I just started reading The DynamoDB Book - I’m only a few chapters in so it’s a lot of review at this point, but I’m really looking forward to sinking my teeth into some of the data modeling tips and tricks later on in the book!
This kind of reminds me of a podcast I listened to with the creator of Wireguard - namely, how certain things such as offering administrators different knobs to tune with VPNs and dynamic allocation of memory have resulted in vulnerabilities in other VPN software. Because of this, they designed those aspects out of Wireguard entirely - it’s highly opinionated, and it allocates all of the memory it needs upfront, so they sidestep those kinds of vulnerabilities entirely!
That’s so great to hear! I myself have had dreams of a compute mesh powered by WASM so this is something I plan to follow closely. Keep up the awesome and thanks again for sharing.
I was thinking of writing this blog post too. Maybe I still will, because I want to emphasise something different about this topic.
Mailing list workflows also make people write a different sort of commit, where the diffs are short and readable and the commit messages are persuasive as to why the diff should be accepted. That’s because these diffs and commit messages are going to be much more prominent in people’s mailboxes, so both should be as readable as possible. Extraneous things such as style fixes or whitespace changes within the same diff are frowned upon because they bring clutter and hinder readability (it’s okay to throw those into a separate commit and thus a separate email, though).
I find this style of commits immensely useful when reading the project’s history, but their value is hard to convey to someone raised on Github, which strongly discourages this style of commits. Most Github users I’ve seen never read the project’s history, because they value neither writing nor reading it.
Most Github users I’ve seen never read the project’s history, because they value neither writing nor reading it.
My impression as to why that is that because navigating the history of a particular code segment in GitHub is hard to do, people do it less often. Because they seldom read the history, they don’t value writing it (why waste effort in something that no-one will read?), instead they worry about the pull request message, which is out of band information for the VCS.
Because commits are easy to navigate using Emacs’ vc-annotate mode, I’ve found valuable information written down 7+ years when working in FLOSS projects where the original author was long gone. And since then I tend to value well written commit messages.
As I was saying in HN, I think Github did to commits and commit messages what Gmail did to email and top-posting: hide most of it so that nobody has to bother with writing it properly and thus doesn’t have to read it either.
I’m 100% with you on this. I think this is a consequence of GitHub’s “code first” attitude; the UI presents code front and center, so that’s what people come to care about. Commit messages and project history are second class citizens on GitHub, which saddens me - there’s a lot of useful information locked away in there! If you get around to writing that post, I would very much like to read it!
Hi, the post is about playing around with elasticsearch and experiment with its features on the Linux man pages (language analyzers, cutoff frequency etc). Of course, having to setup an elasticsearch instance to search man pages is not a convenient method. Thank you for your feedback.
I think the idea behind this is neat, although ES might be a bit heavy - have you looked at embedded FTS systems such as Xapian or even SQLite’s FTS extension?
I’m working on a network framework for Lua. I use select()/poll()/epoll() (depending upon OS) to drive the events and use coroutines to handle the logic of a “request”. So far I have support for both TCP and UDP packets and have a very simple HTTP server (that makes a request to a gopher server to test out the outbound connections logic) and DNS requests (so far hard coded to a server—working on parsing /etc/hosts and /etc/resolve to lift that restriction).
In terms of pros/cons of mine vs. cqueues … eh. I have an idiosyncratic programming style that isn’t as popular with the C/Lua crowd (and let’s just leave it at that).
This may be one of my favorite posts this year - I love reading about game economies and the emergent gameplay that arises from them. A long read, but well worth it!
I recommend having a look at http://keepachangelog.com/en/1.0.0/ and seeing if you can apply any of the advice - a changelog is most useful when it’s curated with the users in mind. Users tend not to want to read about whitespace being removed, for example - if they did, i think they would just read git log!
I use TiddlyWiki as a general personal wiki and journal - I love TiddlyWiki for its power and flexibility. I put a lot of things in my journal, such as:
Things I’ve learned (I have a <<til>> macro for this so I can easily assemble TILs in a single tiddler view)
Code snippets I’ve found useful (I have a <<snippet>> macro for these)
Interesting things I’ve read or watched, sometimes with thoughts I have on it
Things I’ve accomplished
Feelings I’ve had (feeling motivated, burnt out, etc)
Things I wanted to get done that day but didn’t
The first three are more for future reference, while the last three are useful for regular reflection so I can focus on the macro-level of things - discovering patterns on my focus or lack thereof, as well as discovering trends in my emotions and goals.
I tend to lean towards the “git-core” end of the spectrum, particularly for long-lived projects. Finding the place in which a bug occurs is one thing, but it really helps to understand the context in which the bug-producing code was written. Doing a blame/bisect and being able to get inside the author’s head at that moment (which might be younger me’s head!) might prevent me from introducing other bugs!
I also find good commit messages handy with reverts, particularly when we’re trying something new and it doesn’t work. At work we tried to fix an issue around database encodings with some SQLAlchemy casts a few years ago; a week after they were introduced, another developer reverted that change with no explanation in Git. That developer has moved on to another company, and when the encoding issue finally reared its ugly head again, we had to rediscover the reason for the revert in the first place!
Grumpy me says there’s no real relation and you guys are trying to fit a round peg into a square hole. Category theory isn’t doing me a whit of good in trying to improve Mercurial. I need to understand a lot more about diff algorithms, bitkeeper’s weave structure, and delta compression methods.
Then again, I never really did like logic much and the part of category theory I find interesting is homological algebra. The kind of people who like computers more than math don’t care at all about homological algebra.
I just finished a simple web application that I started on at Elm Chicago’s workshop night last week; it’s called Idea Fight, and it’s designed to help me (and others) figure out which ideas should be implemented first.
I really enjoy working with Elm, and I thought that the algorithm to partially order a list of items was pretty fun to work on!
RubyRogues - A Ruby podcast that isn’t specific to Ruby; the panelists discuss a lot of general development topics.
Security Now! - Episodes are long, but full of security news and low-level details of how various things, such as hardware and protocols, work.
Changelog - Discussion with authors and maintainers of open source projects. Deals a lot with “trendy” technologies, like Node and Go, but it’s interesting to hear about how various people got started with their projects and programming in general.
Non-tech
How Did This Get Made? - A humorous podcast in which the panelists review bad movies and ask the question: “how did this get made?!”
Cool Games Inc. - A podcast in which the hosts go through submitted game ideas and use them to come up with ridiculous game ideas.
I use AntennaPod on Android, which has served me well for the past few years.
This should make picking up Elixir as an Erlang developer very easy, and makes interoperability between Erlang and Elixir pretty simple.
I’ve found this wholly untrue. Elixir has moved away from the semantics and style that Erlang imposes to the point where going from one to another is an exercise in frustration and pain. I, as somebody who knows Erlang, wouldn’t give it up for what Elixir offers.
Thanks for pointing this out! I don’t have a lot of experience with Erlang, so I wrote that from an uninformed perspective. Would you mind pointing out some concrete examples of things that caused you frustration and pain?
This breaks in elixir because their “strings” are binaries.
Erlang/OTP 18 [erts-7.3] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]
Interactive Elixir (1.2.4) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> :os.cmd("uname")
** (FunctionClauseError) no function clause matching in :os.validate/1
(kernel) os.erl:384: :os.validate("uname")
(kernel) os.erl:214: :os.cmd/1
iex(1)>
You can make Erlang express the same error by doing:
2> os:cmd(<<"uname">>).
** exception error: no function clause matching os:validate(<<"uname">>) (os.erl, line 384)
in function os:cmd/1 (os.erl, line 214)
Which is fine, but makes translating hard. Elixir basically asks you to either constantly remember this fact, or forget erlang entirely when working with it.
Would you mind pointing to a few more examples (besides having to remember one fact about strings) to help me understand your experience of interop simplicity as being “wholly untrue” and “an exercise in frustration and pain”? (FWIW I’m a full-time Elixir dev learning Erlang more deeply, love both languages and have had zero problems with interop so far.)
Would someone mind explaining how this could be exploited? I’ve read the description twice now and I can’t quite figure it out.
From my understanding, the issue is that if a server allows X11 forwarding, a user can authenticate and provide a crafted credential to inject commands into xauth, which is running under that user’s priviledges. So how is that different from just logging on to the box and running commands via the shell? Does this only apply to servers configured to run certain X11 programs on behalf of the user, but restrict them from using an actual shell?
Does this only apply to servers configured to run certain X11 programs on behalf of the user, but restrict them from using an actual shell?
Pretty much, like fetching a CVS or git repo over SSH, where the server is not supposed to give you an open shell but run a specific command, though it does not need to be an X11 command (just have X11Forwarding enabled in the server).
Without knowing the particulars, I’d generally say that using a shared flag as a fake refcount is an anti pattern. Sooner or later, the object is shared twice, the first share is dropped (clearing the flag), and then boom. This is akin to a buggy read/write lock that completely unlocks after the first reader releases it.
I don’t know if I’d call the flag a refcount; it’s more of a “do I own this piece of data”, so there’s no clearing that flag. I was able to determine that the owner of the callsite always outlives borrowers, so it’s safe to clean up the data if the owner is being collected.
That’s how it starts. :) Then the borrower starts to live just a little bit longer and oops. And it’s hard to assert that this doesn’t happen. The owner can’t check that the flag is cleared when it cleans up because the borrower can’t clear the flag.
Maybe you’re right, but keep in mind that subsignatures are kind of like child nodes in a tree. If you have a tree structure where each node can point to a chunk of memory it owns, or to a chunk of memory owned by any ancestor of that node, that ownership flag should suffice as long as you clean up the tree from the leaves up, right?
I’d expect that part of the kernel slowdown is simply that as the process RSS grows the kernel must do more and more work to manipulate the page table permissions and virtual memory areas as it sets everything to copy-on-write. This is probably especially likely if more VMAs are added instead of just existing ones grown.
That makes sense to me, I just didn’t think that would account for that much time! I would be curious to see if using mmap versus messing with the program break would affect things differently.
I do know that Linux has consistently forked dynamically linked programs measurably slower than statically linked ones. It’s possible that dynamic linking causes more work at fork() time (perhaps both in the kernel and in libc), but I think the big difference is the number of memory pages and VMAs in static versus dynamic processes (since dynamic ones have VMAs for all their mmap’d shared libraries).
Oh, that’s really interesting! That would be another interesting dimension to cover. Now I can’t wait to finish this series of posts to work on the fork() one =)
I just started reading The DynamoDB Book - I’m only a few chapters in so it’s a lot of review at this point, but I’m really looking forward to sinking my teeth into some of the data modeling tips and tricks later on in the book!
Just finished “Designing Data-Intensive Applications”, was very interesting and easy to read. I’m still in the process of deciding the next one.
I rate Designing Data-Intensive Applications very highly indeed. Might put it on the list for a re-read!
A friend also recommended Seven Databases in Seven Weeks but I’ve not got around to it yet.
I will take a look, from the description it takes a very practical approach which could be a challenge, because the book was released in 2012.
Ooo “Designing Data-Intensive Applications” is so good! I read it last year, and should probably review my notes 😅
Nice, I am currently reading that book too.
As a suggestion, Seven concurrency model in seven week is also a rather nice read, and would complement nicely what you are reading now.
Thanks for the suggestion, I will look into it.
Truly an an amazing book.
Another O’Reilly with similar ideas is “Fundamentals of data engineering”. Also worth a read
Did you read this book cover to cover? It’s so hard for me to stay focused on such a long book, but every time I pick a chapter, I learn something new
This kind of reminds me of a podcast I listened to with the creator of Wireguard - namely, how certain things such as offering administrators different knobs to tune with VPNs and dynamic allocation of memory have resulted in vulnerabilities in other VPN software. Because of this, they designed those aspects out of Wireguard entirely - it’s highly opinionated, and it allocates all of the memory it needs upfront, so they sidestep those kinds of vulnerabilities entirely!
I really loved this post and love the underlying concept even more. Can’t wait to learn more.
Thanks! It’s been one of the most fun projects I’ve ever done.
That’s so great to hear! I myself have had dreams of a compute mesh powered by WASM so this is something I plan to follow closely. Keep up the awesome and thanks again for sharing.
Did you see the recent post on Gate? It sounds like you might be interested in that too!
I was thinking of writing this blog post too. Maybe I still will, because I want to emphasise something different about this topic.
Mailing list workflows also make people write a different sort of commit, where the diffs are short and readable and the commit messages are persuasive as to why the diff should be accepted. That’s because these diffs and commit messages are going to be much more prominent in people’s mailboxes, so both should be as readable as possible. Extraneous things such as style fixes or whitespace changes within the same diff are frowned upon because they bring clutter and hinder readability (it’s okay to throw those into a separate commit and thus a separate email, though).
I find this style of commits immensely useful when reading the project’s history, but their value is hard to convey to someone raised on Github, which strongly discourages this style of commits. Most Github users I’ve seen never read the project’s history, because they value neither writing nor reading it.
My impression as to why that is that because navigating the history of a particular code segment in GitHub is hard to do, people do it less often. Because they seldom read the history, they don’t value writing it (why waste effort in something that no-one will read?), instead they worry about the pull request message, which is out of band information for the VCS.
Because commits are easy to navigate using Emacs’ vc-annotate mode, I’ve found valuable information written down 7+ years when working in FLOSS projects where the original author was long gone. And since then I tend to value well written commit messages.
As I was saying in HN, I think Github did to commits and commit messages what Gmail did to email and top-posting: hide most of it so that nobody has to bother with writing it properly and thus doesn’t have to read it either.
I’m 100% with you on this. I think this is a consequence of GitHub’s “code first” attitude; the UI presents code front and center, so that’s what people come to care about. Commit messages and project history are second class citizens on GitHub, which saddens me - there’s a lot of useful information locked away in there! If you get around to writing that post, I would very much like to read it!
apropos(1) already does full text search on your man pages and doesn’t require Java and gigs of memory
edit: not to say this isn’t clever, I’m just not convinced it’s economical
Hi, the post is about playing around with elasticsearch and experiment with its features on the Linux man pages (language analyzers, cutoff frequency etc). Of course, having to setup an elasticsearch instance to search man pages is not a convenient method. Thank you for your feedback.
I think the idea behind this is neat, although ES might be a bit heavy - have you looked at embedded FTS systems such as Xapian or even SQLite’s FTS extension?
Nope, I’ll take a look, thank you
I’m working on a network framework for Lua. I use
select()/poll()/epoll()
(depending upon OS) to drive the events and use coroutines to handle the logic of a “request”. So far I have support for both TCP and UDP packets and have a very simple HTTP server (that makes a request to a gopher server to test out the outbound connections logic) and DNS requests (so far hard coded to a server—working on parsing/etc/hosts
and/etc/resolve
to lift that restriction).Are you familiar with the cqueues Lua library? If so, what pros/cons does your framework have in comparison?
“What I cannot create, I do not understand.”
I”ve written enough code over the years to write my own version of cqueues.
In terms of pros/cons of mine vs. cqueues … eh. I have an idiosyncratic programming style that isn’t as popular with the C/Lua crowd (and let’s just leave it at that).
This may be one of my favorite posts this year - I love reading about game economies and the emergent gameplay that arises from them. A long read, but well worth it!
I recommend having a look at http://keepachangelog.com/en/1.0.0/ and seeing if you can apply any of the advice - a changelog is most useful when it’s curated with the users in mind. Users tend not to want to read about whitespace being removed, for example - if they did, i think they would just read
git log
!Well I have both: a release announcment where I summarized changes, and the raw git log.
http://www.oilshell.org/blog/2017/09/09.html#appendix-b
The hyperlinks in the HTML git log are what help me write the human-readable version!
Ah, excellent!
I use TiddlyWiki as a general personal wiki and journal - I love TiddlyWiki for its power and flexibility. I put a lot of things in my journal, such as:
<<til>>
macro for this so I can easily assemble TILs in a single tiddler view)<<snippet>>
macro for these)The first three are more for future reference, while the last three are useful for regular reflection so I can focus on the macro-level of things - discovering patterns on my focus or lack thereof, as well as discovering trends in my emotions and goals.
+1 to dependent types - I’ve been playing with Idris for a few months and it’s been a great, mind-bending experience.
I tend to lean towards the “git-core” end of the spectrum, particularly for long-lived projects. Finding the place in which a bug occurs is one thing, but it really helps to understand the context in which the bug-producing code was written. Doing a blame/bisect and being able to get inside the author’s head at that moment (which might be younger me’s head!) might prevent me from introducing other bugs!
I also find good commit messages handy with reverts, particularly when we’re trying something new and it doesn’t work. At work we tried to fix an issue around database encodings with some SQLAlchemy casts a few years ago; a week after they were introduced, another developer reverted that change with no explanation in Git. That developer has moved on to another company, and when the encoding issue finally reared its ugly head again, we had to rediscover the reason for the revert in the first place!
Prepping for a presentation Wednesday night. I somehow need to present category theory and its relation to version control in under 30 minuets.
That sounds interesting - please share the slides after your talk if you can!
Grumpy me says there’s no real relation and you guys are trying to fit a round peg into a square hole. Category theory isn’t doing me a whit of good in trying to improve Mercurial. I need to understand a lot more about diff algorithms, bitkeeper’s weave structure, and delta compression methods.
Then again, I never really did like logic much and the part of category theory I find interesting is homological algebra. The kind of people who like computers more than math don’t care at all about homological algebra.
I just finished a simple web application that I started on at Elm Chicago’s workshop night last week; it’s called Idea Fight, and it’s designed to help me (and others) figure out which ideas should be implemented first.
I really enjoy working with Elm, and I thought that the algorithm to partially order a list of items was pretty fun to work on!
Tech
Non-tech
I use AntennaPod on Android, which has served me well for the past few years.
I wrote a post about this not too long ago, in which I mention my affection for mojo, jq, uniprops, and combine: http://hoelz.ro/blog/unsung-heroes-of-the-command-line
One tool I forgot to include in there is tig, a great curses UI for Git.
I’ve found this wholly untrue. Elixir has moved away from the semantics and style that Erlang imposes to the point where going from one to another is an exercise in frustration and pain. I, as somebody who knows Erlang, wouldn’t give it up for what Elixir offers.
Thanks for pointing this out! I don’t have a lot of experience with Erlang, so I wrote that from an uninformed perspective. Would you mind pointing out some concrete examples of things that caused you frustration and pain?
The easiest one to illustrate:
Trying to call an Erlang function that takes a string.
This breaks in elixir because their “strings” are binaries.
You can make Erlang express the same error by doing:
And you can make it work by doing:
Ah, problems with strings vs bistrings make a ton of sense. I can see why they wanted bitstrings to be the default in Elixir, though.
Which is fine, but makes translating hard. Elixir basically asks you to either constantly remember this fact, or forget erlang entirely when working with it.
Would you mind pointing to a few more examples (besides having to remember one fact about strings) to help me understand your experience of interop simplicity as being “wholly untrue” and “an exercise in frustration and pain”? (FWIW I’m a full-time Elixir dev learning Erlang more deeply, love both languages and have had zero problems with interop so far.)
Would someone mind explaining how this could be exploited? I’ve read the description twice now and I can’t quite figure it out.
From my understanding, the issue is that if a server allows X11 forwarding, a user can authenticate and provide a crafted credential to inject commands into xauth, which is running under that user’s priviledges. So how is that different from just logging on to the box and running commands via the shell? Does this only apply to servers configured to run certain X11 programs on behalf of the user, but restrict them from using an actual shell?
Thanks in advance!
Pretty much, like fetching a CVS or git repo over SSH, where the server is not supposed to give you an open shell but run a specific command, though it does not need to be an X11 command (just have
X11Forwarding
enabled in the server).Ah, ok. Thanks for clarifying!
Without knowing the particulars, I’d generally say that using a shared flag as a fake refcount is an anti pattern. Sooner or later, the object is shared twice, the first share is dropped (clearing the flag), and then boom. This is akin to a buggy read/write lock that completely unlocks after the first reader releases it.
I don’t know if I’d call the flag a refcount; it’s more of a “do I own this piece of data”, so there’s no clearing that flag. I was able to determine that the owner of the callsite always outlives borrowers, so it’s safe to clean up the data if the owner is being collected.
That’s how it starts. :) Then the borrower starts to live just a little bit longer and oops. And it’s hard to assert that this doesn’t happen. The owner can’t check that the flag is cleared when it cleans up because the borrower can’t clear the flag.
Maybe you’re right, but keep in mind that subsignatures are kind of like child nodes in a tree. If you have a tree structure where each node can point to a chunk of memory it owns, or to a chunk of memory owned by any ancestor of that node, that ownership flag should suffice as long as you clean up the tree from the leaves up, right?
I’d expect that part of the kernel slowdown is simply that as the process RSS grows the kernel must do more and more work to manipulate the page table permissions and virtual memory areas as it sets everything to copy-on-write. This is probably especially likely if more VMAs are added instead of just existing ones grown.
That makes sense to me, I just didn’t think that would account for that much time! I would be curious to see if using mmap versus messing with the program break would affect things differently.
I do know that Linux has consistently forked dynamically linked programs measurably slower than statically linked ones. It’s possible that dynamic linking causes more work at fork() time (perhaps both in the kernel and in libc), but I think the big difference is the number of memory pages and VMAs in static versus dynamic processes (since dynamic ones have VMAs for all their mmap’d shared libraries).
Oh, that’s really interesting! That would be another interesting dimension to cover. Now I can’t wait to finish this series of posts to work on the fork() one =)
Another wrench to throw into the mix, the libc being used will cause different behavior as well.
You may want to compare glibc versus musl versus uclibc etc…
Also want to compare fork() when different locales are present as that can add more fun utf8 type parsing.
Both spectacular ideas! I’ll add them to my notes!