Yay! congrats to the folks working hard to make this happen!
So not that modules are in go is, to quote Brad Fitzpatrick, asymptotically approaching boring. There’s only the (right now, hypothetical) Go2 bike-shedding to be a big release in the future!
Kidding aside, I’m excited about more wasm in this and future releases and looking forward to some boring xml stuff in 1.12.
Congrats again!
Another great/thorough writeup. And kudos to the dgraph team (and all of the other previous jepsen customers), for putting their money where their mouth is to have their work publicly torture tested like this.
@aphyr, 3 questions if you don’t mind the tangent:
Thanks
Thank you! To answer your questions…
There’s no FDB analysis planned; they haven’t approached me, and since I just moved and took a couple months off, I really need to focus on taking paying gigs and rebuilding funds. My next client is all lined up though, and I should have more results to show in winter. :)
It’s hard to say! I get PRs from maybe 5 active orgs, and I know of… maybe a dozen orgs who use it independently? I’ve also trained… maybe 150 people in writing Jepsen tests, but I don’t necessarily know whether those folks went on to use Jepsen at their orgs internally, adapted the techniques to their own test suites, or moved on to other things. I think the techniques are more important than the tool itself, so even if folks aren’t using Jepsen itself, I’m happy that they’re doing more testing, fault injection, and generative testing!
I have sort of a “part of this balanced breakfast” take on Jepsen–it exists on a spectrum of correctness methods: normal proofs, machine-checked proofs, model checking, simulation testing, the usual unit & integration tests, Jepsen-style tests, internal self-checks, production telemetry, fault injection/chaos engineering, and user reports. In the early design phase, you want provable algorithms, but complexity might force you to give up on machine-checked proofs and move to model checking; model-checking covers weird parts of the state space but isn’t exhaustive, so it’ll miss some things. The map is never the territory, so we need simulations and tests for individual code components and the system as a whole, to verify that each piece and the abstraction boundaries between them hold up correctly. As you move to bigger tests, you cover more system interactions, but the state space generally explodes: larger tests explore less of the state space. Jepsen’s at the far end of that testing continuum, looking at all the interactions of a real production system, but only over short, experimentally accessible trajectories–a simulation test, like FDB does, is going to cover a lot more ground in the core algorithm, but may not catch bugs at the simulation layer itself or in untested components, e.g. a weird interaction between the filesystem and database which wouldn’t arise in an in-memory test. And Jepsen is specifically constrained to simple, testable workloads; it’s never gonna hit the data or request volumes, or query diversity, that real users will push at the system–that’s why we need user reports, telemetry from production, self-checks, etc.
There’s a lot of “formal methods” in Jepsen; every test encodes, more or less explicitly, an abstract model of the system being evaluated. We take a range of approaches for performance and coverage reasons, so some actually involve walking graphs of state spaces, and others are just checking for hand-proved invariants. Developing new and faster checkers is a great place to apply your formal methods knowledge, if you’re looking to contribute!
re FoundationDB. I wanted to see that, too. He had good reason for not doing it described here. In that thread, he said he hasn’t tested it because “their testing appears to be waaaay more rigorous than mine.” Still might be good for independent replication, though. Plenty of scientific papers look like they have a lot of rigor until you find that they missed this, used that incorrect algorithm, or just made stuff up for fame or fortune.
I say that as someone who was Wow’d by FoundationDB. Hopefully, Jepsen just confirms it was as good as it appeared. If not, people get to fix any problems he finds. It’s all-win scenario unless he finds a problem they can’t fix somehow.
That was based on a phone conversation I had with one of the FDB team members–they were doing a bunch of tests, like hardware faults and simulation tests, that weren’t really feasible for Jepsen because a.) I didn’t have custom hardware, and b.) simulation testing has to be built into the database code itself, and Jepsen takes a black-box approach. FDB also spun up their own Jepsen test, but I can’t tell you how deeply they explored there.
Then FDB got eaten by Apple, and fell off my radar–but I’m happy it’s re-emerging now! We don’t have any plans to work together right now, and I’ve got my hands full with other clients, but I’d be happy to work on FDB tests in the future. :-)
Ugh, as if FLOSS license proliferation wasn’t bad enough, there are more of these “Openish” licenses. After reading @antirez ‘s comments, it seems they’re trying to take a stand that keeps them making money and isn’t completely closed source.
I wish them well because
but they are starting on the slippery slope of being more closed-source.
but they are starting on the slippery slope of being more closed-source.
Or a discovery process exploring licensing options and business models open-source advocates ignored intentionally for ideological reasons which might lead to more paid developers writing free as in beer or speech software or even sustainable, high-quality, low-cost, closed-source software. I’m all for more experimentation with hybrid licenses to see what happens. Personally, I think those routes can be better than FOSS where stuff like Linux mostly gets contributions by a handful of companies for their proprietary reasons that fortunately benefit us on the side. Contributions from users, personal or commercial, vs their use and revenue generation is a gap so large we should be trying everything to close it.
A: Because Keynote Speakers Make Bad Life Choices are Poor Role Models (from the talk).
I’m not saying James Mickens is a national treasure, but shouldn’t we, as a nation, treasure him?
Thanks for posting this! I’ve been on another @bcantrill watching spree recently and this is a good one to top it off with (he does a good job of being entertaining, historically informative, and having an engineering aesthetic of building stuff that solves problems).
One of the cool ideas I’ve run across (I think from Paul Graham’s On Lisp) is petrification of a program - stabilizing and formalizing the program past the quick and dirty stage. I know that type hints/gradual typing are helping this, but would love to see more ideas (besides @andyc’s Oil) that can transition shell/quick scripts to something with more types, error handling, composability (besides pipes).
There is the Oh shell: https://github.com/michaelmacinnis/oh
Excellent point. I finished watching the BSDCan video (from the lobsters discussion) , but haven’t dug into playing with it yet.
Hooray for multiple security protocol implementations! I have high hopes for Project Everest’s miTLS as far as raising the status quo for verification in practical/security/performant domains.
If another version control system eventually supercedes git, I hope that a UI as powerful/easy will be considered a minimum requirement (I’m looking at you Fossil/Hg/Pijul).
Neat stuff!
I wonder what @andyc has to say about Oh vs. Oil :)
Thanks for the shout out :) I listened to the end of the talk (thanks for the pointer by @msingle), and basically everything he says is accurate. I found his 2010 master’s thesis several years ago, and it is a great read. Section 3 on Related Work is very good.
https://scholar.google.com/scholar?cluster=9993440116444147516&hl=en&as_sdt=0,33
Oh and Oil have very similar motivations – treating shell as a real programming language while preserving the interactive shell. Oh seems to be more Lisp-like while Oil simply retains the compositionality of the shell; it doesn’t add the compositionality of Lisp. (In other words, functions and cons cells compose, but processes and files also compose).
I mention NGS here, which probably has even more similar motivations:
http://www.oilshell.org/blog/2018/01/28.html
The main difference between Oh and Oil is the same as the difference between NGS/Elvish/etc. and Oil: Oil is designed to be automatically converted from sh/bash.
My thesis is that if this conversion works well enough, Oil could replace bash. If I just designed a language from scratch, I don’t think anyone would use it. Many people seem to agree with this. After all, fish has existed for 15 years, and is a nicer language than bash for sure, but I’ve seen it used zero times for scripts (and I’ve looked at dozens if not a hundred open source projects with shell scripts.)
However as he correctly notes (and I point out in the FAQ), Oil doesn’t exist yet! Only OSH exists.
The experience of implementing OSH and prototyping the OSH-to-Oil gave me a lot of confidence that this scheme can work. However it’s taking a long time, longer than expected, like pretty much every software project ever.
I’m not really sure how to accelerate it. Maybe it will get there and maybe it won’t :-/ No promises!
I “front-loaded” the project so that if I only manage to create a high-quality implementation of OSH (a bash replacement), then I won’t feel bad. I’m pretty sure people will use that if it exists.
I maintain a “help wanted” tag for anybody interested in contributing:
https://github.com/oilshell/oil/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22
This is mainly helping with OSH, as Oil is a concentrated design/implementation effort that is hard to parallelize. Feel free to join https://oilshell.zulipchat.com too!
Michael MacInnis mentions talking to @andyc ~44:25 in the talk, but I’d also like to hear his opinion on Michael’s work.
Oh! (no pun intended :p) I’m still at the ~33:00 mark into the video and hadn’t gotten to that part. But yeah, I’d like to hear that.
Here is a reply from linux-crypto. Among other things:
Your patch description is also missing any mention of crypto accelerator hardware. Quite a bit of the complexity in the (kernel) crypto API, such as scatterlist support and asynchronous execution, exists because it supports crypto accelerators. AFAICS your new APIs cannot support crypto accelerators, as your APIs are synchronous and operate on virtual addresses. I assume your justification is that “djb algorithms” like ChaCha and Poly1305 don’t need crypto accelerators as they are fast in software. But you never explicitly stated this and discussed the tradeoffs. Since this is basically the foundation for the design you’ve chosen, it really needs to be addressed.
I’ve had enough code reviews and read enough LKML threads to appreciate Eric Biggers critiques of the code. Let’s hope that all parties rapidly converge to code they are happy with and merge it in!
@SeanTallen, glad to see another milestone. Nice release notes!
Thanks @msingle.
Was a team effort across the board, including the release notes.
Looks very cool! Of course I see this literally 1 day after I finally go to the effort of installing zotero to keep track of academic-ish papers/pdfs that I have been accumulating.
Ha, that’s too bad, but I think that Zotero is on the other extreme of the spectrum in terms of features. I’d say Zotero is great if you have a need (or personal preference) to have a big organizational system, where you have lots of context, additional notes & citation management, super fine-grained filing, many more different file types etc.
I wrote Paperboy to be the opposite: don’t force me to make too many decisions, just help me rename and move some files, then get out of my way :)
I was actually motivated by this feeling that if I put a lot of effort into feeding a tool, then I can never get rid of it, because a) it has a lot of magic metadata somewhere, b) I feel like I invested so much time that would then have been wasted. Even worse when it has a slightly intrusive online-component!
Do people no longer turn on extensions in Cabal files? I’m shocked to see numbers so low. For my current Haskell project, now at about 3000 lines, I still have lots of files with >5 extensions, even though I deliberately tried to keep things simple and avoid advanced language features. (This is still nothing compared to Cubix, my last project. When I speak to Haskell meetups, I even have a slide joking about it.)
How many files are there that only use the extensions in the list? They didn’t answer that question (just numbers of individual extensions), though they made claims.
Just so I understand the scope since I’m not a Haskell programmer, but 3000 files sounds like … a lot. I’ve used lots of c family languages and a project in any of those with 3000 files would be pretty significant; is that the case with Haskell as well?
There are a lot of bad practices that become self perpetuating. I was involved in some PCI-DSS stuff at work and their password requirements are … not very good for end users. Normal users pick worse passwords when having to rotate them frequently, but PCI requires password changes every 90 days and you can’t use any of the previous 4 or 5 (too lazy to dig up the link), therefore encouraging users to do password1, password2, password3, etc.
I was the only person in my team whom had a random password; the system was pretty good at giving you easy to remember passwords to choose from like “frank+8Fell”. Everyone else ended up with adding an incrementing number or prepending a letter e.g password1, password2, etc or password, ppassword, pppassword…
It’s one of the styles of English possessive for singular words that end in an ‘s’. When making a plural word that ends in ‘s’ into a possessive, all authorities agree that you just add an apostrophe (“the employees’ salaries”). But when it’s a singular word that happens to end in an ‘s’, some styles prefer that you treat it the same way as any other singular word and add apostrope-s (“Alger Hiss’s trial”), while others prefer that you treat it in the same way as plural words ending in ‘s’, and add just apostrophe (“Alger Hiss’ trial”). Both styles are pretty common for a few centuries now I think. I tend to use the apostrophe-s style because it’s how I would speak (I’d say “hiss-es trial”, or in this case, “boats-es personal barricade”, to indicate the possessive). I guess this one is extra-weird because the person’s handle, boats, is a plural English word, but adopted as a handle for a single individual.
I’ll add a citation in honor or @mjn’s fine reply, Wikipedia (Wikisource) has the rule from the original Strunk & White text - Strunk and White is one of the better (and readable) style guides that most people should use for the English language.
Strunk and White is one of the better (and readable) style guides that most people should use for the English language.
It really depends who you ask. See for example the paper linked in https://www.washingtonpost.com/news/volokh-conspiracy/wp/2015/04/21/against-strunk-whites-the-elements-of-style/ for example.
Agreed. If you are at the point where you disagree based on an actual reason, like in the linked rebuttal, or are even aware of other style guides, then weigh the pros and cons appropriately. If your discipline/profession/place of work doesn’t have one and you aren’t being supervised by a professor, this is a pretty good default.
I actually hesitated at wording it as rule and would have preferred guideline, but my link had it titled as rule, so take things with a grain of a salt.
In practice, I would guess most authors do something simpler than S&W and just stick to either the apostrophe-only or the apostrophe-s form, though I have no data on that. Seems a bit fiddly to recommend apostrophe-s almost always, but then carve out an exception for “ancient proper names ending in -es and -is”, a second exception specifically for Jesus, and a third one for traditional punctuation of phrases like “for righteousness’ sake”. I could imagine that working as a publication’s house style that their copyeditors enforce, but I would be surprised to find it much in the wild.
Not surprised. Oracle’s Java copyright lawsuit going the way it is, Google is going to be license price extorted if they stick to Java so they pretty much need to kill Android if the lawsuit succeeds. Working their own version also gives them the copyrights so the ruling will be helpful in maintaining a tight grip on the new platform.
As I understand it (and as noted by alva above), Fuschia competes with Linux, not Java. It’s a microkernel, not a language VM or a language. The article was technically confused – the Oracle line was just a throwaway and not terribly accurate.
Java is going to be in Android forever, simply because there are hundreds of thousands of apps written in Java. Kotlin was a logical move because it’s very compatible with Java.
I assume the thing everyone is kind of getting at is Flutter, which is the preferred IDE for Fuchsia and it’s not Java-encumbered.
The argument against succinctness seems odd to me. Yes, regular expressions (the example given) are notoriously succinct, but is
a(b{3,10}|c*)d
really harder to read than
("a" (or (repeat "b" 3 10) (any "c") "d")
or some other notation? The verbose one might be easier to read for someone unfamiliar with regex notation, but once you’ve learned regexes, it becomes a lot easier to see “the whole picture” with a succinct notation (IMHO).
It’s like saying we should write arithmetic like (to use an example that totally isn’t a real programming language ahem):
ADD 1 TO X GIVING Y
instead of
Y = 1 + X
Mathematical notation is notoriously succinct, and it has succeeded because it makes communicating mathematics much easier (yes, I’m intentionally echoing “Notation as a Tool of Thought” here). Standard mathematical notation is the world’s most common DSL and is so common as to be ubiquitous. It is also notoriously succinct.
Many of the arguments in TFA against regex notation seem to be at least partially answered by extended regex notation, which allows whitespace. To whit:
a
(
b{3,10}
|
c*
)
d
is just as if not more readable as the s-expr above (again IMHO) and still lets me see the trees and the forest.
I suppose the argument comes down to ease-of-use for beginners versus ease-of-use for experienced users. Experienced users want brevity and conciseness and beginners what code that is self-explanatory.
I think that there should be tools for converting from a dsl to the unsugared powerful syntax. As much as I love regexes, not everyone knows them and there are lots of subleties, complexities, and variations (is that perl, vim, shell?).
The tricky parts of regex don’t go away with verbalising the operators.
For example: is the statement above evaluated in a greedy or non-greedy fashion?
The problem with regular expressions isn’t so much that they are overly succinct but that the sub-expressions typically go unnamed. E.g. we might have a regular expression for IPv4 addresses (from https://stackoverflow.com/a/5284410):
re = /\b((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)(\.|$)){4}\b/
but this would be much easier to read if we wrote:
octet = /25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?/
re = /\b(#{octet}).(#{octet}).(#{octet})\.(#{octet})\b/
and as a side benefit it does stricter validation and correctly captures all 4 octets.
The most important facility any language can provide IMO is the ability to give names to constructs we create.
Excellent point. Giving names and recursive ability to regexes gets you Parsing Expression Grammars, though there’s no standardized notation for them.
Once you can name subexpressions you have the question of recursion. If you support recursion this is essentially PEG. That’s why I call PEG “regex++”.
I just recently ran across Sequoia-PGP - which doesn’t answer the original posts concerns, but hopefully will become popular enough to encourage more thinking about the UX/concerns in the post.
This is a great talk, Byrd is excellent at explaining the elegance and depth of the mind-bending concepts here in an accessible way.
For those that are unfamiliar with Byrd’s work; miniKanren and most recently Barliman are fascinating uses of relational/logic programming.