1. 8

Sooo, how is it actually a lie? At a glance, the PDF seems to be purely about some internal stuff in the (an?) OpenBSD package manager. I totally don’t understand how the title applies (apart from it being actually the subtitle of the slides, but I don’t understand either why it is so).

1. 4

It might make more sense if you take it from the other side: Transmitting programs continuously over the network is highly dangerous: if one can alter the transmitted data, one can apply modifications to the OS or add a backdoor to a program.

So what to use as transport? HTTPS? The talk questions whether HTTPS is really strong enough to support transmitting package. It then extends on how to mitigate the potential weakness that HTTPS can provoke.

1. 3

if one can alter the transmitted data

That’s why almost every package manager has package signatures… that’s also why many package manager are still using HTTP.

1. 5

HTTP would still leak what packages you installed and their exact versions, very interesting stuff to know for a potential attacker.

HTTPS would also guard against potential security problems in the signing, ie layered security. If the signing process has an issue, HTTPS will provide a different or reduced security depending on your threat model.

1. 1

Totally true indeed. I was mentioning that for the moment, these security issues do not seem to be considered as high threat and therefore not addressed at the moment (not that I know of).

2. 1

So as HTTP does not provide a strong enough security on its own, other mechanisms are used. I like the practice of not relying at 100% on the transport to provide security.

1. 1

The thing is that, HTTPS doesn’t provide only the certification that what you ask is what you get, it also encrypts the traffic (which is arguably important for package managers).

So at the moment, HTTP + signing is reasonable enough to be used a « security mechanism ».

1. 3

The thing is that, HTTPS doesn’t provide only the certification that what you ask is what you get, it also encrypts the traffic (which is arguably important for package managers).

That’s something the slides dispute. Packages have predictable lengths, especially when fetched in a predictable order. Unless the client and/or server work to pad them out, the HTTPS-encrypted update traffic is as good as plaintext.

2. 1

I’m completely blanking on which package manager it was, but there was a recent CVE (probably this past month) where the package manager did something unsafe with a package before verifying the signature. HTTPS would’ve mitigated the problem.

Admittedly, it’s a well-known and theoretically simpler rule to never do anything before verifying the signature, but you’re still exposing more attack surface if you don’t use HTTPS.

1. 14

I have to disagree, at least partially. A couple years ago I started learning the norman layout on a columnar keyboard (Ergodox, then Planck). I crashed from ~100 wpm to 9 wpm and it took weeks to crawl above 30 wpm (I took regular typing tests). I absolutely was a less productive programmer, and I had plenty of time to think about it as I crawled my way through the days. I think there are two main factors this doesn’t take into account.

First, I think this back-of-the-envelope calculation is roughly accurate, but it’s about lines of finished code. Producing 100 lines of production code can require producing more than ten times that much in drafts, tests, debugging, typos, and intermediate forms. We’ve all had that day (…or week…) debugging that ends with a one-line fix, right?

Second, this only accounts for code. A significant amount of my work is communicating with coworkers and users on Slack/email/tickets or writing documentation. That all greatly outweighs the raw code I produce, and I would not be an effective programmer if I wasn’t doing it.

More subjectively, it’s a distraction to not be able to type quickly. Not just if you’re crawling along at 10 wpm and have to consciously think about each key, but when I was only typing at 40 or 50 wpm I couldn’t get down thoughts as fluidly as I could think them. Above 60 or 70 wpm this faded away and I’d agree that it’s not a bottleneck, but it is a loss of performance.

1. 8

More subjectively, it’s a distraction to not be able to type quickly.

I can’t agree more. I didn’t think text input speed was an issue until I started trying to type code on a phone keyboard. The speed of doing that is just unbearably slow.

1. 5

I see what you’re saying, but I think it misses the spirit of the argument in the article, which is probably better summarized as “For developers who can touch-type, the amount of code required to be typed is not a programming bottleneck” We could probably soften the statement a bit more and still get to the heart of what the author saying which is “Exchanging verbosity for functional simplicity and reliability is a net win.” I agree with both of those statements. I don’t think the argument about wpm is even necessary. I’ve failed to ship plenty of features on time, and never once was the problem that I couldn’t type the code in fast enough. That’s never even been close to being a concern. On the other hand, chasing down runtime type errors, shared mutable state, isolated data in invalid states, and similar issues have often cost myself and teammates quite a bit of time. Of course, poor organization and planning can outweigh any of those factors, but when it comes to the actual code, I’ll take verbose stronger guarantees over terse weak contracts any day of the week.

1. 2

Notice that the author presents a class with a builder that’s 78 lines alongside the mythical man month finding that programmers produce 10 lines a day.

Something tells me he would not be ok taking an entire day, or even 8 days(!) to produce it. Reasoning about averages has a lot of pitfalls.

1. 5

I’ve been working on a regex compiler that produces a java class file that matches just that regex: https://github.com/hyperpape/BytecodeStringMatching. I have the skeleton done, so I’ll be trying to

1. write a nice API
2. extend the syntax it supports
3. start performance testing it
1. 2

Where did the name come from? I initially assumed it would be about bitcoin (which is a negative to me), but it doesn’t seem like it is.

1. 2

Nah, it’s just a domain that I have owned for a long time and I decided to use it because it’s a short, pronounceable .com

1. 4

Surely I’m not going to be the only one expecting a comparison here between go’s. I’m not really well versed in GC but this appears to mirror go’s quite heavily.

1. 12

It’s compacting and generational, so that’s a pair of very large differences.

1. 1

My understanding, and I can’t find a link handy, is that the Go team is on a long term path to change their internals to allow for compacting and generational gc. There was something about the Azul guys advising them a year+ ago iirc.

Edit; I’m not sure what the current status is, haven’t been following, but see this from 2012, look for Gil Tene comments:

1. 3

This presentation from this July suggests they’re averse to taking almost any regressions now even if they get good GC throughput out of it. rlh tried freeing garbage at thread (goroutine) exit if the memory wasn’t reachable from another thread at any point, which seemed promising to me but didn’t pan out. aclements did some very clever experiments with fast cryptographic hashing of pointers to allow new tradeoffs, but rlh even seemed doubtful the prospects of that approach in the long term.

Compacting is a yet harder sell because they don’t want a read barrier and objects moving might make life harder for cgo users.

Does seem likely we’ll see more work on more reliably meeting folks’ current expectations, like by fixing situations where it’s hard to stop a thread in a tight loop, and we’ll probably see work on reducing garbage through escape analysis, either directly or by doing better at other stuff like inlining. I said more in my long comment, but I suspect Java and Go have gone on sufficiently different paths they might not come back that close together. I could be wrong; things are interesting that way!

1. 1

Might be. I’m just going on what I know about the collector’s current state.

2. 9

Other comments get at it, but the two are very different internally. Java GCs have been generational, meaning they can collect common short-lived garbage without looking at every live pointer in the heap, and compacting, meaning they pack together live data, which helps them achieve quick allocation and locality that can help processor caches work effectively.

ZGC is trying to maintain all of that and not pause the app much. Concurrent compacting GCs are hard because you can’t normally atomically update all the pointers to an object at once. To deal with that you need a read barrier or load barrier, something that happens when the app reads a pointer to make sure that it ends up reading the object from the right place. Sometimes (like in Azul C4 I think) this is done with memory-mapping tricks; in ZGC it looks like they do it by checking a few bits in each pointer they read. Anyway, keeping an app running while you move its data out from under it, without slowing it down a lot, is no easier than it sounds. (To the side, generational collectors don’t have to be compacting, but most are. WebKit’s Riptide is an interesting example of the tradeoffs of non-compacting generational.)

In Go all collections are full collections (not generational) and no heap compaction happens. So Go’s average GC cycle will do more work than a typical Java collector’s average cycle would in an app that allocates equally heavily and has short-lived garbage. Go is by all accounts good at keeping that work in the background. While not tackling generational, they’ve reduced the GC pauses to more or less synchronization points, under 1ms if all the threads of your app can be paused promptly (and they’re interested in making it possible to pause currently-uncooperative threads).

What Go does have going for it throughput-wise is that the language and tooling make it easier to allocate less, similar to what Coda’s comment said. Java is heavy on references to heap-allocated objects, and it uses indirect calls (virtual method calls) all over the place that make cross-function escape analysis hard (though JVMs still manage to do some, because the JIT can watch the app running and notice that an indirect call’s destination is predictable). Go’s defaults are flipped from that, and existing perf-sensitive Go code is already written with the assumption that allocations are kind of expensive. The presentation ngrilly linked to from one of the Go GC people suggests at a minimum the Go team really doesn’t want to accept any regressions for low-garbage code to get generational-type throughput improvements. I suspect the languages and communities have gone down sufficiently divergent paths about memory and GC that they’re not that likely to come together now, but I could be surprised.

1. 1

One question that I don’t have a good feeling for is: could Go offer something like what the JVM has, where there are several distinct garbage collectors with different performance characteristics (high throughput vs. low latency)? I know simplicity has been a selling point, but like Coda said, the abundance of options is fine if you have a really solid default.

1. 1

Doubtful they’ll have the user choose; they talk pretty proudly about not offering many knobs.

One thing Rick Hudson noted in the presentation (worth reading if you’re this deep in) is that if Austin’s clever pointer-hashing-at-GC-time trick works for some programs, the runtime could choose between using it or not based on how well it’s working out on the current workload. (Which it couldn’t easily do if, like, changing GCs meant compiling in different barrier code.) He doesn’t exactly suggest that they’re going to do it, just notes they could.

2. 1

This is fantastic! Exactly what I was hoping for!

3. 3

There are decades of research and engineering efforts that put Go’s GC and Hotspot apart.

Go’s GC is a nice introductory project, Hotspot is the real deal.

1. 3

Go’s GC designers are not newbies either and have decades of experience: https://blog.golang.org/ismmkeynote

1. 1

Google seems to be the nursing home of many people that had one lucky idea 20 years ago and are content with riding on their fame til retirement, so “famous person X works on it” has not much meaning when associated with Google.

The Train GC was quite interesting at its time, but the “invention” of stack maps is just like the “invention” of UTF-8 … if it hadn’t been “invented” by random person A, it would have been invented by random person B a few weeks/months later.

Taking everything together, I’m rather unconvinced that Go’s GC will even remotely approach G1, ZGC’s, Shenandoah’s level of sophistication any time soon.

2. 3

For me it is kind of amusing that huge amounts of research and development went into the Hotspot GC but on the other hand there seem to be no sensible defaults because there is often the need to hand tune its parameters. In Go I don’t have to jump through those hoops, and I’m not advised to, but still get very good performance characteristics, at least comparable to (in my humble opinion even better) than for a lot of Java applications.

1. 12

On the contrary, most Java applications don’t need to be tuned and the default GC ergonomics are just fine. For the G1 collector (introduced in 2009 a few months before Go and made the default a year ago), setting the JVM’s heap size is enough for pretty much all workloads except for those which have always been challenging for garbage collected languages—large, dense reference graphs.

The advantages Go has for those workloads are non-scalar value types and excellent tooling for optimizing memory allocation, not a magic garbage collector.

(Also, to clarify — HotSpot is generally used to refer to Oracle’s JIT VM, not its garbage collection architecture.)

1. 1

Thank you for the clarification.

3. 2

I had the same impression while reading the article, although I also don’t know that much about GC.

1. 16

Some of these are really unsurprising. I mean, it’s a “WAT” that Go’s variables are block scoped? Really?

I think this kind of “look out for these slightly unexpected behaviors” talk is a good thing, but trying to liken them to Gary Bernhardt’s WAT talk seems a bit disingenuous.

1. 4

So I don’t know whether that is intense or not, but most of the content is covered very clearly in the Go spec, which is not exactly big, as far as specs go…. so yeah. This.

1. 2

I mean, it’s a “WAT” that Go’s variables are block scoped? Really?

It’s not just block scope, it’s block scope plus shadowing, plus allowing you to declare and assign a visible variable using the value from the variable that you’re shadowing. You could have block scope without allowing either of the latter two choices.

Java doesn’t let you do this, even though it has block scope. The second declaration of a fails. I was mildly surprised that C lets you do the same thing, though given the origins of go, I shouldn’t have been.

1. 1

I think the link is more an appeal to popular culture.

Either way, WAT8 was pretty surprising for me. I would never expect that to happen, that seems more like a quirk of code generation than an actual intentional feature.

1. 2

I use this feature a lot. I think the article is spot on about the problems with the feature, but some people might be imagining it as more vigorous/violent than it is.

It’s hard to communicate this, but I find it’s requires substantially less speed than most people would put into shaking dice in their hand before rolling them. I’m thinking of people who roll dice one handed, not the people who go wild with two hands, like they’re shaking a cocktail.

1. 9

because it forces you to use a phone number, the author works for facebook and recommends against GPG, and they did not want people to use the F-droid free appstore and they are against federating with people hosting their own server

1. 8

Who works for Facebook? Not Moxie, as far as I know.

1. 5

moxie used to work for twitter, but no longer.

1. 5

Meta-point: it is frustrating that despite the importance of this issue (a bad password manager can make you less secure), your options for getting answers are:

1. Teach yourself enough crypto and security engineering to look for vulnerabilities in the code.
2. Follow a bunch of security experts on twitter/hacker news/elsewhere, and hope that they’ll say positive or negative things about various password managers.
3. Ask on a forum, and pray that the right people will actually answer.

Someone needs to consolidate this information and keep it up to date (I don’t think it should be me. I can probably handle point 1, but don’t know enough about the area to trust my ability to adequately synthesize the material and relay it).

1. 1

Someone needs to consolidate this information and keep it up to date

I agree. I was saving threads I saw to help with that. Then my bookmarks started disappearing (overflowing?). Anyway, I’m keeping the idea in mind since I’ll probably use one or more people’s advice myself in near future.

1. 1

Reordered my steps. What I can do is 2, not 1.

1. 8

I’ve been happy-enough with LastPass - I can’t point to any reason beyond inertia, so really what I’m curious about in this thread: are there any significant differentiators that could sway a person to switch?

1. 7

A big reason for me would be moving away from proprietary stuff to secure my passwords

1. 5

To my knowledge at least by staying mainstream there’s a team of individuals working on the product. Ive used LastPass for years, and while there have been issues in the past … There is a large userbase and community scrutinizing it.

Going the self hosted route negates alot of the large community, and trail by fire already accrued by legacy solutions like LastPass.

They also provide an export mechanism …

2. 4

I’ve stuck with LastPass for a while. AFAIK, no security issues that I’ve judged to be significant. I appreciate that, compared to the other solutions that I know of, it seems to be widely compatible and simple to use on all platforms.

Only minor beef that I have is that the browser plugins, or at least the Chrome one, seems to have gotten slower and a little bit buggier over time instead of better and faster.

1. 1

I use LastPass, but am not happy with it, as in the past, it had some pretty serious security issues:

I would switch to 1Password, but it does not have linux support (edit: it has a browser extension for linux, which is suboptimal, but probably better than Lastpass). I’ve almost talked myself into switching to Keepass, but I’ll have to find out how trustworthy the iOS version is.

1. 0

I think this article has a small misconception about vgo (aka go modules): it doesn’t take the minimum version. go get always downloads the latest version. Thereafter the MVS algorithm picks the maximum of all the constraints.

EDIT: also I notice that it confuses the terms minimal and minimum. The Go algorithm is minimal because Russ Cox feels that nothing else can be taken away.

1. 3

“The key to minimal version selection is its preference for the minimum allowed version of a module.” –Russ Cox

The maximum of the values of the constraints is the minimum of the versions allowed by the constraints.

It is more clear to call it the minimum, since the algorithm gives lower and lower versions as constraints are removed–it can only be pushed towards higher values by adding constraints. Conversely, the cargo algorithm “wants” the maximum version, and can only be dissuaded from it by adding constraints (or lockfiles).

1. 2

It does take the minimum version. Yes, the name is minimal not minimum, but one of the property of that minimal algorithm is that it takes the minimum version.

An example should be clarifying. B is available in version from 1.0 to 1.10. A declares dependency on B >= 1.5. vgo resolves B 1.5, Cargo (and other package managers) resolves B 1.10.

1. 1

Yep, I understand that. My point was that if A requires 1.5, C requires 1.2 and D requires 1.6 then the maximum of those is selected, i.e. 1.6. This has the side effect of requiring a deliberate upgrade act to get version 1.10. However the benefit is that if I run the resolution algorithm today then you run it next week when version 1.11 is released, we both get exactly the same set of dependencies and can reproduce one another’s builds.

1. 2

Yes, I think we are all in agreement about what happens. The question is whether it is good. The drawback of vgo argued in the article is that B will inevitably get bug reports for 1.6 already fixed in 1.10. Another is that real world testing of B is spread along all versions from 1.0 to 1.10, while in Cargo most testing is against 1.10 while 1.10 is the latest.

Cargo (and other package managers) solve reproducibility with lockfile. Lockfile is admittedly not “minimal”, but apart from minimality it solves technical problem equally well.

1. 8

I usually link to Betteridge’s Law when I write a post like this, but didn’t this time.

Apparently a significant portion of people found the title to be clickbait-y, but I thought it was a pretty straightforward question. Oh well!

1. 6

This knee-jerk reaction against “clickbait” kind of annoys me. Imo there is nothing wrong with an article having a title that attempts to engage a reader and pique their interest. I would also much rather a title pose a question and answer it in the article, rather than containing the answer in the title itself. (The latter can lead to people just reading the title and missing any nuance the article conveys).

1. 7

I agree. Clickbait really implies that the article has no meaningful content. If the article is actually worth reading, it’s not clickbait, it’s catchy.

2. 1

It’s a fine title, imo. Maybe there’s a better one possible, but it’s fine.

1. 2

“WebAssembly is not the return of Java Applets and Flash.”

Edit: I did enjoy the article, however.

Edit2: As site comment:

I had no idea what the “kudos” widget was, moved my mouse to it, saw some animation happening, and realized I just “upvoted” a random article, with no way to undo it. Wondeful design. >.<

1. 1

That’s fine, and probably an improvement, but worth a correction? I don’t really think so.

1. 3

It might work up to a certain point for buyers who otherwise would buy proprietary software. Their EULA’s are already ridiculous. I’ll note that the military has been known to sometimes just steal stuff if they need it. Here’s Army and Navy examples. In theory, they can make it classified, too, to try to block you proving it in court. At that point, you’re trying to beat them with DRM plus online, license checks to reduce the odds of that. That annoys regular customers, though.

This seems most doable with a SaaS solution.

1. 3

exactly, the freedom to study and share is a pre-sales experience.

1. 1

A case where the government settled for \$50 million is a bit ambiguous–they suffered a consequence for that theft. If this license led the military to make regular payouts for violating licenses, I would count that as a partial success.

1. 1

That was a case where they got caught. Most acts of piracy don’t get caught. More likely in organizations where it’s illegal to even discuss what they’re doing.

1. 14

This idea comes up from time to time. It’s an old idea. Here are two rms articles that address it.

https://www.gnu.org/philosophy/programs-must-not-limit-freedom-to-run.html

Basically: if you’re evil enough to do evil stuff, violating copyright is something you won’t think is very evil at all. So even without the argument about how impossible it is to define evil, a copyright-based license isn’t going to stop anyone from doing evil.

1. 5

I don’t think this holds up in countries with established rule of law. It’s easy to forget that you can sue the government in court in the US and then the government will stop (at least most of the time). It’s just the overton window has shifted so much that we only think of “evil” in terms of things that don’t happen in this day and age, when terrible things are happening all the time and continue to be enabled by technology.

If there’s anything that unities most people, it’s the fear of having all their assets frozen. And the spectrum of evil stops way before “evil mastermind with 1000 offshore accounts and 20 fake identies”.

1. 4

I agree. Julian Sanchez made this point about the NSA/CIA recently, that the bulk of their abuses of power inside the USA are either legal. If they’re not obviously legal, they often fall into a legal grey area, but the right person in the chain of command said that they were legal.

Career government officials tend to have a habit of following most rules of the organizations they inhabit, but may do a lot of shady things that don’t obviously violate those rules.

1. 1

I think I should have read my own links. rms also argues that such restrictions on use based on copyright are likely unenforcible. I don’t recall ever reading about a case where someone violated a license’s conditions on usage (e.g. using Java in a nuclear reactor) and was thus found to be violating copyright. Has that happened?

1. 1

Also: trying to sue the US government for copyright infringement because they used some software to facilitate torture (for example) doesn’t seem to me like a fruitful approach. Maybe with some optimism something could be done about human rights abuses in the US, but going the copyright infringement path doesn’t seem likely to work.

2. 2

Glad to hear it’s been addressed already by gnu. I was thinking along similar lines, like “Eh, I see the problem, but I don’t think giving up freedom zero is the answer.” Of course, I don’t have a good solution either, other than a better more democratic government with well-informed citizens and a functioning justice system.

1. 3

Cool (and encouraging) analysis. Does this account for putting extensions in the default-extensions in a .cabal file?

1. 3

It doesn’t. I’d expect at least the relative frequency to be similar, but this could very well increase how frequently extensions pop up. But I think that it’s probably much more common to enable extensions on a file-by-file basis than a project basis.

It’d be cool to see what effect adding those back has, but it’s probably not doable with the GitHub API.

1. 5

I wouldn’t be surprised if adding it in pushed the most popular ones higher, because why worry about which files need OverloadedStrings? But that’s just a random guess, and it may be that not enough people use the feature for it to matter.

1. 4

I should probably spend my free time this week on catching up on the prolog course.

However, this weekend, I hacked together a tool to use code coverage/property based testing tools to show you input-output pairs that take different paths through code (https://github.com/hyperpape/QuickTheories). I’d like to implement shrinking, and I’m also poking around at symbolic execution, as it seems like that’s the right way to implement a robust version of the tool. I’d also like to create an IDE plugin that lets me trigger this for methods in my code, and see if it’s as helpful as I imagine it being.

Right now at work: I don’t know what I’ll be working on before the day starts…

1. 17

I’ve heard the “binary logs are evil!!!” mantra chanted against systemd so many times that it wasn’t funny anymore. It’s a terrible argument. With so many big players putting their logs into databases, the popularity of the ELK stack, it is pretty clear that storing logs in non-plaintext format works. Way back in 2015, I wrote two blog posts about the topic.

The gist of it is that binary logs can be awesome, if put to good use. That the journald is not the best tool is another matter, but journald being buggy doesn’t mean binary logs are bad. It just means that the journald is possibly not the most carefully engineered thing out there. There are many things to criticize about systemd and the journal, and they both have their fair share of issues, but binary storage of logs is not one of them.

1. 10

Okay, so can we just assume all complaints about “binary logs” are just about these binary logs and get on with things?

The journald/systemd people don’t act like they have any clue what’s going on in the real world: people can’t use the tools they used to, and these tools evidently suck; Plain text sucked less, so what’s the plan to get anything better?

1. 8

I don’t think that’s entirely reasonable. It’s converting a complaint about principle (“don’t do binary logs”) into a complaint about practice, and that makes a big difference. If journald is a bad implementation of an ok idea, that requires very different steps to fix than if it’s a fundamentally bad idea.

What you’re describing makes sense for people on the systemd project to say (“woah, people hate our binary logs, maybe we should work on them”[0]), but not for the rest of us trying to understand things.

[0] I fear they’re not saying that, as they seem somewhat impervious to feedback

1. 2

I feel like @geocar is against binary logs as a source format, but not as an intermediate or analytics format. Even if your application uses structured logging, it can still be stored in a text file, for example as JSON, at the source. It can be converted to a binary log later in the chain, for example on a centralized logging server, using ELK, SQL, MongoDB, Splunk or whatever. The benefit is that you keep a lot of flexibility at the source (in terms of supporting multiple formats depending on the source application) and are still able to go back to the plain text log if you encounter a problem.

1. 4

I’m not even against binary logs “as a source format.”

Firstly: I recognise that “complaints about binary logs” is directed at journald and isn’t the same thing about complaints about logs in some non-text format.

I think getting systemd in deep forced sysadmins to retool on top of journald and that hurt a lot for so very little gain (if there was any gain at all- and for most workflows I suspect there wasn’t). This has almost certainly put people off of binary logs, and has almost certainly got people complaining about binary logs.

To that end: I don’t think those feelings around binary logs are misplaced.

Some humility is [going to be] required when trying to win people over with binary logs, but appropriating the term “binary logs” to include tools the sysadmin chooses is like pulling the rug out from under somebody, and that’s not helping.

1. 2

Thank you very much for clarifying. I agree that forcing sysadmin “to retool on top of journald” hurts.

2. 2

No, it’s recognising that when enough people are complaining about “the wrong thing”, telling them it’s the wrong thing doesn’t help them. It just causes them to dig in.

What’s the right thing?

I think that’s the point of the bug…

3. 1

Okay, so can we just assume all complaints about “binary logs” are just about these binary logs and get on with things?

As soon as the complaints start to be about journald and not “binary logs”, and the distinction is made explicit, yeah, we can. It’s been four years, so I’m not going to hold my breath.

and these tools evidently suck

For a lot of use cases, they do not suck. For many, they are a vast improvement over text logs.

what’s the plan to get anything better?

Stop logging unstructured text to syslog or stdout, and either log to files or to a database directly. Pretty much what you’ve been (or should have been) doing the past few decades, because both syslog and stdout are terrible interfaces for logs.

1. 9

As soon as the complaints start to be about journald and not “binary logs”, and the distinction is made explicit, yeah, we can. It’s been four years, so I’m not going to hold my breath.

People complain about things that hurt, and between Windows and journald it should not be a surprise that “binary logs” is getting the flak. journald has a lot of outreach work to do if they want to fix it.

For a lot of use cases, [the tools] do not suck. For many, they are a vast improvement over text logs.

And yet when programmers make mistakes implementing them, the sysadmin are left cleaning up after them.

Text logs have the massive material advantage that the sysadmin can do something with them. Binary logs need tools to do things, and the journald implementation has a lot of work to do.

Most of the “big players” use a transparent structuring layer rather than making binary logs their golden source of knowledge. This allows people to get a lot of the advantages of binary logs with few disadvantages (and given how cheap disk is, the price is basically zero).

Stop logging unstructured text to syslog or stdout, and either log to files or to a database directly. Pretty much what you’ve been (or should have been) doing the past few decades, because both syslog and stdout are terrible interfaces for logs.

These are directions to developers, not to sysadmins. Sysadmins are the ones complaining.

Are we really to interpret this as refuse to install any software that doesn’t follow this rule?

I’m willing to whack some perl together to get the text log data queryable for my business, but you give me a binary turd I need tools and documentation and advice.

1. 4

Most of the “big players” use a transparent structuring layer rather than making binary logs their golden source of knowledge.

What do you mean by a “transparent structuring layer”?

1. 2

Something to structure the plain text logs into some tagged format (like JSON or protocol buffers).

Splunk e.g. lets users create a bunch of regular expressions to create these tags.

1. 2

Got it now. Thanks for clarifying!

2. 0

Text logs have the massive material advantage that the sysadmin can do something with them. Binary logs need tools to do things, and the journald implementation has a lot of work to do.

For some values of “can do”, yes. Most traditional text logs are terrible to work with (see my linked blog posts, not going to repeat them here, again). Besides, as long as your journal files aren’t corrupt (which happens less and less often these days, I’m told), you can just use journalctl to dump the entire thing, and grep in the logs, just like you grep in text files. Or filter them first, or dump in JSON and use jq, and so on. Plenty of options there.

Most of the “big players” use a transparent structuring layer rather than making binary logs their golden source of knowledge.

Clearly our experience differs. Most syslog-ng PE customers (and customers of related products) made binary logs (either PE’s LogStore, or an SQL database) their golden source of knowledge. A lot of startups - and bigger businesses - outsourced their logging to services like loggly, which are a black box like binary logs.

These are directions to developers, not to sysadmins. Sysadmins are the ones complaining.

These are directions to sysadmins too. The majority of daemons support logging to files, or use a logging framework where you can set them up to log directly to a central collector, or to a database directly. For a huge list of applications, bypassing syslog has been there since day one. Apache, Nginx, pretty much any Java application can all do this, just to name a few things. There are some notable exceptions such as postfix which will always use syslog, but there are ways around that too.

You can bypass the journal with most applications, some support that easily, some require a bit more work, but it has been doable by sysadmins all these years. I know, because I’ve done it without modifying any code.

I’m willing to whack some perl together to get the text log data queryable for my business, but you give me a binary turd I need tools and documentation and advice.

With the journal, you have journalctl, which is quite well documented.

1. 2

Clearly our experience differs. Most syslog-ng PE customers…

Do you believe that syslog-ng has even significant market share of users responsible for logging? Even excluding SMB/VSMB?

outsourced their logging to services like loggly, which are a black box like binary logs.

I would be surprised to find that most people that use loggly don’t keep any local syslog files.

What exactly are you arguing here?

Plenty of options there.

And?

You can bypass the journal with most applications, some support that easily, some require a bit more work, but it has been doable by sysadmins all these years. I know, because I’ve done it without modifying any code.

Right, and the goal is to get people using journald right?

If journald doesn’t want to be used, what it’s reason for existing?

1. 0

Do you believe that syslog-ng has even significant market share of users responsible for logging? Even excluding SMB/VSMB?

Yes.

I would be surprised to find that most people that use loggly don’t keep any local syslog files.

Most I’ve seen only keep local logs because they’re too lazy to clean them up, and just leave them to the default logrotate. In the past… six or so years, all loggly (& similar) users I worked with, never looked at their text logs, if they had any to begin with.

Right, and the goal is to get people using journald right?

For systemd developers, perhaps. I’m not one of them. I don’t mind the journal, because it’s been working fine for my needs. The goal is to show that you can bypass it, if you don’t trust it. That you can get to a state where your logs are processed and stored efficiently, in a way that is easy to work with - easier than plain text files. Without using the journal. But with it, it may be slightly easier to get there, because you can skip the whole getting around it dance for those applications that insist on using syslog or stdout for logging.

1. 2

Do you believe that syslog-ng has even significant market share of users responsible for logging? Even excluding SMB/VSMB?

Yes.

I think you’re completely wrong.

There are a lot of Debian/RHEL/Ubuntu/*BSD (let alone Windows) machines out there, and they’re definitely not using syslog-ng by default…

Debian publishes install information: syslog-ng verus rsyslogd. It’s no contest.

A big bank I’m working with has zero: all rsyslogd or Windows.

Also, the world is moving to journald…

So, why exactly do you believe this?

In the past… six or so years, all loggly (& similar) users I worked with, never looked at their text logs, if they had any to begin with.

Most I’ve seen only keep local logs because they’re too lazy to clean them up, and just leave them to the default logrotate.

Okay, but why do you think this contradicts what I say?

You’re talking about people who have built a custom (text based!) logging system, streaming via the syslog protocol. The golden source was text files.

The goal is to show that you can bypass it, if you don’t trust it.

Ah well, this is a very different topic than what I’m replying to.

I can obviously bypass it by not using it.

I was simply trying to explain why people who complain about binary logging aren’t ignorant/crackpots, and are complaining about something important to them.

1. 1

I think you’re completely wrong.

I think I know better how many syslog-ng PE customers there are out there (FTR, I work at BalaBit, who make syslog-ng). It has a significant market share. Significant enough to be profitable (and growing), in an already crowded market.

A big bank I’m working with has zero: all rsyslogd or Windows.

…and we have big banks who run syslog-ng PE exclusively, and plenty of other customers, big and small.

Also, the world is moving to journald…

…and syslog-ng plays nicely with it, as does rsyslog. They nicely extend each other.

You’re talking about people who have built a custom (text based!) logging system, streaming via the syslog protocol. The golden source was text files.

I think we’re misunderstanding each other… What I consider the golden source may be very different from what you consider. For me, the golden source is what people use when they work with the logs. It may or may not be the original source of it.

I don’t care much about the original source (unless it is also what people query), because that’s just a technical detail. I don’t care much how logs get from one point to another (though I prefer protocols that can represent structured data better than the syslog protocol). I care about how logs are stored, and how they are queried. Everything else is there to serve this end goal.

Thus, if an application writes its logs to a text file, which I then process and ship to a data warehouse, I consider that to be binary logs, because that’s how it will ultimately end up as. Since this warehouse is the interface, the original source can be safely discarded, once it shipped. As such, I can’t consider those the golden source.

If we restricted “binary logs” to stuff that originated as binary from the application, then we should not consider the Journal to use binary logs either, because most of its sources (stdout and syslog) are text-based. If the Journal uses binary logs, then anything that stores logs as binary data should be treated the same. Therefore, everything that ends up in a database, ultimately makes use of binary logs. Even if their original form, or the transports they arrived there, were text.

(Transport and storage are two very different things, by the way.)

I was simply trying to explain why people who complain about binary logging aren’t ignorant/crackpots, and are complaining about something important to them.

I never said they are. All I said is that storing logs in binary is not inherently evil, linked to blog posts where I explain pretty much the same thing, and give examples for how binary storage of logs can improve one’s life. (Ok, I also asserted that syslog and stdout are terrible interfaces for logs, and I maintain that. This has nothing to do with text vs binary though - it is about free-form text being awful to work with; see the linked blog posts for a few examples why.)

1. 1

I think I know better how many syslog-ng PE customers there are out there

Or we just have different definitions of significant.

Significant enough to be profitable (and growing), in an already crowded market.

Look, I have an advertising business that makes enough money to be profitable, and is growing, but I’m not going to say I have a “significant” market share of the digital advertising business.

But whatever.

All I said is that storing logs in binary is not inherently evil

And I didn’t disagree with that.

If you try and re-read my comments knowing that, maybe it’ll be more clear what I’m actually pointing to.

At this point, we’re just talking past each other, and there’s no point in that.

2. 2

Thanks for linking to the blog posts, they were most informative.

1. 2

I think this could use more motivation. I have never written a TODO app in js, but if the library illustrates something valuable, that might not matter.

One note about code: render() has repeated calls to Object.keys(obj).

1. 1

Maybe “TODO” in the title is little unfortunate. It can be used for many things, render any kind of DOM elements. List, menus even single HTML elements. It’s like very small version of React or Vue.js with completely different architecture and logic.

1. 2

Yeah, TODO has an association with toy apps, so I’d drop that word. Looking back at the readme, you might not need any other changes.

1. 1

That is actually great suggestion!

1. 4

My experience is that this is just utterly wrong - I’m not even sure how to start to respond. Of course the best way to express a program fragment is a programming language. Of course the best way to think about a program is with a programming language. There is no distinction between programming and mathematics - of course you want to think mathematically about what you’re constructing, but the best languages for that are programming languages. Why would you want two subtly different descriptions of your program that need to be kept in sync when you could have one description of your program? Some programming languages distract from writing a good expression of your construction by making you specify irrelevant execution details, but the appropriate response is to avoid those languages. A good mathematical description of your algorithm is a program that implements your algorithm, given a decent compiler - and thankfully we’re good enough at those these days.

1. 9

Lobster’s own Hillel expressed it really well just a few days ago:

So many software development practices - TDD, type-first, design-by-contract, etc - are all reflections of the same core idea:

It’s reasonable to want that “plan ahead” stuff to be incorporated in the program (design by contract, tdd), but using an external plan can have a large chunk of the benefit.

1. 6

Maybe, but that’s not an argument for using a non-integrated, harder-to-check plan if you have the option of building the “plan” right into the program.

1. 4

Because there’s design tradeoffs in specification. Integration is a pretty big benefit but also a pretty big cost, often reducing your expressiveness (what you can say) and your legibility (what properties you can query). As a couple of examples, you can’t use integrable specifications to assert a global property spanning two independent programs. You also can’t distinguish between what are possible states of the system and what are valid states, or what behavioral properties must be satisfied.

2. 7

Strong disagree here; you get a lot more expressive power when using a specification language. Let me pose a challenge: given a MapReduce algorithm with N workers and 1 reducer, how do you specify the property “if at least one worker doesn’t crash or stall out, eventually the reducer obtains the correct answer”? In TLA+ it’d look something like this:

(\E w \in Workers: WF_vars(Work(w))) /\ WF_vars(Reducer)
=> <>[](reducer.result = ActualResult)


Why would you want two subtly different descriptions of your program that need to be kept in sync when you could have one description of your program?

I’ve written 100 line TLA+ specs that captured the behavior of 2000+ lines of Ruby. Keeping them in sync is not that hard.

1. 9

I’ve written 100 line TLA+ specs that captured the behavior of 2000+ lines of Ruby. Keeping them in sync is not that hard.

Keeping code in sync with comments that literally live along side them is even more “not that hard”, and yet fails to happen on an incredibly regular basis.

In my experience, in any given system where two programmer artifacts have to be kept in sync manually, they will inevitably fall out of sync, and the resulting conflicting information, and confusion or mistaken assumption of which one is correct, will result in bugs and other programmer errors impacting users. The solution is usually to either generate one artifact from the other, or try to restructure one artifact such that it obviates the need for the other.

1. 4

Keeping code in sync with comments that literally live along side them is even more “not that hard”, and yet fails to happen on an incredibly regular basis.

The difference is that if your code falls out of sync with your comments, your comments are wrong. But if your code falls out of sync with your formal spec, your code probably has a subtle bug. So there’s a lot more instutional pressure to update your spec when you update the code, just to make sure it still satisfies all of your properties.

The solution is usually to either generate one artifact from the other, or try to restructure one artifact such that it obviates the need for the other.

This has been a cultural problem with formal methods for a long time: people don’t value specifications that aren’t directly integrated into code. This has held the field back, because actually getting direct integration is really damn hard. It’s only in the past 15ish years that we’ve accepted that it’s alright to write specs that can’t generate code, and that’s why Alloy and TLA+ are becoming more popular now.

1. 6

What justifies that assumption. Some junior will inevitably, in response to some executive running in with their hair on fire over some “emergency”, alter the behaviour of the code to “get it done quick” and defer updating the spec until a “later” that may or may not ever arrive. Coming along and then altering the code to meet the spec then re-introduces the emergency situation.

The fundamental problem here is that you’ve created two sources of truth about what the application should be doing, and you cannot a priori conclude that one or the other is always the correct one.

1. 6

And what happens when that “emergency” fix loses your client data, or breaks your data structure, or ruins your consistency model, or violates your customer requirements, or melts your xbox, or drops completed jobs?

Yes, it’s true that sometimes the spec needs to be changed to match changing circumstances. It’s also seen again and again that specs catch serious bugs and that diverging from them can be seriously dangerous.

1. 2

And what happens when that “emergency” fix loses your client data, or breaks your data structure, or ruins your consistency model, or violates your customer requirements, or melts your xbox, or drops completed jobs?

Nobody’s arguing that the spec is useless, just that the reality is that it does introduce risks that require care and attention and which cannot be handwaved away with “keeping them in sync is not that hard” because sync issues will bite organizations in the ass.

2. 2

It’s only in the past 15ish years that we’ve accepted that it’s alright to write specs that can’t generate code, and that’s why Alloy and TLA+ are becoming more popular now.

It would be very helpful if you could at least generate test cases from those specs though. But then that’s why I work on a model based testing tool ;)

1. 2

Which one?

1. 3

Proprietary of my employer (Axini). Based on symbolic transition systems, a Promela and LOTOS inspired modeling language and the ioco conformance relation. Related open source tools are TorX/JTorX/TorXakis. Our long term goal is model checking, but we believe model based testing is a good (necessary?) intermediate step to convince the industry of the added value by providing a way where formal modeling can directly help them test their software more thoroughly.

1. 2

Really neat stuff. Thanks. I’ll try to keep Axini in mind if people ask about companies to check out.

1. 2

Thanks, I’ve also been regularly forwarding articles and comments by you to colleagues :)

1. 1

Cool! Glad thry might be helpig yall out.:)

3. 2

This was a problem in high-assurance systems. All write-ups indicated it takes discipline. That’s no surprise given that’s what good systems take to build regardless of method. Many used tools like Framemaker to keep it all together. That said, about every case study I can remember found errors via the formal specs. Whether it was easy or not, they all thought they were beneficial for software quality. It was formal proof that varied considerably in cost and utility.

In Cleanroom, they use semi-formal specs meant for human eyes that are embedded right into the code as comments. There was tooling from commercial suppliers to make its process easier. Eiffel’s Design-by-Contract kept theirs in the code as well with EiffelStudio layering benefits on top of that like test generation. Same with SPARK. The coder that doesn’t change specs with code or vice versa at that point is likely just being lazy.

2. 3

Why would you want two subtly different descriptions of your program that need to be kept in sync when you could have one description of your program?

For example to increase the number and variety of reviewers and thus reducing bugs.

A good mathematical description of your algorithm is a program that implements your algorithm

Are you thinking of a specific compiler? I agree that programmers think mathematically even when they use programming languages to express their reasoning, but I still feel some “impedence” in every language I use.

1. 3

For example to increase the number and variety of reviewers and thus reducing bugs.

That seems very unlikely though. Even obscure programming languages are better-known than TLA+. More generally I can’t imagine getting valuable input on this kind of subject from anyone who wasn’t capable of understanding a programming language. I find the likes of Cucumber to be worse than useless, and the theoretical rationale for those is stronger since test cases seem further away from the essence of the program than program analysis is.

Are you thinking of a specific compiler? I agree that programmers think mathematically even when they use programming languages to express their reasoning, but I still feel some “impedence” in every language I use.

I mostly work in Scala so I guess that influences my thoughts. There are certainly improvements to the language that I can imagine, but not enough to be worth using something other than the official language when communicating with other people.

1. 1

Somewhere out there, there is an exchange where Matthias Felleisen says something along the lines of dynamic languages allow a programmer to fruitfully apply two mutually incompatible, or at least hard to reconcile, typing disciplines to a program at once. I thought it was on Bob Harper’s blog, but I can’t find the source.