TL;DR Lobbying groups are pushing for new standards and “certifications”, so that they can sell audits, or certify audit resellers. Most CTOs are happy because there’s the word cyber in there too, and it even talks about Linux, so it must be important.
The security audit industry isn’t exactly looking for more work at the moment. I think what we’re seeing is quite the opposite; there’s no Volkswagen/PSA-sized European software company giving politicians their marching orders, giving them the impression that the economic cost of implementing these directives will mostly be borne by the US and China.
They’re not considering that the multi-million dollar US-made office product they’re using might have a critical dependency on a library made by a 27 year-old living in a small apartment in Riga. And if they are, they probably see it as something that should be outlawed.
They’re not considering that a lot of employment in the software industry (especially in the EU) isn’t found in the offices of multinationals making consumer apps that are used by billions, but in small shops producing tailor-made software for a specific sector of industry or country.
The security audit industry isn’t exactly looking for more work at the moment.
Clearly, a remarkable fact about the tech industry is that it is content with a paced growth rate, and not looking for ways to make the line go up faster…
This act will mainly make vendors slap another badge on their website to prove that they have jumped through various hoops and filed security compliance forms. That is is about “cyber” security is almost a detail…
Security audits (currently) don’t scale the way the rest of the tech industry does; Automated scanning tools exist, but as far as I understand, a lot of the auditing process is fairly labour-intensive work.
I think we can count ourselves lucky if this act amounts to “just” another badge to slap on a website.
Xcode does. Visual Studio (code & regular) do. All the IntelliJ suite do. The Java JDK has telemetry but I think it’s opt-in.
It’s a very normal feature that’s really useful for people who work on tools to know that they’re investing in the right places.
The difference is that Go proposed something transparent rather than obscured and at the whims of large companies.
But we can’t have nice things.
It’s a very normal feature
It isn’t. You just gave a few examples, which means that most compilers and most interpreters, such as Python, GCC, LLVM/Clang, Perl, PHP, Ruby, Tcl, D, SBCL, CLisp, and so on, do no such thing, and feel no need to. Trying to normalize it is creepy, and trying to do so by merely stating that it’s normal is really something else.
It is a normal thing for proprietary software. I think that is one of the driving factors making this controversial: Golang is ostensibly an open-source platform, but that brings expectations that are sometimes at odds with its historical origin as a Google initiative.
The informal, soft power companies can have over open-source technologies that people depend on creates resentment.
Yeah, I read those last few comments, and which compilers had telemetry, and I think you’ve hit the nail on the head. Go-with-telemetry has to be considered a proprietary platform in a way that go-without-telemetry doesn’t.
Careful how you use ‘proprietary’ here, I’m sure some pedant somewhere would point out that the license is still OSI. However, governance has always been a blind spot of open licensing, and that is where this issue falls.
All of those, with the exception of the JDK, are IDEs. They are not compilers.
It’s somewhat defensible to have telemetry in an IDE, and as far as I’m aware, IntelliJ and Visual Studio both asked me before collecting it.
The reasons they give for wanting telemetry in the Go compiler – the public reasons, notwithstanding any private reasons that we don’t know – are weak at best, and just serve to reinforce the reasons I dislike Go at worst.
For example, tracking what platforms people compile for. Why not just let the community maintain ports? It amazes me that LLVM can manage to have community-built and driven ports to the M68K platform, despite LLVM being a significantly more complex codebase than Go. Yet, Go won’t even let users of Power support ISAs lower than Power8. Even when the community gave them PRs and offered CI, they refused it! Large commercial customers using Go on Power7/AIX were even told to pound sand, let alone those of us trying to run Linux workloads on older Power hardware.
I don’t know what Go compiler authors want telemetry for, but as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write, and be able to use that to improve error messages and/or think about how to avoid common mistakes. It is easy to find valid code in my language over the internet, but people almost never commit code that does not compile. All the intermediate states that are ill-parsed or ill-typed, but people wrote because it felt natural, this is what I would love to have access to. Of course this could be opt-in, and a good rule of thumb would be to only collect this for projects that are already publicly available – to make sure that there are as few privacy concerns as possible.
I thought of a design once: have the compiler create a git repository somewhere on user machines (one git repository per project in the language), and then commit incorrect/invalid files there on each failed compile. Once in a while, show users a message saying: “hey, would you like to send your git repo to for us to look at your errors and improve the compiler?”. (In particular, users decide to send their data over the network, and it is in a format that they can easily inspect to make sure they are okay with its content.)
I don’t know what Go compiler authors want telemetry for, but as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write, and be able to use that to improve error messages and/or think about how to avoid common mistakes.
And this is a reason people run screaming away from telemetry, even if it’s arguably well-intentioned: If my compiler is sending my code to some other entity, that code can be used against me in a court of law. Am I writing encryption code? Am I writing code some AI flags as detrimental to some DRM scheme? It’s impossible to tell what could happen, potentially, and some of the scenarios are so horrible they outweigh the potential good.
I brought this up in the GitHub discussion and here, but got shouted down and silenced quite effectively.
Which is a bit suspicious.
as a compiler maintainer I would certainly be interested in some telemetry-style data to see the wrong code that people write
So do I, but I’d hate to actually be responsible for processing it. People accidentally paste a load of sensitive things into code, or write almost-correct commercially sensitive code all the time. The only way I’d be happy collecting this would be to have a local service trying to extract representative samples and then an explicit manual step for users to approve uploading them.
Yes, see the design I sketched above with local git repositories:
Unfortunately your idea of “representative samples” sounds in fact very very hard to do right. In general I don’t know what I’m looking for in this data yet, my queries may change over time, and I don’t know how to summarize it in a way that remains useful. There has been work on automatically minimizing buggy code, and we could consider doing it, but minimizing is compute-intensive (so would users really want that?). I think that for parsing errors, one could design specific summary formats that would make sense. For typing errors it is much harder in general (but doable, keyword is “type error slicing”), and understanding a typing error usually benefit from being able to build the affected parts of the project and also understanding the code, minimization could easily prevent that. And for “what kind of bugs do people introduce in their code that passes the type-checker but is caught by the testsuite”, automatically minimizing that in a way that does not hamper our ability to analyze errors and turn them into language/tooling design feedback, well, that sounds like a research project of its own (starting on the existing work on program slicing, but in a way that should preserve useful context for code comprehension).
And I think that for the people worried that their code could contain something very problematic, minimization/summarization is not necessarily going to reassure them. They will disable any feeedback process in any case. Their choice! But so maybe working hard on summarization if the intent is to reassure people is not worth it. I think it is just easier to work with the other sort of people that, like, “would enjoy streaming their coding session anyway but they never bothered to set it up”.
I wonder if it would make sense to generate an “errors digest” or something, as build output, which can be optionally (but encouraged) to be committed directly to source, like a lockfile.
It’d have to be trivially merged by git, but the compiler itself could provide a merge tool and some registry somewhere where projects could opt-in, probably associated with a package manager.
Then full opt-out is “gitignore”,
the default is “people working on this project can use that telemetry”, since the tools are built in
and opt-in is “register my project with the telemetry scraper, which pulls from git”
I guess this doesn’t handle the “private code” argument, but it would allow for some of what you’re looking for I think, on a not per-user, but per-project basis, which I think helps the PII argument.
Yes, committing errors into the project proper is a one way to go about it, if we can make the UI unobtrusive enough.
(This does not handle “private code” but I think that it is realistic to assume that, fundamentally, if people are not willing to make their code public, they probably don’t want to export telemetry information about it either unless you force them to. Luckily there are many programmers willing to write public code.)
Anecdotes about stolen backpacks in high school are cute, but how is this different from the many other redis-backed locks on rubygems.org, or at least the most popular, redlock?
Redlock is distributed. You use multiple Redi’ to decide who gets the lock, similar to Paxos or Raft. Most people don’t need this complexity.
Simple mutexes like this one use a single Redis and aren’t fault tolerant if Redis dies.
Thanks Mike, that’s exactly what the decision was here. I needed a simple mutex to reach for, didn’t want to manage multiple Redi for this. Wasn’t something I was worried about it the single Redis went down, etc. So made something I thought was easy to grok and maintain.
Google ’s justification, that the cost is too high, does not seem entirely honest from the outside. Their proxy is broken if it needs several full git clones daily.
Sourcehut didn’t apply for delisting when offered, which would have led them to the same state some 2 years ago… I guess they preferred serving the traffic all this time when it was already clear that Google wouldn’t lift a finger.
Both parties here show a great deal of entitlement, which is not helping when you’d want to root for the underdog.
Do you mean SourceHut’s? SourceHut is complaining that Google’s proxy is causing them high cost.
And how the hell is SourceHut showing “entitlement” here? Not dosing another server with your huge enterprise pipe pointlessly cause you can’t be bothered to code the most basic of internal synchronization is not entitlement, it’s very basic etiquette.
They offered to stop 2 years ago and he refused out of principle. You may care or not care about the ideology he’s espousing but it is disingenuous to say they didn’t offer to solve the problem in a way that didn’t harm end users.
They offered to stop DDOSing the official instance only, not other instances, so that wouldn’t really solve the problem.
You should not have to opt out of denial of service attacks.
When you get immense traffic and the source of the traffic offers to exempt you specifically from the massive demand their bad code is creating for no good reason, then it is morally righteous to take a stand and say “no. fix your shit.”
Sourcehut did not suffer from a DOS attack from Google though. They complain about excessive traffic, and have been able to sustain it for at least 2 years.
I’m sure they deserve being called “morally righteous” though. 🤣
How dare he want a solution that would benefit all people using his software and not just him! So entitled!
When someone [Google] offers you a solution to a particular issue, but you refuse it for reasons, and instead require them to fix the issue in a different manner that better suits you. I call that entitlement.
I’d add self-righteousness to the mix for ranting about how bad they’ve been to you, and the rest of the Internet, and how you’re forced to take a solution that may be detrimental to your users, instead of trying to find a middle ground.
Luckily, sr.ht managed to implement account deletion (it only took 2 years).
If someone was periodically breaking into your house and offered to stop if you write your address on a list, would you consider that fair?
Is that a fair analogy though? I’m not trying to defend Google, they have a problem and are not solving it, but internet sites and houses aren’t really the same, especially in this case where they both run publicly available services.
Perhaps a business analogy would work better.
If you run a small shoe store, with only your spouse and you working, and Nike kept calling and asking if you have time for repairing some shoes, “just in case someone needed some repairs”, that would be closer I think? I’m probably just overthinking it.
Personally I agree that yes, the Google is acting entitled here, and Drew, he’s not exactly entitled, but he is ignoring reality.
I know I’m mostly wrong, but it does look like this from the sidelines: “Sorry we’re doing an inconvenience. Want us to avoid bombing your site?” - “No, I don’t, I want you to accept the same ideology as I have.”
Is it weird that I’m hoping the source code is all made public? It’s so hard to find modern enterprise production codebases; most people only have access to the code from whatever company they work for. I’d like to study it!
You wouldn’t steal a car
You wouldn’t steal a handbag
You wouldn’t steal a television
You wouldn’t steal a movie
Downloading pirated source code is stealing,
stealing is against the law,
PIRACY. IT’S A CRIME
No, it’s not weird ; I’m mildly curious too. I’d be surprised if it’s made public very soon ; it’s probably for sale somewhere… :)
Depending on what you’re looking for it might be not too hard to come by.
A while ago almost all Twitch source code was leaked. 125 Gb. Mostly Ruby/Rails, IIRC.
GitLab is pretty much completely open source.
GitHub’source protection used to be (it probably still is but it used to be, too) token. Google/duck/whatever: deobfuscate GitHub Enterprise. A relatively clean bog standard Rails app.
Back in the days almost complete Windows source code was leaked. I’m sure it’s still out there somewhere if you’re into that sort of things.
A few AAA games with source were leaked, too.
I’m sure this list is incomplete.
The blockchain will finally be powered by AI, and will make Web 3.0 an attainable reality. This will revolutionize the way that we interact with people online, and accelerate the Fortran movement of virtual WASM desktops on OpenBSD.
(Feel free to help me train the next generation of ChatGPT in replies.)
Isn’t filtering based on the name of the executable pretty naive? In a real setup you’d probably add a wildcard allow for stuff like Firefox, curl, etc. A malicious program could then just use them to access the network, bypassing mighty-snitch. A few years back I used to use simplewall, it had the same problem.
If you want to filter network requests by program, then one better approach could be to only allow network access from processes launched by some magic wrapper program, which’d ask for your permission every time.
For example - instead of running firefox
you’d run wrapper firefox
, which would ask for your permission, and then run Firefox with full network access. Similarly, if you were doing anything network-related on the terminal, you’d run wrapper sh
to give yourself network access, without letting unrelated processes abuse it in the meantime.
absolutely, but it’s much better than nothing. i don’t wildcard anything, though i do wildcard subdomains a lot, especially for firefox.
full address wildcard exists because for a lot of people they might not use a snitch without it. we have to cater to convenience or or no one will use security enhancing software. they can understand the tradeoffs and improve their usage over time. or not. blocking ads and trackers is still good even for someone with very limited security needs.
firefox doesn’t typically have a cmdline, but curl does. so a rule for curl can be specific to: curl -v google.com. wildcard address with wildcard cmdline is probably not great.
the possible duration of a rule is 1-minute, 24-hour, and forever. more secure on the left, more convenient on the right. i’m experimenting with more liberal use of non-forever durations. permanent rules considered harmful.
a snitch isn’t going to make you perfectly secure. who can write to the rules file? who can modify binaries specified in rules? who owns a domain you’ve whitelisted?
a snitch should help you observe and consider unusual network activity. a snitch may help you prevent a malicious program from functioning. most programs should not be making network requests. most network requests should be to domains that make sense. most network requests should be at times that make sense. the rest get an eyebrow raise, some consideration, and a block.
future work is securing the filesystem and using the checksum of binaries as a part of the rule. finding a way to make firefox run with a distinct cmdline per url would also be good.
i explored including filesystem filtering in this snitch via lsm path hooks, but ended up dropping it for now. likely fuse is a better approach for this, but i’m undecided and don’t have a solution yet. i would like to know which binaries are reading my aws credentials file, and raise my eye brows accordingly.
a snitch isn’t going to make you perfectly secure. who can write to the rules file? who can modify binaries specified in rules? who owns a domain you’ve whitelisted?
I was mostly thinking about how a snitch could work in an otherwise secure system. Not that we really have any at the moment :(
i explored including filesystem filtering in this snitch via lsm path hooks, but ended up dropping it for now.
It would be wonderful if you eventually figured that out! Personally I think Linux is just too lax about security for something like that to be viable, but I’d love to be proven wrong!
I was mostly thinking about how a snitch could work in an otherwise secure system. Not that we really have any at the moment :(
lol, true. this is fine. kind of working snitch feels better than zero snitch. using my iphone feels like network roulette. yes, i do feel lucky.
on my github i have another project called tiny-snitch. it is otherwise identical except that it doesn’t know about exe/cmdline. the benefit of this is that it can run upstream, ie tiny-snitch runs on your wireguard server and your iphone/laptop/windows all tunnel through that. then it sends prompts to sms/signal/email/somewhere? upstream-snitch feels like a potentially good idea, but exe/cmdline capable local-snitch is so convenient.
It would be wonderful if you eventually figured that out! Personally I think Linux is just too lax about security for something like that to be viable, but I’d love to be proven wrong!
i haven’t published my attempt but it definitely kind of works via the lsm route. lsm has many path_* hooks. my take away was that it is very brittle and will take a long time to stabilize, if it even can. linux boots fine without network, not so much without filesystem.
moving secrets out of environment variables and into files guarded by a snitch feels like a good idea. my next attempt will be via fuse, which can access caller pid/tgid via fuse_get_context. i’m not sure snitch for the entire filesystem is a good idea, but for a single place it might be. there’s no place like ~/secure/*.
If you want to filter network requests by program, then one better approach could be to only allow network access from processes launched by some magic wrapper program, which’d ask for your permission every time. For example - instead of running firefox you’d run wrapper firefox (…)
That sounds a lot like firejail (and other implemtations of that idea of course).
while a good approach, it doesn’t help you when some malicious foss library drops an executable and crontab somewhere on your system. unless you firejail pid1, which does actually kind of work!
Did you look at the stdlib documentation, or are you looking for something else? Here is a short example from ziglearn.org.
If I’m not mistaken, apart from threads, zig also has some async features that should get a bit more love in the future™️.
I think that’s the first time I see communication from SH indicating a clear way to get your account deleted, by refusing the ToS change rather than providing a “delete my account” button. Good!
My bank does the same thing occasionally “here are the new ToS for your account, go away if you don’t like them”. :)
Best comment in the tracker so far:
I guess my question is: how do you gauge ecosystem interest? What are we (large and small entities alike) missing to signal that properly?
They actually did nothing of a sort.
We’ll know the real reasons from some biography to be published in 20-50 years from now.
I thought this Meta burn was pretty good too:
… this thread alone shows a great deal interest from some small time companies like adobe and intel, other smaller companies like facebook have expressed interest in it too.
I really dislike the cattle vs. pets analogy, because it reenforces a speciesist world view that sadly is very prevalent in almost all parts of the world.
I like it because it refers to a perspective shared among those who see it. Even those who disagree, understand the meanings behind it.
Ehhhhhh… I mostly agree in this instance, but this is unfortunately not a great line of argument to take in general. You can apply the argument to any kind of human discrimination in history, and it’s just as true. If you describe something as “X for white’s, not spic’s” people will generally understand the meanings behind it, but that doesn’t mean you couldn’t do better.
Reproducible and non-reproducible configurations. The key feature of cattle that we are trying to isolate is the ability to spawn new machines with a known-good configuration on a reliable basis; in other words, cattle are reproducible.
I really like that term. It also doesn’t take size into account so much and it leaves the question how these goals are achieved wide open.
Other than it is a really bad analogy in general. In most situations it’s just “now your cluster is the pet”. It’s also bad because it somehow makes it sound that holding cattle is easier than a pet.
On top of that it’s in the same line as tons of other lines meant to shut people up before arguing, so you don’t have to know what you are talking about.
A similar one is that complaining about the term serverless is like complaining about horseless carts, when in reality it’s more like calling a cab “carless”.
I really think those statements and analogies do the industry a huge disfavor and that they should be abandoned altogether. Not because analogies are always bad (they are pretty much always imprecise though), but because they don’t even serve the purpose of analogies which is explaining things well. We have good analogies in IT, that mostly work, from cryptographic keys, to files, directories (or folders if you are into that). What they have in common is that they explain something better than any technical terms. Cattle and pets don’t. They at best make bold claims about how things work, but tend to easily break no matter what direction you go. Think about protecting your cattle or your pet. How does analogy even work in terms of security, which is a big part. For files again, it works. Putting the file into the trash bin, and even retrieving it, emptying it, all works pretty well. Also protecting your file works well with analogies.
I think the difference is that some of these “bad” analogies are being mostly used for marketing and like mentioned to bring points across when you lack good arguments.
This is for people who need to administer a handful of machines, all fairly different from each other and all Very Important. Those systems are not Cattle! They’re actually a bit more than Pets. They’re almost Family. For example: a laptop, workstation, and that personal tiny server in Sweden. They are all named after something dear.
They’re almost Family
Sounds like a good motivation to murder them personally and replace with Cattle provisioning.
Backups are great, until you want to upgrade to a new version of $software or $os. Then the backup needs to be applied, but are you sure each part needs to be there? Or that you didn’t miss something?
Additive configuration, like we use for cattle, will work when you change something underlying, like the OS.
FreeBSD and NixOS both let you keep the last old version around and reboot into it whenever you want. Others may or may not.
disagree, cattle techniques don’t mean you can’t have extensible config management
although Chef isn’t easy to learn for a lot of folks, i’m glad i already know it, it’s easier to see when you have exactly as much extensibility as you need in your config management and not more than that… just like writing good software
Slight disagree on the idea and completely disagree on the threshold. In my experience, cattle management is extremely worth it on anything above 1. Otherwise at some point a change will be made to one of the hosts but not the other, or multiple things will change out of order. It’s basically inevitable with enough employees and time.
For the idea itself, I’m finding it worth it to manage everything that way. After a disk failure I could rebuild my home server in minutes from the NixOS description, rather than trying to recover backups and figure out exactly how things were configured.
I’ve embraced this strategy as well now for at least the last decade, just swapping out nix for a config management system.
I keep backups of data (ex. Fileserver), but not system/program state. I could never go back, it feels wasteful of time and disk space now.
Maaaaaybe. Ansible does pretty well for me with about 4 pets of various kinds. Some effort goes into making sure they are all quite similar though: all run Debian, they’re all within one version of each other, all have the same copy of my home dir on it even if they only need a few bits of it, etc. Each just has their own little config file that layers atop that general setup.
Hard agree. And that’s what I like about this post. But I think having systems that are very easily replaceable pays off even at small scale. Like someone offering me 3 free months of hosting for my lone cloud server if I move to their platform.
Good thing we’re not relatives. /s :)
I’d also rather use the larger ops tools: if only because you’ve got more chances to encounter them elsewhere, and that’s knowledge you’ll be able to reuse. Pets would not work for me, but I’m sure it’ll be useful to someone else. I’ll stick to ansible playbooks for now.
Yeah, even if it’s one app, I’d rather make a terraform / ansible deployment strategy because I’ll be able to recreate it when requirements inevitably start requiring redundancy or what have you.
It’s a widely used library implementing both new and legacy algorithms, where almost every line has security implications. Why would you not expect a significant issue with it periodically?
They’re not doing the same thing though. Libre started with removing a lot of code and rarely used cyphers. It’s a good thing of course, but they did limit the scope so the issue rate fell.
Then openssl adopted some of the generic improvement ideas from libre, so they are getting closer as well.
Yes, that’s a very valid method of getting to a more secure library. That’s like saying “you can’t compare an SUV’s mileage to a hybrid car’s mileage because they’re different weights” - they are indeed different weights, but that’s the point and that’s why they’re being compared.
But even setting that aside, a lot of the code LibreSSL removed was not just rarely used ciphers. It was stuff like, an internal, bespoke malloc() implementation that defeated tools like Valgrind and OpenBSD’s hardened malloc() implementation (OpenSSL would call libc malloc() once at startup to allocate what essentially amounts to an arena and then divvy up the arena to callers with an OpenSSL function). Things like, support code targeting the lowest common denominator so OpenSSL could run on e.g. VMS (except that even when there were better platform APIs, they wouldn’t be used over the lowest common denominator stuff). All of this stuff is complexity that increases the likelihood of bugs, security or otherwise, but does not impact what ciphers/APIs/whatever other properties you care about as a consumer are available.
Look at virtually anything on this (admittedly snarky) blog for more.
Exciting times.
I’ve been sneaking it in at work to replace internal tools that have 1.5 second startup delay and 200+ MB on runtime dependencies with fast, static little zig exes that I can cross-compile to every platform used in the workplace.
I find your story more flattering than any big tech company deciding to adopt Zig. Thank you very much for sharing!
My impression of Zig and you all who are behind it has been that you care about these use cases at least as much as enabling big complex industrial applications, and not only in words but in action. :)
I actually started out with Rust, which I thought would be more easily accepted. I work in the public sector and tech choices are a bit conservative, but Rust has the power of hype in addition to its nice qualities, and has some interest from techy people in the workplace.
But then the easiest way to cross-compile my initial Rust program seemed to be to use Zig, and I didn’t really want to depend on both of them!
Seems like Go would be a natural choice. Far more popular than Zig and cross-compiles everywhere. Why Zig?
Not OP, but I can’t stand programming in Go. Everything feels painful for no reason. Error handling, scoping rules, hostile CLIs, testing, tooling, etc.
My greatest hope for Zig is that I can use it to replace Go, not just to replace C.
@kristoff what’s your take on that? Given that Zig has higher-level constructs like async/await built-in, with the support of higher-level APIs, are there reasons programming in Zig can’t be as convenient as programming in higher-level languages like Go?
I’m not going to argue with that but if you’re my report and you’re building company infrastructure in some esoteric language like Zig that will be impossible to find team members to maintain said infrastructure after you leave, we’re going to have a serious talk about the clash between the company’s priorities and your priorities.
OP said “sneaking in at work”. When working in a team, you use tooling that the team agrees to use and support.
Oh, can’t disagree there. I’m just hoping that someday I can replace my boring corporate Go with boring corporate Zig.
Two half-baked thoughts on this:
small, well-scoped utilities should not be hard for some future engineer to come up to speed on, especially in a language with an ever-growing pool of documentation. if OP was “sneaking in” some Brainfuck, that’s one thing. Zig? that’s not a horribly unsafe bet - it’s a squiggly brace language that looks and feels reasonably familiar, with the bonus of memory management thrown in
orgs that adhere religiously to “you use tooling that the team agrees to use and support” tend to rarely iterate on that list, which can make growth and learning hard. keeping engineers happy often entails a bit of letting them spread their wings and try/learn/do new things. this seems like a relatively lower-risk way to allow for that. mind you, if OP were “sneaking in” whole database engines or even Zig into hot-path app code without broader discussion, that’s a whole other problem, but in sidecar utility scripts? not much worse than writing a Bash script (which can often end up write-only anyway) IMO
Pretty much this, in my case.
The “sneaking” part was not entirely serious.
I have used it before at work to implement external functions for Db2, which has a C API, which is very easy to use with Zig: import the C headers, write your code, add a C ABI wrapper on top. Using it just as “a better C” in that case.
And while we mostly use “boring” old languages, there are some other things here and there. It’s not entirely rigid, especially not outside of the main projects.
(1) assumes that there is no cost to adding an additional tool chain simply because it’s for a small/self contained utility, which I’d hope people understand is simply not true
(2) you’re not wrong about tooling conservatism, but that’s because of your statement (1) being false - adding new tools has a real cost. The goal of a project is not to help you learn new things, that’s largely a happy coincidence. More to the point you’re artificially limiting who can fix things later, especially if it’s a small out of the way tool - once you’re gone if any issues arise any bug fix first requires learning a new tool chain not used elsewhere.
At least in my own domain (stuff interacting with other stuff on internet) I could say the same thing about Go, or most languages that aren’t Java/JS/C#/PHP/Python/Ruby. Maybe we will get to live in the 90’s forever :)
I am not a Zig user, but a Go user, yet I disagree about the team part.
In my experience that’s not really true, and my assumption here is that this is because it’s not just fewer people looking for a job using language X, but also fewer companies for these developers to choose from.
More then that I’d argue that the programming language might not be the main factor. As in that’s something you can learn if it’s interesting.
Of course all of that depends on a lot of other context as well. The domain of the field that you’ll actually work on, the team, its mentality, frameworks being used, alignment of values within the profession and potentially ones outside as well.
I also would assume that using Zig for example might make it a lot easier to find a fitting candidate when compared to let’s say Java where you night get a very low percentage of applications where the candidates actually fit. Especially when looking for a less junior position. Simply because that’s what everyone learns in school.
So I think having a hard time finding (good) devs using Zig or other smaller languages (I think esoteric means something else for programming languages) is not a given.
I don’t think that Zig can be a Go replacement for everyone, but if you are comfortable knowing what lies behind the Go runtime, it can be. I can totally see myself replacing all of my usage of Go once the Zig ecosystem becomes mature enough (which, even optimistically, is going to take a while, Go has a very good ecosystem IMO, especially when it comes to web stuff).
Zig has some nice quality of life improvements over Go (try
, sane defer
, unions, enums, optionals, …), which can be enough for me to want to switch, but I also had an interest in learning lower level programming. If you really don’t want to learn anything about that, I don’t think Zig can really be a comfortable replacement, as it doesn’t have 100% fool-proof guard rails to protect you from lower level programming issues.
In Go, deferred function calls inside loops will execute at the end of the function rather than the end of the scope.
Oh I didn’t realize Zig had block scoped defer. I assumed they were like Go. Awesome! Yeah that’s a huge pain with Go.
I “agree to disagree” on many of the listed issues, but one of them sincerely piqued my interest. Coming from Go and now Rust (and before C, C++, and others), I am actually honestly interested in Zig (as another tool in my toolbox), and tried dabbling in it a few times. However (apart from waiting for better docs), one thing I’m still super confused by and how I should approach it, is in fact error handling in Zig. Specifically, that Zig seems to be missing errors with “rich context”. I see that the issue is still open, so I assume there’s still hope something will be done in this area, but I keep wondering, is this considered not a pain point by Zig users? Is there some established, non-painful way of passing error context up the call stack? What do experienced Zig devs do in this area when writing non-trivial apps?
I see that the issue is still open, so I assume there’s still hope something will be done in this area
You are right, no final decision has been made yet, but you will find that not everybody thinks that errors with payloads are a good idea. They clearly are a good idea from an ergonomics perspective, but they also have some other downisides and I’m personally in the camp that thinks not having them is the better choice overall (for Zig).
I made a post about this in the Zig subreddit a while ago: https://old.reddit.com/r/Zig/comments/wqnd04/my_reasoning_for_why_zig_errors_shouldnt_have_a/
You will also find that not everybody agrees with my take :^)
Cool post, big thanks!!! It gives me an understandable rationale, especially making sense in the context of Zig’s ideals: squeezing out performance (in this case esp. allocations; but also potentially useless operations) wherever possible, in simple ways. I’ll need to keep the diagnostics idea in my mind for the next time with Zig then, and see what I think about them after trying. Other than that, after reading it, my main takeaway is, that I was reminded of a feeling I got some time ago, that errors & logging seem a big, important, yet still not well understood nor “solved” area of our craft :/
I used zig-clap
recently, which has diagnostics that you can enable and then extract when doing normal Zig error handling. I think that’s an okay compromise.
And easier than all those libraries that help you deal with the mess of composing different error types and whatnot.
There are gotchas/flaws like this: https://github.com/golang/go/discussions/56010
I feel like I run into shadowing issues, and then there’s things like where you’re assigning to err
a bunch of times and then you want to reorder things and you have to toggle :=
vs =
, or maybe you do err2, err3, etc. In Zig all that error-handling boilerplate is gone and operations become order-independent because you just try
.
And don’t get me started on the fact that Go doesn’t even verify that you handle errors, you need to rely on golangci-lint for extra checks the language should do…
Edit: also as Andrew points out, Go doesn’t block-scope things when it should: https://lobste.rs/s/csax21/zig_is_self_hosted_now_what_s_next#c_g4xnfw
Edit: ohh yeah part of what I meant by “scoping” was also “visibility” rules. It’s so dumb that changing the visibility (public/private) of an identifier also makes you change its name (lowercase vs uppercase initial letter).
It’s so dumb that changing the visibility (public/private) of an identifier also makes you change its name (lowercase vs uppercase initial letter).
Especially since some people write code in their native language(s) (like at my job), and not all writing systems even have this distinction.
I’ve had better results (and more fun) with Rust and Zig in my personal projects, so Go didn’t really cross my mind.
If it was already in use in this workplace, or if there had been interest from coworkers in it, I might agree that it would be a “natural choice”.
Edit: I think it’s also easier to call Zig code from the other languages we use (and vice versa), than to call Go code. That might come in handy too.
The platforms are those used by people in the organisation, currently Linux, Windows, and macOS. Mostly x86, some ARM.
Do the other devs all know Zig? If not, seems like a large downside is that tools formerly editable by anyone become black boxes?
To be clear, I hate unnecessarily bloated tools too. I’m just considering the full picture.
They don’t/didn’t, but it’s a quick one to pick up when you already know C and some other ones.
I didn’t know the previous language before I started contributing to these tools either.
It was pretty easy when someone else had already done the foundation, and I think/hope I am providing a solid foundation for others as well.
I totally agree this is a big issue. I’ve been forwarding email from my domain to Gmail for the past 10 years. Over the past 2 years, a bunch of things started getting marked as spam. The Gmail UI also got extremely slow for me this year.
So I actually decided to go self-hosted. Using the nixos-mailserver project made this pretty easy but as this blog post points out, it can be impossible to figure out why an email is being marked as spam when sending to Gmail, Outlook, etc. Sometimes just luck is involved!
Not sure about outlook since email delivery the always just worked, but I once was helping someone with an ancient setup. Google has a list of things to do to get your emails though
Things like properly setting up DKIM, SPF, etc., but something people tend to not do because it’s rarely mentioned in mail server setup gives is setting the PTR record correctly. It’s easy to forget and how you do that depends on where you host your server. Once you know everything to set it though it’s easy and quick to do.
That’s a question a spammer would ask. ;)
I would not recommend OVH as their cheap VPS were (are?) often abused by spammers, so it’s clean IP lottery. I got one clean IP some years ago but got tired of self-hosting too (I offloaded that service to Gandi to do more interesting things with my time).
I heard that Hetzner has a decent reputation these days.
a spammer
Heh, maybe I am one :) I doubt though, I don’t run any services or anything, I only spam people with Azure’s notification emails about failed builds these days :)
I have a tiny Vultr VM that handles outgoing mail for me. Vultr disables outbound SMTP by default. To get it enabled, you have to raise a support ticket that tells them the amount of mail you want to send and why. This took about 2 hours from becoming a customer for the first time and getting a working relay. It probably provides enough friction to prevent scammers from doing it and, importantly, the fact that they shut down outbound spam senders means that their IPs generally don’t end up on block lists (or, if they do, are cleaned quickly).
I think it’s either small providers or just getting lucky. One usually is unlucky with bad IPs, so it’s not like one has search for them. I personally didn’t have that problem, but if you end up with a bad IP I’d honestly just ask the hosting company to give you a new one, because you cannot run the mail service you intend to run.
Of course you can also check black lists, but I’d recommend to just test it, sending emails to different email servers after the initial setup (SMTP server, SPF, DKIM, PTR, DMARC, putting it on dnswl.org) is complete. Judging by the comments here it’s probably enough to just check if the big ones, such as gmail receive your emails correctly, preferably ones that didn’t receive yours yet, but new ones, so you don’t end up having it cleared for just one user. I think there’s also services you can send emails to for testing purposes, that check against black lists and usually also whether DKIM and so on works correctly.
If that works you’re probably fine, if not ask for a new IP and try again. If it didn’t work maybe worthwhile to check if you can find out which blackist it’s on. Both to check whether the whole block is listed and to see if you can’t just get it unlisted. Some of them offer that. Of course that’s only for public black lists.
Of course you can also check black lists, but I’d recommend to just test it, sending emails to different email servers after the initial setup
But not to the big players. You’ll have a lot more luck if the first email that you send to a gmail account is a reply to a message that was sent to your domain. Gmail uses outbound emails to increase reputation of domains and servers. If Google’s customers have been sending emails to your server for a while then they assume that you’re more likely to be real than if you just appear and start sending emails.
The goal here was not to be lucky, but the opposite, crafting a worst case scenario so you so catch problems as early as possible.
I disagree. The goal is to have email working. You are dealing with a remote state machine that has a permanent failure state. If you transition into this state, you have lost. Your goal is to avoid ever reaching that state. Doing things that move you towards that state early will help you catch the problems but will make it impossible to remedy them.
How can you disagree on what my goal was?
Also: The question was how to check whether your IP causes issues.
The goal was to figure out whether you need to switch the IP, so if I get into that permanent failure state early on I’ll just start over.
On top of that I have my doubts on that triggering a permanent failure, but since that’s just us guessing I will leave it there.
Can’t sent email addresses be spoofed pretty well?
Yes and no: With DMARC / DKIM, you can ensure that the message came from the server that it claims to. The server might be incorrectly configured. Without, this is far harder to verify.
Note that you also need DNSSEC on the DKIM records, otherwise an attacker who can intercept and spoof DNS can give the server a fake public key and then send a spoofed message with the corresponding private key.
I’m that one luser who doesn’t send click “mailto” links because that opens the mac email client, which I don’t use.
This is an interesting use case. One of the problems with mailto links is that they’ll usually open with the default mail client (no idea why you’ve set the default to the one that you don’t use), but a lot of people they have separate work and personal accounts and use separate clients (especially if they’re web-based ones). It’s quite likely that they’d open with one that wasn’t the one that they wanted to log in with and then they’d wonder why logging in didn’t work.
Yes and no: With DMARC / DKIM … DNSEEC … DKIM
The waterfall of multi-letter acronyms I’ve been reading in all these discussions suggests to me this will be trouble.
no idea why you’ve set the default to the one that you don’t use
I think the simplest explanation here is the correct one: I didn’t bother to change the default because I don’t use mailto:
links and I use webmail.
My computer is a tool, not a home. I don’t spend much time decorating it just right. If the default isn’t right for me, I’m not going to waste time discovering what the right setting is.
I have a sneaking suspicion this is the most common user persona.
That’s SSE, HTTP/2 push was primarily to pre-emptively send assets to the client before they were requested.
Don’t forget SSE ; and whatever they’ll add in the next Chromium sprint…
I don’t have a solution for this, but suggesting GitHub as an alternative to now-hostile Docker seems to move the issue from one silo to the next.
Decentralization is not going to happen, at least not as long as the decentralization is being pushed by the kinds of people who comment in favor of decentralization on tech forums.
Decentralization means that each producer of a base image chooses their own way of distributing it. So the base image for your “compile” container might be someone who insists on IPFS as the one true way to decentralize, while the base image for your “deploy and run” container is someone who insists on BitTorrent, and the base image for your other service is someone who insists on a self-hosted instance of some registry software, and…
Well, who has time to look up and keep track of all that stuff and wire up all the unique pipelines for all the different options?
So people are going to centralize, again, likely on GitHub. At best they’ll start having to specify the registry/namespace and it’ll be a mix of most things on GitHub and a few in other registries like Quay or an Amazon-run public registry.
This is the same reason why git, despite being a DVCS, rapidly centralized anyway. Having to keep track of dozens of different self-hosted “decentralized” instances and maintain whatever account/credentials/mailing list subscriptions are needed to interact with them is an absolute pain. Only needing to have a single GitHub account (or at most a GitHub account and a GitLab account, though I don’t recall the last time I needed to use my GitLab account) is such a vast user-experience improvement that people happily take on the risks of centralization.
Maybe one day someone will come along and build a nice, simple, unified user interface that covers up the complexity of a truly decentralized system, and then we’ll all switch to it. But none of the people who care about decentralization on tech forums today seem to have the ability to build one and most don’t seem to care about the lack of one. So “true” decentralized services will mostly go on being a thing that only a small circle of people on tech forums use.
(even Mastodon, which has gained a lot of use lately, suffers from the “pick a server” onboarding issue, and many of the people coming from Twitter have just picked the biggest/best-known Mastodon server and called it a day, thus effectively re-centralizing even though the underlying protocol and software are decentralized)
I imagine that image producers could also agree on a one distribution mechanism that doesn’t have to rely on a handful of centralized services. It doesn’t have to be a mix of incompatible file transfer protocols either, that would be really impractical.
The main reason was (is) probably convenience yes, but I think that Git has a different story: I may be wrong, but I don’t think that Docker was about decentralizing anything, ever. I would rather compare Git and GitHub’s relation to that of SMTP and Gmail.
Maybe, that would be convenient. I may be a bit tired, or read too much from your response, but I feel that you’re annoyed when someone points out that more centralization isn’t the best solution to centralization issues.
I fear that, because Docker sold us the idea that there’s only their own Dockerfile format, that builds on images that must be hosted on Docker Hub, we didn’t think about alternatives – well, until they added “and now you must pay to keep using things that we offered for free”. Let’s not discard all discussions on the topic of decentralization too quickly, as we could improve on Docker, and we need more ideas.
In the various threads about Docker, people are already proposing all sorts of incompatible transfer protocols and hosting mechanisms, and displaying no interest in cooperating with each other on developing a unified standard.
The best I think we can hope for is that we get a duopoly of popular container registries, so that tooling has to accommodate the idea that there is no single “default” (as happens currently with Docker treating their own registry as the default). But I think it’s more likely that network effects and cohesiveness of user experience will push most people onto a single place, likely GitHub.
My point was to say “look, this thing that was explicitly designed to be decentralized, and is in a category of tools that literally have the word ‘decentralized’ in the name, still ended up centralized, that does not give high confidence that trying to decentralize Docker registries, which were not even designed with decentralization in mind, will succeed in a meaningful way”.
I will just be brutally honest here: the success rate of truly “decentralized” systems/services in the real world is incredibly low. Partly this is because they are primarily of interest to tech people who are willing to put up with a tool that is metaphorically covered in poisoned razor blades if it suits some theoretical ideal they have in mind, and as a result the user experience of decentralized systems/services tends to be absolutely abysmal. Partly this is because social factors end up concentrating and centralizing usage anyway, and this is a problem that is hard/impossible to solve at the technical level, but most people who claim to want decentralized systems are only operating on the technical design.
So I do discard discussions on decentralization, and quickly. Perhaps this time someone really will come up with a decentralized solution that gets mass adoption and manages to stay de facto decentralized even after the mass adoption occurs. But to me that statement is like “I should buy a lottery ticket, because perhaps this time I will win”.
Well, what can I say. These are some strong opinions, probably soured from various experiences. I’m not trying to convince you in particular, but do hope that we can aim higher than a duopoly. Thanks for the chat. :)
GitHub has benefits from centralisation because, as a publisher of an open-source project, I want it to be easy for people to file bugs and contribute code. Most people who might do this have a GitHub account and so anything either hosted on GitHub or hosted somewhere with sign-in with GitHub makes this easy.
I’m not sure that this applies to container images. People contributing to the code that goes into a container or raising issues against that code will still go to GitHub (or similar) not DockerHub. People interact with DockerHub largely via
{docker,buildah} pull
, orFROM
lines inDockerfile
s. These things just take a registry name and path. As a consumer of your image, it makes absolutely no difference to me what the bit before the first/
in your image name is. If it’s docker.io, quay.io, azurecr.io, or whatever, the tooling works in precisely the same way.The only place where it makes a difference is in private registries, where ideally I want to be able to log in with some credentials that I have (and, even more ideally, I want it to be easy to grab the image in CI).
I see container registries as having more in common with web sites than code forges. There are some incentives towards centralisation (lower costs, outsourced management), but they’re not driven by network effects. I might host my web site using GitHub Pages so that GitHub gets to pay for bandwidth, but visitors find it via search engines and don’t care where it’s hosted, especially if I use my own domain for it.
Indeed. You need a place that has offers large amounts of storage and bandwidth for low cost, if not free entirely. And bandwidth has had a ridiculous premium for a very long time, which makes it very hard to find such a service.
You could find VPS providers at a reasonable rate for both of these, but now you’re managing a server on top of that. I’m not opposed to that effort, but that is not a common sentiment. 😅
Time for a project that combines Docker with Bittorrent.
A shared redirector should need less space/bandwidth than actual independent hosting, but backend-hopping would become easier… And the projects themselves (who don’t want to annoy the backends too much) don’t even need to acknowledge it aloud.