I think OCaml’s syntax won’t present much of a challenge for any seasoned programmer, especially for someone with a passing knowledge of a similar language like Haskell.
I find the hardest part of OCaml is remembering the syntax :). I made https://ocamlsyntax.com because there are certain constructs that just will not stick in my brain and I have to look them up every time I need them (looking at you, polymorphic abstract type variables). And that doesn’t even include any of the object-oriented stuff! That’s a whole other mirror universe of different syntax on top of the rest of the language.
no syntax for list comprehensions (I often use those in Clojure, Haskell and Erlang, although I’m well aware they are not something essential)
Definitely a personal style thing – even in Haskell I prefer an explicit do
to list comprehensions, which OCaml… sort of has… if you install a syntax extension… okay yeah not great.
the compiler is super fast, but that was expected given that OCaml is famous for it.
Really?? That’s fascinating. My only significant experience with OCaml was working on a large-ish codebase, and the unbelievable slowness of even incremental recompilation was one of my biggest gripes with the language. And doing an optimized build, especially post-flambda… is it really famous for being a fast compiler? (Is this a “compared to GHC” situation?)
I was amused to see that the popular book Real World OCaml directly recommends replacing the standard library with a third-party alternative.
For context, one of the co-authors of Real World OCaml wrote the third-party alternative that the book advocates. (I mean, he didn’t literally write it all by himself, but you know what I mean.)
There’s also support to target JavaScript from OCaml sources, which looks interesting, but I haven’t tried it yet.
For the curious, I think js_of_ocaml
is shockingly good. It really does what it says on the tin. The generated output isn’t slim – I don’t know if there’s some kind of tree-shaking thing you can do to improve that – but if you can afford to serve 1mb .js files, you really can write a webapp in OCaml.
My only significant experience with OCaml was working on a large-ish codebase, and the unbelievable slowness of even incremental recompilation was one of my biggest gripes with the language.
With a large codebase surely the build time is mostly influenced by what build tool is used and how the build is done. Was the build optimized properly i.e. were modules being rebuilt that didn’t need to be? And were the correct settings used to optimize for dev workflow for incremental compilation e.g. the -opaque
flag?
I don’t know! It was a long time ago, and I never tried to dig into the build system to see if I could speed it up. There was a team dedicated to build performance (and tooling), and I trust that they knew what they were doing, but I can’t make any more intelligent claims than that. Incremental recompilation was definitely much faster than not-incremental recompilation, and they built a system to make almost all builds into incremental builds with a shared artifact cache, so I assume they were doing all the right things. But I couldn’t swear to it. We had knobs to compile modules without linking anything, which made incremental rebuilds tolerably fast, but that meant no tests… so yeah.
Anyway I am genuinely asking about that point, because I know very little about OCaml outside of my bubble. And because this was the largest codebase I have ever worked on in any compiled language, I have no intuition for what a “fast” or “slow” compiler feels like at that scale.
If it was a few years ago it may have also changed drastically. The compiler has always been very fast, but the build tools need to exploit it (and parallelize and avoid rebuilds where possible). Anecdotally, I worked on a project that counted about a million lines of ocaml. It used to build with omake, which would take about 10 minutes (not including tests). When we ported it to jbuilder/dune, we could build the full toolstack (so the huge project + all its related libraries and extra components + tests components) in less than two minutes, with rebuilds (usually) in the order of seconds.
This was the topic for my internship back in 2017 working on the Reason compiler–integrating ppx_show into the language. I’m glad it has finally arrived (albeit in a different form) in OCaml.
What do you mean? Pretty printers have been there since ages (the article is from 2017 but the fmt library is there since 2015) and the possibility to install pretty printers in the toplevel is earlier than that
The goal was to automatically install a to_string function for every type created, not just in the top level. In normal execution
What/where is OCaml used? What is the language like, compared to “mainstream” things? I think I sometimes see mentions of OCaml on my Linux day-to-day, but I don’t know much beyond that.
Is it something used a lot, or is it very niche (kinda like Rust vs Zig maybe)?
Comparing it to more mainstream things: it’s a bit like Go, in that it’s an ahead-of-time compiled language that can produce fast, native binaries, but with a runtime that includes a garbage collector.
It’s a strictly evaluated imperative language, and you can read the code and predict quite accurately what instructions the compiler will produce. You can also write in a pretty high-level functional style: OCaml has a very wide range from “high level” to “low level” coding. You basically never have to call out to C in order to do something “fast enough,” as long as you avoid heap allocations (the GC only runs in response to heap allocation, you can get a very strongly typed language without any runtime overhead if you’re careful).
It also has a type system reminiscent of Haskell’s, which means you can make massive changes to large codebases pretty fearlessly. But — unlike Haskell — OCaml supports implicit side effects (like most languages), so it doesn’t have much of a learning curve. It also lacks typeclasses, and most of the other fancy type system things that make Haskell tricky to learn.
OCaml also has a shockingly good JavaScript backend — you can compile OCaml code to JS and use that to share code between client and server, if you’re doing web stuff. Autogenerate correctly typed APIs and stuff (if, you know, your only clients are also using OCaml). I don’t know any other language that comes close here.
Subjectively: OCaml is a very ugly language, with lots of weird syntax and strange language warts. But if you can look past that, it’s a very practical language. It’s not fun the way that Haskell is, but it’s old and stable and works well, and the type system is the best you’re going to find in an imperative language. (Reason — an attempt to provide an alternate syntax for the OCaml compiler — was disappointingly incomplete the last time I checked. Don’t know if it’s still a thing.)
But the community is very small. Jane Street publishes some very thorough libraries covering all the basic stuff — containers, async scheduling, system IO, etc — but coverage for what you might think of as basic necessities (especially if you’re doing web development) is a lot more spotty.
So it occupies sort of a weird place in the world. It’s a solid, conservative, relatively performant language. But you probably don’t want to build a product on top of it, because hiring will be pretty expensive. And I don’t think it’s particularly interesting from a mind-expanding point of view — Haskell has a lot more bang for the buck there.
It’s a strictly evaluated imperative language
What’s your definition of imperative? If you limit “functional” to “pure”, then it’s quite against the mainstream opinion that classifies Scheme and often even CommonLisp as “functional”. Presence of mutable values does not make a language non-functional—absence of support for first-class functions and primitives for passing immutable values between them does.
Most real-life OCaml code, at least in public repositories, is as functional as typical Haskell, i.e. centered around passing an immutable state around, with wide use of benign side effects (like logging) and occasional use of mutable values when it’s required for simplicity of efficiency.
(For the uninitiated, you need to declare mutable variables or record fields explicitly, by default everything is immutable, unlike in Scheme)
It also lacks typeclasses, and most of the other fancy type system things that make Haskell tricky to learn.
What you aren’t saying and what someone who doesn’t know it yet may want to hear is that lack of type classes makes type inference decidable. With any “normal” (non-GADT) types, the compiler will infer types of any value/fuction automatically. There are no situations when adding an annotation will make ill-typed code well-typed. The only reason to add type annotations is for humans, but humans can as well view them in the editor (via Merlin integration).
Well, module interfaces do need type annotations. Which is another thing you seem to dismiss: the module system. Functors provide ad hoc polymorphism when it’s required, and their expressive power is greater. My recent use case was to provide a pluggable calendar lib dependency for a TOML library. OCaml is the only production-ready language that allows anything like that.
But — unlike Haskell — OCaml supports implicit side effects (like most languages), so it doesn’t have much of a learning curve. it’s particularly interesting from a mind-expanding point of view — Haskell has a lot more bang for the buck there.
Not mind-expanding for someone who already saw dependently-typed languages for sure. For someone with only Go or Python background, it’s going to be as mind-blowing as Haskell, or any actually functional language for that matter.
Technically, it’s possible to write OCaml as if it was Pascal, but it’s neither what people actually do nor something encouraged by the standard library. People will also run into monads pretty soon, whether a built-in one (Option
, Result
) or in concurrency libs.
Jane Street publishes some very thorough libraries covering all the basic stuff — containers, async scheduling, system IO
My impression is that the last time you looked was quite a while ago. Sure they do, but for each of those there’s at least one non-JaneStreet alternative, in case of Lwt, more popular than the JaneStreet one. Compare the reverse dependencies of Async vs Lwt.
Sure, that community is still smaller than those of many other languages, but it’s far from “you will never find a lib you need”.
What’s your definition of imperative? If you limit “functional” to “pure”, then it’s quite against the mainstream opinion that classifies Scheme and often even CommonLisp as “functional”.
By imperative I mean that OCaml has statements that are executed in order, as opposed to something like Prolog or APL or a (primarily!) expression-oriented language like Haskell. I avoided calling it a “functional language” because I don’t know what that term means to the person I was replying to. I would describe OCaml as functional as well. I don’t think the label is mutually exclusive with imperative.
What you aren’t saying and what someone who doesn’t know it yet may want to hear is that lack of type classes makes type inference decidable.
If this tips anyone over the fence into learning OCaml, I will be delightfully surprised :)
Which is another thing you seem to dismiss: the module system. Functors provide ad hoc polymorphism when it’s required, and their expressive power is greater.
I think you’re reading more into my comment than is really there. I was trying to give a rough overview of “what is OCaml” to someone who does not know OCaml. The module system is neat. I’m not dismissing it. Typing on a phone takes a long time.
Not mind-expanding for someone who already saw dependently-typed languages for sure. For someone with only Go or Python background, it’s going to be as mind-blowing as Haskell, or any actually functional language for that matter.
Yeah, this is fair. If the choice is between OCaml or nothing, definitely study OCaml! But Haskell has a larger community, a lot more learning resources, and will force you to think differently in more ways than OCaml. Which makes it hard to recommend OCaml to someone who is functional-curious, as much as I personally like the language.
My impression is that the last time you looked was quite a while ago. Sure they do, but for each of those there’s at least one non-JaneStreet alternative, in case of Lwt, more popular than the JaneStreet one. Compare the reverse dependencies of Async vs Lwt.
From this response I get the impression that you read my comment as “the only libraries that exist are the ones Jane Street published.” What I meant was to assure the person I was replying to that OCaml has a healthy set of basic libraries available, with an existential proof of that statement.
Sure, that community is still smaller than those of many other languages, but it’s far from “you will never find a lib you need”.
We are in complete agreement here.
Subjectively: OCaml is a very ugly language, with lots of weird syntax and strange language warts. But if you can look past that, it’s a very practical language. It’s not fun the way that Haskell is, but it’s old and stable and works well, and the type system is the best you’re going to find in an imperative language.
the syntax does have its share of odd corners, but i don’t find it ugly on the whole. i quite enjoy working in it. also, having given both a decent try, i found it more fun than haskell, and ultimately it was the fp language i ended up sticking with.
The “standard” reply is a company called Jane Street, that apparently requires every employee(?) to take a course in OCaml.
It is not used a lot, but its user base is growing fast recently. Companies/institutions using Ocaml also include: Citrix (xenserver), Facebook, Bloomberg (where rescript was born), Tezos, Ahrefs, INRIA (COQ to name one), Aesthetic Integration, Tarides, to name a few. It is used a lot for writing compilers (also Rust started with an OCaml implementation) but it is a pretty good language for system programming, most general purpose programming in fact.
The community is not huge, so you don’t have as many libraries as other languages do, but the ones that are there are usually pretty solid
It’s getting fairly popular. I have posted Haskell and OCaml skills in HN Who’s hiring threads, and getting tons of emails back lately due to the OCaml part. I know OCaml (and SML) for a good 15 years, and it has gone from really niche to decently easy to find a job that uses it.
I think this is due to the increasing popularity of functionaly programming and modern type systems. Aside from this, Facebook and many others use it for building static analyzers. Furthermore, it’s a good companion for Coq.
Sadly, Haskell is still quite unpopular, but that’s a topic for another discussion.
To complement the sibling replies, consider: OCaml is already a mainstream language. You are most likely to experience it on your desktop through FFTW, a ubiquitous signal-processing library which has been available for a couple decades.
A lot of people have answered your question, I’d also add that F# is a direct descendant of OCaml and shares a lot of the core syntax. It’s probably more widely used than OCaml and seems to be the CLR language that people get the most enthusiastic about.
I get a “Video Unavailable: this video is private” message from that link.
I think the same content is being uploaded https://watch.ocaml.org/video-channels/ocaml2021/videos to watch after
Indeed, the live stream is no longer available. Maybe the maintainers can replace the link with yours
It is a joke, refers to GithHub copilot and its carelessness against licenses. If you look at the code, you will see that after some sleep time it will serve you back your original file :D
I’ve assumed that there was a real NN behind this satire until I’ve read this thread. I think that the problem here is that the website miscommunicates its purpose.
Also I couldn’t find any direct reference to the source code, and a quick search on DuckDuckGo and GitHub doesn’t show up anything.
Is there the source code for Copilot available somewhere? I doubt it, but wondering would it change anything if it were.
If you receive copyrighted material and process it, in addition to costs associated to and computational power, how much would you risk in legal terms?
Copyrighted material must be processed in order to play it, by the very nature of how computers work the material must be copied in part or in full a number of times during processing - there is actual exemption in copyright law to allow for this otherwise the very act of playing back material would be illegal by the letter of the law.
Q: What would the NN actually do? You want just enough learning/wiggle room for it to be controversial like Microsoft Copilot, methinks. Perhaps a NN that generates a song inspired by the input song, with a slider for how similar you want the song to be.
Then you could break it down by degree - at what point is the song “the same song with a note or two different”, vs “a different song that shares most of the notes”?
I wish I could invite this story multiple times. The perfect combination of being approachable, while still being packed with (to me) new information. Readable without ever being condescending.
One thing I learned was that DNA printers are a thing nowadays. I had no idea. Are these likely to be used in any way by amateur hackers, in the sense that home fusion kits are fun and educational, while never being useful as an actual energy source?
So you can actually paste a bit of DNA on a website and they’ll print it for you. They ship it out by mail in a vial. Where is breaks down is that before you inject anything into a human being.. you need to be super duper extra totally careful. And that doesn’t come from the home printer. It needs labs with skilled technicians.
Could any regular person make themselves completely fluorescent using this method? Asking for a friend.
You may be interested in this video: https://www.youtube.com/watch?v=2hf9yN-oBV4 Someone modified the DNA of some yeast to produce spider silk. The whole thing is super interesting (if slightly nightmarish at times if you’re not a fan of spiders).
So that’s going to be the next bioapocalypse then. Autofermentation but where as well as getting drunk, you also poop spider silk.
Thanks for the awesome article! Are there any specific textbooks or courses you’d recommend to build context on this?
Not really - I own a small stack of biology books that all cover DNA, but they cover it as part of molecular biology, which is a huge field. At first I was frustrated about this, but DNA is not a standalone thing. You do have to get the biology as well. If you want to get one book, it would have to be the epic Molecular Biology of the Cell. It is pure awesome.
You can start with molecular biology and then a quick study of bio-informatics should be enough to get you started.
If you need a book, I propose this one, it is very well written IMO and covers all this stuff.
Great article! I just have one question. I am curious why this current mRNA vaccine requires two “payloads” ? Is this because it’s so new and we haven’t perfected a single shot or some other reason?
As I understand it[1] a shot of mRNA is like a blast of UDP messages from the Ethernet port — they’re ephemeral and at-most-once delivery. The messages themselves don’t get replicated, but the learnt immune response does permeate the rest of the body. The second blast of messages (1) ensures that the messages weren’t missed and (2) acts as a “second training seminar”, refreshing the immune system’s memory.
[1] I’m just going off @ahu’s other blogs that I’ve read in the last 24 hours and other tidbits I’ve picked up over the last 2 weeks, so this explanation is probably wrong.
It’s just the way two current mRNA vaccines were formulated, but trials showed that a single shot also works. We now know that two shots are not required.
The creators of the vaccine say it differently here: https://overcast.fm/+m_rp4MLQ0 If I remember correctly, they claim that one shot protects you but doesn’t prevent you to be infective, while the second make sure that you don’t infect others
Not an expert either, but I think this is linked to the immune system response, like some other vaccines, the system starts to forget, so you need to remind him what the threat was.
Is there any information on pseudouridine and tests on virus encorporating it in their DNA?
The one reference in your post said that there is no machinery in cells to produce it, but the wiki page on it says that it is used extensively in the cell outside of the nucleus.
It seems incredibly foolhardy to send out billions of doses of the vaccine without running extensive tests since naively any virus that mutated to use it would make any disease we have encountered so far seem benign.
From https://en.wikipedia.org/wiki/Pseudouridine#Pseudouridine_synthase_proteins:
Pseudouridine are RNA modifications that are done post-transcription, so after the RNA is formed.
That seems to mean (to me, who is not a biologist) that a virus would have to grow the ability to do/induce such a post-processing step. Merely adding Ψ to sequences doesn’t provide a virus with a template to accelerate such a mutation.
And were this merely a nuclear reactor or adding cyanide to drinking water I’d agree. But ‘I’m sure it will be fine bro’ is how we started a few hundred environmental disasters that make Chernobyl look not too bad.
‘We don’t have any evidence because it’s obvious so we didn’t look’ does not fill me with confidence given our track record with biology to date.
Something like pumping rats with pseudouridine up to their gills then infecting them with rat hiv for a few dozen generations and measuring if any of the virus starts encorporating pseudouridine in its RNA would be the minimum study I’d start considering as proof that this is not something that can happen in the wild.
As I mentioned, I’m not a biologist. For all I know they did that experiment years ago already. Since multiple laymen on this forum came up with that concern within a few minutes of reading the article, I fully expect biologists to be aware of the issue, too.
That said, in a way we have that experiment already going on continuously: quickly evolving viruses (such as influenza) that mess with the human body for generations. Apparently they encountered pseudouridine regularly (and were probably at times exposed to PUS1-5 and friends that might have swapped out an U for a Ψ in a virus accidentally) but still didn’t incorporate it into their structure despite the presumed improvement to their fitness (while eventually leading our immune system to incorporate a response to that).
Which leaves me to the conclusion that
Due to lack of time (and a list of things I want to do that already spans 2 or 3 lifetimes) I’ll stick to 2.
I enjoyed the article, reminded me of my days at the university :-)
So here are some quick questions in case you have an answer:
It is called negative selection. It works like this:
How does info spread through the body
I came across this page relatively recently and it really blew my mind.
glucose is cruising around a cell at about 250 miles per hour
The reason that binding sites touch one another so frequently is that everything is moving extremely quickly.
Rather than bringing things together by design, the body can rely on high-speed stochastic events to find solutions.
This seems related, to me, to sanxiyn’s post pointing out ‘random combination’ - the body:
This constant, high-speed process can still take a day or two to come up with a shape that’ll attack whatever cold you’ve caught this week - but once it does, that shape will be copied all over the place.
I did some projects in grad school with simulating the immune system to model disease. Honestly we never got great results because a lot of the key parameters are basically unknown or poorly characterized, so you can get any answer you want by tweaking them. Overall it’s less well understood than genetics, because you can’t study the immune system in a petri dish. It’s completely fascinating stuff though: evolution built a far better antivirus system for organisms than we could ever build for computers.
In an interesting turn of events, the investigation of the whole SolarWinds compromise led to the discovery of an additional malware that also affects the SolarWinds Orion product but has been determined to be likely unrelated to this compromise and used by a different threat actor.
404, archive here: https://archive.vn/795yh
Strange, for me the page works but not the archive. I have made a new snapshot on the web archive, just in case: https://web.archive.org/web/20201218113458/https://www.reuters.com/article/us-usa-cyber-breach-idUSKBN28R2ZJ
Couldn’t Apple avoid this mess by making a local cache of approved certs on each Mac and updating it periodically? I.e. the same thing APT does for package metadata (for all packages available in enabled repositories). You can search the cache (with apt search
or apt-cache search
) without contacting any servers. Only, in the case of Apple, it wouldn’t be a cache of package metadata, but a cache of approved developer certs. Then they could check all the apps without contacting any servers. This way both privacy and performance concerns are addressed. It would also allow users to launch programs during server downtime.
That is indeed what they do, for security reasons they don’t require an explicit action from the user to refresh the cache, but they do have a cache, and only check it if a certain amount of time has passed. From the article:
I should also add that after closing Firefox and opening it again, no requests were made. This is reasonable, and indicates that certificate checking isn’t performed at each launch but only after it hasn’t been performed for a certain period of time.
That’s a different thing though. I’m talking about a cache of certificates, while the article seems to be talking about a cache of cert check responses. It means that server still gets some metadata about what programs the user launches. It’s also unclear why in this case people started having problems launching any non-Apple programs during server downtime.
It needs to check in with the OCSP server occasionally to see if the developer certificate hasn’t been revoked, and the only way to chek using OCSP is to send the developer certificate to the OCSP server.
People started having problems because the OCSP was available reachable, but never responded with an OK, presumably because it was being overloaded by new Bug Sur requests.
yakubin meant to have a full copy of all certificates on the computer and updating that in full every so often. Therefore no data about individual apps is sent over the network, when launching an app.
One could also use the google safe links algorithm where a partial hash of the app is sent and a bunch of responses is sent such that the server doesn’t know which apps the client is actually using.
That only solves the privacy issue. It doesn’t solve the problem of the time needed to launch an app and reliability of the system in the face of server downtime or packets dropped by a firewall. Generally, I don’t think that execve
on its own should ever prompt any network communication.
EDIT: Your solution, by introducing frequent collisions, makes the whole mechanism essentially useless. Now you can’t differentiate between detected malware and legitimate software. To make it work at this point, you need to send the full hash to the server, defeating the point. I misunderstood, see @pgeorgi’s reply.
Your solution, by introducing frequent collisions, makes the whole mechanism essentially useless. Now you can’t differentiate between detected malware and legitimate software. To make it work at this point, you need to send the full hash to the server, defeating the point.
The solution works like this: client to server: I have a hash here, starting with 12. server to client: I have a set of good hashes 1234, 1235, 1237 and bad hashes 1236, 1239. This set is good for 24 hours.
That way the server or a listener can’t infer which hash is specifically requested while the client gets precise information to work with (plus some more that they probably won’t need)
Sure, it doesn’t solve all the issues, but some minor additions can. I think there are two reasonable approaches:
If Apple had just set a really small timeout on their request this OCSP system would have worked fine and no one would have noticed this outage.
From the privacy point of view, looks like they will introduce encryption: https://www.macrumors.com/2020/11/15/apple-privacy-macos-app-authenticaion
Thanks for the link. Obviously this approach only offers increased privacy Vs actors that aren’t Apple. But if you’re running a Mac obviously you have to trust Apple a bit, so I don’t think it’s unreasonable.
The neat thing is that this means Apple can effectively remove software from your computer in addition to watching what you run.
Everyone who ever built an antivirus or any sort of malware-removal tool is an active foot-soldier in the war against general-purpose computing, apparently. Seeing as all of those tools are designed to identify and remove software from your computer…
So a professional is going to be checking if Transmission got their dev server owned this week instead of every single user having to check, cool.
There is a response by apple to the Privacy issue: https://www.macrumors.com/2020/11/15/apple-privacy-macos-app-authenticaion/
I have been on both sides of GitHub DMCA, so I can speak from experience.
GitHub strongly favors keeping the content up.
So yeah, its down now. But the repo owners can send a Counter Notice. Then RIAA has 14 days to file a copyright infringement suit in Federal Court, and then present a copy of the filing to GitHub.
If GitHub does not receive the filing in time, or anything is wrong with the paperwork, the repo will go back up.
https://docs.github.com/en/free-pro-team@latest/github/site-policy/dmca-takedown-policy
Do you know if this is true also in this case? The letter is not really the usual takedown notice: more here https://twitter.com/xor/status/1319861757301710848
Sorry but I cant follow that. Twitter is absolute garbage. Maybe that guy can repost it to a blog or something.
https://nitter.net/xor/status/1319861757301710848
Nitter is a free and open source alternative Twitter front-end focused on privacy.
One reason Twitter easily spirals down to utter garbage, is the lack of downvotes. We can only retweet, and if we disagree, we can only shout back. There seems to be very little moderation. And of course, the length limit on the damn tweets effectively bans nuanced thought, which require too many characters. (Incidentally, this limitation makes me wonder how Twitter managed to get so popular.)
A free front end is unlikely to solve those problems.
I wrote this article to give other people an introduction to NaN-boxing. Please let me know what you think!
Are you getting an actual 404 or the „Cannot find article“ message? If its the latter please try and reload the page. JS isn‘t my strong point :P
Sorry I was not aware of that, I do not have an account myself but I can see the article. That’s bad (and it is a pity since the article is really good), I’ll avoid sharing medium articles from now on.
There is a very interesting new article on it! Hackers Tell the Story of the Twitter Attack From the Inside https://www.nytimes.com/2020/07/17/technology/twitter-hackers-interview.html
To be fair, I’ve never felt a need for binary distributions.
If you don’t care about carbon usage, it’s fair to say the default toolchain is quite fast to build.
However, you might want to use
flambda
orflambda2
compilers, i.e. OCaml with steroids, and they will probably need much more time to build, than just downloading the corresponding binaries.About opam-repository: it was a decision by the maintainers to keep all versions of packages, and that decision progressively makes the repo huge. I personally think it’s a mistake, only a few versions of every package really need to be kept, and this small set could easily be computed so that every package could be installed for every OCaml version it was available for, while still keeping the minimal number of versions.
I agree with you, but the last two comments of avoiding the cmt* files by default and omitting linkall where not necessary seem to be pretty reasonable to implement. I wonder if it would also reduce compilation times
as the article notes, an opam switch effectively maintains a “binary distribution” within
.opam
, so the increase in size affects everyone.The
opam-bin
plugin can be used to share files (through hard-links) between switches containing the same OCaml compiler (typically, a global switch and per-project local switches), making your disk space usage a little smaller.