“People who really love a language criticizing it” is one of my favorite genres of blog post. I don’t know why but I love to read this stuff even if I have no intention of using Julia.
The startup and memory-intensive issues are really familiar to me as someone who’s spent a lot of time in Clojure. Really hammers home the point that there’s no such thing as a one-size-fits-all language, and it’s OK to focus on being excellent in a niche even if it means that no one can use your language to implement (say) grep.
Such sentiment is frankly one of my favorite indications of someone’s experience with a tool. One cannot truly understand a tool until one can also (constructively) criticize it. My favorite software engineer interview question is to ask the interviewee what their favorite or preferred language is and what they like about it, then ask them what they dislike about it or warn new users about it. I get stereotypical answers to the softball first part but some really interesting answers to the second part when someone really knows their stack or some real indication that someone just doesn’t have the depth of experience I may be looking for in a role necessitating significant knowledge of both sides of the blade.
My favorite software engineer interview question is to ask the interviewee what their favorite or preferred language is and what they like about it, then ask them what they dislike about it or warn new users about it.
My problem is I’m way more enthusiastic talking about the second one than the first one. There’s just so much more to talk about there!
But the problem is fundamentally unsolvable, because it’s built into Julia on a basic design level.
Would love to read more about this! My naive understanding is that this mostly just quality of implementation issue: Julia uses LLVM to compile code during execution, and lacks a tiered JIT. Are there reasons why we can’t just add interpreter tier to Julia, beyond “someone has to do that work”?
My (mostly uninformed) feeling is that the job of the compiler is to get rid of abstractions (e.g. monomorphizing code), and perfect optimization is a global process. So when any piece of code can be redefined at any time, that makes recompilation slow.
Of course that doesn’t mean you can’t do some tricks, but it’s working against the grain of the problem.
I’d also say from reading about v8 over the years, the tiers seem to become a huge tax. It’s not just “someone has to do that work”, but “every language change from now on requires more work” (from a limited group of people; it’s not orthogonal).
I don’t think the Julia language is done evolving, so I can see that duplicating the semantics of the language in multiple places is something they would be reluctant to do. (again this is pure speculation) Hopefully there is some kind of IR that makes this less burdensome, but compilers are always messier than you’d like to think :)
edit: I think the quote is a shorter way to explain it.
Just as separate compilation isn’t possible for C++ template code, it’s a challenge for Julia as well.
e.g. if all your C++ code is in templates – and there are some styles that lean that way for zero runtime cost – then C++ doesn’t have incremental compilation at all. It has plenty of duplicate compilation if you like :)
It also sounds like they can improve the caching / precompilation:
While currently precompile can only save the time spent on type-inference, in the long run it may be hoped that Julia will also save the results from later stages of compilation. If that happens, precompile will have even greater effect, and the savings will be less dependent on the balance between type-inference and other forms of code generation.
Yeah, caching is the first thing I thought of when I saw the “unsolvable” problem. I wonder if caching the whole heap could be an option here as it is in some Standard ML implementations, SBCL, …
Given the existence of things like ghci/runhaskell that also have to compile a complex language before they start to run, I feel like it can’t be unsolvable
You can already set the compilation level of Julia per function, and the lowest level is sort of an interpreter.
There has been some work on a fully ahead of time compiler for Julia and the core team have mentioned using more conventional JIT techniques with an interpreter level, too.
Yeah I mean it’s not hard to see why “just go and create a second implementation of your language that retains perfect compatibility with the first” isn’t really something a lot of people want to hear.
I imagine there’s a way too could do it incrementally with careful planning, but I don’t know enough about Julia internals to make any real statements about the level of difficulty that entails. Could be really easy for all I know
We might be able to use a Rust library in Julia with little friction, but no-one would use a Julia library if they could avoid it. So if you want to code up some universally used library, you better go with a static language.
Julia reminds me of Terra in that it’s sort of a dynamically-typed language that can generate C-like statically typed code. Terra lets you export the generated low-level code:
Terra was designed so that it can run independently from Lua. In fact, if your final program doesn’t need Lua, you can save Terra code into a .o file or executable.
Couldn’t you do the same thing in Julia? Or, are Julia programs more likely to actually depend on LLVM at runtime? For example, if a library is very generic, then you’d need LLVM to be able to instantiate the functions on new types. (But if you had a very generic Rust library, that would also be hard to call from C, for the same reason: you’d need the Rust compiler to instantiate the functions on new types.)
Here’s what I posted on the Julia Zulip. For context, I’ve been using Julia for several years and am a minor contributor to the core language and ecosystem.
Spicy. Thanks for posting.
I think 2, 3 and 4 are serious problems for common use cases, but disagree that they’re impossible to solve. At least for smaller programs, a fully ahead of time compiler (like the GPU ones we already have) could solve this and it’s a project that gets some intermittent interest. PackageCompiler is suitable for some use cases, too.
Agree with 5, 6, 7, 8. Tho I think some fairly simple support for declaring a function to be a required part of an interface could be enough to mostly deal with that.
Generally, I do methodswith(AbstractDict) or whatever, but it’s definitely not exactly what I want.
Agree that the type system situation is a bit odd and the Rust-style interface system makes much more sense to me.
Definitely something to be said that inheritance of data makes implementing new types very quick (tho it can also be dangerous and confusing). I think we could get most of the ease by having some commonly accepted way of defining methods for the interface of the wrapped type that just forward to the wrapped type. There are some macros for this, but there’s not a commonly accepted way of doing it and you have to discover what the interface is first (and you need to keep your subclass up to date with changes to that interface).
Undecided on 9, 10. I mostly agree with 10, but also know that others like the filter and map functions as-is.
I think everyone agrees that not having a proper Path type was a mistake and there are semi-frequent threads about introducing one and eventually deprecating our use of strings as paths.
Unfortunately the IO and paths stuff was copied from python shortly before python introduced the path types, so we inherited their mistakes (as they inherited the mistakes of other programming languages).
I think everyone agrees that not having a proper Path type was a mistake and there are semi-frequent threads about introducing one and eventually deprecating our use of strings as paths.
Unfortunately the IO and paths stuff was copied from python shortly before python introduced the path types, so we inherited their mistakes (as they inherited the mistakes of other programming languages).
They could always fix it. It’s a bit like with climate change, the longer you wait, the more painful the change becomes.
Same with …
map, filter and split are eager, returning Array.
… and a few other things. (I think the only language still in denial about this issue is Scala.)
Yes, that may require backtracking on …
Julia released 1.0 in 2018, and has been committed to no breakage since then.
… but only shows that people shouldn’t make promises they can’t keep.
For me, such promises are a sign of language design immaturity – it’s the 21st century, design your language to provide facilities to deal with necessary changes, instead of promising not to change anything!
Every language needs a well-defined process for deprecation and removal of language and library items, simply winging it is not an option.
Julia v2 is coming soon and some of these things are scheduled to be changed then, so there’s the deprecation plan :)
map and friends may change in v2 as well. I’m undecided on replacing them with generators, but I’m confident the Julia contrubtors will do something sensible with them.
Yeah, we agree that we should have a Path type, but there’s limited engineering resource and it’s just not enough of a priority for anyone yet, so no one has done it yet, tho third party packages have existed for a while: https://github.com/rofinn/FilePaths.jl
Fix everything you can. Don’t put things off. When the time of Julia 2 → Julia 3 comes you absolutely want to have less broken things to fix than you had in Julia 1 → Julia 2.
I don’t use Julia myself – just thought this was an interesting article – but as I understand it, its main goal is a more modern replacement for Fortran, mostly for use in scientific computing and other types of “number crunching” applications. Fortran is still wide-spread in that area due to its low overhead and high performance, while still being relatively easy.
that’s another gripe, there is no such type as a Path in Julia - it just uses strings. Why not? I honestly don’t know, other than perhaps the Julia devs wanted to get 1.0 out and didn’t have time to implement them.
Well this is just how paths are represented in the Unix C API: plain old nul-terminated strings. And if I remember correctly Windows isn’t much different.
The problem isn’t the underlying implementation, the problem is that paths are not strings, they just happen to be represented as them. It’d be like if Rust used Vec<u8> as its string type instead of String/str, or if instead of std::time::Instant you had u32 or whatever.
The point is that OsString has different implementations based on what the underlying operating system APIs uses.¹
This is a requirement for many cases including “OS paths allow a superset of bytes than what would be valid in the languages’ string encoding” down to avoiding “we helpfully converted the OS paths to UTF-8 for you and now we can’t find the file using that string anymore, because OS path → language string → OS path doesn’t result in the same bytes”.
I had AbsolutePaths and RelativePaths (to prevent invalid path operations at compile-time) with PathSegments that were either OsStrings or placeholders like <ROOT_DIR>, <HOME_DIR>, <CACHE_DIR> etc. (that the library understood to serialize and deserialize such that you could e. g. use these paths in config files without having to manually implement this for each use-case).
“People who really love a language criticizing it” is one of my favorite genres of blog post. I don’t know why but I love to read this stuff even if I have no intention of using Julia.
The startup and memory-intensive issues are really familiar to me as someone who’s spent a lot of time in Clojure. Really hammers home the point that there’s no such thing as a one-size-fits-all language, and it’s OK to focus on being excellent in a niche even if it means that no one can use your language to implement (say)
grep
.Such sentiment is frankly one of my favorite indications of someone’s experience with a tool. One cannot truly understand a tool until one can also (constructively) criticize it. My favorite software engineer interview question is to ask the interviewee what their favorite or preferred language is and what they like about it, then ask them what they dislike about it or warn new users about it. I get stereotypical answers to the softball first part but some really interesting answers to the second part when someone really knows their stack or some real indication that someone just doesn’t have the depth of experience I may be looking for in a role necessitating significant knowledge of both sides of the blade.
My problem is I’m way more enthusiastic talking about the second one than the first one. There’s just so much more to talk about there!
Would love to read more about this! My naive understanding is that this mostly just quality of implementation issue: Julia uses LLVM to compile code during execution, and lacks a tiered JIT. Are there reasons why we can’t just add interpreter tier to Julia, beyond “someone has to do that work”?
There is some debate about it here, including from a core Julia dev.
https://news.ycombinator.com/item?id=27961251
My (mostly uninformed) feeling is that the job of the compiler is to get rid of abstractions (e.g. monomorphizing code), and perfect optimization is a global process. So when any piece of code can be redefined at any time, that makes recompilation slow.
Of course that doesn’t mean you can’t do some tricks, but it’s working against the grain of the problem.
I’d also say from reading about v8 over the years, the tiers seem to become a huge tax. It’s not just “someone has to do that work”, but “every language change from now on requires more work” (from a limited group of people; it’s not orthogonal).
I don’t think the Julia language is done evolving, so I can see that duplicating the semantics of the language in multiple places is something they would be reluctant to do. (again this is pure speculation) Hopefully there is some kind of IR that makes this less burdensome, but compilers are always messier than you’d like to think :)
edit: I think the quote is a shorter way to explain it.
e.g. if all your C++ code is in templates – and there are some styles that lean that way for zero runtime cost – then C++ doesn’t have incremental compilation at all. It has plenty of duplicate compilation if you like :)
It also sounds like they can improve the caching / precompilation:
https://julialang.org/blog/2021/01/precompile_tutorial/
Yeah, caching is the first thing I thought of when I saw the “unsolvable” problem. I wonder if caching the whole heap could be an option here as it is in some Standard ML implementations, SBCL, …
Given the existence of things like ghci/runhaskell that also have to compile a complex language before they start to run, I feel like it can’t be unsolvable
I have had this exact thought; I don’t think there is a fundamental reason why not, though it would take a tremendous amount of refactoring.
You can already set the compilation level of Julia per function, and the lowest level is sort of an interpreter.
There has been some work on a fully ahead of time compiler for Julia and the core team have mentioned using more conventional JIT techniques with an interpreter level, too.
Yeah I mean it’s not hard to see why “just go and create a second implementation of your language that retains perfect compatibility with the first” isn’t really something a lot of people want to hear.
I imagine there’s a way too could do it incrementally with careful planning, but I don’t know enough about Julia internals to make any real statements about the level of difficulty that entails. Could be really easy for all I know
Julia reminds me of Terra in that it’s sort of a dynamically-typed language that can generate C-like statically typed code. Terra lets you export the generated low-level code:
Couldn’t you do the same thing in Julia? Or, are Julia programs more likely to actually depend on LLVM at runtime? For example, if a library is very generic, then you’d need LLVM to be able to instantiate the functions on new types. (But if you had a very generic Rust library, that would also be hard to call from C, for the same reason: you’d need the Rust compiler to instantiate the functions on new types.)
Here’s what I posted on the Julia Zulip. For context, I’ve been using Julia for several years and am a minor contributor to the core language and ecosystem.
Spicy. Thanks for posting.
I think 2, 3 and 4 are serious problems for common use cases, but disagree that they’re impossible to solve. At least for smaller programs, a fully ahead of time compiler (like the GPU ones we already have) could solve this and it’s a project that gets some intermittent interest. PackageCompiler is suitable for some use cases, too.
Agree with 5, 6, 7, 8. Tho I think some fairly simple support for declaring a function to be a required part of an interface could be enough to mostly deal with that.
Generally, I do methodswith(AbstractDict) or whatever, but it’s definitely not exactly what I want.
Agree that the type system situation is a bit odd and the Rust-style interface system makes much more sense to me.
Definitely something to be said that inheritance of data makes implementing new types very quick (tho it can also be dangerous and confusing). I think we could get most of the ease by having some commonly accepted way of defining methods for the interface of the wrapped type that just forward to the wrapped type. There are some macros for this, but there’s not a commonly accepted way of doing it and you have to discover what the interface is first (and you need to keep your subclass up to date with changes to that interface).
Undecided on 9, 10. I mostly agree with 10, but also know that others like the filter and map functions as-is.
I think everyone agrees that not having a proper Path type was a mistake and there are semi-frequent threads about introducing one and eventually deprecating our use of strings as paths.
Unfortunately the IO and paths stuff was copied from python shortly before python introduced the path types, so we inherited their mistakes (as they inherited the mistakes of other programming languages).
They could always fix it. It’s a bit like with climate change, the longer you wait, the more painful the change becomes.
Same with …
… and a few other things. (I think the only language still in denial about this issue is Scala.)
Yes, that may require backtracking on …
… but only shows that people shouldn’t make promises they can’t keep.
For me, such promises are a sign of language design immaturity – it’s the 21st century, design your language to provide facilities to deal with necessary changes, instead of promising not to change anything!
Every language needs a well-defined process for deprecation and removal of language and library items, simply winging it is not an option.
Julia v2 is coming soon and some of these things are scheduled to be changed then, so there’s the deprecation plan :)
map
and friends may change in v2 as well. I’m undecided on replacing them with generators, but I’m confident the Julia contrubtors will do something sensible with them.Yeah, we agree that we should have a
Path
type, but there’s limited engineering resource and it’s just not enough of a priority for anyone yet, so no one has done it yet, tho third party packages have existed for a while: https://github.com/rofinn/FilePaths.jlMy advice after having done this a few times:
Fix everything you can. Don’t put things off. When the time of Julia 2 → Julia 3 comes you absolutely want to have less broken things to fix than you had in Julia 1 → Julia 2.
The complaints about subtyping are… Interesting in a language that has unlimited multiple dispatch how often do you need to lean heavily on subtyping?
What’s good about it? Honestly asking. I’ve tried to browse through the intro material but did not find my answer to this question.
I don’t use Julia myself – just thought this was an interesting article – but as I understand it, its main goal is a more modern replacement for Fortran, mostly for use in scientific computing and other types of “number crunching” applications. Fortran is still wide-spread in that area due to its low overhead and high performance, while still being relatively easy.
See e.g. Julia: come for the syntax, stay for the speed, and Fast as Fortran, Beautiful as Python pretty much sums up its value proposition.
If you’re an application programmer or the like then Julia is probably a bad fit.
Well this is just how paths are represented in the Unix C API: plain old nul-terminated strings. And if I remember correctly Windows isn’t much different.
By the way, Rust’s PathBuf is just a wrapper over OsString. There’s nothing fancy under the hood: https://doc.rust-lang.org/src/std/path.rs.html#1076-1078
The problem isn’t the underlying implementation, the problem is that paths are not strings, they just happen to be represented as them. It’d be like if Rust used Vec<u8> as its string type instead of String/str, or if instead of std::time::Instant you had u32 or whatever.
The point is that
OsString
has different implementations based on what the underlying operating system APIs uses.¹This is a requirement for many cases including “OS paths allow a superset of bytes than what would be valid in the languages’ string encoding” down to avoiding “we helpfully converted the OS paths to UTF-8 for you and now we can’t find the file using that string anymore, because OS path → language string → OS path doesn’t result in the same bytes”.
¹ https://doc.rust-lang.org/std/ffi/struct.OsString.html
Rust’s
PathBuf
is also a gigantic pain in the ass to use. Paths should be lists of path components, IMO. The string is just a serialization format.Not to mention that the semantics of
Path::join
(i. e.PathBuf::push
) are just crazy.Yeah, I want to have an operation that does two completely different things without telling me which one actually happened! /s
What would the type of the individual components be?
I built something like this a while ago:
I had
AbsolutePath
s andRelativePath
s (to prevent invalid path operations at compile-time) withPathSegments
that were eitherOsString
s or placeholders like<ROOT_DIR>
,<HOME_DIR>
,<CACHE_DIR>
etc. (that the library understood to serialize and deserialize such that you could e. g. use these paths in config files without having to manually implement this for each use-case).