As far as I can tell, this doesn’t actually refute the central point of the post it’s disagreeing with (that Julia has correctness problems). It seems to basically be listing a bunch of cool Julia features, which is nice, but not if they don’t produce correct results?
As someone who isn’t a Julia programmer, the biggest omission in the original post and the thing that makes it hard to respond to, is the lack of an alternative. If the author doesn’t recommend Julia anymore, what do they recommend? As I see it, you have two high-level choices for numerical computation:
Write everything from scratch.
Rely on an existing ecosystem.
Option 1 is a terrible idea for anything non-trivial. If I tried to implement a statistics package then I can guarantee, with 100% certainty, that it would contain more bugs than an off-the-shelf Julia one, for example. If you eliminate option 1, then you have a choice between a fairly small number of ecosystems and most of the criticisms from the article seem to be applicable to most of them. The concrete example of @inbounds is interesting, but it gives the same behaviour as unannotated C/C++ and most of the Python and other scripting-language alternatives rely on C/C++/Fortran implementations for their hot paths. If Julia ignored @inbounds in debug builds (does it already?) then I’d assume that most of these cases would be caught: you’d hit them during development and if you don’t hit them then the same shape of data in production won’t trigger them.
Yeah basically for a lot problems your choices are Matlab, Python, C/C++, or Fortran … from that perspective Julia looks very good. It’s faster than the first 2 and higher level than the latter 2.
But I’d say Julia is a pretty specialized language; some people have been trying to use it as a general purpose language.
Yes, @inbounds is always ignored during unit testing, and whenever you pass the command-line flag to do so (e.g. which you might do for integration testing or even deploying “correctness critical code”)
Just a correction: That composability is only possible through a classless language design is simply false. For example NumPy has the array interface protocol. You can call numpy.sin(array) (sine of the variable array), where array could be a NumPy array, a cupy array in GPU memory, a Dask array distributed on a cluster, a combination of both or any other object which supports the array interfaces. Those libraries are developed by different people and organizations and besides talking about the standardization of the array interface they don‘t communicate much in order to make it work.
Having said that, I would be very keen on learning the advantages and addition of Julia multiple dispatch beyond that.
Having said that, I would be very keen on learning the advantages and addition of Julia multiple dispatch beyond that.
I’m haven’t used Julia, but I know function dispatch from R, and object-dot-method dispatch from Python/pandas, and I can tell you a few advantages of function dispatch. They boil down to this: you can start a new package to expand what an existing class can do.
I’m going to use “method” to mean a (function, class) combination, regardless of whether it’s done R-style (define a specialized function) or Python-style (define a method in a class’s namespace).
Very brief description of R’s S3 system for generic function dispatch / object-oriented-programming-before-Java-narrowed-what-that-means:
specialized functions have names like summary.glm, summary.lm, summary.data.frame, etc. (NB: these are ordinary functions with a dot in the name.)
User calls the generic function summary(mymodel), which dispatches by looking at the object’s class(es) and then calling an appropriately-named specialized function.
Whether you’re starting with a generic function or with a class, you can list the available methods:
methods(summary) # See the different classes that `summary` supports
methods(class="lm") # See the different functions that support `lm`
First story: Data Frames Forever, or, how R’s Core Data Type Supported Every New Way Of Working
The data frame is a fundamental object type in R. It is defined in the base library, I think.
Here’s how the split-apply-combine work pattern changed over the years – let’s assume a data frame (player, year, runs) of baseballers-or-cricketers-or-joggers, and we want to define a new column ‘career_year’ based on when the sporter first started.
It used to require some rather manual coding that I won’t reproduce here.
All this time, data.frame did not need to change. ‘But that’s just defining new functions,’ I hear you holler. Yes, it is; but because it was functions, you could define new ones in your own package without touching data.frame.
You see this a lot in R: an old package’s interface isn’t much used anymore, but its data structures are still used as a lingua franca while the interfaces evolve.
You don’t see this a lot in Python, because if you want to support new methods .argh(), .grr(), and .hmm() to a class you’ll have to extend the class itself. Pandas, as a result, is a huge library. At some point Pandas sprounted df.pipe(myfunction) syntax, which allows you to pass your own function; but that has not stopped the Pandas package from growing.
Second story: Broom, or, How a Whole Bunch of Classes All Got A New Method.
broom is an R package that extracts information from a variety of fitted models into a tidy data frame.
It does this by defining 3 generic functions: tidy() (per-component), glance() (per-model), and augment() (per-observation); and a whole host of specialized variants to support models from many different packages.
It started smallish, and its obvious usefulness and extensibility made it really take off. It now supports a whole host of models from a host of different packages: some unilaterally implemented in the Broom package; some were contributed to Broom by package authors;
And I seem to recall, but cannot currently search, that some packages include functions like tidy.ourmodeltype themselves in order to support Broom’s interface. And that is only possible if it doesn’t matter where the specialized myfunction.myclass lives.
NB: this is different from giving a new class a few of the same methods (same interface) as an existing class; this is about thinking up a few new methods (an interface), and then supporting a whole bunch of already-existing classes.
… I am fully aware that the above is neither rigorous argumentation nor inspiring rhetoric, but I hope it nonetheless captures some of the positive effects of namespace-agnostic method dispatch that I’ve seen in the R ecosystem. Which probably also apply to Julia.
you can start a new package to expand what an existing class can do.
You can also do this in a number of OO languages that support extending classes. These include Objective-C (which calls these “categories”), Swift, and … damn, I’m drawing a blank on the others. (I’m sure Ruby, because you can do anything to a class at runtime in Ruby.)
The common functionality is that you can declare an extension of a class, regardless of its provenance, and within that scope add methods or declare conformance with protocols/interfaces.
Nim is interesting in that it blurs together functions and methods, so f(x, y) is semantically the same as x.f(y). Since it also has C++-style overloading by parameter type, you get a lot of the same functionality as Julia/R, with the caveat that it’s (mostly) earlier-bound, i.e. the binding is done by declared type not runtime type. (The exception is that you can declare polymorphic methods on classes, and calls to those are late-bound, “virtual” in C++ parlance.)
Sometimes I wonder if it’s the case that any language you use enough for it to become your fave will have this problem. I’ve been using Chicken for twenty years and it feels like with almost every project I start in it I find upstream bugs. And then I think: I should switch languages and then I think: but then those languages will also have bugs that I’d find after a while.
I am not sure, I would guess that the presence of bugs is a function both of how it is developed (like what amount of academic rigor is applied in its development) and how many users it has (to shake out the bugs in the corner cases).
As far as I can tell, this doesn’t actually refute the central point of the post it’s disagreeing with (that Julia has correctness problems). It seems to basically be listing a bunch of cool Julia features, which is nice, but not if they don’t produce correct results?
As someone who isn’t a Julia programmer, the biggest omission in the original post and the thing that makes it hard to respond to, is the lack of an alternative. If the author doesn’t recommend Julia anymore, what do they recommend? As I see it, you have two high-level choices for numerical computation:
Option 1 is a terrible idea for anything non-trivial. If I tried to implement a statistics package then I can guarantee, with 100% certainty, that it would contain more bugs than an off-the-shelf Julia one, for example. If you eliminate option 1, then you have a choice between a fairly small number of ecosystems and most of the criticisms from the article seem to be applicable to most of them. The concrete example of @inbounds is interesting, but it gives the same behaviour as unannotated C/C++ and most of the Python and other scripting-language alternatives rely on C/C++/Fortran implementations for their hot paths. If Julia ignored @inbounds in debug builds (does it already?) then I’d assume that most of these cases would be caught: you’d hit them during development and if you don’t hit them then the same shape of data in production won’t trigger them.
Yeah basically for a lot problems your choices are Matlab, Python, C/C++, or Fortran … from that perspective Julia looks very good. It’s faster than the first 2 and higher level than the latter 2.
But I’d say Julia is a pretty specialized language; some people have been trying to use it as a general purpose language.
Yes, @inbounds is always ignored during unit testing, and whenever you pass the command-line flag to do so (e.g. which you might do for integration testing or even deploying “correctness critical code”)
One of the issues that Yuri brought up in his original post was concerning incorrect computations when composing multiple types from different repositories, in this case, StatsBase.jl code that was quite old. There were some legit bugs W.R.T bounds handling that have been addressed since. See: https://discourse.julialang.org/t/discussion-on-why-i-no-longer-recommend-julia-by-yuri-vishnevsky/81151/14
Just a correction: That composability is only possible through a classless language design is simply false. For example NumPy has the array interface protocol. You can call
numpy.sin(array)
(sine of the variablearray
), where array could be a NumPy array, a cupy array in GPU memory, a Dask array distributed on a cluster, a combination of both or any other object which supports the array interfaces. Those libraries are developed by different people and organizations and besides talking about the standardization of the array interface they don‘t communicate much in order to make it work.Having said that, I would be very keen on learning the advantages and addition of Julia multiple dispatch beyond that.
I’m haven’t used Julia, but I know function dispatch from R, and object-dot-method dispatch from Python/pandas, and I can tell you a few advantages of function dispatch. They boil down to this: you can start a new package to expand what an existing class can do.
I’m going to use “method” to mean a (function, class) combination, regardless of whether it’s done R-style (define a specialized function) or Python-style (define a method in a class’s namespace).
Very brief description of R’s S3 system for generic function dispatch / object-oriented-programming-before-Java-narrowed-what-that-means:
specialized functions have names like
summary.glm
,summary.lm
,summary.data.frame
, etc. (NB: these are ordinary functions with a dot in the name.)User calls the generic function
summary(mymodel)
, which dispatches by looking at the object’s class(es) and then calling an appropriately-named specialized function.Whether you’re starting with a generic function or with a class, you can list the available methods:
(Try here: https://rdrr.io/snippets/)
Further reading in the docs and Wickham’s Advanced R chapter on S3
Alright, on to the stories.
First story: Data Frames Forever, or, how R’s Core Data Type Supported Every New Way Of WorkingThe data frame is a fundamental object type in R. It is defined in the base library, I think.
Here’s how the split-apply-combine work pattern changed over the years – let’s assume a data frame (player, year, runs) of baseballers-or-cricketers-or-joggers, and we want to define a new column ‘career_year’ based on when the sporter first started.
It used to require some rather manual coding that I won’t reproduce here.
Then came ddply:
and then came dplyr (and the
%>%
pipe syntax innovation):All this time,
data.frame
did not need to change. ‘But that’s just defining new functions,’ I hear you holler. Yes, it is; but because it was functions, you could define new ones in your own package without touchingdata.frame
.You see this a lot in R: an old package’s interface isn’t much used anymore, but its data structures are still used as a lingua franca while the interfaces evolve.
You don’t see this a lot in Python, because if you want to support new methods .argh(), .grr(), and .hmm() to a class you’ll have to extend the class itself. Pandas, as a result, is a huge library. At some point Pandas sprounted
Second story: Broom, or, How a Whole Bunch of Classes All Got A New Method.df.pipe(myfunction)
syntax, which allows you to pass your own function; but that has not stopped the Pandas package from growing.broom
is an R package that extracts information from a variety of fitted models into a tidy data frame.It does this by defining 3 generic functions:
tidy()
(per-component),glance()
(per-model), andaugment()
(per-observation); and a whole host of specialized variants to support models from many different packages.It started smallish, and its obvious usefulness and extensibility made it really take off. It now supports a whole host of models from a host of different packages: some unilaterally implemented in the Broom package; some were contributed to Broom by package authors;
And I seem to recall, but cannot currently search, that some packages include functions like
tidy.ourmodeltype
themselves in order to support Broom’s interface. And that is only possible if it doesn’t matter where the specializedmyfunction.myclass
lives.NB: this is different from giving a new class a few of the same methods (same interface) as an existing class; this is about thinking up a few new methods (an interface), and then supporting a whole bunch of already-existing classes.
… I am fully aware that the above is neither rigorous argumentation nor inspiring rhetoric, but I hope it nonetheless captures some of the positive effects of namespace-agnostic method dispatch that I’ve seen in the R ecosystem. Which probably also apply to Julia.
You can also do this in a number of OO languages that support extending classes. These include Objective-C (which calls these “categories”), Swift, and … damn, I’m drawing a blank on the others. (I’m sure Ruby, because you can do anything to a class at runtime in Ruby.)
The common functionality is that you can declare an extension of a class, regardless of its provenance, and within that scope add methods or declare conformance with protocols/interfaces.
Nim is interesting in that it blurs together functions and methods, so f(x, y) is semantically the same as x.f(y). Since it also has C++-style overloading by parameter type, you get a lot of the same functionality as Julia/R, with the caveat that it’s (mostly) earlier-bound, i.e. the binding is done by declared type not runtime type. (The exception is that you can declare polymorphic methods on classes, and calls to those are late-bound, “virtual” in C++ parlance.)
Sometimes I wonder if it’s the case that any language you use enough for it to become your fave will have this problem. I’ve been using Chicken for twenty years and it feels like with almost every project I start in it I find upstream bugs. And then I think: I should switch languages and then I think: but then those languages will also have bugs that I’d find after a while.
I am not sure, I would guess that the presence of bugs is a function both of how it is developed (like what amount of academic rigor is applied in its development) and how many users it has (to shake out the bugs in the corner cases).