I had an extensive config, with lots of custom stuff in it, but eventually realized that 90% of it could be covered by spacemacs, holy mode (I’ve been using emacs for a long time, no interest in vim bindings!), and a few settings on top of that. So my .spacemacs has about 170 lines of miscellaneous stuff in the user config, 35 lines of package/layer configurations, and some various top-level config stuff. And since switching to that, a couple years ago, I’ve essentially stopped messing around with the config: it’s emacs, and it gets out of my way.
A (perhaps) bigger problem with the single-mindedness around the progress/preservation type soundness is that it relies (fundamentally) upon “not going wrong” at runtime actually meaningfully capturing whatever static invariants your type system was supposed to enforce.
For example: let’s say you come up with a fancy new type system to enforce array bounds checks statically. Since you did this to make your language fast, you don’t check anything at runtime. But, to ensure this is safe, you prove that your type system is sound, and all is well. Except all this naive “type soundness” result will say is that you don’t get “stuck”, and you (intentionally) don’t actually check anything at runtime to ensure that you are within bounds, so you never get stuck anyway: your soundness theorem is meaningless. It would only actually tell you something if you actually checked every access for boundedness and failed (got “stuck”) on any out-of-bounds access. In that case, not getting stuck actually means that you never go out of bounds, and your type system does something.
Of course, a more sensible approach is probably to change the theorem itself, so that rather than generic “stuckness” (which then requires you to augment the runtime semantics with stuff you don’t want it to have), you actually describe what behavior you want not to occur.
Maybe there’s more details not included in this article but it seems like the bigger problem is numerical instability in whatever algorithms they are dealing with, not the unpredictability of sort.
From the paper:
In the end, the inconsistency was traced to differences in the default file-sorting algorithm in Python across platforms, as shown in Figure 2, and the fact that, as written, the script “nmr-data_compilation” assumes that the frequency and NMR files are sorted in the same order.
So it doesn’t have anything to do with numerical instability.
bigger problem is numerical instability in whatever algorithms they are dealing with, not the unpredictability of sort.
The rules of glob are well defined by POSIX. However, it’s likely the case that since Python runs on non-POSIX systems, it implements its own version entirely, leading to a mismatch in expectations, and therefore a problem. (Though, I don’t recall if glob specifies the order, or just the special characters and how they match.)
So, given that chemists aren’t known for the most elegant of programs, and he mentions “When I wrote the scripts 6 years ago, the OS was able to handle the sorting,” (i.e. he doesn’t fully understand the underlying implementation, but expects a certain behavior – not an uncommon thing from even experienced, professional programmers, mind you) my guess is that there’s something more sinister going on like:
part1 = process1(files[0:2])
part2 = process2(files[2:7])
part3 = process(3(files[7:])
result = part1 - (part2 / part3)
Regardless of the actual root cause, the fact that 150-160 studies were cargo culted based on the original research is the real story here.
From the glob specification:
The pathnames are in sort order as defined by the current setting of the LC_COLLATE category, see the XBD specification, LC_COLLATE.
If I understand correctly, two computers running the same OS can sort in different ways and still be POSIX compliant.
(That might be out of date)
my guess is that there’s something more sinister going on like
You don’t have to guess. The code for both is available in the “supplementary materials” zip files for the original Nature paper and the recent paper:
The script reads all the *.out
files into one big list, which was previously
not explicitly sorted:
# https://gist.github.com/sjl/a675c449a5452cb96a9fd8ce49741888#file-foo-py-L362-L370
def read_gaussian_outputfiles():
list_of_files = []
for file in glob.glob('*.out'):
list_of_files.append(file)
if (len(list_of_files) == 0):
for file in glob.glob('*.log'):
list_of_files.append(file)
list_of_files.sort()
return list_of_files
The data for each molecule is contained in two separate files: nmr-….out
and
freq-….out
. So then the script splits the big list of files into two separate
lists of nmr
and freq
files by iterating over the list and skipping files
that don’t contain a particular word:
# https://gist.github.com/sjl/a675c449a5452cb96a9fd8ce49741888#file-foo-py-L346-L360
def read_gaussian_nmr_outfiles(list_of_files):
list_of_nmr_outfiles = []
for file in list_of_files:
if file.find('nmr-') !=-1:
list_of_nmr_outfiles.append([file,int(get_conf_number(file)),open(file,"r").readlines()])
return list_of_nmr_outfiles
def read_gaussian_freq_outfiles(list_of_files):
list_of_freq_outfiles = []
for file in list_of_files:
if file.find('freq-') !=-1:
list_of_freq_outfiles.append([file,int(get_conf_number(file)),open(file,"r").readlines()])
return list_of_freq_outfiles
These two lists are processed individually for a while and then are passed to
get_chemical_shifts
, which iterates through the nmr
list and retrieves the
corresponding freq
entry by indexing into the freq
list (comments mine):
# https://gist.github.com/sjl/a675c449a5452cb96a9fd8ce49741888#file-foo-py-L285-L301
def get_chemical_shifts(lofc_nmr, lofe):
ATOM_NUMBER = 0; ATOM_SYMBOL = 1; ISOTROPIC_VALUE = 4
counter = 0
# ITERATING THROUGH FIRST LIST
for file in lofc_nmr:
proton_chemicalshift_table = []
carbon_chemicalshift_table = []
for line in file[2]:
if "Isotropic" in line:
linesplit = line.split()
if linesplit[ATOM_SYMBOL] == "C":
carbon_chemicalshift_table.append([linesplit[ATOM_NUMBER],linesplit[ISOTROPIC_VALUE]])
if linesplit[ATOM_SYMBOL] == "H":
proton_chemicalshift_table.append([linesplit[ATOM_NUMBER],linesplit[ISOTROPIC_VALUE]])
# INDEXING INTO SECOND LIST
lofe[counter].append(carbon_chemicalshift_table)
lofe[counter].append(proton_chemicalshift_table)
counter += 1
return lofe
If the original list is sorted, everything works, because it looks like:
[freq1, freq2, freq3, nmr1, nmr2, nmr3]
and then gets split into:
[freq1, freq2, freq3]
[nmr1, nmr2, nmr3]
and then the iteration and indexing pairs the correct files with each other. But if the original list isn’t sorted:
[nmr3, freq1, freq3, nmr1, freq2, nmr2]
then the splitting produces lists like:
[nmr3, nmr1, nmr2]
[freq1, freq3, freq2]
and the iteration/indexing pairs up the wrong files with each other.
You don’t have to guess. The code for both is available in the “supplementary materials” zip files for the original Nature paper and the recent paper:
I didn’t have time to research the issue. Perhaps that should have meant I keep my mouth shut, but the obvious conclusion (which happened to be in the same ballpark, but out in left field) is a list gets built and split up in some way for different processes.
Thanks for doing the research and analysis of the root cause. It was a really interesting read!
Python’s glob.glob()
indeed doesn’t use the platform libc’s glob()
at all. Instead it uses functions from Python’s os
module to get a list of filenames in the searched directory/directories, and does matching of the pattern against the filenames in Python. You can find the Python 3.8.0 implementation of glob.glob()
here, for example.
The cross-platform variation comes from the first part of that: Python’s os.scandir()
/os.listdir()
are implemented in C and call the appropriate low-level directory-listing functions for the operating system you’re using. And that’s not guaranteed to order the same way, or at all, on every platform/filesystem.
And if anyone’s wondering why Python does it this way: I don’t know for certain, but my guess from reading the implementation of the Python os.scandir()
is mostly for normalization of the different platforms’ directory-listing results.
Kinda related… is there a way to get my email – on a domain I control – to be delivered to two different mail providers? I’d love to try out FastMail, but not at the cost of losing my existing mail service during the trial period.
At a protocol level, no (redundancy of MX records is for when one server is down) — but most mail providers have some mechanism of transparently forwarding messages which should allow this kind of trial?
but most mail providers have some mechanism of transparently forwarding messages which should allow this kind of trial?
This is exactly what I do when trialing new providers I am interested in. This way no mail is lost and I can still play around with the new provider without fear of losing anything.
I think you can try out a mail provider on a subdomain. I don’t remember if I ever tried that myself though lol
The challenges of teaching software engineering are git, commandline fiddling, etc? What about the (essentially fundamental) problem that the discipline exists to solve problems at a scale that seem impossible to replicate in a classroom (certainly a 4 day course) — so you just end up parroting a bunch of stuff, hoping they remember enough so that once they mess it up in production they remember the advice you gave as an alternate approach.
People seem to realize that programming can only be learned by doing (hence our programming courses involve programming). But I’ve asked a bunch of people who teach software engineering how they do that for, e.g., teaching students how to deal with large legacy codebases, and everyone says it’s too hard — they just end up teaching green field project design, at a tiny scale (i.e., the prototype that a professional team could put together in a few days).
Well, the challenges of teaching software engineering are that you need to suffer a load of git and command-line fiddling before you can get to the point where you learn any software engineering. If it were at all realistic to dump people into a Smalltalk, LISP machine, or microcomputer BASIC interpreter, then they would not have those problems. They would, instead, have different ones…
But I’ve asked a bunch of people who teach software engineering how they do that for, e.g., teaching students how to deal with large legacy codebases, and everyone says it’s too hard — they just end up teaching green field project design, at a tiny scale (i.e., the prototype that a professional team could put together in a few days).
We don’t say it’s too hard. In the section on refactoring, for example, we give the student some existing code and encourage them to use the refactorings they know about to restructure the code, to help them understand it and communicate their understanding.
Well, the challenges of teaching software engineering are that you need to suffer a load of git and command-line fiddling before you can get to the point where you learn any software engineering.
This could be a massive endeavour, but… how about cleanly separating the teaching environment from the professional environments? You don’t really give a crap about Git when trying to teach version control. You don’t really give a crap about ls
when trying to teach files and directories. You don’t really give a crap about C when trying to teach imperative programming…
The tools we use professionally do embody the fundamentals you want to teach, and learning them will cause the students to acquire a host of transferable skills. But they will also learn some fairly specific or useless cruft along the way. Such as this idea of staging, which is only useful for fairly big patches, or the fact that git checkout -b my_new_branch
is a shortcut for git branch my_new_branch; git checkout my_new_branch
.
We need a simple, limited command line that is made for teaching how command lines work. We need a simple, limited programming language that is made for teaching programming (or maybe one language per major paradigm). We need half done or buggy projects, made for teaching maintenance. And so on.
Leave the “real” stuff for more advanced courses. And to squash any misgivings about not teaching useful stuff (I remember resenting being confined to the “Caml environment” (basically the OCaml REPL) instead of being taught “real” programming), take a few minutes to talk about all the uninteresting cruft “real” stuff comes with. That it will be taught, but that we need to start with the basics.
What is the size of the codebase they are refactoring? There’s a real phase shift of how refactoring works, I think, once you are dealing with systems that are too large to actually fit inside your head. Either you need to make them instead be made up of smaller parts (which is a refactoring — oops!), or you need to figure out how to make changes that mechanically will be checked (obviously the language you’re using changes the tools available for doing this).
That was something I never quite learned until being confronted with an almost completely untested, ~7 year old large and nearly unreadable ruby code base that nonetheless was making 7 figures for the client (enough that they wanted to keep adding to it, but not enough to do too much work). “Refactoring” that code involved adding tons of instrumentation and then carving out clean, tested code paths, that we knew, from testing and instrumentation, didn’t mess up the rest of the codebase. But these were (intentionally) minimal changes, done almost in parallel with the existing code, because untested ruby code is a pretty hard thing to wrangle.
Marginally off-topic: What is the deal with using braces everywhere if the code is well-indented and semicolons aren’t required? It doesn’t make the code any easier to read, (if anything, it has the opposite effect).
It’s very convenient to be able to automatically indent code. If you re-organize code (say, move it from one scope to another, or copy from one part of the codebase to another), not having to worry about fixing the whitespace manually (especially because screwing this up can have important, semantic, consequences) is a really nice property.
Lol. If you want to advocate for the advantage of having a third-party hosted CMS for a blog, do so, but please don’t put “blazing fast” as the primary reasons in the title, because I’m going to pretty much guarantee that Nginx serving HTML/CSS is waaaay faster, and I can generate that from pretty complex structured data (blog or otherwise) using static site generators.
Uggh. This is lazy reporting:
But computer systems don’t deal well with abstract concepts like “city,” “state,” and “country,” so MaxMind offers up a specific latitude and longitude for every IP address in its databases (including its free, widely-used, open-source database). Along with the IP address and its coordinates is another entry called the “accuracy radius.”
City, state, and country is perfectly easy to represent abstractly (certainly as easy to represent as lat/lon). MaxMind just has a shitty API because they didn’t bother to think about these consequences (that people would take the center of the accuracy radius as the exact location).
I find this site helpful when looking for a new VPN (though they are not free): https://thatoneprivacysite.net/vpn-section/
Consider using ProtonVPN if you’re not sure about which. Their incentives are at least aligned with you vs shady folks and governments. Just be sure to use the jurisdictions that don’t cooperate a lot with the U.S.. Traffic will be slower but less odds of secret warrants.
Mullvad is very well reviewed on That One Privacy Site, and that review was one of the reasons I decided to use it. I’ve been using it for about a year now without problems and without ever giving them any real information about myself.
I’ve never tested it, though I’d like to – they claim that you can mail them cash and they’ll convert it from whatever currency you send and credit it to the account you specify (which is just a number). Given that almost any service will require an email (they don’t) or a credit card (which prepaid cards can get around) or paypal (…), this is a pretty good sign.
I didn’t try the mail-them-cash option, but rather paid in bitcoin, which was straightforward enough.
Can anyone compare Ur/Web to Yesod? From this overview, it sounds like they share many of the same goals surrounding safety.
Yesod doesn’t have code that is transparently compiled to client-side (at least not without a lot of GHCjs hackery, which wouldn’t be part of Yesod), any of the transparent reactive stuff (which works smoothly with the client-server stuff), and in general, at least in my experience (which is somewhat limited – I’ve done a lot of Haskell web dev, though only a little with Yesod, and I’ve built a few applications with Ur/Web, but nothing big), things in Yesod are much more first order. i.e., functions in Ur/Web become handlers in a natural way, can be called by client-side code with rpc, etc. Urls exist, of course (and they are not obfuscated), but you won’t really be using them (even as typed constants like with Yesod) unless interacting from outside the application. And in general everything feels like it is at a much higher level of abstraction, vs. Yesod has the express goal (AFAIU) to essentially make Rails-style web development type safe.
Most of that makes sense, but what do you mean by “won’t really be using [URLs]”? My Yesod experience is also pretty shallow (got a ~2k LOC app with 13 handlers). In a server rendered form, you’ll have something that looks like: <form method=post action=@{FooR var}>
. What looks different in Ur?
Maybe this is just my knowledge being dated (I last experimented with Yesod years ago), but I had thought you had to declare routes to handlers, and use identifiers generated by that process for urls, form actions, etc. Can you just put any function that has the right return type in a url / action? (that’s essentially what Ur/Web lets you do).
I see, and your memory is right, based on my understanding. Sounds like there’s a bit less ceremony. I’ve edited my example for clarity–FooR
is what you use, and it would correspond to a postFooR :: Text -> Handler Html
function.
In case people are looking for more, I wrote a post describing (in semi-literate style) a tiny application I made while experimenting with Ur/Web ~6 years ago. I would guess that the Ur/Web code still works (I don’t think the language has changed much), but the browser stuff might not, and I’m not hosting a copy a demo of the application anymore (sorry!).
https://dbp.io/essays/2013-05-21-literate-urweb-adventure.html
The obsession with connecting teaching with particular languages is totally bizarre. This quote, in particular, is somewhat mind-boggling:
Don’t get me wrong. [SICP is] a good textbook. I’m sure it made me a better programmer. But it wasn’t something I applied very often in my Java/ActionScript/PHP/Python/Ruby/JavaScript career.
If it taught you to be a better programmer, then you absolutely applied it. That’s the whole point. People should be taught how to program, not how to use particular languages (you can learn the basics of a language by reading a book, and the rest you’ll learn by using it).
Now there are plenty of things wrong with SICP, but to say that since it is teaching functional programming, it is totally useless if you aren’t writing Scheme/Clojure/Haskell/OCaml is nuts. It teaches abstraction, problem solving, how to break problems into smaller ones, etc. It’s a particularly spare presentation (as it starts with such small building blocks), which makes it a bit extreme, but the process of building data structures and algorithms out of smaller pieces is an absolutely fundamental part of computer science, and the rush to replace these curricula with “more realistic languages” seems to lose some of this (e.g., everyone wants intro classes in Python, a supposedly “simple” language, but one where it’s impossible to define data structures in a straightforward way without bringing in half of the class system).
If I recall correctly, they mention right in the beginning that what they’re after is figuring out the nature of computation. In that sense, it’s a much more fundamental (dare I say philosophical?) book than most, which may turn some people off or leave them with the idea that they didn’t learn any directly “practical” skill. However, it will shape your thinking about problems in such a way that indeed, you will be a better programmer. It transcends language and even programming paradigm. Of course, you will have a more FP-oriented mindset after you’ve finished it, but that can be applied in almost every language.
I didn’t find this blog post very insightful, and I reject the notion that “FP is hard”. Yes, it’s a different way of thinking, but it’s only difficult if you try to go too fast while teaching it to programmers who are firmly stuck in a different paradigm and expect them to just grok it immediately. Teaching is hard, and teaching FP just seems harder if you are leaning too hard on students already knowing how to program and expect them to just pick it up like the syntax of a new language. It’s actually quite similar to try to teach object-oriented programming properly to people who have very little experience with programming. That’s actually much harder than you think! The whole idea of “objects” sending “messages” to each other is really very abstract.
I think teaching functional programming to people who have never programmed is much easier as you can rely on some of the math knowledge they already have. Teaching old-school imperative programming (think BASIC, Pascal, Logo) is much easier too, but in some circles there’s something of a taboo against teaching that kind of programming as it supposedly teaches people “bad habits”.
The “lacks” of Go in the article are highly opinionated and without any context of what you’re pretending to solve with the language.
Garbage collection is something bad? Can’t disagree harder.
The article ends with a bunch of extreme opinions like “Rust will be better than Go in every possible task
There’re use cases for Go, use cases for Rust, for both, and for none of them. Just pick the right tool for your job and stop bragging about yours.
You love Rust, we get it.
Yes, I would argue GC is something that’s inherently bad in this context. Actually, I’d go as far as to say that a GC is bad for any statically typed language. And Go is, essentially, statically typed.
It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.
That’s why Go has the “defer” statement, it’s there because of the GC. Otherwise, destructors could be used to defer cleanup tasks at the end of a scope.
So that’s what makes a GC inherently bad.
A GC, however, is also bad because it “implies” the language doesn’t have good resource management mechanisms.
There was an article posted here, about how Rust essentially has a “static GC”, since manual deallocation is almost never needed. Same goes with well written C++, it behaves just like a garbage collected language, no manual deallocation required, all of it is figured out at compile time based on your code.
So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ? Doing something that can be done at CT during runtime ? It’s bad from a performance perspective and also bad from a code validation perspective. And it has essentially no upsides, as far as I’ve been able to tell.
As far as I can tell the main “support” for GC is that they’ve always been used. But that doesn’t automatically make them good. GCs seem to be closer to a hack for a language to be easier to implement rather than a feature for a user of the language.
Feel free to convince me otherwise.
It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left.
Why do you think this would be the case? A language with GC can also have linear or affine types for enforcing that resources are always freed and not used after they’re freed. Most languages don’t go this route because they prefer to spend their complexity budgets elsewhere and defer/try-with-resources work well in practice, but it’s certainly possible. See ATS for an example. You can also use rank-N types to a similar effect, although you are limited to a stack discipline which is not the case with linear/affine types.
So, essentially, a GC does what language like C++ and Rust do at compile time… but it does it at runtime. Isn’t this inherently bad ?
No, not necessarily. Garbage collectors can move and compact data for better cache locality and elimination of fragmentation concerns. They also allow for much faster allocation than in a language where you’re calling the equivalent of malloc under the hood for anything that doesn’t follow a clean stack discipline. Reclamation of short-lived data is also essentially free with a generational collector. There are also garbage collectors with hard bounds on pause times which is not the case in C++ where a chain of frees can take an arbitrary amount of time.
Beyond all of this, garbage collection allows for a language that is both simpler and more expressive. Certain idioms that can be awkward to express in Rust are quite easy in a language with garbage collection precisely because you do not need to explain to the compiler how memory will be managed. Pervasive use of persistent data structures also becomes a viable option when you have a GC that allows for effortless and efficient sharing.
In short, garbage collection is more flexible than Rust-style memory management, can have great performance (especially for functional languages that perform a lot of small allocations), and does not preclude use of linear or affine types for managing resources. GC is hardly a hack, and its popularity is the result of a number of advantages over the alternatives for common use cases.
What idioms are unavailable in Rust or in modern C++, because of their lack of GC, but are available in a statically typed GC language ?
I perfectly agree with GC allowing for more flexibility and more concise code as far as dynamic language go, but that’s neither here nor there.
As for the theoretical performance benefits and real-time capabilities of a GCed language… I think the word theoretical is what I’d focus my counter upon there, because they don’t actually exist. The GC overhead is too big, in practice, to make those benefits outshine languages without runtime memory management logic.
I’m not sure about C++, but there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure, and Rust makes these things explicit.
The idea that all memory should be explicitly tracked and accounted for in the semantics of the language is perhaps important for a systems language, but to say that it should be true for all statically typed languages is preposterous. Languages should have the semantics that make sense for the language. Saying a priori that all languages must account for some particular feature just seems like a failure of the imagination. If it makes sense for the semantics to include explicit control over memory, then include it. If it makes sense for this not to be part of the semantics (and for a GC to be used so that the implementation of the language does not consume infinite memory), this is also a perfectly sensible decision.
there are functions you can write in OCaml and Haskell (both statically typed) that cannot be written in Rust because they abstract over what is captured by the closure
Could you give me an example of this ?
As far as I understand and have been told by people who understand Rust quite a bit better than me, it’s not possible to re-implement this code in Rust (if it is, I would be curious to see the implementation!)
https://gist.github.com/dbp/0c92ca0b4a235cae2f7e26abc14e29fe
Note that the polymorphic variables (a, b, c) get instantiated with different closures in different ways, depending on what the format string is, so giving a type to them is problematic because Rust is explicit about typing closures (they have to talk about lifetimes, etc).
My God, that is some of the most opaque code I’ve ever seen. If it’s true Rust can’t express the same thing, then maybe it’s for the best.
If you want to understand it (not sure if you do!), the approach is described in this paper: http://www.brics.dk/RS/98/12/BRICS-RS-98-12.pdf
And probably the reason why it seems so complex is because CPS (continuation-passing style) is, in general, quite hard to wrap your head around.
I do think that the restrictions present in this example will show up in simpler examples (anywhere where you are trying to quantify over different functions with sufficiently different memory usage, but the same type in a GC’d functional language), this is just a particular thing that I have on hand because I thought it would work in Rust but doesn’t seem to.
FWIW, I spent ~10 minutes trying to convert your example to Rust. I ultimately failed, but I’m not sure if it’s an actual language limitation or not. In particular, you can write closure types in Rust with 'static
bounds which will ensure that the closure’s environment never borrows anything that has a lifetime shorter than the lifetime of the program. For example, Box<FnOnce(String) + 'static>
is one such type.
So what I mean to say is that I failed, but I’m not sure if it’s because I couldn’t wrap my head around your code in a few minutes or if there is some limitation of Rust that prevents it. I don’t think I buy your explanation, because you should technically be able to work around that by simply forbidding borrows in your closure’s environment. The actual thing where I got really hung up on was the automatic currying that Haskell has. In theory, that shouldn’t be a blocker because you can just introduce new closures, but I couldn’t make everything line up.
N.B. I attempted to get any Rust program working. There is probably the separate question of whether it’s a roughly equivalent program in terms of performance characteristics. It’s been a long time since I wrote Haskell in anger, so it’s hard for me to predict what kind of copying and/or heap allocations are present in the Haskell program. The Rust program I started to write did require heap allocating some of the closures.
It’s inherently bad since GC dictates the lack of destruction mechanisms that can be reliably used when no reference to the resource are left. In other words, you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.
Deterministic freeing of resources is not mutually exclusive with all forms of garbage collection. In fact, this is shown by Rust, where reference counting (Rc
) does not exclude Drop
. Of course, Drop
may never be called when you create cycles.
(Unless you do not count reference counting as a form of garbage collection.)
Well… I don’t count shared pointers (or RC pointers or w/e you wish to call them) as garbage collected.
If, in your vocabulary, that is garbage collection then I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.
If, in your vocabulary, that is garbage collection
Reference counting is generally agreed to be a form of garbage collection.
I guess my argument would be against the “JVM style” GC where the moment of destruction can’t be determined at compile time.
In Rc
or shared_ptr
, the moment of the object’s destruction can also not be determined at compile time. Only the destruction of the Rc
itself; put differently the reference count decrement can be determined at compile time.
I think your argument is against tracing garbage collectors. I agree that the lack of deterministic destruction is a large shortcoming of languages with tracing GCs. It effectively brings back a parallel to manual memory management through the backdoor — it requires manual resource management. You don’t have to convince me :). I once wrote a binding to Tensorflow for Go. Since Tensorflow wants memory aligned on 32-byte boundaries on amd64 and Go allocates (IIRC) on 16-byte boundaries, you have to allocate memory in C-land. However, since finalizers are not guaranteed to run, you end up managing memory objects with Close()
functions. This was one of the reasons I rewrote some fairly large Tensorflow projects in Rust.
However, since finalizers are not guaranteed to run, you end up managing memory objects with Close() functions.
Hmm. This seems a bit odd to me. As I understand it, Go code that binds to C libraries tend to use finalizers to free memory allocated by C. Despite the lack of a guarantee around finalizers, I think this has worked well enough in practice. What caused it to not work well in the Tensorflow environment?
When doing prediction, you typically allocate large tensors relatively rapidly in succession. Since the wrapping Go objects are very small, the garbage collector kicks in relatively infrequently, while you are filling memory in C-land. There are definitely workarounds to put bounds on memory use, e.g. by using an object pool. But I realized that what I really want is just deterministic destruction ;). But that may be my C++ background.
I have rewritten all that code probably around the 1.6-1.7 time frame, so maybe things have improved. Ideally, you’d be able to hint the Go GC about the actual object sizes including C-allocated objects. Some runtimes provide support for tracking C objects. E.g. SICStus Prolog has its own malloc that counts allocations in C-land towards the SICStus heap (SICStus Prolog can raise a recoverable exception when you use up your heap).
So Python, Swift, Nim, and others all have RC memory management … according to you these are not GC languages?
One benefit of GC is that the language can be way simpler than a language with manual memory management (either explicitly like in C/C++ or implicitly like in Rust).
This simplicity then can either be preserved, keeping the language simple, or spent on other worthwhile things that require complexity.
I agree that Go is bad, Rust is good, but let’s be honest, Rust is approaching a C++-level of complexity very rapidly as it keeps adding features with almost every release.
you can’t have basic features like the C++ file streams that “close themselves” at the end of the scope, then they are destroyed.
That is a terrible point. The result of closing the file stream should always be checked and reported or you will have buggy code that can’t handle edge cases.
You can turn off garbage collection in Go and manage memory manually, if you want.
It’s impractical, but possible.
Is this actually used with any production code ? To my knowledge it was meant to be more of a feature for debugging and language developers. Rather than a true GC-less option, like the one a language like D provides.
Here is a shocking fact: For those of us who write programs in Go, the garbage collector is actually a wanted feature.
If you work on something where having a GC is a real problem, use another language.
It’s an abstract for a workshop talk of work that is quite early. So, perhaps it shouldn’t have been shared at this point (but on the other hand, we believe in openness…). Regardless, it’s certainly not finished work, and if it sounds somewhat interesting, that’s basically the goal, as it is just a workshop talk!
Possibly relevant is this (old, but accomplishing what you describe; not sure about how costly though): https://lobste.rs/s/kun536/radio_e_mail_west_africa_2002
Just read through this from another post above, awesome implementation and really good detail, taking notes down and see if I can get my hand on one of these modems (as well as seeing if using it where I am violates any laws)…
This seems very relevant to the thread posted just the other day: https://lobste.rs/s/cpbngl/remote_data_access_is_hard_need_lobste_rs , which is asking for essentially a solution to the same problem, >15yrs later!
I wonder if those modems still exist!
Your local second-hand store is probably the best place to begin searching for a modem! I know mine is for sure.
Do people know of any fully abstract compiler? The only one I know of is Microsoft Research’s Fully Abstract Compilation to JavaScript.
They wrote a compiler from F* to JavaScript and proved full abstraction in Coq. It is kind of amazing, considering that full abstraction means it is secure and immune from changes to Object.prototype, etc.
Just a minor correction – that paper is for a language called “f*” (a little f, not a big F), which is a much much smaller language (no dependent types, for example) than F*. It’s an unfortunately close name, as it can be confusing!
Are you looking for industrial / “large” compilers? The answer to that is no (or, not yet… there didn’t used to be any realistic correct compilers either, so…).
As yet, most of the fully abstract compilers that exist are for small languages, in addition to the f*->JS one you mentioned, Devriese et al built one from the simply typed lambda calculus with recursion to the untyped lambda calculus (pdf), and people in my group have built a couple – e.g., from a lambda calculus with recursive types to a language with exceptions (pdf), from a language with information flow properties into F omega (pdf), etc. And of course there are probably others that I don’t know about!
One thing I never understood (after hearing this sequence countless times) is why it was audible. Once the handshake was completed, the speaker is turned off. I guess if things were going wrong, I could kind of figure out (as it didn’t sound normal), but presumably the modem would know that anyway and would give that feedback (I can’t remember that well, but when I had an external modem, it definitely had lights that let you know what was going on, and when the modems were internal, presumably software could do that).
Well, it was pretty fast to hear if the handshake was, say, hitting the wrong number or if your own line had somebody talking on it.
Would not surprise me if the real reason was just as a marketing gimmick or similar.
Here’s a StackOverflow answer that makes sense. Corroborates friendlysock’s prediction, too.
Fake news has nothing to do with “sounding journalistic”. This is how one ends up confusing propaganda (fake news) with truth, which is at the heart of a lot of journalism today.
The only thing such software will accomplish is further reinforce and convince the gullible that if someone sounds like a journalist then what they have to say must be true.
Yeah, this is an incredibly terrible idea.
It’s akin to other (terrible) ideas where people think they can solve (really hard) societal problems with (insert hot technical solution), e.g., using algorithms to “certify” that certain algorithms are fair according to either naive or just plain wrong understanding of what fairness is (like, assigning money bail amounts based on an algorithm that is “fair” according to some specification: for example, that it’ll produce the same result whether or not race is an input, ignoring: A. the entire money bail system is terrible, doesn’t work, and shouldn’t be given further credibility and B. algorithms can easily recreate bias working around whatever minor corrections are attempted)
So I would be an “end-developer” in this story? (Well, if I were writing software for Fuchsia.) So I’m down to Dart, C or C++?
I’m not in these waters, but this could going to slow down adoption by a lot, right? I don’t see many people embracing Dart, and C/C++ has it’s well-known pain points that most people currently developing third-party software don’t have to deal with on their current platforms. I mean, if I were writing a Windows, Android or an iOS app, I would likely not be dealing with memory. And I somehow don’t see a lot of people developing Linux apps jumping to Fuchsia.
Google is weird.
These days my reaction to seeing new stuff being developed in unsafe languages (like C/C++) is to shake my head and close the page again.
Should we rewrite e. g. Linux in Rust? Maybe, maybe not.
Should we start writing new software in C/C++? Absolutely not! (Some exceptions may apply.)
People who say “you can write safe code in C/C++ if you are careful bla bla bla” are exactly the kind of people whose software I try to actively avoid.
Should we start writing new software in Rust? Probably not. Very immature ecosystem, a lot of language churn, no established GUI frameworks.
What other languages are you going to use? I refuse to touch JVM/.NET with a ten foot barge pole, I’m certainly not going to write a desktop application in Go, or a scripting language, or a functional language like Haskell. What’s left other than C?
C would be my last choice to develop a desktop application among the languages you mentioned. Memory safety is more important than memory usage and performances in my humble opinion.
I don’t think that memory usage and performance are the big wins for C. I think stability is. C has a stable ABI and is the natural language for interacting with the rest of the Unix platform. There’s no impedance mismatch between C and your operating system like I feel with every other language.
Plus it’s fun to write.
Rust has first class support for making calls into libraries that conform to those ABIs, and for producing libraries that conform to those ABIs. It also, like C, doesn’t mandate a particular threading model or garbage collection scheme. It’s the first language I’ve seen in a long time that has successfully convinced me that I can have a bunch of extra safety without giving up first class access to even more advanced OS facilities. It also seems to enable an incremental approach to subsystem replacement, being even useful for kernel module development with no_std. It’s not a panacea, but it’s absolutely worth investigating, even if you get started with a thin FFI wrapper around an existing C library.
I’m not quite sure I’d call Rust’s support here “first class.” If it were first class, then I personally think I’d be able to do things like, “include this C header file and have the obvious things available to me.” Without that, you wind up needing to re-create a lot of the conditional compilation used in C headers for specific platform support. It’s not insurmountable, but there’s a lot of friction there.
It feels first class compared to Go, but second class compared to C++.
I think D is pretty good. It has optional GC, and you can write code which looks a lot like C. In addition, it has some nice sugar like foreach and ranges which make writing common C stuff easier.
I struggle to see how D really differentiates itself from C++ tbh
Generics are done differently (no templates) which yields better compile times. It’s less of a kitchen sink language, since it was developed with 15 years of C++ hindsight. There is GC for when you want it.
The bad compile times for C++ templates are due to monomorphisation and then optimisations being run for every single template instance basically independently. How does D improve on that? Does it optimise before instantiation?
Well, I guess that eliminates C then?
If I have to choose between
I’ll pick the former immediately.
C developers had almost 50 years to demonstrate that they are able to write acceptable code. They couldn’t.
Let’s abandon this failed experiment and retrain them in a way that reduces the harm to the outside world, just like we did with the coal miners a few decades ago.
I don’t think C has had language churn ever, its ecosystem is very mature, and there are heaps of established stable GUI frameworks.
I don’t really understand what you’re getting at here.
No language can or will stop people from writing broken, unsafe, poor quality software.
Rust has a good record for stopping some really bad classes of breakage though, which is tremendously valuable. Sure you can still implement something with crypto weaknesses or arithmetic errors, but you’re probably not going to set the program counter to a user-provided buffer unless you really work for it.
It has a good record at preventing some very narrow, specific classes of breakage. It achieves this with some large costs. I don’t like that people talk up the positives of Rust but never seem to remember to mention the downsides, and if anyone does mention those downsides they get downvoted into being hidden by the Rust Evangelism Strike Force.
What I said was: “No language can or will stop people from writing broken, unsafe, poor quality software.” I said that in response to “inflicting another 2 decades of broken, unsafe, poor quality software upon innocent users”. I disagree with that because C does not cause software to be broken, unsafe or poor quality. In fact, most of the frustratingly poor quality programmes I’m forced to use are those written in languages other than C. Broken, poor quality software in my experience is mostly big bloated overly complicated programmes that try to do too much. What language they’re written in doesn’t seem to have much impact, with one big exception: C.
C is not really designed for writing big programmes, and as a result, people that write C keep their programmes small. As a result, they tend to be of very high quality. This is just my experience, but it’s a very consistent experience. C programmers tend to care about producing good quality code.
I’m not sure I understand the small programme claim: is the Linux kernel a small programme? is the Windows kernel a small programme? is GIMP a small programme? also, about quality, I think it’s hard to discuss, in essence boiling down to a “he said she said” & very subjective argument.
As to Rust downsides, I totally agree it has those, but note that C advocates totally don’t mention C downsides in those discussions either :) also, all languages have some downsides, so every choice here is a compromise, question is, what weights one attaches to which pros & cons.
Programming language discussions are always very subjective.
Vulnerabilities (i.e., that get reported, get CVEs) are very much not subjective… and there are still vast numbers that are memory unsafety errors (seems like >50% in most studies I’ve seen), and they are going up. So being blasé that “C is fun to write” seems wildly irresponsible. Humans have empirically been shown to be incapable of writing memory safe code. Starting new projects in memory unsafe languages because of vague ideas about software ecosystem is just plain irresponsible (if you are writing code for a hardware platform where there is literally no other option, that’s a different story…)
Where could one read about Rust downsides? Something relatively objective? Maybe the biggest 3 points fit in a comment here?
Not a Rust user, just interested in the discussion.
I’d say they are
I’d be happy to be corrected.
That vague adjective sure is doing a lot of heavy lifting in this sentence.
In this situation, I’m kinda thinking Nim could be an interesting option for the “end-developers”, given that it compiles to C. Though I also think some people from the Rust community may try to provide non-Google-backed support for Fuchsia too.
“Dart is an object-oriented, class-based, garbage-collected language with C-style syntax.” In other words, if you know Java, you can probably pick it up pretty quickly.
The same cannot be said for Rust.