I’m new to all the computer science stuff.
So it would be great to have few hints for good papers.
I am especially (but not limited) interested in distributed systems.
Tank you in advance :)
This is a nice resource: https://github.com/papers-we-love/papers-we-love
Just as a heads-up, papers we love does not love the areas of Computer Science equally. There is a lot more coverage of Programming Languages and Systems papers compared to say Machine Learning and Theory.
Very true, the SF chapter was kind of notorious for having a lot of FAANG folks involved and the papers tended to skew heavily towards distributed systems. I can’t say I really have an issue with that, seeing as that’s what people wanted to read, but PWL does definitely have a bit of a bias.
PWL organizer here. Every chapter is left to its own devices to run itself, so yeah, the local organizers have a lot of sway in picking speakers. If you dig through the repository you can find a lot of interesting papers spanning many topics.
@zxtx - PRs welcome for more Machine Learning and Theory papers.
I’ll always recommend Claude Shannon’s A Mathematical Theory of Communication as well as his master’s thesis
It really does still hold up.
I’m going to throw Joe Armstrong’s thesis onto the pile: Making Distributed Systems Reliable in the Presence of Software Errors
I’ll pair with that this classic on fault-tolerant systems from makers of NonStop:
Why Computers Stop and What Can Be Done About It (1985) (pdf)
Lambda: The Ultimate Imperative
Lambda: The Ultimate Declarative
Lambda: The Ultimate GOTO
Lambda: The Ultimate Opcode
See here for more information about these: https://en.wikipedia.org/wiki/History_of_the_Scheme_programming_language#The_Lambda_Papers
Also, thanks for the reminder to check these out, I’ve forgotten to read them until now!
Since others are chiming in on distributed systems papers and PLT papers, here are some machine learning papers, focused more on recent techniques around NNs.
Why are people obsessed with distributed systems, there’s a ton of other cool stuff out there!
Maybe because that topic is highly relevant to the work that a lot of people who work in software do?
We all have our “Why are people obsessed with [topic I am not interested in]?” gripes. The thing is, they’re based on our personal interests / skills / tastes, which are idiosyncratic.
I personally wish people weren’t so obsessed with terminal-based editors, CLI tools, unreadable languages like LISP, the boring minutiae of Linux distros, MS Windows, antiquated “retro” systems, statistics, etc. etc. But I don’t see a reason to ask why, any more than I would complain about people liking Vegemite or country music.
It is an interesting topic and many people have to have at least a base level understanding of all the problems. If you are working with any service at scale, you must know the trade-offs.
Although I’m not a frequent reader or distributed systems topics, I can recommend a few resources as a starting point:
The Morning Paper has unfortunately ended, but the archives are still full of great content.
Data Communications: The First 2500 Years
Also, the Royce ‘waterfall paper’. Most people haven’t read it, and don’t know that it’s not advocating for what is called waterfall. This along with the 50 year old book The Mythical Man Month is required reading.
Alternative link for the first (as the spinroot webserver seems flaky): https://www.researchgate.net/profile/Gerard-Holzmann/publication/221330073_Data_Communications_The_First_2500_Years/links/00b495230b06e67454000000/Data-Communications-The-First-2500-Years.pdf
I’d say this is among the first 10 papers you should read in distributed systems: it’s short, communicates an important idea, and is highly cited; the author did a lot of important work in the field, and won a Turing Award a few years ago:
Time, clocks, and the ordering of events in a distributed system
I’d also recommend “Fallacies of Distributed Computing” for getting in the right mindset for building: https://en.wikipedia.org/wiki/Fallacies_of_distributed_computing
Papers that make connections to adjacent fields are also good, like operating systems:
Your computer is already a distributed system. Why isn’t your OS?
For the adjacent field of programming languages, I would look up the original MapReduce and Spark papers (and follow their citations if you like). Everyone who works on distributed systems has to understand something about programming languages. If you don’t then you get languages written in YAML (not really joking about that)
Computer networking is another adjacent field; I don’t have a good paper recommendation there, but maybe someone else does. Maybe a textbook will cover the connections. I try to read about the things that “won” (like TCP/IP) and also a little about the things that didn’t.
Don’t forget the Bitcoin whitepaper: https://bitcoin.org/en/bitcoin-paper
A Case For Learned Index Structures is probably my favorite paper. They reimplement traditional CS data structures like B-Trees and bitmap indices as deep neural networks, then they make the case that they’re both faster and more memory efficient, given a GPU or TPU. The paper itself leaves a ton of unanswered questions, but if you dig, a lot of follow up work has been done to close the gaps since it was written.
I’ve found this is a good compilation of papers on distributed systems: http://muratbuffalo.blogspot.com/2021/02/foundational-distributed-systems-papers.html
The most memorable paper in my experience has been The Byzantine Generals Problem. You can get all of Lamport’s papers at https://lamport.azurewebsites.net/pubs/pubs.html
The DynamoDB paper was also very influential: http://www.cs.cornell.edu/courses/cs5414/2017fa/papers/dynamo.pdf
I also found HyperLogLog to be kind of a miracle of computer science and math: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf
The paper is a bit above my head so other blog posts like this one were really helpful in understanding it.
Backus’ paper Can Programming be Liberated [sic] : https://dl.acm.org/doi/pdf/10.1145/359576.359579
Typing Haskell in Haskell
It has been invaluable for understanding how to implement type inference and type classes.
I love most of the seL4 research papers.
I am a huge fan of Luca Cardelli’s paper on Mobile Computational Ambients which introduces the Ambient calculus. I keep going back to it as well.
MIT’s 6.824 Distributed Systems class has a nice collection of distributed systems papers. Paper selection varies a bit from year to year, so you can check out older years for even more papers.
Stanford’s CS208 Canon of Computer Science has a nice list of seminal papers in computer science. These are papers mostly on the older side, all pre-2000.
In general, I’ve found course websites to be a great place to find lists of papers. They’ve been selected with care, so it’s likely that it’s a collection of great papers. Choosing papers out of conference proceedings (if you’re interested in distributed systems, check out e.g. SOSP and OSDI) is a good way of getting a sense of current research directions.
It may seem distant to your practical concerns, but reading Turing’s “On Computable Numbers, With an Application to the Entscheidungsproblem” is really watching imagination at play. It’s available online but I found Petzold’s book on this paper to be a real joy: The Annotated Turing: A Guided Tour Through Alan Turing’s Historic Paper on Computability and the Turing Machine.
There’s another book on Turing’s work more broadly, The Essential Turing (review) which alternates between his papers/letters/lectures and interpretations for the modern lay audience that I really like.
Oh my - well that’s going to be a must read. Thanks.
Great Works in Programming Languages
Classic Papers in Programming Languages and Logic