1. 85

There’s another story with some interesting papers, but I think they are interesting more from a historical perspective, than from the perspective of: they still hold novelties and the unexplored today.

Some things I often reach for:

These are all things that you can use today. Does anyone have any others?

  1.  

    1. 6

      My favorite two-short-article introduction to machine learning is A Few Useful Things to Know about Machine Learning [PDF] on the more positive side, and the paper you link on the more cautionary side.

      1. 5

        Probably the best title for any paper I have ever seen.

        In one brief phrase you understand what he is going to say, and with brief reflection, you know in your gut he is right.

        1. 2

          Oh that’s one I haven’t seen before.

          § 2.2 is a huge thing I’ve seen a lot of programmers get into the trap of simply because they forgot about what their model actually means.

          Thanks for that!

        2. 15

          +1 on Bernstein’s “Some thoughts on security after ten years of qmail 1.0”, it’s truly impressive what he’s achieved and (more importantly) what we can learn from that :)

          1. 3

            When I wrote qmail I rejected many languages as being much more painful than C for the end user to compile and use. I was inexplicably blind to the possibility of writing code in a better language and then using an automated translator to convert the code into C as a distribution language.

            Fascinating. I wonder what he would say nowadays with widespread package managers.

            edit: …and who calls compilers “automated translators” in 2007?

            1. 3

              who calls compilers “automated translators” in 2007?

              When I read that I assumed he meant quite literally “translator” - takes in a language and spits out C, which is then compiled. I guess the term “transpiler” is more common today.

              1. 1

                he meant quite literally “translator”

                But … that’s exactly what a compiler is. One language in, another language out; that’s what it’s always meant for as long as compilers have existed.

                1. 1

                  Yes, fair point. It’s always hard to ascertain someone’s intention purely from language, but I’d guess he chose the word “translator” to make it more obvious that one language is being translated to another.

          2. 12

            First to impress me and inspire my style of programming was the Cleanroom methdology for low-defect, software development. Best description was here:

            http://infohost.nmt.edu/~al/cseet-paper.html

            About anything about Design-by-Contract given assertions combined with lightweight provers (eg SPARK Ada) or spec-based generation of tests. Example I found with Google from Meyers:

            http://se.inf.ethz.ch/~meyer/publications/computer/contract.pdf

            Example of a company that delivers commercial solutions with SPARK using a Correct-by-Construction methodology with very, low defects.

            http://www.anthonyhall.org/c_by_c_secure_system.pdf

            For security, I prefer citing one of the inventors of INFOSEC (Paul Karger) whose evaluation of MULTICS found all kinds of problems that others kept re-discovering and getting credit for. They usually didn’t follow the solutions, though, that became the B3 and A1 criteria for developing secure systems. Maybe throw in Myers landmark work on subversion that followed to show you everything you couldn’t trust in the lifecycle. :) Karger later applied those lessons making the first, high-assurance VMM for VAX/VMS, a MLS-aware CPU, and a smartcard OS w/ provable security.

            http://hack.org/mc/texts/classic-multics.pdf

            http://seclab.cs.ucdavis.edu/projects/history/papers/myer80.pdf

            Also worth bringing up, even though I discovered it late, was Margaret Hamilton et al’s work during Apollo program on their flight software and software assurance in general. They pretty much invented most aspects of software QA on their own in a vacuum. Then they made a formalism for describing & synthesizing systems correct on the first go. She later invented the term software engineering to describe how they did things. Pretty badass even if the tool had issues. See “Apollo Beginnings” in first document for a description of that process with next link describing capabilities of their tool from the 1980’s (was “higher-order software”).

            http://htius.com/Articles/r12ham.pdf

            http://htius.com/

            1. 3

              Clean room allows for compilation just not execution or testing, meaning (to me) it would work really nicely with OCaml but pretty badly with Python.

              Looks like it would tie in well to a book I hype often “Engineering a Safer World” by Nancy Leveson

              1. 2

                The reason for that requirement was how access to compilers was limited and then the process time-consuming. No need to keep it now that people have computers in their pockets. ;) Not to mention execution has its own benefits. I’d ignore the statistical certification part, too.

                Other than that, the methodology held up pretty well as is over time.

            2. 11

              Often debated is Out of the Tar Pit - http://shaffner.us/cs/papers/tarpit.pdf

              I personally recommend it to those who are stuck in OOP land.

              1. 1

                That’s a good one I forgot about!

              2. 7

                I made a similar thread about this. Here were some of my favorites:

                1. 5

                  I’ve been using these testing techniques more and more to help get coverage into legacy codebases:

                  1. 5

                    Here are a few from literature I like:

                    End-To-End Arguments in System Design - all about correct placement of functionality where it can be of most use.

                    Why Do Computers Stop and What Can Be Done About It

                    QuickCheck

                    On Understanding Types, Data Abstraction, and Polymorphism answering the important question “what do we mean by ‘type’?”.

                    1. 1

                      I like this sentence from the abstract of your last suggestion:

                      We christen this language Fun because ‘fun’ instead of λ is the functional abstraction keyword and because it is pleasant to deal with.

                    2. 5

                      i think ousterhout’s “scripting: higher level programming for the 21st century” is a must-read. the pendulum of programming fashion seems to swing back and forth between “scripting languages will save us all” and “no static types? are you crazy?!”, but both camps could stand to read this paper to get a good feel for where the ideas behind scripting languages came from in the first place, and the strengths, weaknesses and tradeoffs they entail.

                      1. 2

                        I haven’t seen that before. I really liked seeing this nugget, since it’s particularly challenging:

                        OO programming does not provide a large improvement in productivity because it neither raises the level of programming nor encourages reuse.

                      2. 4

                        Don’t forget to check out Papers We Love.

                        Papers We Love is a repository of academic computer science papers and a community who loves reading them.

                        Also, I wrote a bot that tweets random papers from their repo. You can follow the bot @loveapaper as well as their main twitter account @papers_we_love for updates on new papers and remote chapters.

                        1. 3

                          If you write server-side systems that are configurable, read Systems Approaches to Tackling Configuration Errors: A Survey.

                          Update: It is full of horror stories and bad design decisions. Forget if it was in this one or one of the references (by one of co-authors): apparently Hadoop(?) by default at one point stored data in /tmp/. So people would deploy it, everything would work, then they’d reboot… and bye-bye data.

                          1. 1

                            Dealing with this issue right now in many of our systems. In large orgs when there are many teams working on different areas, unless there is standardized good design up front, you pay for it harshly down the road. We did have somewhat of a standard but IMO it was overcomplicated and error prone. Thanks for the recommendation.

                          2. 3

                            Simple Testing Can Prevent Most Critical Failures [PDF] - sub-titled “An Analysis of Production Failures in Distributed Data-intensive Systems”

                            In case you haven’t read it, the abstract sums it up thusly: “We found the majority of catastrophic failures could easily have been prevented by performing simple testing on error handling code – the last line of defense – even without an understanding of the software design.”

                            Particularly damning is the finding that “A majority of the production failures (77%) can be reproduced by a unit test.”

                            My conclusion is that code reviews should focus particularly on the error handling code and tests thereof.

                            1. 3

                              Error handling is hard. Many programmers are very bad at it, and I think that the only language/environment that actually “got it” was CL – but their implementation is difficult and unwieldy to the point that it’s clunky enough most CL programmers don’t bother.

                              If you’re not familiar, the basic idea is that errors are interactively continuable:

                              • Those values are out of range. ask for different values.
                              • The network port is closed. prompt the user to retry, supply a different host/port, or return the failure sounds suspiciously familiar to the “Abort, Retry, Fail” we saw thirty years ago, and yet if we handle it interactively most of the time it won’t matter if the error (fail) path isn’t well tested when back-propagated up the pipeline because the user (or sysadmin, or monitor) will retry until they get bored and are prepared to deal with the fallback.
                              • The disk is full, show a error message and wait for disk space works as a strategy because we all have multitasking operating environments these days, and if the program effort just prior to this message (say building a complex model) is significant, then this is a cheap, and extremely user-friendly solution.

                              And so on.

                              Modern UNIX even has a very easy-to-use system for implementing this kind of system (signal handlers) that there’s no reason not to develop new applications with it in mind except it may be unfamiliar to other users/sysadmins who are used to losing their work and wasting their time.

                              1. 1

                                Error handling is hard. Many programmers are very bad at it

                                Indeed, I’m not very good at it either. Unfortunately, like software development in general, bad error handling is no different from good error handling until the error actually happens, and then it’s either too late to make it better, or, even worse, the damage goes by unnoticed until failure occurs (much?) later (earlier if you were paranoid and checksummed your data often).

                                And so, we (as a “profession”) don’t focus on making it better, either with better tools, or with better use of the (admittedly insufficient) tools we have.

                                1. 1

                                  Let me add one more link from Stratus that illustrates why it’s intrinsically hard.

                                  https://klibert.pl/statics/RobustProgramming.pdf

                              2. 3

                                Even though I don’t actually like Lua the language, the papers are quite good:

                                https://www.lua.org/papers.html

                                Also the ZINC experiment, which details the implementation of the predecessor to OCaml, is length but worth a read:

                                https://caml.inria.fr/about/papers.en.html

                                1. 2

                                  Terminology in Digital Signal Processing explains all^W a lot of the basic terms used in this field. A great read for everyone who wants to start programming DSP stuff.

                                    1. 2

                                      Another Baker classic is “Equal Rights for Functional Objects” on the subject of equality predicates: http://home.pipeline.com/~hbaker1/ObjectIdentity.html

                                      1. 1

                                        Baker is a lot of gold, and too many programmers think malloc is magic.

                                      2. 1

                                        I have only read DJB’s reflections on security from the initial post, and I liked it.

                                        How would I go about if I wanted to find critique, follow ups or addendums to, for example, that paper? I don’t read a lot of papers, I imagine it takes some skill to find relevant responses to them.

                                        1. 2

                                          You can try google searching for link:http://cr.yp.to/qmail/qmailsec-20071101.pdf. This shows who refers to that article elsewhere on the Internet.