1. 37

^basically, the title.

  1. 11

    We should do a weekly on this instead. There’s just too many papers with snippets of valuable information!

    1. 35

      Monthly is probably the right cadence (maybe?). Papers tend to be dense and it takes a minute to really absorb them.

      1. 1

        Yep, agreed

      2. 6

        that’s a nice idea! I shall gladly take this and post this every month

        1. 2

          Mildly related. Maybe tags would be great for anything lobsterish. Not just “ask”.

          The reason is that I’d be interested in that probably but the “What do you do this Weekend/Week” feels largely off-topic to me. I mean I hope everyone is super good and leading a great life, but the information that someone got a child, has a birthday coming up, got a dog, died, etc. feels kind of off-topic. I tend to skim through them though, because every now and then someone mentions they work on an interesting GitHub project or change or something.

          Hope that doesn’t come off bad, but maybe it would be nice to have fairly dedicated tags for regular interval things, so people can choose what they see. Or maybe someone has a better idea? Maybe a general private life and weekly thread where people can write more generic things? I imagine an “off topic/private life” weekly and a tech (programming projects, papers) “misc on-topic, interesting small bits” thing. Maybe the latter should be its own post though.

          Also given that so many people nowadays break the “When submitting a URL, the text field is optional and should only be used when additional context or explanation of the URL is needed. Commentary or opinion should be reserved for a comment, so that it can be voted on separately from the story” rule I think maybe either removing the description for non-ask or removing the rule.

          But that’s going off-topic. I just didn’t want to spam the meta tag, in case someone has a better idea.

        2. 8

          Are you asking for ones published in 2022? The best one that I read in 2022 was published in 2017. It’s Refinement through Restraint: Bringing Down the Cost of Verification. It’s about the Cogent language, which is used to write formally verified filesystems. The way it does this is amazing.

          It starts with the paradigm of certifying compilation, which is where the compiler outputs proof of its correctness in addition to the actual compiled target code. The theory behind compiler correctness is super interesting (refinement and simulation), which this paper gives a great overview of. But on top of that, in order to produce this certificate proof, the compiler compiles multiple different artifacts on each run, including an embedding of the source code in the Isabelle proof assistant. It’s just a really cool, mind-expanding idea.

          Now, most people don’t formally verify their entire application. But this paper inspired me to work on my own language project that takes this high level idea, but makes a couple of tweaks. First, I drop the formality requirement and instead produce a test for correctness instead of a proof. Second, the compiler isn’t intended to produce machine code, but to produce an entire application from a high-level specification. Like, I’m using it to generate web applications, not assembly code.

          This is a really interesting workflow where you get tests generated for you for free, and instead of writing tons of unit tests you can just run this one test to see if your compiled application is correct based on the specification. No free lunch of course - the generated test is a large integration test, and you end up trading of the time it takes to write the test for test execution time. But, scaling up test runners is easy. Scaling up programmers is more difficult, in my opinion.

          So yea. Pretty niche stuff, but this was the most influential paper I read last year.

          1. 2

            Are you asking for ones published in 2022? The best one that I read in 2022 was published in 2017.

            I was asking about the ones you read in 2022. For me, it was WiscKey paper, but it was published in 2016

            1. 1

              Those typing rules look overwhelming.

              Thanks for sharing this. Do you have any other filesystem papers to recommend in general?

              1. 1

                There’s definitely a lot going on, but that’s because it’s a real language building actual software, which I think is more interesting than an oversimplified research language.

                I honestly don’t have any filesystem papers to recommend unfortunately, I’m interested in the overall design of Cogent wrt verification.

              2. 1

                That sounds like an interesting system tha you’ve made. Is there more information or an example of use available publicly?

                1. 1

                  It’s very much work-in-progress, but I’m happy to share. Within the next couple weeks, I’ll be at a pretty good stopping point and am going to write a post summarizing where I’m at. I’ve written about what the certification test would look like before as well, though there are optimizations coming there too.

                  This is the closest thing to a working example of the current system: https://github.com/amw-zero/edsl/blob/refinement-certification/effects.sl. Parts of it work, parts don’t, but you can see the rough idea. You create a high-level with entity and process definitions. This is your data and behavior.

                  Then you transform this model into whatever you want via metaprogramming. I’ve been focusing on web apps, so I generate a TypeScript client with the same interface as the model, like this:

                      {{* Model.schemas.map(toTsInterface) }}
                      {{ clientClass() }}

                  During the metaprogramming phase, you can refer to the Model as a variable - you can see with {{* Model.schemas.map(toTsInterface) }} that all entities defined in the model are transformed into TS interfaces in the implementation. If you look further, the clientClass() that gets generated has a corresponding method for each method defined in the model which makes a network request and sends the appropriate parameters to a corresponding endpoint that’s also generated.

                  Speaking of generating endpoints, your implementation can consist of multiple different processes, which you can handle by producing auxiliary files to the implementation:

                  def toServerEndpoint(action: Action):
                    let endpointBody = tsClosure([tsIden("req"), tsIden("resp")], action.body)
                    tsMethodCall("app", "post", [action.name, endpointBody])
                  file server:
                      {{* Model.actions.map(toServerEndpoint) }}

                  Here, the actions in the model (the methods that modify state) are transformed into Express.js-style endpoints and output to a server.ts file.

                  You then write a small amount of code to bind the client class into a frontend SPA (I’ve only used React thus far, but because it’s just metaprogramming you can connect it to whatever you want), and to pull the endpoint definitions into a real server. This is what I mean by “compiling a full-stack web application” - it sounds crazy at first, but it definitely can be done.

                  This edsl repo is actually a spinoff from https://github.com/amw-zero/sligh. I hit a wall with the design, so spun off a temp repo to figure some things out, but now I’m almost back to parity with the initial version (have to generate SQL and finalize the certification test). I was building a real application with the Sligh repo, so that has more end to end functionality right now, including a working certification test. The ceiling on the logic that you can write with it is pretty low - it can only handle basic CRUD, but even from that I know the workflow can work so am investing the time in the updates in edsl before merging it back.

                  So yea - it’s certainly not a plug and play tool right now, but I actually see a path to that on the horizon and will def share more when there’s something worth sharing.

              3. 4

                Brent Yorgey, Monoids: Theme and Variations (Functional Pearl)

                Monoids are an incredibly common structure, arising nearly anywhere you have an operation of type (a, a) -> a. But Yorgey adds in a compact demonstration of some really useful additional ideas: monoid homomorphisms turn one monoid into another in a structure-preserving way and monoid actions where the values of a monoid describe functions on some other type, and compose in the right way.

                There’s more to it than that, and you should definitely read the paper. I’m interested in the way patch-like operations can become much clearer when expressed in terms of a monoidal patch acting on a data structure.

                1. 3

                  Last year I explored how navigational meshes in game engines work. The best (and well, for the most part “only”) paper I’ve read related to that was the master thesis by Doug Demyen. I’m not sure it’s state of the art, but if not, it gives you a very good foundation and vocabulary for future papers.

                  I think the graph abstraction optimisation in Chapter 6-7 was particularly interesting, as it sounds like a cheap optimisation you could do on more or less any graph you want to search in, assuming you can store the entire graph and do linear passes on it.

                  1. 2

                    “On the Paradox of Learning to Reason from Data” by Zhang, et all (paper) on how neural networks have a hard time learning even simple logic from examples.

                    An old concept that I’m just now catching up with but various papers on Belief Propagation:

                    • “Generalized Belief Propagation” by Yedida et all (paper)
                    • “Residual Belief Propagation for Topic Modeling” by Zeng et all (paper)
                    • “Parallel Splash Belief Propagation” by Gonzalez et all (paper)

                    “Multi-Scale Truchet Patterns” by Carlson (paper)

                    “The Dynamics of the Transition from Kardashev Type II to Type III Galaxies Favor Technosignature Searches in the Central Regions of Galaxies” by Wright et all (paper)

                    “Fast Poisson Disk Sampling in Arbitrary Dimensions” by Bridson (paper)

                    1. 2

                      Instant Neural Graphics Primitives with a Multiresolution Hash Encoding shows how to speed up learning a neural radiance field, a function f(x, y, z, ray_direction) -> (r, g b), by an order of magnitude. This allows them to turn sets of photos into beautiful volumetric renderings in dozens of seconds instead of minutes.

                      Their innovation is to use a multilevel hash table to replace earlier Fourier-inspired frequency encodings. In practice a neural net can’t learn the function mentioned above as is and its inputs must to be expanded into more dimensions. Their approach is computationally simple and cache-friendly and gracefully handles hash collisions.

                      A special mention goes to their hashing function that needs three prime numbers as factors but they choose to make one of them 1 for performance reasons :)

                      1. 1

                        I didn’t read a lot of papers, but I’m a huge nerd for CBOR & other practical encodings so I enjoyed

                        1. 1

                          As a meta-question, how do you select what to read? I tried following the fire hose that is the Arxiv CS web feed for a while, but the titles were all either hyper-specialized to the point where I knew none of the nouns, or so general that they sounded more like a manifesto than research.

                          1. 3

                            I used to read “the morning paper” which had a CS paper a day, but that unfortunately stopped. The archives are still up though: https://blog.acolyer.org/archives/

                            1. 2


                              People post papers here sometimes.

                              1. 2

                                Semantic Scholar has a recommendation engine that will give you suggestions for new papers related to your interests. Aside from that, I skim conference proceedings for titles that seem interesting.

                                1. 2

                                  Honestly, I just relentlessly google phrases that come to mind, until I find projects / papers / books that I elevate to my personal canon. Once you find major cornerstones like that, you can get pretty far by reading everything related to them and everything that they reference.

                                  For example, the two biggest cornerstones I’ve found in the last couple of years are TLA+ and the seL4 project. Luckily, both of these have dozens of papers related to them, and each paper references their previous work as well as related work in the field.

                                  Seriously - try putting into words what you’re looking for, even if it’s very high level. I was googling really vague things like “software correctness,” and even that will get you going. The trick is figuring out exactly what it is you’re interested in, and putting that into words.

                                  1. 1

                                    Sounds like something ChatGpt might do even better now. Worth a shot anyway.

                                    1. 1

                                      No, I don’t think it would. By finding things “the hard way,” I’ve learned many things and connected many brain pathways that would never be reproduced by just getting a list of answers to your questions right away.

                                      1. 1

                                        A couple of weeks ago I experimented with asking ChatGPT for fiction book recommendations. When I asked about for more like one book, it told me that was the first in a trilogy. The trilogy proved to be non-existent. The titles were real, but by other authors.

                                        I’ve seen mention before about it confidently making up plausible sounding references to fictitious papers. So do be careful.

                                    2. 2

                                      This is how I do, start with a paper, then find:

                                      1. cited papers
                                      2. papers which cite the same papers
                                      3. other papers by the same author
                                      1. 1
                                        • Find journals, conferences on topics you’re interested in. Go to their pages and see latest editions and read titles/abstracts. Expand from there.
                                          Here’s an example from ACM with a list of all conferences: https://dl.acm.org/conferences
                                        • Find papers you like and see where they were published and expand from there.
                                        • Find #papers-we-love on Github, twitter, youtube etc. https://paperswelove.org/

                                        Note, sometimes the PDfs are behind paywalls(unless you’re at a uni that pays licenses). There are ways to get the paper, like from the Authors website( or libgen!)

                                        1. 1

                                          To add to the other suggestions by the sibling replies, try seeing if there are any State-of-the-Art Report (STAR) papers on your topic, or other good literature reviews. Those can often give you a nice starting place with an overview of the topic and a ton of links to other worthwhile papers to read.

                                          E.g.: for graphics.

                                        2. 1

                                          I have not finished either yet, but two standouts are:

                                          1. 1

                                            I had a lot of fun reading Towards Alias-Free Pointers, just as a reminder how old some ideas are. The paper argues for many of the benefits of using unique pointers and similar, also with respect to concurrency. It basically describes a lot of concepts that are being used by modern programming languages - in 1995.

                                            One from 2022 I liked was [Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language?](Here We Go Again: Why Is It Difficult for Developers to Learn Another Programming Language?). There’s not a lot of tangible research around education, learning and spread of programming languages, so I suck up everything around that angle. I like this paper particularly because it makes tangible an issue that I see in day-to-day life as a programming language trainer: programming languages are rarely learned in a structured fashion, but there’s a lot of use in that.

                                            1. 3

                                              Missing the link for the education paper?

                                              1. 2

                                                Uff, never post on the way out of the door. Thanks!