1. 2

    I’ve done this tour. While it looks like Brucker was let go last year for giving them on his days off it was remarkably accessible.

    1. 17

      Hi Lobsters, author here.

      I wanted to give a little bit of background on the motivation behind this post. For a while, I’ve been making academic posters using PowerPoint, Keynote, or Adobe Illustrator, and while it’s possible to get a high-quality result from these tools, I’ve always been frustrated by the amount of manual effort required to do so: having to calculate positions of elements by hand, manually laying out content, manually propagating style changes over the iterative process of poster design…

      For writing papers (and even homework assignments), I had switched to LaTeX a long time ago, but for posters, I will still using these frustrating GUI-based tools. The main reason was the lack of a modern-looking poster theme: there were existing LaTeX poster templates and themes out there, but most of them felt 20 years old.

      A couple weeks ago, I had to design a number of posters for a conference, and I finally decided to take the leap and force myself to use LaTeX to build a poster. During the process, I ended up designing a poster theme that I liked, and I’ve open-sourced the resulting theme, hoping that it’ll help make LaTeX and beamerposter slightly more accessible to people who want a modern and stylish looking poster without spending a lot of time on reading the beamerposter manual and working on design and aesthetics.

      1. 4

        Yes, I use LaTeX or ConTeXt for most of my writings, apart from notes in plain text.

        No, I just don’t think TeX is a great way for posters. Probably because I am a control freak in making posters, I really want my prominent content/figures exactly where they are supposed to be and how large I want them to be on a poster. Sometimes I ferociously shorten my text to just be able to get the next section a little higher, so the section title does not fall off the main poster viewing area. So, yes, I still use pages.

        I guess the difference is whether I am more focused on explaining things, which I use LaTeX, or I am more focused on laying out text blocks and figures, which GUI-based tools excel.

        1. 2

          I often want something in between. Like I want to click and draw arrows and figures but have that turned into LaTeX code so I can still style around that.

      1. 3

        I think the value of ML comes in very particular scenarios.

        1. Tasks where business logic would be too cumbersome to implement. For example, language detection of a document and for that matter most NLP and Image recognition tasks.

        2. Tasks where it’s easy to collect data in hindsight. For example, recommendation systems, detecting nudity in videos, and predicting user demographics.

        3. Tasks that are simply statistical tasks. A/B testing and election forecasting involve doing some amount of calculation and there is no equivalent in terms of SQL and business logic.

        4. Tasks which have an optimization component. This could deciding what prices to set for which users, or predicting server utilization to save power.

        All these settings can and should be combined with business logic but it’s unlikely to be enough.

        1. 5

          If you found this article interesting I suggest checking out http://www.fakenewschallenge.org, There they thought long and hard what a fake news dataset should look like and some models trained against it.

          1. 3

            I’m restricting myself to predictions that seem to be already happening. So consider this in the spirit of Gibson’s the future is already here and just not fully distributed yet. Also historically, I usually get all my predictions spectacularly wrong and this might be more predictive as a set of things that don’t happen.

            Virtual Reality

            As pixel density increases in the next 5 years you are going to see VR in more common places. Expect to see stuff like Google Daydream being used on flights to watch movies. Expect to see pop-up sports centers doing VR games and events. Think stuff like laser tag, mini golf, or squash.

            Programming Languages

            Over the 5 - 10 year standpoint expect to see gradual typing enter the mainstream. As the languages figure out a way not to check the annotations constantly at runtime they will start to have a performance profile like today’s scripting languages.

            Expect in 5 years the first practical applications of dependent types to appear in the wild.


            In 5 years, a large set of user will do all their computer through their cell phones. They will have accessories for connecting them to a bigger screen and keyboard but no dedicated desktop / laptop.


            In 5 years, another cryptocurrency will supplant Bitcoin by basically fixing all the engineering issues in the protocol, the governance issues in the project, and delivering on the claims the technology originally made. This won’t be obvious until about a year before it happens. Most blockchain startups will be out of business or have been acquired by a bank at this point.

            Machine Learning

            In 5 years, you will start to see machine learning algorithms and systems become truly engineered. Between the need for fairness in ML, GDPR, and a general need for reliable diagnostics in devops expect lots of work into making ML system easier to debug and engineer with strong reliability guarantees.

            Expect to see more regular products using image recognition. People will create fan-films featuring celebrities that didn’t actually act in them, or are still alive.

            In 5 years, conversational agents will still be a niche without clear successes.

            In 5 years, there will be a startup which compellingly writes news articles for multiple languages.

            ML will start making inroads into other areas of CS. Expect to hear about new state of the art results where deep learning is used to augment network protocols, program synthesis, theorem provers, and compiler optimizations.

            With the exception of small geofenced areas, we will not have self-driving cars widely deployed.

            1. 2

              Should you do this, I’ve found this config works best for making emacs load like and quick


              1. 4

                recovering from errors is overrated imo, just give me one good error message. Trying to recover from errors just gives you a bunch of false later errors caused by the original. This post addresses parse errors, but doesn’t really help with all the false semantic errors you get when a syntax error obscures a symbol or type definition.

                1. 2

                  Recover doesn’t necessarily mean you continue parsing or something else. The idea is to use as much of the state as possible to construct a useful error message. Often in parsers things are too low level to give a particularly helpful message. Structuring the parser so some higher-level information is preserved can be really helpful to guiding the end-user.

                1. 3

                  So this feels like a referentially-transparent R. I’m curious what do you think your FFI story will look like? Or is the intention to implement things like BLAS, HTTP, CSV, JSON, etc in the language vs in libraries? I feel like new serialization formats and other ways to get an munge data keep appearing and historically they would enter a system is initially through the FFI.

                  1. 1

                    No FFI!

                    That’s tantamount to a user-defined library.

                  1. [Comment removed by author]

                      1. 1

                        Which of those list implementations is immutable?

                        I don’t remember reading the PSA about how bad lists are the first time I went through this. To me it stinks of rationalization. There’s a large space of possible data structures out there, and Rust’s borrow checking can’t represent a big chunk of them – including such examples as binary trees with parent pointers. Are they all things we shouldn’t be teaching students of programming?

                        1. 19

                          The decision procedure seems straight-forward to me:

                          1. If you want a tree structure with parent pointers in Rust, see if someone has written a generic data structure that will suit your needs. If not, continue.
                          2. Implement the tree structure yourself. If you need to express shared ownership, then use a reference counted type. If this is too annoying or has too much performance overhead, continue.
                          3. Use raw pointers and unsafe to express your parent pointers, just like you’d do in C. Spend the time to make sure you’ve gotten it right and expose a safe interface at a module boundary. Enjoy the benefits of using safe code elsewhere. If this doesn’t work for whatever reason, continue.
                          4. Compile time borrow checking might not be suitable for your use case. Don’t use Rust.

                          I don’t know exactly what would cause someone to go from 3 to 4, although I imagine there are legitimate reasons. Annoyance could be one. I don’t really understand that one too much myself though, unless you for some reason can’t find a way to hide unsafe behind a safe API (which you absolutely should be able to do with a binary tree).

                          Maybe the future holds more sophisticated borrow checking. But this is what we have for now. If you think expounding on trade offs is equivalent to “stinking of rationalization,” then so be it. But it seems entirely appropriate to me.

                          There is of course a separate issue where many Rust programmers will suggest that one should not implement these data structures themselves. My “rationalization” for this is a conflation of target audience. If you’re just wanting to get a taste for Rust and you want to try it out by implementing a doubly linked list, then people are going to rightfully steer you away from that because it might not be the best way to dip your toes in. If, however, you have a specific use case and know that’s what you need and existing generic data structures aren’t suitable for legitimate reasons, then receiving the aforementioned advice sounds downright patronizing. But that doesn’t make it wrong in every case.

                          Common questions have common answers, and not everyone is discerning enough to know when the common answer is inappropriate. We shouldn’t hold that against them. Use it as an opportunity to educate (or, sometimes, delightfully, be educated, because everyone is wrong once in a while).

                          1. 5

                            I’ll add to your excellent analysis that, at 4, one might just use an external tool to verify the unsafe Rust. There’s quite a few tools, esp for C, that can check that either partly or totally. They require their own expertise. They can be hard to use or have serious limitations. One must also be careful about mismatches between meaning of the Rust code and the “equivalent” version. However, they might be a better option than no mechanical checking on unsafe part or no use of Rust at all in a given project.

                            “Maybe the future holds more sophisticated borrow checking.”

                            Definitely. The CompSci folks are working on building more flexible models all the time. Most I saw were building on things like linear types rather than affine or using functional languages. There’s at least potential if these people start doing similar things to Rust’s model. Even if main language doesn’t adopt it, an extension for an unsafe piece of Rust or something Rust can call with FFI would still be beneficial.

                            1. 2

                              Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                              I think “this is a limitation of borrow checking” is a perfectly great statement. Where it edges into rationalization is when it says “you shouldn’t need this” as the above link does. Or “if you need this you’re stupid or doing something wrong”, as the above link kinda implies. I see this in the Go community as well, where conversation often jumps from “we don’t know how to do generics well” to “you shouldn’t need generics”. This isn’t from the leaders, but it does get said a lot. (As an ex-Lisp hand I’m reminded of all the times we said “you shouldn’t need this” to noobs, and how it affected Lisp’s adoption.)

                              The worldview I’m approaching Rust from is to determine if it can replace C or Java (or other system languages that expose pointers) to the point where someone can go through their career without needing to use those older languages (modulo interop scenarios). That makes me care more about learning situations even if they are rarely relevant in “the real world”.

                              1. 6

                                Sure. It’s a classic hedge. If you’re writing material and you want to address common mistakes, then these sorts of “don’t do these things” rules make sense, especially if they are prone to overuse. For example, if a lot of beginners are picking up Rust and struggling because they’re trying to write linked lists, then we can either:

                                1. Make the process of writing linked lists easier.
                                2. Nudge them in another direction.

                                The presumption being here that those of us who aren’t beginners in Rust will know when to break these rules. We could probably be more explicit about this in more areas, but it’s hard to be 100% precise all of the time. And you certainly can’t control what others say either. There’s likely a phenomenon at play here too, one that I’ve heard described as “there is nothing like the zeal of the newly converted.” :-)

                                Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                                To clarify here, because I think this is important, but a key part of Rust’s value proposition isn’t necessarily that you never need to use unsafe, but rather, when you do, it’s usually possible to hide it behind a safe abstraction. That means that when a memory violation bug occurs, you know exactly where to look.

                                Of course, that isn’t the complete story. If you legitimately ended up needing to use unsafe in a ton of places, then that would weaken the aforementioned value proposition. The key is striking a balance, and one part of learning Rust is learning when to use unsafe and when not to. It can be tricky, but that’s why a good first approximation is “don’t use unsafe.” :-) The classic uses of unsafe in my experience are:

                                1. FFI.
                                2. In the implementation of generic data structures that must have as little overhead as possible. These uses are probably the most difficult form of unsafe to get right, partially because of generics. (The nomincon is instructive here.)
                                3. Getting around checks like bounds checks, doing unaligned loads/stores, avoiding redundant UTF-8 checks, etc. These are as hard or as easy to get right as in C.
                                1. 5

                                  The tragic thing here is that that page is awesome from an information content perspective. It’s just a matter of how the parts of the argument are phrased. Compare:

                                  “Here’s how you do linked lists in Rust. Be warned, it’s going to be a little klunky compared to other languages, but fortunately you’ll rarely need it in the real world, and there’s often better alternatives.”


                                  “Linked lists are awful, but you’re twisting my arm so here’s how you do them in Rust.”

                                  What’s nice about the first approach is that it shortcuts the whole cycle of mutual accusations of defensiveness that often happen in these situations. Because really nobody is being defensive, we’re just wondering internally if the other side is being defensive. I learned programming using linked lists, and if you tell me they suck it doesn’t trigger defensiveness. My identity isn’t wrapped up in using linked lists. What it triggers is skepticism. And it’s totally unnecessary and distracting.

                                2. 4

                                  There are plenty of benefits to writing Rust instead of C even if you typed every single line of code within an unsafe { } block. Rust has a lot of ergonomic improvements compared to C and C++ (my personal favorite is a type system which supports good algebraic data types).

                                  It’s also worth mentioning that the reason the unsafe keyword exists in Rust is precisely because there are lots of useful things you might want to do in a program that will violate the borrow checking rules, and you need some way to sidestep it on occasion. The fact that the keyword is named “unsafe” gives people pause when using it - which is normally good, because you should think carefully about writing code that you know can’t be automatically checked for memory safety - but that doesn’t mean that it’s wrong to write a Rust program that uses unsafe blocks, even if you are a beginner.

                                  If I want an doubly-linked list in Rust, I can in 12 lines of code do:

                                   struct List<T> {
                                         item: T,
                                        next: Option<*mut List<T>>,
                                        prev: Option<*mut List<T>>
                                    fn main() {
                                        let mut first: List<i32> = List { item: 1, next: None, prev: None };
                                        let mut second: List<i32> = List { item: 5000, next: None, prev: Some(&mut first)
                                        first.next = Some(&mut second);
                                        println!("first item: {}", first.item);
                                        println!("second item: {}", unsafe {(*first.next.unwrap()).item});
                                        println!("first item again: {}", unsafe {(*second.prev.unwrap()).item});

                                  and this will compile with no errors and have exactly the same behavior as the equivalent C program. It’s unsafe of course - and the fact that you have to use unsafe blocks is a good sign that you should think about whether this is a good way to write this program, or at least that you should be very, very careful when writing it. But you can do it. Even if you are a beginner to Rust.

                                  1. 2

                                    Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                                    It’s a long-time practice in safe, systems languages to support escape hatches to do what the safety mechanisms won’t allow to be done. This included writing OS kernels in Pascal, PL/S, Oberon, and Ada. The developers of such languages realized that most of an app doesn’t have to be done unsafely by default. So, they’re usually safe by default. Then, if you need unsafety, you can put it in a specific module that turns off one or more safety features just for that module. Compiler still automatically does static or dynamic checks for everything else. This focuses the mind on the most dangerous modules when looking for low-level problems. Finally, the common practice was to wrap the unsafe code in function calls that (a) may do input validation to ensure the unsafe code receives sane input and/or (b) had its own checkable rules for being called safely in rest of the safe code. Some went further to have formal specifications for correct use of all code with the unsafe code’s logical correctness checked there with language-level correctness checked by eye and code-analysis tools.

                                    So, there’s the big picture of how this has been done going back to the 1960’s with Burroughs doing an ALGOL CPU/OS combo. It consistently worked, too, if you look at what caused most crashes and security bulletins. At one point, when hardware was expensive and slow, one source I had said Burroughs even offered to turn off the safety checks for their customers for performance boost. The customers said no: didn’t want the headaches. Software is a lot more complex today on much faster machines. Might as well keep the good practice. :)

                                  2. 2

                                    I think that’s totally reasonable. In the future, programmers might figure out how stuff that is only doing in unsafe Rust can actually be done in safe Rust. I don’t think the full implications of borrow checkers has been worked out. As it’s better understood, some of these data structures will be revisited. Rust is still useful in the interim time.

                            1. 2

                              How are the error messages it creates? One of the big weaknesses of *parsec in Haskell is how inscrutable the error messages can get.

                              1. 2

                                Lark provides you with the line & column in the text where the error occurred (it counts them automatically), and also which input it expected.

                                You can see for yourself how Lark utilizes this information to provide useful errors when users make mistakes in the grammar: https://github.com/erezsh/lark/blob/master/lark/load_grammar.py#L593

                                Of course, it’s not always clear what is the error, and what is the best way to solve it. I am open to hearing about ways that I can improve the error messages.

                                1. 3

                                  That’s a great start. Some helpers for displaying the line and perhaps relevant textspan. One other useful thing is display which grammar rule the parse failed within. Many tools in principle support it, but the api could help better encourage it.

                                  1. 2

                                    Good ideas!

                              1. 2

                                Asking for an email-address as an ID and then also a “tag” or username is a way to use this three-pointed identity without having it be cumbersome to the user. Everyone knows no one will share an email publicly, so it gets the point across that one is internal and one is external. You don’t even have to validate that the email is really an email.

                                1. 2

                                  email addresses have been reused. Also people change email addresses. Having the internal id not be email makes that migration easier to implement. The ID ideally is an implemention detail the user never sees.

                                  1. 2

                                    Yeah I’m suggesting the email is used to login, a hash is used as the internal id, and a username is used for the public.

                                    1. 1

                                      Ah I think I had misunderstood you.

                                1. 4

                                  I actually disagree with this article on some level. I think we should respect outcomes, but some activities really are about the journey more than the destination. I think there are good reasons to value working. The time you spend improves your skills. Making the wrong choices gives you perspective. If anything the issue is people refusing to value “not working”. If you value outcomes you can still find yourself working long hours, just with many short projects. If you value “not working” you will find time to think or be more contemplative.

                                  1. 1

                                    Do people have any guidelines on dealing with nested tmuxes? I often find that I use a tmux locally and one when I ssh. The trouble is keybindings on remote servers aren’t always detected. For example, it’s hard to make says S-left arrow work to move between terminals on the remote end.

                                    1. 1

                                      Perhaps similar to this: https://marc.info/?l=openbsd-misc&m=149476496718738&w=2

                                      I find that when I want to interact with the inner tmux, I have to ^B, count to 1, ^B whatever. If you’re too fast, the outer tmux slurps it up.

                                    1. 3

                                      Rust has lots of attractive features (chiefly a great type system and a familiar syntax) that make me want to use it, but the cognitive overhead of the memory system still makes Go and other GC languages the better value proposition for the overwhelming majority of projects I take on. To some extent, this will improve with familiarity, but the gap can never close completely (Rust will always require more thinking about memory than GC languages) and I doubt it will close enough to change the calculus. Still, I applaud the intentional attitude that the community takes toward continuous improvement.

                                      1. 4

                                        If you don’t mind me asking, how long have you spent with it, and what did you struggle most with?

                                        We have some decent stuff coming down the pipeline to ease up the learning curve, but there’s always more. I wonder how much what you’ve experienced lines up with other people’s.

                                        1. 2

                                          What kind of stuff is coming to ease using the language? As someone who mostly works in Python and Haskell even basic stuff in Rust still trips me up. Things like when should I be using a reference vs directly passing value in? Which data structures should I be using for different problems? Etc. There is a mental overhead that is still slowing me down, so anything to help me get past that would be great!

                                          1. 1

                                            Hey sorry, I missed this!

                                            https://blog.rust-lang.org/2017/12/21/rust-in-2017.html is a good summary, see the “ergonomics initiative” section.

                                          2. 1

                                            Sorry, I missed this. I don’t keep a very good inventory of things I bump into, and part of the problem is that if I understood my frustrations well enough to articulate them, they probably wouldn’t be so frustrating to begin with. Sort of a “beginner’s paradox”. I’ve been playing with Rust in my free time on and off since about 2014, but I still don’t feel like I’ve climbed the learning curve well enough to be passably productive with Rust (I might feel differently if my bugs could kill people, mind you!).

                                        1. 1

                                          One other thing worth pointing out: if you are going to randomly generate trees, be ready to throw them away early as naive sampling methods end up lots of small trees, a few gigantic ones, and very little in the middle, https://byorgey.wordpress.com/2013/04/25/random-binary-trees-with-a-size-limited-critical-boltzmann-sampler-2/

                                          1. 1

                                            I wish there were apps that displayed maps this way. Anybody have favourite alternatives to Google Maps?

                                            Incidentally, I’m extremely frustrated with Google’s algorithm for directions. They frequently over optimize for driving time over other considerations. I have on more than one occasion played with OpenStreetMaps data to see if I can do better.

                                            1. 1

                                              Waze has way too many ads and gamification of driving style ads.

                                              I like a lot of the stuff in Here Maps for navigation (speed limits and such). Their nav is better, their map colour scheme sucks though and for some reason it crashes mobile data on my phone after a few minutes (wtf?!)

                                              Honestly I miss my old Pioneer in-dash navigation system and might get another one for my car soon. I liked how once you approach a turn, it switches to a photo of the turn, the lane you should be in and a distance meter. I have a feeling this tech is patented and Google/Here doesn’t want to pay the fee, but companies like Honda/Ford/Pioneer/Tomtom are fine with it; hence why the car in-dash units are just so much better from a UX perspective.

                                            1. 10

                                              Our goal is to deliver the best experience for customers, which includes overall performance and prolonging the life of their devices. Lithium-ion batteries become less capable of supplying peak current demands when in cold conditions, have a low battery charge or as they age over time, which can result in the device unexpectedly shutting down to protect its electronic components.

                                              Last year we released a feature for iPhone 6, iPhone 6s and iPhone SE to smooth out the instantaneous peaks only when needed to prevent the device from unexpectedly shutting down during these conditions. We’ve now extended that feature to iPhone 7 with iOS 11.2, and plan to add support for other products in the future.

                                              Come on. If this is really about managing demand spikes, why limit the “feature” to the older phones? Surely iPhone 8 and X users would also prefer that their phones not shut down when it’s cold or the battery is low?

                                              1. 6

                                                I would assume most of those phones are new enough where the battery cycles aren’t enough to cause significant enough wear on the battery to trip the governor, and/or battery technology improved on those models.

                                                It’s really a lose-lose for Apple whichever way they do it, and they IMHO picked the best compromise: run the phone normally on a worn battery and reduce battery life further, and risk just shutting off when the battery can’t deliver the necessary voltages on bursty workloads; or throttle the performance to try to keep battery life consistent and phone running with a battery delivering reduced voltages?

                                                1. 6

                                                  Apple could have also opted to make the battery replaceable, and communicate to the user when to do that. But then that’s not really Apple’s style.

                                                  1. 3

                                                    I believe that’s called “visiting an Apple store.” Besides, as I’ve said elsewhere in this thread, replacing a battery on an iPhone is pretty easy; remove the screen, (it’s held in with two screws and comes out with a suction cup) and the battery is right there.

                                                  2. 4

                                                    and plan to add support for other products in the future.

                                                    They probably launched on older phones first since older phones are disproportionately affected.

                                                    1. 2

                                                      Other media reports indicate that battery performance loss is not just a function of age but of other things like exposure to heat. They also indicate that this smoothing doesn’t just happen indiscriminately but is triggered by some diagnostic checks of the battery’s condition. So it seems like making this feature available on newer phones would have no detrimental effect on most users (because their batteries would still be good) and might help some users (whose batteries have seen abnormally harsh use or environmental conditions). So what is gained by limiting it only to those using older models? Why does a brand new iPhone 7 bought new from Apple today, with a brand new battery, have this feature enabled while an 8 does not?

                                                      1. 2

                                                        Probably easier for the test team to find an iPhone 7 or 6 with a worse battery than an 8. the cpu and some other components are different.

                                                        1. 3

                                                          There are documented standards for rapidly aging different kinds of batteries (for lead-acid batteries, like in cars, SAE J240 says you basically sous-vide cook them while rapidly charging and draining them), and I’d be appalled if Apple didn’t simulate battery aging for two or more years as part of engineering a product that makes or breaks the company.

                                                  1. 3

                                                    Is the barrier of entry that low? You need to have a certain mathematical maturity to understand the papers, and most of the new results are about stuff in papers. If you can do that, what’s the big deal? More people working in an area means more research directions get to be explored. Now if you are a charlatan out fooling non-experts, how is that any different from any other field? The same thing has happened with software in general.

                                                    The comments seem to get obsessed with how need engineers more than researchers but that feels like a strawman as well. Unless your problem was already solved in a paper published at ICLR/NIPS/ICML/etc you are going to do at least a little bit of research to adapt the technology to your problem. This is going to require at least some creativity and intuition for these statistical models. You might not get a publication out of the work, but you’ll still work damn hard to get things working.

                                                    1. 3

                                                      I can whip up a non-trivial ML solution in about 15 minutes using commoditized tools, like TensorFlow. That system will even work pretty well for my sample test data- well enough that I think I’ve just built a system that accurately catalogs sentiment or identifies a trait in a photograph. I can run to production with that, and get pretty good results… until I don’t.

                                                      If I need to do something more complicated, I can snag a lightweight, pop-press ML book that guides me through some of the statistical concepts. Again, I get a solution that looks good, under a cursory inspection. I can roll that package out to production and let people start consuming its results, and its flaws will manifest in subtle, difficult to detect ways.

                                                      1. 2

                                                        That might not be so bad. Like I don’t think that is any worse than any other bugs that somehow only manifest in production. As long as you have good monitoring in place, I think you can get pretty far on that attitude. More cynically, a colleague once told me, “These models usually fail a little after you’ve been promoted so it’s not your problem”.

                                                        1. 3

                                                          I think there are real ethical concerns. I mean, simple stuff- like that research team that put pictures as “beautiful” into their training set, and ended up creating essentially a robotic racist. A lot of our assumptions about how the world works, when encoded via machine learning, magnifies what are normally minor issues into an industrial scale.

                                                          1. 4

                                                            I completely agree. But Fairness and ethics in machine learning is not something you can fix with barriers. That’s something that needs to be out there as an idea. I’m not even sure how to go about doing that.

                                                    1. 2

                                                      There usually aren’t too many machine learning papers on this list so I have to suggest a few.

                                                      Bandit based Monte-Carlo planning

                                                      This is the paper that introduced Monte-Carlo tree search and is a core part of AlphaGo. The algorithm is super simple and most of paper is the theory behind it, which is actually not incomprehensible.

                                                      Statistical Modeling: The Two Cultures

                                                      Leo Breiman’s paper on how Machine Learning people solve problems vs statisticians is as timeless as the day it was first written. Notably he doesn’t take sides which is very refreshing.

                                                      A Mathematical Theory of Communication

                                                      Claude Shannon’s paper is extremely accessible for something so foundational. This is the paper that I would argue started machine learning. All the ideas that we use to this day are in there. The way we think about the problem hasn’t changed all that much. As a bonus, Shannon’s paper The Bandwagon is still relevant to navigating AI hype.

                                                      1. 4

                                                        It’s kind of weird to me to see computer people keep naming more things after mathematicians who already have a lot of things named after them. It’s a funny cultural phenomenon: for computer people, the mathematicians are more like distant relatives than close friends, so cluttering the namespace further doesn’t seem like a problem to them. It’s kind of funny how everything gets called Bayes this, Gauss that, where those mathematicians have the most tenuous relationship to the things computer people are naming.

                                                        I had to think of what a Poincaré embedding could be… maybe a higher-dimensional manifold into R^2 or R^3? Higher-dimensional topology is something Poincaré really is known for. But no, it’s just the disk model of the hyperbolic plane. Most of the time, I don’t even grace that with Poincaré’s name, kind of like how mathematicians just call finite fields, “finite fields”, very rarely Galois fields.

                                                        My nitpick isn’t just purely pedantic: this unfamiliarity with mathematics has caused some real problems, such as an AI winter. It’s blindingly obvious that a perceptron was just defining a plane to separate inputs and that lots of of data sets couldn’t be separated by a plane, but because of hype and because Minsky pointed out what really should have been obvious to everyone, connectionism, neural networks, deep learning, or whatever the next rebranding will be, all fell into a deep AI winter. I know I sound like an ass, but it’s both cute and worrying to see computer people struggling with and rediscovering my familiar friends. It almost reminds of me Tai’s model but a little less severe.

                                                        1. 2

                                                          But clearly Poincaré embedding is a reference to https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model. It’s not like machine learning researchers chose the name arbitrarily. The embedding is named after the metric you choose to use for it. These names are informative. When someone say Euclidian, I know nothing fancy is happening. When someone says Gaussian, I know somewhere is a normal distribution in the formulation. When someone say Bayesian, I know I can expect a space to inject priors. The naming isn’t arbitrary.

                                                          You suggest using the terms mathematicians use, but it’s not clear it makes the work any more accessible. For non-Mathematicians it just means they are more likely to end up on some unrelated paper that doesn’t help them understand the idea. I get where you are coming from, I remember when kernels were a big thing, watching people struggle with what is essentially inner products and properties of inner products. It never helped to tell them they need to understand inner products. I just had to give them LADR and that was enough.

                                                          I think there is some confusion between the deep learning hype and the people practicing it. The practitioners are mostly aware of the mathematics. It’s everyone downwind that gives the impression of ignorance.

                                                          1. 3

                                                            When someone says Gaussian, I know somewhere is a normal distribution in the formulation.

                                                            For example, Gaussian integers and Gaussian curvatures, right?

                                                            1. 1

                                                              I think I could make a connection for Gaussian curvature, but fair point.

                                                              1. 2

                                                                I know both probability theory and differential geometry, and I don’t see the connection (pun not originally intended, but thoroughly enjoyed).

                                                                1. 1

                                                                  Sorry for the delay in responding. One connection I might draw is if you sample points from a multivariate Gaussian, that cloud of points resembles a sphere with Gaussian curvature. It’s a bit of a reach.

                                                            2. 3

                                                              I agree the researchers seem to usually know the mathematics, but they speak with such a funny foreign accent. Learning rate instead of step size, learning instead of optimising, backpropagation instead of chain rule, PCA instead of SVD… everything gets a weird, new name that seems to inspire certain superstitions about the nature of the calculations (neural! learning! intelligent!). And they keep coming up with new names for the same thing; inferential statistics becomes machine learning and descriptive statistics becomes unsupervised learning. Later they both become data science, which is, like we say in my country, the same donkey scuffled about.

                                                              There are other consequences of this cultural divide. For example, the first thing any mathematician in an optimisation course learns is steepest descent and why it sucks, although it’s easy to implement. The rest of the course is spent seeing better alternatives, and discussing how particulars of it like such as its line search can be improved (for example, the classic text Nocedal & Wright proceeds in this manner). People who learn optimisation without the optimisation vocabulary never proceed beyond gradient descent and are writing the moral equivalent of bubble sort because it’s more familiar than quicksort and has less scary mathematics

                                                              1. 1

                                                                Is PCA really the same thing as SVD? I suspect I may finally be able to understand PCA!

                                                                1. 2

                                                                  It’s essentially the SVD. The singular vectors are the directions of highest variation and the singular values are the size of this variation. You do need to recentre your data before you take the SVD, but it’s, like we say in the business, isomorphic.

                                                                  And if you know it’s SVD, then you also know that there are better algorithms to compute it than eigendecomposition.

                                                                  1. 1

                                                                    SVD is a tool you can use to perform a PCA.