1. 2

    How are the error messages it creates? One of the big weaknesses of *parsec in Haskell is how inscrutable the error messages can get.

    1. 2

      Lark provides you with the line & column in the text where the error occurred (it counts them automatically), and also which input it expected.

      You can see for yourself how Lark utilizes this information to provide useful errors when users make mistakes in the grammar: https://github.com/erezsh/lark/blob/master/lark/load_grammar.py#L593

      Of course, it’s not always clear what is the error, and what is the best way to solve it. I am open to hearing about ways that I can improve the error messages.

      1. 3

        That’s a great start. Some helpers for displaying the line and perhaps relevant textspan. One other useful thing is display which grammar rule the parse failed within. Many tools in principle support it, but the api could help better encourage it.

        1. 2

          Good ideas!

    1. 2

      Asking for an email-address as an ID and then also a “tag” or username is a way to use this three-pointed identity without having it be cumbersome to the user. Everyone knows no one will share an email publicly, so it gets the point across that one is internal and one is external. You don’t even have to validate that the email is really an email.

      1. 2

        email addresses have been reused. Also people change email addresses. Having the internal id not be email makes that migration easier to implement. The ID ideally is an implemention detail the user never sees.

        1. 2

          Yeah I’m suggesting the email is used to login, a hash is used as the internal id, and a username is used for the public.

          1. 1

            Ah I think I had misunderstood you.

      1. 4

        I actually disagree with this article on some level. I think we should respect outcomes, but some activities really are about the journey more than the destination. I think there are good reasons to value working. The time you spend improves your skills. Making the wrong choices gives you perspective. If anything the issue is people refusing to value “not working”. If you value outcomes you can still find yourself working long hours, just with many short projects. If you value “not working” you will find time to think or be more contemplative.

        1. 1

          Do people have any guidelines on dealing with nested tmuxes? I often find that I use a tmux locally and one when I ssh. The trouble is keybindings on remote servers aren’t always detected. For example, it’s hard to make says S-left arrow work to move between terminals on the remote end.

          1. 1

            Perhaps similar to this: https://marc.info/?l=openbsd-misc&m=149476496718738&w=2

            I find that when I want to interact with the inner tmux, I have to ^B, count to 1, ^B whatever. If you’re too fast, the outer tmux slurps it up.

          1. 3

            Rust has lots of attractive features (chiefly a great type system and a familiar syntax) that make me want to use it, but the cognitive overhead of the memory system still makes Go and other GC languages the better value proposition for the overwhelming majority of projects I take on. To some extent, this will improve with familiarity, but the gap can never close completely (Rust will always require more thinking about memory than GC languages) and I doubt it will close enough to change the calculus. Still, I applaud the intentional attitude that the community takes toward continuous improvement.

            1. 4

              If you don’t mind me asking, how long have you spent with it, and what did you struggle most with?

              We have some decent stuff coming down the pipeline to ease up the learning curve, but there’s always more. I wonder how much what you’ve experienced lines up with other people’s.

              1. 2

                What kind of stuff is coming to ease using the language? As someone who mostly works in Python and Haskell even basic stuff in Rust still trips me up. Things like when should I be using a reference vs directly passing value in? Which data structures should I be using for different problems? Etc. There is a mental overhead that is still slowing me down, so anything to help me get past that would be great!

                1. 1

                  Hey sorry, I missed this!

                  https://blog.rust-lang.org/2017/12/21/rust-in-2017.html is a good summary, see the “ergonomics initiative” section.

                2. 1

                  Sorry, I missed this. I don’t keep a very good inventory of things I bump into, and part of the problem is that if I understood my frustrations well enough to articulate them, they probably wouldn’t be so frustrating to begin with. Sort of a “beginner’s paradox”. I’ve been playing with Rust in my free time on and off since about 2014, but I still don’t feel like I’ve climbed the learning curve well enough to be passably productive with Rust (I might feel differently if my bugs could kill people, mind you!).

              1. 1

                One other thing worth pointing out: if you are going to randomly generate trees, be ready to throw them away early as naive sampling methods end up lots of small trees, a few gigantic ones, and very little in the middle, https://byorgey.wordpress.com/2013/04/25/random-binary-trees-with-a-size-limited-critical-boltzmann-sampler-2/

                1. 1

                  I wish there were apps that displayed maps this way. Anybody have favourite alternatives to Google Maps?

                  Incidentally, I’m extremely frustrated with Google’s algorithm for directions. They frequently over optimize for driving time over other considerations. I have on more than one occasion played with OpenStreetMaps data to see if I can do better.

                  1. 1

                    Waze has way too many ads and gamification of driving style ads.

                    I like a lot of the stuff in Here Maps for navigation (speed limits and such). Their nav is better, their map colour scheme sucks though and for some reason it crashes mobile data on my phone after a few minutes (wtf?!)

                    Honestly I miss my old Pioneer in-dash navigation system and might get another one for my car soon. I liked how once you approach a turn, it switches to a photo of the turn, the lane you should be in and a distance meter. I have a feeling this tech is patented and Google/Here doesn’t want to pay the fee, but companies like Honda/Ford/Pioneer/Tomtom are fine with it; hence why the car in-dash units are just so much better from a UX perspective.

                  1. 10

                    Our goal is to deliver the best experience for customers, which includes overall performance and prolonging the life of their devices. Lithium-ion batteries become less capable of supplying peak current demands when in cold conditions, have a low battery charge or as they age over time, which can result in the device unexpectedly shutting down to protect its electronic components.

                    Last year we released a feature for iPhone 6, iPhone 6s and iPhone SE to smooth out the instantaneous peaks only when needed to prevent the device from unexpectedly shutting down during these conditions. We’ve now extended that feature to iPhone 7 with iOS 11.2, and plan to add support for other products in the future.

                    Come on. If this is really about managing demand spikes, why limit the “feature” to the older phones? Surely iPhone 8 and X users would also prefer that their phones not shut down when it’s cold or the battery is low?

                    1. 6

                      I would assume most of those phones are new enough where the battery cycles aren’t enough to cause significant enough wear on the battery to trip the governor, and/or battery technology improved on those models.

                      It’s really a lose-lose for Apple whichever way they do it, and they IMHO picked the best compromise: run the phone normally on a worn battery and reduce battery life further, and risk just shutting off when the battery can’t deliver the necessary voltages on bursty workloads; or throttle the performance to try to keep battery life consistent and phone running with a battery delivering reduced voltages?

                      1. 6

                        Apple could have also opted to make the battery replaceable, and communicate to the user when to do that. But then that’s not really Apple’s style.

                        1. 3

                          I believe that’s called “visiting an Apple store.” Besides, as I’ve said elsewhere in this thread, replacing a battery on an iPhone is pretty easy; remove the screen, (it’s held in with two screws and comes out with a suction cup) and the battery is right there.

                        2. 4

                          and plan to add support for other products in the future.

                          They probably launched on older phones first since older phones are disproportionately affected.

                          1. 2

                            Other media reports indicate that battery performance loss is not just a function of age but of other things like exposure to heat. They also indicate that this smoothing doesn’t just happen indiscriminately but is triggered by some diagnostic checks of the battery’s condition. So it seems like making this feature available on newer phones would have no detrimental effect on most users (because their batteries would still be good) and might help some users (whose batteries have seen abnormally harsh use or environmental conditions). So what is gained by limiting it only to those using older models? Why does a brand new iPhone 7 bought new from Apple today, with a brand new battery, have this feature enabled while an 8 does not?

                            1. 2

                              Probably easier for the test team to find an iPhone 7 or 6 with a worse battery than an 8. the cpu and some other components are different.

                              1. 3

                                There are documented standards for rapidly aging different kinds of batteries (for lead-acid batteries, like in cars, SAE J240 says you basically sous-vide cook them while rapidly charging and draining them), and I’d be appalled if Apple didn’t simulate battery aging for two or more years as part of engineering a product that makes or breaks the company.

                        1. 3

                          Is the barrier of entry that low? You need to have a certain mathematical maturity to understand the papers, and most of the new results are about stuff in papers. If you can do that, what’s the big deal? More people working in an area means more research directions get to be explored. Now if you are a charlatan out fooling non-experts, how is that any different from any other field? The same thing has happened with software in general.

                          The comments seem to get obsessed with how need engineers more than researchers but that feels like a strawman as well. Unless your problem was already solved in a paper published at ICLR/NIPS/ICML/etc you are going to do at least a little bit of research to adapt the technology to your problem. This is going to require at least some creativity and intuition for these statistical models. You might not get a publication out of the work, but you’ll still work damn hard to get things working.

                          1. 3

                            I can whip up a non-trivial ML solution in about 15 minutes using commoditized tools, like TensorFlow. That system will even work pretty well for my sample test data- well enough that I think I’ve just built a system that accurately catalogs sentiment or identifies a trait in a photograph. I can run to production with that, and get pretty good results… until I don’t.

                            If I need to do something more complicated, I can snag a lightweight, pop-press ML book that guides me through some of the statistical concepts. Again, I get a solution that looks good, under a cursory inspection. I can roll that package out to production and let people start consuming its results, and its flaws will manifest in subtle, difficult to detect ways.

                            1. 2

                              That might not be so bad. Like I don’t think that is any worse than any other bugs that somehow only manifest in production. As long as you have good monitoring in place, I think you can get pretty far on that attitude. More cynically, a colleague once told me, “These models usually fail a little after you’ve been promoted so it’s not your problem”.

                              1. 3

                                I think there are real ethical concerns. I mean, simple stuff- like that research team that put pictures as “beautiful” into their training set, and ended up creating essentially a robotic racist. A lot of our assumptions about how the world works, when encoded via machine learning, magnifies what are normally minor issues into an industrial scale.

                                1. 4

                                  I completely agree. But Fairness and ethics in machine learning is not something you can fix with barriers. That’s something that needs to be out there as an idea. I’m not even sure how to go about doing that.

                          1. 2

                            There usually aren’t too many machine learning papers on this list so I have to suggest a few.

                            Bandit based Monte-Carlo planning

                            This is the paper that introduced Monte-Carlo tree search and is a core part of AlphaGo. The algorithm is super simple and most of paper is the theory behind it, which is actually not incomprehensible.

                            Statistical Modeling: The Two Cultures

                            Leo Breiman’s paper on how Machine Learning people solve problems vs statisticians is as timeless as the day it was first written. Notably he doesn’t take sides which is very refreshing.

                            A Mathematical Theory of Communication

                            Claude Shannon’s paper is extremely accessible for something so foundational. This is the paper that I would argue started machine learning. All the ideas that we use to this day are in there. The way we think about the problem hasn’t changed all that much. As a bonus, Shannon’s paper The Bandwagon is still relevant to navigating AI hype.

                            1. 4

                              It’s kind of weird to me to see computer people keep naming more things after mathematicians who already have a lot of things named after them. It’s a funny cultural phenomenon: for computer people, the mathematicians are more like distant relatives than close friends, so cluttering the namespace further doesn’t seem like a problem to them. It’s kind of funny how everything gets called Bayes this, Gauss that, where those mathematicians have the most tenuous relationship to the things computer people are naming.

                              I had to think of what a Poincaré embedding could be… maybe a higher-dimensional manifold into R^2 or R^3? Higher-dimensional topology is something Poincaré really is known for. But no, it’s just the disk model of the hyperbolic plane. Most of the time, I don’t even grace that with Poincaré’s name, kind of like how mathematicians just call finite fields, “finite fields”, very rarely Galois fields.

                              My nitpick isn’t just purely pedantic: this unfamiliarity with mathematics has caused some real problems, such as an AI winter. It’s blindingly obvious that a perceptron was just defining a plane to separate inputs and that lots of of data sets couldn’t be separated by a plane, but because of hype and because Minsky pointed out what really should have been obvious to everyone, connectionism, neural networks, deep learning, or whatever the next rebranding will be, all fell into a deep AI winter. I know I sound like an ass, but it’s both cute and worrying to see computer people struggling with and rediscovering my familiar friends. It almost reminds of me Tai’s model but a little less severe.

                              1. 2

                                But clearly Poincaré embedding is a reference to https://en.wikipedia.org/wiki/Poincar%C3%A9_disk_model. It’s not like machine learning researchers chose the name arbitrarily. The embedding is named after the metric you choose to use for it. These names are informative. When someone say Euclidian, I know nothing fancy is happening. When someone says Gaussian, I know somewhere is a normal distribution in the formulation. When someone say Bayesian, I know I can expect a space to inject priors. The naming isn’t arbitrary.

                                You suggest using the terms mathematicians use, but it’s not clear it makes the work any more accessible. For non-Mathematicians it just means they are more likely to end up on some unrelated paper that doesn’t help them understand the idea. I get where you are coming from, I remember when kernels were a big thing, watching people struggle with what is essentially inner products and properties of inner products. It never helped to tell them they need to understand inner products. I just had to give them LADR and that was enough.

                                I think there is some confusion between the deep learning hype and the people practicing it. The practitioners are mostly aware of the mathematics. It’s everyone downwind that gives the impression of ignorance.

                                1. 3

                                  When someone says Gaussian, I know somewhere is a normal distribution in the formulation.

                                  For example, Gaussian integers and Gaussian curvatures, right?

                                  1. 1

                                    I think I could make a connection for Gaussian curvature, but fair point.

                                    1. 2

                                      I know both probability theory and differential geometry, and I don’t see the connection (pun not originally intended, but thoroughly enjoyed).

                                      1. 1

                                        Sorry for the delay in responding. One connection I might draw is if you sample points from a multivariate Gaussian, that cloud of points resembles a sphere with Gaussian curvature. It’s a bit of a reach.

                                  2. 3

                                    I agree the researchers seem to usually know the mathematics, but they speak with such a funny foreign accent. Learning rate instead of step size, learning instead of optimising, backpropagation instead of chain rule, PCA instead of SVD… everything gets a weird, new name that seems to inspire certain superstitions about the nature of the calculations (neural! learning! intelligent!). And they keep coming up with new names for the same thing; inferential statistics becomes machine learning and descriptive statistics becomes unsupervised learning. Later they both become data science, which is, like we say in my country, the same donkey scuffled about.

                                    There are other consequences of this cultural divide. For example, the first thing any mathematician in an optimisation course learns is steepest descent and why it sucks, although it’s easy to implement. The rest of the course is spent seeing better alternatives, and discussing how particulars of it like such as its line search can be improved (for example, the classic text Nocedal & Wright proceeds in this manner). People who learn optimisation without the optimisation vocabulary never proceed beyond gradient descent and are writing the moral equivalent of bubble sort because it’s more familiar than quicksort and has less scary mathematics

                                    1. 1

                                      Is PCA really the same thing as SVD? I suspect I may finally be able to understand PCA!

                                      1. 2

                                        It’s essentially the SVD. The singular vectors are the directions of highest variation and the singular values are the size of this variation. You do need to recentre your data before you take the SVD, but it’s, like we say in the business, isomorphic.

                                        And if you know it’s SVD, then you also know that there are better algorithms to compute it than eigendecomposition.

                                        1. 1

                                          SVD is a tool you can use to perform a PCA.

                                  1. 5

                                    One thing I often bring up is if you do scientific computing you probably use code written in Fortran as Lapack is written in Fortran. One thing that is underappreciated is that Fortran compilers are really good at optimizing array computation, and lots of numerical code is scarcely more than just that.

                                    1. 8

                                      May I suggest another to this list:

                                      Algorithm Design by Jon Kleinberg and Éva Tardos

                                      The trouble I have with CLRS and Dasgupta is the theory is really hard to work through. Kleinberg and Tardos have an incredible gift of giving the intuition behind not just the algorithm, but also why the theorems and properties of the algorithm hold. If you care to prove properties of algorithms and not just code them their book teaches you that skill brilliantly. They also spend much more time on Randomized and Approximate algorithms which I think are criminal not to cover in 2017.

                                      1. 2

                                        This really resonated with me. I definitely had a phase where I felt as a programmer that I needed to build big generic things. I think scratching your own itch or solving the problems of actual people can be more rewarding.

                                        1. 3

                                          If stuff like this interests you, I would suggest reading about Concentration Inequalities [1][2]

                                          Concentration inequalities unify both of these inequalities and provide tools to make new ones for whatever situations you have.

                                          1. https://www.stat.berkeley.edu/~mjwain/stat210b/Chap2_TailBounds_Jan22_2015.pdf
                                          2. http://84.89.132.1/~lugosi/anu.pdf
                                          1. 2

                                            It’s a shame the part where PyMC3 gets used is behind a paywall. There really needs to be more materials from people that aren’t core developers.

                                            1. 2

                                              Hi, I’m one of the PyMC3 developers. I also would love to see more material from outside the core devs (one of the problems with this is that we tend to invite the people that write this sort of material to join the team). We have been talking lately about how we can improve our documentation, so I’d love to hear your feedback on what we can do better. Thanks!

                                              1. 2

                                                I’m @zaxtax on twitter. I’m obviously a gigantic fanboy of PyMC! I meant I want to see other people writing about the library. I do think too much of the documentation is buried in the notebooks but don’t know how others want the documentation organized.

                                                1. 1

                                                  I do think too much of the documentation is buried in the notebooks but don’t know how others want the documentation organized.

                                                  100% agree. Separating the API docs from Bayesian modeling how-tos is high on our list, so stay tuned!

                                            1. 2

                                              IME lobsters isnt the best place for AI stuff. Youre better off going to a subreddit or arXiv or even HN and certain groups of people on twitter. Its unfortunate but true.

                                              1. 4

                                                But arxiv isn’t really a place for discussion. There is a MachineLearning subreddit but it’s pretty obsessed with Deep Learning, so any machine learning that’s not that gets treated in a rather hostile way. HN sometimes has people who know what’s going on, but you can pretty often get incorrect information arrogantly presented as fact.

                                                Twitter discussions are wonderful, but I feel are really ephemeral and hard to point to. Recently, Yoav Goldberg ranted https://medium.com/@yoav.goldberg/an-adversarial-review-of-adversarial-generation-of-natural-language-409ac3378bd7 about the flag-planting nature of current machine learning practice and it provoked a ton of back and forth by different groups of people. All that is really invisible if you aren’t following the right people.

                                              1. 10

                                                meta: the ml tag was intended for stories relevant to programming languages in the ML family. But this is hardly the first time I’ve seen it used to tag Machine Learning posts here… and I’m sympathetic, since there is currently no tag designated for machine learning, which is a large and growing application area. Maybe we could use such a tag?

                                                1. 4

                                                  Please can we get a machinelearning tag?

                                                  1. 3

                                                    Hardly the first, more like the thousandth. This consistent user error makes the ML tag borderline useless.

                                                  1. 4

                                                    I had almost skipped the article because the title suggested an opinion piece. I am glad I didn’t: the article lays out a clear problem, discusses how Bayesian deep learning can solve it, shows examples of B.d.l. in action, and links to research papers.

                                                    For those wondering whether to read it, this paragraph should give you an idea.

                                                    The main issue is that traditional machine learning approaches to understanding uncertainty, such as Gaussian processes, do not scale to high dimensional inputs like images and videos. To effectively understand this data, we need deep learning. But deep learning struggles to model uncertainty.In this post I’m going to introduce a resurging field known as Bayesian deep learning (BDL), which provides a deep learning framework which can also model uncertainty.

                                                    1. 1

                                                      I mean that was the title on the article. I’m not sure I could have provided to help.

                                                      1. 2

                                                        Oh, I agree! I did not mean to blame you, and I am sorry that I gave that impression.

                                                        I mentioned the title to make clear my own mistake (prejudging an article by its title), in the hope that I might prevent others from making the same mistake.

                                                        Finally, thank you for posting this article. I found it very interesting indeed, and enjoyed reading it.

                                                    1. 4

                                                      Meta comment: Why don’t we have a machine learning or statistics tag? I feel like these days machine learning is a major aspect of technology and technology news.