1. 3

    This is really exciting. Do you have a list of the projects using nom in practice? You might want to submit to http://langsec.org/ if they have a CfP in 2016.

    1. 4

      well, I already presented it at Langsec 2015. For Langsec 2016, I would have to show something new. Maybe I have, maybe not ;)

      To find users, I take a look at a Github search. I also know it is used at some companies like Dropbox.

      1. 1

        awesome… I forget how I missed that. Here’s a link to your vid: https://www.youtube.com/watch?v=b7M8Uj7k_0Y

    1. 1

      I think its great to see “combinator style” parsers; after writing my first parser using Parsec I feel bad when writing quick-and-dirty parsers in C++/Java.

      However, and this may be pedantic, but I feel the claim “safe by default” is a bit strong. Seems the authors wants to say that it is memory safe (no use after free/out of bounds access) but in the absence of a safety specification for what the parser should do and a way to show the parser meets that specification I don’t think you could call it safe. Additionally, the fact that the library was fuzzed to “verify” it is safe seems to hint that it does not have a proof of the absence of such errors: such a proof is what I would call safe.

      Again, this is probably just me having my own definitions of what “safe” and “verify” mean.

      1. 7

        Again, this is probably just me having my own definitions of what “safe” and “verify” mean.

        In the context of Rust, ‘unsafe’ means ‘memory safety’.

        in the absence of a safety specification for what the parser should do and a way to show the parser meets that specification I don’t think you could call it safe

        Well, Rust as a language guarantees memory safety as long as you don’t use unsafe, and then it’s up to you to use it correctly. nom currently uses unsafe four times, two to perform an unsafe cast, and two to copy some pointers. As long as these four uses are correct, then if nom isn’t memory safe, it’s a bug in Rust.

        1. 2

          I know, absence of proof is not proof of absence. But outside of actual proof of correctness for the generated machine code (which exists in some specific cases, not applicable here yet), correct combinators able to withstand tools like AFL are a pretty good deal.

        1. 9

          Under “Why is Nom faster?”

          It uses the slice heavily, a Rust data structure containing a pointer and a length

          This is how ByteString works in Haskell, so I don’t think it’s a cogent explanation of why Nom is faster if your attoparsec parser is using ByteString.

          It might very well be the case the Nom does a better job avoiding copying/allocations, but ByteString in Haskell already lets you do that. Vector and Text do too, but via different means.

          First glance at the attoparsec HTTP parser looks reasonable, but the benchmarking methodology is wildly different for Rust and Haskell AFAICT, so I’m a little suspicious of the comparison. It would’ve been better to compare Rust ~ Rust with the same benchmarking kit.

          1. 6

            I would really like to improve the benchmarking methodology, the one for C/C++ is not great either. It is still useful to establish a range and see in which category nom lies. The HTTP parser may have been less tested, but people spent some time on improving the MP4 parsers in Haskell.

            I’d like to investigate and see why there is a difference between Rust and Haskell here if attoparsec does it the same way.

            1. 3

              It would’ve been better to compare Rust ~ Rust with the same benchmarking kit.

              The thesis is really that you can write safe parsers without sacraficing speed. Comparing nom to other empiracally fast parsers is directly in support of that.

              While the benchmarking might be “wrong”, I didn’t read it as if the author cares that it’s faster. But, rather, just fast enough that you dont feel like you are giving up speed at the risk of safety. So, unless the benchmark is really wrong, I’m going to submit that this is fine.

              1. 2

                Errr, that’s not really how he put it when he first announced the benchmarks pre <1.0 but okay, we can go with that interpretation if it suits you.

                Also, “sacrificing speed” - are you aware of how fast attoparsec is? Attoparsec already proved you could write safe parsers that were fast. It was already fast and has improved leaps and bounds in the last few years.

                1. 7

                  I don’t have any historical reference here, so I’m taking the post at face value. If you have other information that would help me change my mind here, so be it. I just reread the post given your reaction. There are more claims of speed than I remembered, but I still don’t see that as the real thesis, nor do I see these claims as outlandish. There are links to benchmarks (even 1 that is a nom ~ {various other date parsing libraries in Rust}), with source available, and even discussion that the benchmarks are intended to discuss “relative” performance (my words). His words:

                  The goal is to compare their usability and their performance on a real world binary file format. As with all benchmarks, the results must be taken with a grain of salt. This is not a formal comparison of languages, but an experiment to check where the nom parser library stands against more established parser libraries, in terms of performance and usability. I welcome any idea or contribution to improve performance for either of the parsers, or improve statistical significance. The parsers have been written in the most naive way, to make them as deterministic as possible. In each benchmark, the files are completely loaded in memory before measuring, and the parser is applied repeatedly to the buffer in memory. The hammer parser has some slight memory leaks, the developers have been notified of this and the bugs will be fixed in the future.

                  source

                  Why do I care so much about this that I reread the post and pulled out quotes and stuff? Because the general reaction that I see when someone claims something is “faster” is almost always “the benchmark is wrong.” “The benchmark isn’t good enough.” “The benchmark doesn’t take into account $X.” You just happen to be on the receiving end of my reaction to it. Sorry.

                  Surveys have very much the same response, btw. We discover flaws after the fact. This sucks, but in many cases the results are “good enough” to draw a conclusion that is “close enough.” I feel, in this case, that holds. Especially since I feel that the benchmarks only substantiate the claim that you can write a safe parser that doesn’t sacrifice speed.

                  Also, “sacrificing speed” - are you aware of how fast attoparsec is? Attoparsec already proved you could write safe parsers that were fast. It was already fast and has improved leaps and bounds in the last few years.

                  I’m not aware, but can you use an attoparsec parser and link it easily with C code? Or Python? Or Ruby? Or Go? That’s another one of the claims stated in the post.

                  1. 2

                    “The benchmark isn’t good enough.”

                    Criterion in Haskell establishes statistical significance and will let you know when the result is weak. The Rust benchmark harness isn’t doing anything of the sort AFAICT. If you’re going to use a trivial benchmark harness, at least compare like-with-like. The point isn’t that it would change the result, per se.

                    You can call into Haskell code from C via the FFI. I’ve liked Haskell’s FFI w/ C better than I did Python’s.

                    We’ve now exceeded the scope of what I cared about on this particular topic, which was mostly about measuring and process.

                    1. 4

                      at least compare like-with-like.

                      But how is he not comparing like for like? You have two parsers doing the same amount of work.

                      Would it have been reasonable for him to do a “real”, Criterion based Benchmark for Attoparsec (which established the significance) and then did the same amount of work in the nom equivalent? Is that what you’re after here?

                      1. -1

                        I already described what I thought would’ve been more appropriate given reasonable time investment constraints in my original comment.

                        I’m asking politely, but directly now: please stop replying now, we have gone beyond the scope of what I care about here.

                        1. 3

                          Feel free to ignore me, but I don’t understand why you think it’s not “like-with-like.” I must be missing something trivial.

                          If I have two runners, one is 6' tall, and the other is 5' tall and ask them to run a mile as fast as they can, and then time them–is that a like-with-like situation? Or are you going to argue that the wind, humidity, amount of debris on the road makes it different some how?

            1. 6

              This is so important. We are still building web (and other) software like amateurs, and calling ourselves professionals because of our big tools. There’s a common idea that once you get the right framework/library/text editor/OS things will get easy. In fact, when complexity increases, tools matter less, architecture and process get in the front.

              Once the whole project gets more complex, you need more reliability. It is easy to debug 1000 lines of code. It is harder to debug multiple services with mutual dependencies when one of them does not check its invariants. It needs a systematic approach to software quality. Unit tests are useful, but clearly not enough: you need mutation testing to make sure the tests are relevant, and property based testing to verify larger ranges of values. Types are useful, but unless you’re using dependent typing, some business logic will slip through the cracks. Do you have continuous integration? Do you know the performance bounds of your system (not “I know it is fast” but “I know its steady state, and when it will break down”) ? How do you handle failure? How do you document failure?

              That kind of testing takes time, but that’s how you can build rock solid components.

              1. 2

                You mean the fact that my dev environment is Vim+GNU/Linux doesn’t automagically make me an epic developer? That makes me sad.

                1. 2

                  That is absolutely how you build rock solid components…

                  …that aren’t being sold, that took too long to develop, that lose their solidity once normal developers who don’t know a fucking thing about proper engineering infiltrate your project, etc. etc.

                  The thing is right now that 99.99% of people only want a doghouse, and even the skyscraper folks have only budgeted for a duplex.

                  Quality is not a goal unto itself.

                  1. 4

                    I really agree and didn’t want to present quality as a goal in and of itself. I see skyscraper to doghouse as a spectrum, most problems as higher up that spectrum than we assume, and the cost of appropriately addressing them to be much lower than we fear.

                    1. 2

                      Quality is not the goal of a project, it is a requirement for which you have to make compromises. Sure, for a fire and forget web app, you won’t need to care about reliability. But for a library used by somebody else, for a web framework that will be reused, for for any project that must be maintained long term, focusing on quality will make the difference.

                      The things I cited, mutation testing, property based testing, typing systems, performance testing, they are easy to do right now. And putting them through a CI (which is also rather easy these days) ensures the quality won’t drop in the future, even when you get new developers onboard.

                      The real problem when making those compromises is that we automatically assume skyscraper<->good quality, doghouse<->bad quality. Those are orthogonal dimensions: complexity and quality. There are complex projects badly handled, there are simple projects really well done. What people often settle for is to get lots of features (ie increased complexity) shipped fast, and this is the recipe for unmaintainable software.

                  1. 19

                    I very much disagree. Languages can change a lot more than just the syntax. Sure, switching between Python and Ruby (or any other imperative language( is trivial for anyone with significant experience using either, but switching to a language that works in a different paradigm will require you to change how you think and will require you to find different solutions. Anyone who has moved from imperative to (purely) functional will tell you that it’s like learning to program all over again.

                    Languages can impose totality requirements and having to write a total program to solve a specific problem will make you think very hard about the problem you’re supposed to solve. The language will force you to think of every possible case and you will often need to ask for specification requirements from a stakeholder due to this.

                    Choosing a language with dynamic types and/or type coercion will require a lot of discipline to write tests for every part of the program. If someone on the team fails to adhere to test-discipline for some part of the implementation you might find yourself with parts of your program you can’t refactor. Not being able to refactor will lead to a drop in velocity and in turn in delays with product shipping. At worst, a lack of testing-discipline will require a full rewrite of an otherwise impossible-to-refactor program and even more delays or missed sprint goals.

                    Language matters and it matters a lot.

                    1. 5

                      I agree that other languages can make you think differently and impose various implementation practices, but I don’t think that’s what the article is talking about. (The title of the article is a bit provocative with the use of “syntax”. I think it means “syntax and semantics”, but using both in the that isn’t as catchy).

                      Whether you use C or Haskell doesn’t matter. What matters is how you express a problem and the solution via the program. In short, good programming transcends any programming language.

                      1. 11

                        Haskell code written like good C code is not good Haskell code. C code written like good Haskell code is not good C code. As the paradigms go further afield (SQL, Prolog, Forth) the tradeoffs change.

                        1. 2

                          It’s not about the idioms of the language so much as it’s “what data structure makes sense here,” “how can I decompose this problem into concurrent processes,” “how do I adapt this idea that I know to cut with the grain of this language so someone else can come use this work later.” This kind of stuff transcends language, and I’d argue that it’s the important stuff. No one is arguing that learning both Haskell and C isn’t valuable, but I think the article is arguing putting undue emphasis on proficiency in a particular language or syntax is missing the forest for the trees - even when evaluating yourself or others critically.

                          1. 4

                            Especially data structures are vastly different in C and Haskell. There’s whole books written about lazy and pure datastructures, which rely on very specific things provided by the language (and runtime). They are rather hard to write in languages that don’t impose certain semantics.

                            While many of them could be implemented in C, one could argue that such a development comes close to developing a whole different sublanguage.

                            1. 2

                              Data structures are very different in an immutable language like Haskell. In Haskell is quite common to use finger trees or zip lists. These immutable structures are a BAD IDEA if you’re not using a pure language where the compiler can inline aggressively but are nice and fast in Haskell. Concurrent/parallel programming in a pure functional language is very different from what you would do in Ruby so almost everything you learnt when working with pthreads is almost useless. Concurrency abstractions are different. Haskell programmers don’t use mutexes for one, preferring software transactional memory or MVar’s.

                              I believe you’re vastly underestimating how much a language can change everything. Out of curiosity, do you have significant work experience in a non-imperative language?

                              1. 1

                                Lest this turn into a discussion that goes nowhere, I want to attempt to illustrate what I think the article is getting at with this statement:

                                Concurrent/parallel programming in a pure functional language is very different from what you would do in Ruby

                                I think the article is pointing out that it’s important to know the concepts of concurrent/parallel programming and know how to use them independent of the programming language. Sure, you’ll use them differently in Haskell and Ruby, but the principles remain. Crudely speaking, concurrency is a kind of API and languages just realize them in different ways.

                                I’m learning Haskell right now and I’m not learning much that I didn’t already know, nor is it changing my perspective on software development that much. The reason for this is that years (ages?) ago I studied things like GADTs, lazy evaluation, and type theory outside the world of any particular programming language. Then, when studying programming languages, I was able to see how these concepts manifested in different implementations. But when writing software, I think in terms of these general concepts then express them in some (hopefully) appropriate manner with the language at hand.

                                What this resolutely does not mean is taking the implementation of the concept in one language and writing it in another. So yes, it’s typically a bad idea to use the immutable structures pervasive Haskell and just use them in C, but that doesn’t mean you can’t make use of them in some way, if they seem appropriate.

                                Concurrency, immutability, and the like exist outside of programming languages and you do not need a programming language to study, learn from, or problem solve with them . Great programmers know this and use it to great effect. I believe that is what the article is talking about.

                          2. 3

                            I am not sure problems and solutions will really transcend any programming language. The same way some concepts are better expressed in some talked languages, programming languages shape the way we think, through their syntax, their architecture, and the current practices of the community.

                            It is not easy to transfer a solution between two languages with very different approaches, or if it is easy, the result will not be idiomatic, or will not be a good expression of that solution in that language.

                            The same way, estimating the complexity of a problem depends a lot on the language.

                            1. 2

                              Not really, you can’t really apply most concepts from OOP to a pure functional language. Best practices are different. Even at an algorithmical level you can’t expect solutions in an imperative language to map one-to-one to solutions in a pure functional languages. Good code in Haskell is very, very different from good code in Ruby. The skills aren’t very transferable either.

                          1. 7

                            I’m willing to say this is the experience a lot of people get when first learning Rust. It forced me to reevaluate my way of logic when writing my first few programs with it. I had to fight the borrow checker and the type system quite a bit before I could grok the compiler messages.

                            1. 2

                              Yup! It seems very common. A month or two in, though, it kind of just clicks and you don’t fight with it very often.

                              1. 1

                                It can be hard at first because it emphasizes structure, even more with IO. Once you are working with your data structures, it gets easier.

                                I am still slow when starting a new project, but once the plumbing is done, modifying anything is easy, since the compiler has my back.

                                I find a ressemblance to Haskell development, where any IO is soon abstracted away, to work only with deterministic functions.

                            1. 2

                              I am still not sure that Rust is the right platform to build websites in today’s time constraints, but the robustness of the code is undeniable

                              1. 1

                                What do you mean with time constraints? I think if anything, the only parts lacking in the Rust landscape are mature third-party libraries.

                                With that said, there are Iron and Hyper as the author pointed out. Once this scene gains more traction, I think that Rust will become very viable for web services.