1. 4

    Anyone have a snippet of the kind of code that C2rust ends up generating? Would be interesting to see how much work cleanup would end up being

    1. 5

      I did a fairly deep investigation but it was a while ago, the tools have probably changed since: https://wiki.alopex.li/PortingCToRust

      1. 4
        1. 4

          Not sure if the website is being kept still up to date, but www.c2rust.com let’s you try it online. The code is pretty un-idiomatic (using unsafe points instead of references and so on), but note that there is also a refactor step being actively worked on. Based on the video, it doesn’t look like they used that here, but this is where the code lives: https://github.com/immunant/c2rust/tree/master/c2rust-refactor.

          Full disclosure: I used to work on this, but I’ve not kept up with developments from the past few months.

          1. 2

            Here’s dwm converted with it.

            1. 2

              You can see yourself. There’s an online translator: https://c2rust.com/

              Rust intentionally tries to discourage use of raw pointers and pointer arithmetic by not having a syntax sugar for them, so directly translated C code ends up very ugly.

              It starts looking OK only after you disambiguate pointers into proper references/slices/optionals. In C it’s not clear whether a pointer argument to a function means 0 or 1, exactly 1, or N elements. Rust has separate types for these cases, and syntax sugar and safety comes with them.

              1. 1

                It’s a start though! I’d rather start from there, and refactor towards idiomatic than start from scratch in some cases.

            1. 5

              I can’t wait to see performance benchmarks for this! Pie in the sky dream: if this started being competitive with DirectWrite and CoreText it would be perfect for Alacritty - ligatures for free and a cross-platform Rust dependency instead of a mess of cfg’s.

              1. 5

                Note that this is not a DirectWrite competitor, although it can be a part of that. Allsorts does no rendering.

                1. 1

                  I don’t think Alacritty uses DirectWrite for rendering though (so as far as Alacritty is concerned, it is a competitor). But it is the case that DirectWrite and CoreText do interesting rendering work that is out of scope for Allsorts.

                2. 2

                  That would indeed be pretty cool. I’d like to see it get worked into the existing font/type ecosystem in Rust to make adding it to a project easier. That’s probably not a work time project though. :)

                1. 2

                  A common real-world example for Church encoding in Haskell is parsing monads. Both parsec and megaparsec use it. My understanding is that GHC does a great job of inlining functions, so higher order parser combinators end up being better optimized. For instance:

                  newtype ParsecT s u m a
                      = ParsecT {unParser :: forall b .
                                   State s u
                                -> (a -> State s u -> ParseError -> m b) -- consumed ok
                                -> (ParseError -> m b)                   -- consumed err
                                -> (a -> State s u -> ParseError -> m b) -- empty ok
                                -> (ParseError -> m b)                   -- empty err
                                -> m b
                               }
                  
                  1. 1

                    Along the lines of the malformed class name error, is a similar restriction around resource names. I found out the hard way recently that findResource won’t find resources named, say, bundle.js.map. This is because bundle.js.map does not follow the format of name.extension that is expected. The crazy bit is that you can still get at those resource files (they do end up in the JAR), just not with the conventional methods.

                    1. 1

                      Have you checked how it behaves on newer JVMs?

                    1. 12

                      This isn’t a bad post, but beyond giving a very rudimentary overview of BFS there doesn’t seem to be much in the article demonstrating Go’s standard library. You could do this same exercise in virtually any language that has list and map types in its standard library and the code would look the same.

                      Serious question: do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues?

                      1. 3

                        Serious question: do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues?

                        If the list type handles both and you can just pick which methods to call in your use case, why would that be an issue? Stacks and queues are very similar internally… You can sometimes see a very simple implementation of them just using slice access really.

                        1. 3

                          If the list type handles both and you can just pick which methods to call in your use case, why would that be an issue?

                          No fundamental issue. I often rely on type signatures to help understand what I’m dealing with, and it is also sometimes helpful to get a compile error (ie. if I try to use as a stack a list that was intended to represent a queue). I guess folks using Go probably rely on comments explicitly indicating whether the list is a stack or queue (and helpfully named variables).

                          1. 2

                            From what I’ve seen (using go since 2012) people don’t usually pass around a slice expecting you to push or pop.

                            If you are using a slice as a queue you make one of two choices: option 1 is write a wrapper type. Option 2 is reuse some wrapper type built on the empty interface. Soon we’ll get a third option with go2 generics.

                            1. 2

                              Did they decide to add generics in Go 2? That would’ve been pretty big news.

                              1. 2

                                Here’s one proposal from the core go team: https://go.googlesource.com/proposal/+/master/design/go2draft-contracts.md

                                I don’t know which release it will land in, but go getting generics is a question of when not if.

                                1. 1

                                  I feel like if they had decided that they will implement generics eventually, they would’ve told us. Seems possible that they could explore the design space and decide there is no possible design they are satisfied with.

                        2. 2

                          Yes, you are right, I’m not diving deep into BFS. I actually said in the post there was plenty of pretty awesome material online. I wanted to demonstrate the power of standard library. You must surely admit having a simple BFS implementation on ~15 lines of code is pretty cool :-)

                          do Go users actually re-use the standard list type everywhere instead of having proper types for stacks and queues

                          I dont personally do this all the time, but I often tend to start with it, just to get things going and then gradually improve on and sometimes eventually completely replacing some of the standard library data structure implementations.

                          1. 0

                            Why not? A stack is just a vertical list.

                            1. 2

                              No, a stack is not a vertical list. A stack is a LIFO. If you’re arbitrarily indexing into a stack, it’s a list.

                              1. -1

                                That’s picking nits, and not really useful ones. The stack, as in the C/hardware stack, is pushed to on subroutine call and indexed into throughout the subroutine. It’s hard to argue that it’s not a stack.

                                1. 3

                                  It’s not a nitpick. “A stack is just a vertical list” is nonsense, else we’d just call it a list.

                                  1. 1

                                    Have you ever tried physically rotating a data structure? Of course it’s nonsense! :)

                                    Anyway, it depends on how restrictively you define a stack. Is it just nil, push and pop? What about rot, tuck and roll? I’d say nil, cons, car and cdr form both a list and a stack minimally and adequately. Both data structures have a head and a tail, and arbitrary indexing is O(n), so it’s just a matter of perspective.

                                    Also you can arbitrarily index into a mutable stack non-destructively if you keep another stack around to form a zip list.

                                    1. 1

                                      A stack doesn’t really have a tail, if you’re pushing and popping the “tail” it’s a list or dequeue. rot is not an operation that cannot be performed on a stack, it requires some kind of external storage; in Forth, >r swap r> swap, and swap too requires some external storage. Same for tuck. roll is not a real stack operation, it’s treating a stack like a list and indexing arbitrarily. Names for these data structures exist for a reason. A stack is not a vertical list because the point of a stack is you push and pop the top item. If you wanted to randomly index into some data structure that might be appended or truncated, the first data structure to come to mind would not be a stack, because that isn’t the point in a stack.

                                      1. 1

                                        These new external storage requirements aside (even pop requires some external storage), I’m not arguing what things are for, I’m saying what they are. Absolutely you wouldn’t arbitrarily index into a stack, but if you take an index-able data structure like a list (not that cdddr is any different to pop pop pop) and treat it like a stack, you’ve got yourself a stack. You wouldn’t reimplement a list type just to remove a method so you can rename it to “stack.”

                          1. 10

                            It is a straw-man argument to say that macros in LISP-like languages are amazing because you don’t need to generate strings and eval them. Most macros systems I’ve used (in Scala, Haskell, and Rust) work over ASTs instead of strings. In some cases (typed TH), you even get to reason about the type of the expressions represented by your ASTs. Even LISP quoting/unquoting facilities have their analogue in other languages with quasi quotation.

                            I view extra syntax as having a similar benefit as syntax highlighting: it enables my eyes to quickly navigate to specific parts of the code (branching constructs, type signatures, side effects, etc.). In both cases, there’s obviously a balance for everyone.

                            1. 6

                              The point about macros was mostly in comparison with languages that don’t have them at all. I haven’t done much macro work in any language, probably the most in Clojure, a bit in Haskell, and I guess I’ve done template programming in C, if you want to count that, so I’m not really the right person to ask when it comes to the tradeoffs of different the macro systems out there. I’m also very torn on how I feel about them, because in many languages they end up being a pain to debug, so I mostly stay clear of them, but I think they can provide a lot of value to library authors.

                              I like that argument about syntax highlighting/visual distinction, that is a good point.

                            1. 7

                              I recently also implemented a neural net in Rust targeting the MNIST dataset (https://github.com/harpocrates/rust-mnist) while working through http://neuralnetworksanddeeplearning.com. The main surprise I had was about how nice the dependency management story was.

                              • Using ndarray (instead of Vec), it was 5LOC to add a compile time feature flag to enable using the GPU (which then resulted in ~35% speedup)
                              • Adding a web UI (for interactively identifying digits drawn on an HTML5 canvas) took 100LOC of Rust with the simple-server crate

                              I found both these libraries by searching through https://crates.io/ and figured out how to use them by looking at the automatically generated docs.

                              1. 18

                                For the love of God, do we have to label 50 year old software methods with some misused term from Category Theory?

                                1. 22

                                  The term “monad” does apply here though, albeit without it’s full power. We should be glad to see that old techniques are being understood using another angle. Understanding monads has helped me see lots of patterns in code I write every day. It’s not merely a useless theoretical construct.

                                  1. 6

                                    I’m interested in hearing about what insights you got from “monad”. To me, it’s just pretentious.

                                    1. 10

                                      For me, monads are a concept that lets me think about computation that carries baggage from one statement to the next. The baggage can be errors, a custom state object, or context related to IO or async computations. Different things, similar way it looks in code. Recognizing the monad concept helps me understand the contours of the implementation and also helps me generate better designs. It goes both ways. I do think there’s value in using the word. It’s a way to compact thought.

                                      1. 6

                                        You may have reasons to dislike Haskell, but that shouldn’t mean you need to deprive yourself of the benefits of these fundamental concepts. Take the signatures of the following functions:

                                        ($)      ::                                     (a -> b) ->   a ->   b
                                        (<$>)    ::                      Functor f =>   (a -> b) -> f a -> f b
                                        (<*>)    ::                  Applicative f => f (a -> b) -> f a -> f b
                                        (=<<)    ::                        Monad f => (a -> f b) -> f a -> f b
                                        foldMap  ::         (Foldable t, Monoid m) => (a -> m  ) -> t a -> m
                                        foldMap  ::     (Foldable t, Monoid (f b)) => (a -> f b) -> t a -> f b
                                        traverse :: (Traversable t, Applicative f) => (a -> f b) -> t a -> f (t b)
                                        

                                        These are all different ways of combining a function that may have some super powers (in a functor context, producing a functor context etc.) with some value that may have some super powers and obtain something with super powers. And note how you can combine them together to build a composite value. By the way, there are infinitely many like these, and new useful ones are discovered all the time.

                                        Having an intuition for these combinators combined with a knowledge of common types that satisfy the Functor, Applicative, Monad, Foldable, Monoid and Traversable helps you write reusable code. Instead of tying down the implementation to a specific combination of concepts, you recognize the actual pattern the first time, so your code becomes easier to understand* and reusable the first time with less typing!

                                        (*) Easier to understand for someone who has an intuition for these concepts. Because if you see a function with signature Applicative f => f A -> B -> f C, you immediately know that the function may decide to invoke the f effect you gave it (f A) a number of times and it may base that decision on B and it may also use your A and B to construct the C, but it’s impossible for it to use the A to decide on how to invoke the effects of f.

                                        Reading tips:

                                        • a :: b: a has type b
                                        • a -> b: a function from a to b
                                        • f a: Type constructor f (for instance List) applied to type a (to produce “list of a”)
                                        • a -> b -> c: Same as a -> (b -> c): a function that takes a and produces another function that will take b to produce c, a two-argument function in practice.
                                        • c a => ... a ...: a is any type that has an instance (implementation) of the typeclass (interface) c
                                        1. 5

                                          most people don’t know this but the haskell version of monad is a crude simplification of the category theory definition of monad. it’s a lot less general.

                                          1. 4

                                            It’s always possible that I am wrong, but to me, “monad” as used in Haskell is partly just a pompous way of describing well known and trivial programming paterns and partly a painful effort to permit the absolutely necessary use of state variables within a system based on the incorrect theory that one should program statelessly.

                                            http://www.yodaiken.com/2016/12/22/computer-science-as-a-scholarly-discipline/

                                            1. 11

                                              Full disclosure, I’m a monad-lovin’ fool, but to me the utility of the concept is to see that some (yes) well-known and trivial programming patterns are in fact conceptually the “same thing”. It unifies ideas, that many programmers only learn by convention, underneath a single concept which has more than just the weight of convention behind it.

                                              It’s totally possible to go overboard with it, or to over-rely on the flimsy “mathiness” of FP as a way of dismissing or trivializing real problems, but imo if you want a bedrock understanding of programming patterns that is based in something factual (as opposed to flavor-of-the-month tribalism or appeals to the current authority), FP is the way to go.

                                              1. 5

                                                I’m with you on the pompous and unenlightening terminology, but that’s a pretty unfair characterization of the role monads play in demarcating state-dependent code.

                                                1. 4

                                                  I think there are a couple of real insights in FP, but the “math” continues the basic CS confusion between meta-mathematics/axiomatic and working mathematics - which is often about processes and algorithms and state. And the project falls for a common CS error, in which we note that doing X is hard, messy, error prone, so we create a system which doesn’t do X because that’s clean/light/well-defined “easy to reason about” (even if not in practice) and then we discover that X is essential, so we smuggle it back in awkwardly. I just don’t see the insight that the Category theory brings to understanding state.

                                                  1. 4

                                                    My experience with describing stateful systems is that it’s much more elegant and bug-free in pure languages because you get to reason about it explicitly. I don’t see many imperative languages letting me statically prove propositions about the system state (“this resource exists in that state at that point in the code”). You need equational reasoning and powerful type systems for that. Monads help.

                                                    The point of purity isn’t about avoiding state and mutability, it’s about avoiding implicit state.

                                                    1. 2

                                                      That’s great. Do you have a link to a paper/post which works out a proof of some non-trivial program?

                                                      1. 1

                                                        The most complex I’ve seen is probably the state management in my video game, where it ensures proper initialization, resource management and deallocation. It’s very simple on the user side, it consists only in writing and complying with the state transition specifications in the types, e.g. quitRenderer : (r : Var) -> ST m () [remove SRenderer r]. This is a method in the Draw interface whose type says that it removes the renderer from the context m. SRenderer is some Type defined by the implementation, in my case a more complex composite type, and if I were to implement that method without properly handling those subcomponents, it wouldn’t compile. There is much more to the “complying” part though. ST tracks your state as you further develop your function, and you just print it out. It’s like a static debugger. You see how your states evolve as you unwrap and modify them without even running your code, and I cannot how rarely additional changes after you test are needed compared to other languages.

                                                        I haven’t worked on other serious Idris projects, I’ve just done the book exercises and went to code my game. I definitely recommend the book though.

                                                    2. 1

                                                      Yeah, I’m with you on the category theory, that could easily be obfuscatory math-envy (still looking for an example of a software situation where category-theory concepts carry more than their weight.) But separating out X as much as possible, and making it clear where X has to be mixed back in, that’s just modularization, usually a good policy.

                                                      1. 0

                                                        But modularization is hard and where module boundaries don’t work, creating a backdoor often just makes things less modular. To me, what would be more interesting than “pure functions” plus escape is to clarify exactly what, if any, state dependencies exist. I don’t want to bury e.g. file system pointer dependency in some black box, I want to know f(fd,y,z) depends on fd.seekpointer internally and fd.file contents externally, or, for a simpler example that f(x) has a dependency on time-of-day. I don’t see much serious work on trying to expose and clarify these dependencies.

                                                        1. 4

                                                          That’s exactly the role monads are supposed to play in Haskell.

                                                  2. 1

                                                    I don’t have much problem with naming the idea of monads, though I definitely agree that pure functional programming is not a good way to program.

                                                    1. 1

                                                      I read your linked post. Since you don’t provide commentary, it’s hard to understand what your “Oh” is meant to imply. I’m guessing that you’re trying to highlight a contradiction between “Why functional programming matters” and the “Unix paper”. Could you explain your idea here?

                                                      1. 1

                                                        Hughes describes as a novel advance from FP something that was well known a decade earlier in systems programming - and described in one of the most famous CS papers ever published. The concept of programs streaming data over a channel, of course, also predates UNIX which is why Dennis Ritchie, who did know the literature did not make a claim as ignorant as Hughes’ - or the one in the paper discussed here.

                                                        1. 2

                                                          Hmm. I don’t know the literature either. Your critique that much of the first paragraph from “Why functional programming matters” quoted in your blog post isn’t novel in light of the “Unix paper” from more than a decade earlier seems true, given the context in your post.

                                                          That said, there are some novel things which lazy-evaluation brings to the table. Hughes hints at them in the section that you quoted:

                                                          1. It makes it practical to modularize a program as a generator that constructs a large number of possible answers, and a selector that chooses the appropriate one.

                                                          2. You can take this further: The “possible answers” may be large data structures which are expensive to compute, and lazy evaluation will only compute enough of the data structure to determine whether you want to use it, leaving the rest unevaluated when the “selector” rejects them.

                                                          Strict evaluation doesn’t give you either of these advantages unless explicitly constructed. You can get #1 with blocking semantics on reading & writing unix pipes, like Ritchie showed. You probably can’t get #2 in a general way without lazy evaluation.

                                              2. 19

                                                misused term

                                                Can we please stop fighting jargon? Do other industries have people constantly calling for things to be named differently? Like, does the Fashion Industry have people fighting to rename “balaclava” to “winter weather knitted headwear” because it’s headwear named after a Battle, and not really descriptive?

                                                If not “monad,” what’s a better alternative?

                                                1. 8

                                                  It seems pretty clear to me that the purpose of this post is to shame rob pike, people already think he’s too stupid to understand the hindley-milner type system so this just adds onto that.

                                                  1. 6

                                                    I don’t think the post shames Rob Pike. I think it’s meant to point out a pattern that shows up in the go stdlib also shows up elsewhere.

                                                  2. 2

                                                    I think in the pursuit of knowledge it makes sense to name & build understanding around the things that we already do, or observe nature doing. That’s what applying a “misused term” to “50 year old software methods” accomplishes.

                                                    1. 2

                                                      I don’t think jargon confers knowlege.

                                                      1. 1

                                                        I’d rather we name concepts than ceaselessly describe them in imprecise terms. That’s what the article is doing, and it serves the function that people can understand the pattern if they’ve already used it elsewhere, that communication is clearer because it is more concise, and that mistakes can easily be identified.

                                                        So, while jargon might not confer knowledge, it enables communication and facilitates thought.

                                                        1. 1

                                                          To me, there is nothing precise about the concept of monad as used in FP. It’s a collection of messy hacks. The Moggi paper is a travesty of Category Theory.

                                                    2. 1

                                                      You really need to relax.

                                                    1. 4

                                                      This article misses one of the core differences between ByteBuffer and ByteArrayInputStream/ByteArrayOutputStream: the former is not thread safe while the latter is. Granted, when you are deserializing, you are usually reading from the buffer from only one thread.

                                                      It is a also a bit disappointing that there is no ByteBuffer-backed structure (also not thread safe and also fast) that dynamically allocates new buffers to support writing in data whose final size you don’t yet know. Sure, I could roll my own (or use one of the many half-baked library implementations), but given that the JDK supports ConcurrentSkipListSet, surely it has space for this simple addition?

                                                      1. 1

                                                        Oh! That could explain a lot. Taking a mutex costs about ~15ns on contemporary CPUs, even in the completely happy zero contention path (since that is about how long an uncontended lock cmpxchg takes, last time I benchmarked it on a single-socket dual core Intel laptop from about 2017ish).

                                                        Honestly, “thread safe byte stream” sounds like a really silly API to me. If two different threads are pulling bytes out of the same file descriptor, it’s almost completely implausible that they won’t tread on each others’ toes. Even with a mutex guarding their read() calls so that each attempt to get a given number of bytes forms a critical section (even if it happens to involve multiple underlying read(2) syscalls), the only plausible data formats where you wouldn’t cause confusion would have to be ones where each record is fixed-size (so you never get one thread causing another to read halfway through a record) and the ordering of the records doesn’t matter at all.

                                                      1. 2

                                                        Great article!

                                                        It’s a bit crummy that optimizations in GHC are so hit and miss. We really need a good story for reliable optimizations. As the author notes, it is way too easy for your carefully tuned code to suddenly fall off the good path due to some change in GHC. I wish there were some way to at least say: “don’t compile unless this optimization manages to fire!”. Same goes for hackery around rewrite rules.

                                                        1. 3

                                                          I wish there were some way to at least say: “don’t compile unless this optimization manages to fire!”. Same goes for hackery around rewrite rules.

                                                          That’s actually possible! There’s a recent package called inspection-testing which (with the help of a GHC plugin) allows testing certain properties of an expression’s compiled form, e.g. that calling some high-level API in a certain way is equal to some particular optimised implementation. AFAIK, since it runs at compile time, it will bail out upon failure, like you ask :)

                                                          I’ve also come across other testing libraries for non-functional properties (i.e. things other than input/output behaviour), e.g. http://hackage.haskell.org/package/StrictCheck

                                                          It’s also feasible to check this sort of thing with benchmarks (e.g. criterion, weigh or bench). Projects like asv (which I’ve used for Haskell via the asv-nix plugin) can track performance over commits and look for step-changes, although that requires a reasonably stable environment to keep timings consistent. It might be more robust to just check the performance of multiple implementations, e.g. that implementation X takes between 1.2 and 1.5 times as long as hard-optimised implementation Y, and between 0.1 and 0.2 times as long as naive implementation Z.

                                                          1. 1

                                                            If I’ve got time to make a hard-optimised implementation which I know won’t regress, that’s what I’ll use!

                                                            I agree that tracking performance over time is a good technique, if sometimes a bit tricky to set up properly. GHC itself has a cute approach of storing performance metrics inside of git notes (all as part of its CI process).

                                                            That’s not to say that GHC plugin backed solutions aren’t pretty neat too. I forget the particulars, but I know the generic-lens package has some neat testing to check that generically-derived lens are as efficient (meaning all the generic code inlines properly) as manually written ones.

                                                            1. 1

                                                              If I’ve got time to make a hard-optimised implementation which I know won’t regress, that’s what I’ll use!

                                                              I meant something more like the blog post, which compares against alternatives written in C and Rust, which turned out to be unusable in their project for different reasons. I also like the blog’s use of a dummy implementation to provide a lower bound, which we could do if there are no suitable alternatives to benchmark against.

                                                              Also, just in case I was misunderstood about inspection-testing, the idea there is that we compare a general solution called with some known arguments, against a non-solution which just-so-happens to hard-code those arguments. For example, if we want to check that the function prefixEachWith will unroll/inline the recursion on its first argument, we can compare:

                                                              lhs = prefixEachWith ["foo", "bar", "baz"]
                                                              
                                                              rhs = map (\x -> [x:"foo", x:"bar", x:"baz"])
                                                              

                                                              The rhs is optimised, in a way that won’t regress, but only for this particular test. It isn’t a general solution to the problem.

                                                              GHC itself has a cute approach of storing performance metrics inside of git notes (all as part of its CI process).

                                                              It’s a nice idea, but unfortunately the only time I see it come up is when the allowed boundaries on the timings get bumped to stop spurious-seeming build failures :( I think tracking benchmarks over time is good for human perusal, and maybe some warnings, but not worth aborting a build over. Benchmarking different implementations (perhaps in different languages, perhaps dummies) is less fragile, and perhaps worth aborting for, since we’re measuring relative performance, which should cancel out many aspects of the particular machine being used.

                                                        1. 3

                                                          This problem has been particularly annoying for GHC (I bet that might have something to do with why ezyang, a frequent GHC contributor, blogged about it in the first place). More recently, Shayan Najd has been implementing the Trees that Grow approach in GHC. If you haven’t heard about it, it is an interesting take on extensible and well-typed AST trees using type families.

                                                          1. 3

                                                            I’m curious to see pictures of what happens when the person in the seat in front of you decides to lean their seat back…

                                                            1. 4

                                                              Ok, someone replied to me and deleted his reply yesterday about this. I’m guessing he wants privacy where only I saw it through email system. I looked at his publications and affiliations to find he’s smart as hell, worked with DARPA (among others), and maybe even did a project with Galois. So, the source is probably reliable on what can or can’t work with DARPA. Here’s a paraphrasing of his claims:

                                                              1. Program managers can set up programs so that pre-pub review isn’t required for the whole program or for specific projects.

                                                              2. He did that himself with the entire project not subject to pre-pub review. He had the whole thing in open repositories that people did the work in.

                                                              3. Another project had subject areas defined as “designated, unclassified, subject areas.” Anything in them didn’t have to be reviewed. This is where you’d put the FOSS part.

                                                              4. He believes the prepub requirement by default is program managers just being lazy. Defaulting on review is the safest approach to go. It requires putting no thought or effort into whether something needs secrecy in the first place. So, they just do it even though they don’t have to. It wastes a lot of people’s time.

                                                              So, in his view, Galois could definitely get exemption for whole projects or pieces of them that could be in a live repo. He also claimed to have done this at least two times: one a whole project, one pieces of a project. Maybe the Galois people interacting with DARPA should try to default on doing that for anything that will be published to begin with. If it’s possible at all at current time. And I thank them for all the neat stuff they FOSS’d for us up to this point since they didn’t even have to do that. Still a great company. :)

                                                              @harpocrates can you pass this along to them so they at least consider it? I’d appreciate it.

                                                              1. 3

                                                                I don’t currently write proposals, but I’ll pass the message on.

                                                                Don’t hold your breath though - this sounds like something Galois would already have done (perhaps we have? I don’t know…) if possible. We really do make an effort to open-source when possible.

                                                                1. 1

                                                                  Thanks!

                                                              1. 18

                                                                I’m a primary contributor to c2rust and I may be the person “stolen” away from corrode. I’d like to apologize if it feels like we ripped off ideas without giving due credit - the project wasn’t really supposed to “discovered” so soon. The website was a throwaway idea, a means of easily sharing our work in a limited circle while avoiding both the DARPA approval and the sub-par build process (you have to build a huge chunk of Clang).

                                                                So here is me acknowledging Jamey’s work: I personally did take inspiration from Corrode, and I was expecting to work on Corrode proper when I joined Galois (I even wrote a pure Haskell library to parse/pretty-print Rust code - with Corrode in mind). I’ve re-read the CFG module of Corrode several times (as well as the Mozilla paper, and some older literature).

                                                                All that said, I also want to point out that Corrode hasn’t had any activity at all since last April - and that’s not for want of PRs piling up. I’m not criticizing here, since I understand that managing an open source can be quite time-consuming and stressful, but I feel like this also does need to be mentioned. Also, c2rust can be freely forked. Once the DARPA funding runs out, it is my hope that the Rust community will become the maintainers.

                                                                Finally, regarding the many improvements that can be automated: that is next up on our plate!

                                                                1. 4

                                                                  Hi, nice to see you here!

                                                                  Regarding idiomatic improvements, in case you missed, C2Eif is the most impressive research in this area in my opinion. It can create classes and methods based on function signature analysis. It can also mark methods as private based on call graph analysis.

                                                                  It is also impressive in coverage. It handles Vim, it handles inline assembly, it translates long jumps to exceptions, it handles variadic functions (Eiffel doesn’t have them like Rust, they are translated to library calls), it translates printf format strings to Eiffel formatting, etc.

                                                                  1. 2

                                                                    Marco Tudel, the author of C2Eif, has a company called mtSystems that sells a C to Java translator.

                                                                    1. 1

                                                                      Very cool! This is the sort of tool I wish was open source, but probably won’t be because nobody has any financial incentive to do so.

                                                                    2. 1

                                                                      I didn’t see it in search. That project would’ve been a good submission for Lobsters. :)

                                                                  1. 5

                                                                    Interesting example given in haskell about type system complexity:

                                                                    length (1, 2)  --> 1    wut?
                                                                    length (1, 2, 3)  --> *incomprehensible error* 
                                                                    
                                                                    1. 4

                                                                      FWIW, this is all caused by a Foldable ((,) a) instance that is already quite controversial in the Haskell community1. It isn’t the only controversial Foldable instance either - did you know there is a Foldable (Either a)2?

                                                                      The main friction is that removing instances that were previously there may cause code that currently compiles to stop compiling. One suggestion I personally like is to have a compiler warning for the pathological cases3.

                                                                      1. 3
                                                                        <interactive>:6:1: error:
                                                                            • No instance for (Foldable ((,,) t0 t1))
                                                                                arising from a use of ‘length’
                                                                        

                                                                        — what’s incomprehensible about this?

                                                                        1. 4

                                                                          Hmm well this is what I got, which is pretty incomprehensible to someone starting out with haskell I think.

                                                                          <interactive>:4:1: error:
                                                                              • No instance for (Foldable ((,,) t0 t1))
                                                                                  arising from a use of ‘length’
                                                                              • In the expression: length (1, 2, 3)
                                                                                In an equation for ‘it’: it = length (1, 2, 3)
                                                                          <interactive>:4:9: error:
                                                                              • Ambiguous type variable ‘t0’ arising from the literal ‘1’
                                                                                prevents the constraint ‘(Num t0)’ from being solved.
                                                                                Probable fix: use a type annotation to specify what ‘t0’ should be.
                                                                                These potential instances exist:
                                                                                  instance Num Integer -- Defined in ‘GHC.Num’
                                                                                  instance Num Double -- Defined in ‘GHC.Float’
                                                                                  instance Num Float -- Defined in ‘GHC.Float’
                                                                                  ...plus two others
                                                                                  ...plus three instances involving out-of-scope types
                                                                                  (use -fprint-potential-instances to see them all)
                                                                              • In the expression: 1
                                                                                In the first argument of ‘length’, namely ‘(1, 2, 3)’
                                                                                In the expression: length (1, 2, 3)
                                                                          <interactive>:4:11: error:
                                                                              • Ambiguous type variable ‘t1’ arising from the literal ‘2’
                                                                                prevents the constraint ‘(Num t1)’ from being solved.
                                                                                Probable fix: use a type annotation to specify what ‘t1’ should be.
                                                                                These potential instances exist:
                                                                                  instance Num Integer -- Defined in ‘GHC.Num’
                                                                                  instance Num Double -- Defined in ‘GHC.Float’
                                                                                  instance Num Float -- Defined in ‘GHC.Float’
                                                                                  ...plus two others
                                                                                  ...plus three instances involving out-of-scope types
                                                                                  (use -fprint-potential-instances to see them all)
                                                                              • In the expression: 2
                                                                                In the first argument of ‘length’, namely ‘(1, 2, 3)’
                                                                                In the expression: length (1, 2, 3)
                                                                          
                                                                          1. 5

                                                                            Yeah, sadly GHC error messages are pointlessly hard to read.

                                                                            The first should just say “There is no instance of Foldable for (a,b,c)”.

                                                                            The other two are very standard messages you’ll see all the time. You don’t even need to read them. Actually, GHC should be taught to simply not produce them in cases like this. They’re a consequence of the previous error. GHC should be printing something like “I don’t know what type to assign to literal ‘1’ because there are no constraints on it. If there are other type errors fixing them may add additional constraints. If not, annotate the literal with a type like (1::Int)”.

                                                                            Basically, 1, doesn’t mean much. It could be an integer, a double, a km/s, a price, the unit vector, etc. As long as the type has a Num instance available 1 can be converted to it. Since the type controls the behavior of that object you need to know what it is before you can run the code.

                                                                            1. 3

                                                                              I agree that the amount of information that GHC outputs is overwhelming (the GHC typechecker looks to me like a complicated solver environment that might be better served by its own interactive mode and type-level debugger, Coq-style). On the other hand, the source of the error is clearly written in two lines at the start of the message, that’s why it’s hardly “incomprehensible”.

                                                                              1. 2

                                                                                To me this looks like the unreadable messages from g++. You just have to learn to read through it!

                                                                              2. 3

                                                                                For someone new to Haskell, the notion that – were you to invest the time to learn, this would be a readable message – is hard to fathom. I think that sort of imagination barrier is why generally things with steep learning curves are less popular.

                                                                                1. 9

                                                                                  There’s nothing Haskell-specific about bad error messages. And it’s nothing to do with imagination or the learning curve of Haskell. It’s just the obtuse way that GHC error messages are written and the lack of interest to make them better.

                                                                                  If this message said “(,,) is not an instance of Foldable” no one would find it difficult to comprehend.

                                                                                  That being said. The tuple instance of Foldable is really horrible and confusing. Either length shouldn’t be in foldable and we should use some other concept (slotness or something) or that instance shouldn’t be there by default. But this has nothing to do with Haskell or type systems. It’s just as if Java had created a terrible misnamed class that gave you the wrong answer to an obvious query.