1. 3

    Reading more of “Combinators: A Centennial View” by Stephen Wolfram.

    I’m really enjoying it, but I don’t think I’ve ever seen an author so in need of validation. Every 4 pages or so Wolfram will throw in something that looks complicated next to a line about how he revolutionized some field just to remind you that you’re reading the work of a genius.

    If you can eye roll through those parts though I highly recommend it. It’s a fun pop-math read with alot of historical sections to give context, fun jaunts through different applications, and of course a pretty good introduction to combinators and the SKI calculus. Plus, it’s absolutely gorgeous. Much care was put into typesetting and the (color!) diagrams.

    1. 5

      Learning more about time on computers. Timer data structures, leap units, mechanics around NTP. More conceptual things like “what even is a year”. Organizational/standards things like if there’s a difference between RFC 3339 and ISO 8601 and if so why are they often treated interchangeably…are there quirks and edge cases introduced? Is anything concrete or is it all just a mess of compromise and people winking at each other because we have to define time somehow?

      I started a project that has to do with time, and y’know what? I’m this far in, may as well just see if this rabbit hole has a bottom…or at least walls.

      1. 4

        I wrote a bit on that topic, though not touching on the different formats part. Maybe it can be helpful.

        1. 1

          This looks awesome, thank you!

      1. 4

        Don’t rewrite it in Rust, unless you need to make major changes. Don’t distribute C/C++ source that nobody can easily build, either. We should be compiling legacy stuff to WASM, and running it sandboxed.

        1. 4

          WASM is less secure than native code along some important dimensions – it has a flat memory space and lacks protections on the stack and heap. See figure 1 in the paper for a summary.

          Many C libraries require many syscalls to be open (curl, etc.) or have transitive dependencies that do. The bigger the attack surface, the more important these issues are in practice.

          https://www.usenix.org/conference/usenixsecurity20/presentation/lehmann

          https://news.ycombinator.com/item?id=24216764

          https://lobste.rs/s/fr8ki1/everything_old_is_new_again_binary

          1. 2

            WASM is still missing some major bits of functionality like C++ exceptions and tail-calls.

            Also, how much legacy stuff uses system APIs that aren’t available in WASM’s sandbox?

            1. 2

              Is WASM anywhere near the point where that’s feasible? Actual question. It seems like we’d need incredibly fast interpreters for various architectures and OSs and all of those take time to build.

              1. 3

                It’s there, it just doesn’t solve the problem @Sophistifunk is implying. It is easy to run a library in a sandboxed environment with or without WASM as long as the library is designed with that use case in mind. A lot of Windows libraries export their interfaces entirely in terms of COM objects with clearly-defined copy semantics for all buffers and no shared state. You can run these in an unprivileged DCOM server and have strong isolation. Unfortunately, the libraries where you actually want to run them sandboxed are not these ones, they’re the ones that are optimised for performance at the expense of everything else and have shared mutable state propagating across the library boundary in all sorts of places. WASM doesn’t help here at all because your WASM-compiled library has a different ABI to the code containing it and so can’t be used directly (except from a language like Verona that has first-class sandboxing of foreign code as part of its core type system).

                1. 1

                  These libraries simply need to be abandoned. Slightly faster CVEs is not acceptable, and I look forward to the day that acting so negligently means your customers can sue you.

                  1. 2

                    A back of the envelope calculation suggests that the cost of rewriting all of that code is on the order of $10T. That’s probably a lowball estimate and I wouldn’t be surprised if it’s low by 1-2 orders of magnitude. So, while I agree that we should get rid of it, it’s not going to happen any time soon.

                2. 2

                  Firefox it’s already doing that with a few libraries (graphite, ogg), using RLBox + wasm2c (not lucet anymore).

                  Also, article about easy wasm2c usage: https://kripken.github.io/blog/wasm/2020/07/27/wasmboxc.html

                  1. 1

                    It’s very close. And no need to interpret it, it’s easy to compile. It’s also easy to inspect and verify, and the modules only have access to things you give them.

                    1. 2

                      At some point we have to commit to safety and correctness at the cost of a speed hit, otherwise, the anti-safety crowd can always use the speed difference, no matter how minuscule to prevent us from achieving safety.

                      We already take a huge hit with things like Java, Python, Node, etc. The variability between hardware platforms and that single core performance has been largely flat for years, the absolute speed of mitigations is a ruse. Somehow now, with this code running on the fastest processor in the world, we can’t sacrifice an XX% reduction in throughput compared to native, but the code running on the previous generation was ok?

                      Focusing on top of line speed above all else will enable the anti-safe folks to always be able to move the goal posts.

                      lol, edit, I see /u/unrelentingtech posted the wasmboxc link. it is very much what we are all looking for.

                1. 2

                  I’m playing around with a procedurally generated choose your own adventure game by reusing components from various generators I’ve built. No idea how well the components will actually map over. But even if I end up dropping the game (likely), it’s interesting to look at the procedural narrative space from a different viewpoint.

                  Also I’m vaguely putting out feelers for a new job, so that will likely end up taking more headspace than I expect.

                  1. 40

                    Graph database author here.

                    It’s a very interesting question, and you’ve hit the nerve of it. The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries. The question comes to what you optimize for.

                    And so, unlike a table-centric view, which has benefits that are much better-known, what happens when you optimize for joins? Because that’s the graph game – deep joins. A graph is one huge set of join tables. So it’s really hard to shard a graph – this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear. Similarly, it’s really easy to write a very expensive graph query, where starting from the other side is much cheaper.

                    So then we get to the bigger point; in a world where joins are the norm, what the heck are your join tables, ie, schema? And it’s super fluid – which has benefits! – but it’s also very particular. That tends to be the first major hurdle against graph databases: defining your schema/predicates/edge-types and what they mean to you. You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time. And $DEITY help you if you want to share a common schema with others! This is what schema.org is chasing, choosing some bare minimum.

                    This is followed on by the fact that most of the standards in the graph database world are academic in nature. If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. I wanted to standardize. I wanted to do right by the community. But, jeez, it’s just so abstract as to be useless. OWL goes another meta-level and defines properties about properties, and there’s simpler versions of OWL, and there’s RDFS/RDF* which is RDF about RDF and on and on…. it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient or define your schema. Example: There’s a direct connection to the difference between a vector and a linked list – they both represent an ordered set. You can’t do a vector in triples, but you can do a linked list.

                    I know I’m rambling a little, but now I’ll get to the turn; I still think there’s gold in them hills. The reason it’s not popular is all of the above and more, but it can be really useful! Especially when your problem is graph-shaped! I’ve implemented this a few times, and things like mapping, and social networks, and data networks, and document-origin-tracing – generally anything that would take a lot of joins – turn out swimmingly. Things that look more like tables (my example is always the back of a baseball card) look kind of nuts in the graph world, and things that look like a graph are wild in third normal form.

                    So I think there’s a time and a place for graph databases. I just think that a combination of the above counter-arguments and the underlying needs are few enough that it’s under-explored and over-politicized. They work great in isolation, ironically enough.

                    I’m happy to chat more, but that’s my quick take. Right tool, right job. It’s a shame about how that part of database theory has gone.

                    1. 10

                      Full disclosure: I work for Neo4j, a graph database vendor.

                      Very well said.

                      I’d add that most of the conversation in responses to OP assume “transactional” workloads. Graphs databases for analytic workloads are a whole other topic to explore. Folks should check out Stanford Prof. Jure Leskovec’s research in the space…and a lot of his lectures about graphs for machine learning are online.

                      1. 2

                        The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries.

                        When faced with an unknown data problem. I always choose RDBMS. It is a known quantity. I suspect I’d choose differently if I understand graph dbs better.

                        I would love to see more articles here on practical use for graph dbs. In particular, I’d love to know if they are best deployed as the primary datastore for data or maybe just for the subset of data that your interested in query (e.g., perhaps just the products table in an ecommerce app).

                        this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear.

                        Interesting. People use NoSQL a lot. They simply do joins in the application. Maybe that’s the practical solution when it comes to graph dbs? Then again, the point of graph solutions is generally to search for connections (joins). I’d love to hear more on this aspect.

                        Thank you and the OP. I wish I can upvote this more. :)

                        1. 1

                          Yeah, you’re entirely right that the joins happen in the application as a result. The reason they’re a pain is that they represent a coordination point — a sort of for-all versus for-each. Think of how you’d do a join in a traditional MapReduce setting; it requires a shuffle! That’s not a coincidence. A lot of the CALM stuff from Cal in ~2011 is related here and def. worth a read. That’s what I meant by a pain. It’s also why it’s really hard to shard a graph.

                          I think there’s something to having a graph be a secondary, problem-space-only engine, at least for OLTP. But again, lack of well-known engines, guides, schema, etc — it’d be lovely to have more resources and folks to explore various architectures further.

                        2. 2

                          You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time.

                          That’s such a great way to put it :)

                          Especially when your problem is graph-shaped!

                          I think we need collective experience and training in the industry to recognize problem shapes. We’re often barely able to precisely formulate our problems/requirements in the first place.

                          Which database have you authored?

                          1. 5

                            Cayley. Happy to see it already mentioned, though I handed off maintainership a long while ago.

                            (Burnout is real, kids)

                          2. 2

                            Thanks for Cayley! It’s refreshing to have such a direct and clean implementation of the concept. I too think there’s alot of promise in the area.

                            Since you’re here, I was wondering (no obligation!) if you had any ideas around enforcing schemas at the actual database level? As you mentioned, things can grow hairy really quick and once they are in such a state then the exploration to know what needs to be fixed and the requisite migraions are daunting.

                            Lately I’ve been playing with an idea for a graph db that is by default a triplestore under the hood. But with a (required!) schema that would look something commutative diagram-y. This would allow for discipline and validation of data, but also allow you to recognize multiple edge hops that are always there so for some things you could move them out of the triplestore into a quad- or 5- store to produce more compact disk representations to yield faster scans with fewer indexes and give the query planner a bit of extra choice. I haven’t thought it through too much, so I might be missing something or it might just not be worth it.

                            Anyway, restriction and grokkability of the underlying schema/ontology does seem like the fundamental limiter to me in alot of cases and was curious if as someone who has alot of experience in the area if you had thoughts on how to improve the situation?

                            1. 1

                              If you don’t mind me joining in, have you heard of https://age.incubator.apache.org/ ? I’m curious to hear your opinion about whether it can be an effective solution to this problem.

                              1. 1

                                If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. […] it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient

                                I’ve been using SPARQL a little recently to get things out of Wikidata, and it definitely seems to have pain points around that. I’m not sure at exactly what level the fault lies (SPARQL as a query language, Wikidata’s engine, etc.), but things that seem to me like they should be particularly easy in a graph DB, like “is there a path from ?x and ?y to a common node, and if yes, give me the path?” end up both hard to write and especially hard to write efficiently.

                                1. 2

                                  This goes a bit to the one reply separating graphs-as-analytics and graphs-as-real-time-query-stores.

                                  SPARQL is the standard (once again, looking at you W3C) but it’s trying to be SQL’s kid brother — and SQL has it’s own baggage IMO — instead of trying to build for the problem space. Say what you will about Cypher, Dude, at least it’s an ethos. Similarly Gremlin, which I liked because it’s easy to embed in other languages. I think there’s something in the spectrum between PathQuery (recently from a Google paper — I remember the early versions of it and some various arguments) and SPARQL that would target writing more functional paths — but that’s another ramble entirely :)

                              1. 60

                                Access Patterns (imo). Very few applications I’ve run into have graph hops and pattern matching from a known starting node as their primary access pattern. It’s usually supplemental. Even if you can conceptually think of your data model as a graph (which is something you can commonly do) when your app actually uses your data the main access patterns tend to be selecting by attributes (ie: find account by email address), doing relational joins of 1 or maybe 2 hops, or aggregating summary stats for display to users or exploratory analysis.

                                Compared to traditional relational databases, graph databases are terrible at all of those. Usually most queries involve a massive scan across all the nodes to find an entry point for a specific pattern and then it matches…but that scan isn’t usually as fast as it is in relational databases, and index structures ontop tend to be more expensive space-wise. Everytime I’ve looked into graph databases (Neo4j, Tigergraph, and Neptune primarily for different projects) we ran benchmarks and things fell over for our use cases very fast.

                                Don’t take this as saying Graph databases are bad. What they are good at is fast hops and pattern matching across long chains. Like if the main way you primarily used a relational database was doing 5-10 table joins per query, but don’t want the heavy memory usage, latency, and IO that comes with that. I can see a giant knowledge graph style expert system or some interesting weighted path stuff being a good fit. But at the same time, I feel like if you were doing something like that at scale then you’d probably want control over what’s in memory to avoid a bunch of disk access…but I’m probably biased due to the stuff I’ve worked on.

                                Speaking of the stuff I’ve worked on, one project ended up settling for custom graph application logic ontop of a KV store and it worked out well. The other project used Cayley (opensource graph db) as an in-memory index for a while. We could rebuild it from our db if it went down, but if it was up then we could quickly use Gremlin to do some queries and then flesh out the results by querying Postgres for the rest of the record attributes. It worked fine. But in the end we dropped it and just used two postgres tables with Subject, Verb, and Object columns. Subject and Object are foreign keys to normal relational tables. There was some tuning involved, but in the end it worked just as well and we didn’t have to know/run two services. This is what I’d reach for first for any “graphy” project I come up against and if it fails I’ll at least know what the bottleneck is so I can optimize for it next.

                                Massive disclaimer: My opinions are colored by the domains I’ve worked in (large scale web services, automated supply chain logistics and tracking, and general user facing web apps).

                                1. 2

                                  Working on a blog post on how I model character desire and intent in my procedural cartoon generator. This of course comes with the requisite yak shaving on the blog itself.

                                  I’ll post it here if I can figure out the right tags for it…just the “programming” tag seems lame because it’s so generic and we don’t have anything like a “modeling” tag. Maybe “math” since it’s basically a self contained logic? I dunno, maybe it’s off topic.

                                  1. 6

                                    This is really cool! It’s always nice to see how fairly abstract domains (a graph of documents meant for consumption by people) are mapped to concrete structures. I hadn’t seen that method of creating the reference index before and it’s really awesome.

                                    This does make me wonder though, has anyone put forth a spec for a Zettelkasten transfer format? It seems like there are new tools everyday that are actually unique and not just clones of each other. If everyone home-rolls their own format then trying new tools seems prohibitive. A consistent way to import/export would be nice. Some might balk at the sqlite dependency, but I can see this as a first step towards something like that.

                                    One suggestion: It might be better to have an explicit tag for a reference ID. This would stop conflicts incase someone decides to store hex in their note, or wants to include an ID w/o it being a reference. It also means if I build a tool ontop of this, I don’t have to explicitly query the database to know if a string is a reference. I imagine wanting to show references without extra queries would be common for tools built ontop of it, especially for visual tools.

                                    1. 7

                                      This is my favorite blogpost of all time. It’s someone thoroughly exploring and playing with the idea of the Sierpinski Triangle and having an absolute ball. I don’t understand every part of the post, but to me this is the perfect embodiment of the fun of “doing math” and really exploring a space.

                                      • They play the Sierpinski Triangle on a piano
                                      • They build it with cellular automata
                                      • They plot it on a 3D model of a Cow
                                      • They explore Chaos with it
                                      • They explore it in higher dimensions
                                      • They smear different terms across space during construction
                                      • They view it as a Markov Chain
                                      • They view it as an L-System
                                      • They view it as a graph

                                      Just SO MUCH Sierpinski Triangle! I saw that u/pushcx posted this a few years ago and no one commented…but I really wanted to post it again because when I think of what types of blogs I want to read this always pops into my head. True exploration.

                                      1. 1

                                        This book links it with the Towers of Hanoi game.

                                      1. 2

                                        Plugging away at a complete rewrite (or..maybe first actual write since the previous attempts sort of imploded?) of my procedural animated tv show generator Harmon. Specifically plot and narrative generation first this time. There’s a real irony in it being so fun and all consuming, yet so pointless and unlikely to bear fruit.

                                        Also starting to work through the book “Term Rewriting and All That” as I have time.

                                        1. 2

                                          Well that’s another book added to my list, thank you. :)

                                          As for your Harmon project…that sounds incredible.

                                          1. 2

                                            That sounds very interesting, have you any links to share yet?

                                            1. 1

                                              Not yet, though I do plan on doing some write ups.

                                              Right now it’s effectively a term rewriting system that operates on an AST-ish structure informed by some theories from narratology. The AST-ish output isn’t super pretty and hard to understand without context. I’m focusing on the story generation right now so the workflow tends to be:

                                              • Read some of the work by Polti, Booker, or Propp on ways to break down and classify stories.
                                              • Become frustrated that they are ill-defined and very much non-rigourous (which wasn’t their intent anyway)
                                              • Read a bunch of math books and find ways to formalize things just enough that I can separate out code and data in a way that won’t exhaust my complexity budget.
                                              • Repeat.

                                              I’m trying really hard to not do anything with graphics, scene layout, constraint solving, etc yet. Previous attempts have shown I can get from plot -> rudimentary animation, but the sheer scope and lack of personal discipline in modularization and conceptual cleanliness makes the project bloated and hard to work with. This eventually kills the project until I come back to it. So I’m pinky-swearing that this time will be different, but avoiding that pitfall makes everything look like it’s going super slowly. It feels fast paced from the inside though, haha.

                                              I am planning on writing some debugging tools for it, so maybe when I do that I can try to make them accessible and make a short video using them to show progress.

                                          1. 5

                                            I appreciate this run through. My continually relevant tweet from 6 years ago is relevant once again, https://twitter.com/losvedir/status/636034419359289344.

                                            I will say that one area that the array language influence “stuck” was with CSS. For a while I preferred one line class definitions, with no line breaks between related classes, eg:

                                            .foo{display: flex; border: 1px solid #ddd;}
                                            .foo-child{flex: 0 0 100; padding: 1rem;}
                                            

                                            But then that made me more receptive to tailwind style utility CSS, so that’s where I am now.

                                            But array languages are so cool, and I really wonder how much is syntactic (terseness as a virtue, all these wonderful little operators), and how much is semantic (working on arrays, lifting operators to work at many dimensions). What would a coffeescript like transpiler from more traditional syntax to, say kdb/q, be like?

                                            1. 6

                                              IME, the real magic of APL, and what the numerous APL-influenced array languages have consistently lost in translation, are the concatenative, compositional, functional operators that give rise to idiomatic APL. They have taken the common usecases, but forgone the general ones. For example, numpy provides cumsum as a common function, but APL & J provide a more general prefix scan operator which can be used with any function, no matter whether primitive or user-defined, giving rise to idioms like “running maximum” and “odd parity” to name just a couple. Likewise, numpy has inner but it only computes the ordinary “sum product” algorithm while APL & J have the matrix product operator that affords the programmer the ability to easily define all sorts of unusual matrix algorithms that follow the same inner pattern.

                                              This is not even to mention the fantastic sorts of other operators, like the recursive power of verb or the sort-of-monadic under that AFAICT have no near equivalent in numpy.

                                              1. 1

                                                Is there a simple way for other languages to replicate the success, or do the designers just need to be brilliant?

                                                1. 7

                                                  I doubt brilliance has much to do with it. It’s likely more about exposure to the concepts coinciding with the motivation required to model them in a language or library. Especially in a way that’s accessible to people who don’t have previous exposure. Learning the concepts thoroughly enough to make it simple, and doing the work required to create an artifact people can use and understand is really difficult.

                                                  You see similar compositional surprises when looking at some of the Category Theory and Abstract Algebra inspired Haskell concepts. I imagine the current wave of “mainstream” interest in Category Theory will result in these ideas seeping into more common usage, and exposed in ways that don’t require all the mathematical rigor.

                                                  It’s important to realize that APL-isms are beautiful, but they are especially striking to people because it’s new to them. Set theory, the lambda calculus, and relational algebra are just some things that have similarly inspired in the past (and continue to do so!) that have spread into common programming to the extent that casual users don’t realize they came from formalized branches of mathematics. In my opinion this is a good thing!

                                                  Another exciting thing happening right now is the re-discovery of Forth. It has similar compositional flexibility, but goes about things in a very different way that corresponds to Combinatory logic. I would expect some people are going to reject the Abstract Algebra/Category Theory things as “too far removed from the hardware”, but be jealous of the compositional elegance. This will result in some very excited experimentation with combinatory logic working directly on a stack. Not that this hasn’t been happening in the compiler world with stack machines for decades…but it’s when non-specialists get ahold of things that innovation happens and things get interesting.

                                            1. 7

                                              J really bent my head. The approach is so different it feels like learning to program again. Each line is it’s own fun little puzzle. There aren’t many “concepts” to wrestle with like in languages with unique type systems or logic languages. You just get a few different starting tools that are quick to grok, but then have to figure out how to actually use them to do anything real.

                                              As a data science person, you probably have some exposure to array programming ideas through stuff like numpy. But J and other APLs go 100% in that direction, and the world looks really different through that prism.

                                              I never really write serious J. But the REPL is my go-to calculator and I have a personal library of work and life related functions that I add to as needed. It’s crazy how good it is as a scratchpad, and how much the language conforms to how you think about a domain. Forth has a similar property to a greater degree, but feels like it gets “unruly” in a way that J doesn’t.

                                              Familiarity with Array Programming has also helped me come up with cleaner designs in other languages. It’s also due for a renaissance imo, as GPUs and SIMD become more relied upon and as the data people realize how wasteful we’ve been with our machines.

                                              1. 3

                                                a personal library of work and life related functions

                                                Would you mind sharing some examples? Not asking for source code, just curious as to the sort of tasks you use this for :)

                                                1. 2

                                                  APL is def alien. It feels like an insanely powerful programmable calculator with the keys labeled with mysterious glyphs. The calculator’s data model is basically a Rubik’s cube filled with numbers, and to get good at APL you have to learn the macros / idioms that let you do a really powerful thing with a canned series of twists (I mean keystrokes).

                                                1. 1

                                                  Removing an assumption baked into my tool for understanding event based systems. I know how to remove it, but it makes the user think about one more concept. That’s ok, but I’ve been working hard to keep it as simple as possible. I’m really feeling the tradeoff.

                                                  1. 2

                                                    I have a couple small projects I might poke at, but overall I think I’m gonna avoid the screen and get my apartment in order. Yesterday was my last day of employment, and I’m starting my own little business on Monday…so I want this weekend to actually be a weekend. Plus, it’s absolutely gorgeous outside right now. Prime “listen to podcast while walking to the further-out grocery store” weather.

                                                    1. 16

                                                      The year of Prolog! Yes, I’m seriuous, last years we’ve seen flourish a new wave of Prolog environments (Tau, Trealla and Scryer) which this year can reach a production-ready status. At least, that’s what I hope and I’m helping this environments with some patches as well.

                                                      1. 19

                                                        year_of(prolog, 2021).

                                                        1. 6

                                                          There was even a new stable release of Mercury late last year. It’s, uh, I’m not personally betting on it getting wide scale adoption, but I do personally feel that it’s one of the most aesthetically pleasing bits of technology I’ve ever tried.

                                                          1. 5

                                                            A couple years ago I hacked on a Python type inferencer someone wrote in Prolog. I wasn’t enlightened, despite expecting to be, from a bunch of HN posts like this.

                                                            https://github.com/andychu/hatlog

                                                            For example, can someone add good error messages to this? It didn’t really seem practical. I’m sure I am missing something, but there also seemed to be a lot of deficiencies.

                                                            In fact I think I learned the opposite lesson. I have to dig up the HN post, but I think the point was “Prolog is NOT logic”. It’s not programming and it’s not math.

                                                            (Someone said the same thing about Project Euler and so forth, and I really liked that criticism. https://lobste.rs/s/bqnhbo/book_review_elements_programming )

                                                            Related thread but I think there was a pithy blog post too: https://news.ycombinator.com/item?id=18373401 (Prolog Under the Hood)

                                                            Yeah this is the quote and a bunch of the HN comments backed it up to a degree:

                                                            Although Prolog’s original intent was to allow programmers to specify programs in a syntax close to logic, that is not how Prolog works. In fact, a conceptual understanding of logic is not that useful for understanding Prolog.

                                                            I have programmed in many languages, and I at least have a decent understanding of math. In fact I just wrote about the difference between programming and math with regards to parsing here:

                                                            http://www.oilshell.org/blog/2021/01/comments-parsing.html#what-programmers-dont-understand-about-grammars

                                                            But I had a bad experience with Prolog. Even if you understand programming and math, you don’t understand Prolog.

                                                            I’m not a fan of the computational complexity problem either; that makes it unsuitable for production use.

                                                            1. 2

                                                              Same. Every time I look at Prolog-ish things I want to be enlightened. It just never clicks. However, I feel like I know what the enlightenment would look like.

                                                              I don’t fully grok logic programs, so I think of them as incredibly over-powered regexes over arbitrary data instead of regular strings. They can describe the specific shape of hypergraphs and stuff like that. So it makes sense to use it when you have an unwieldy blob of data that can only be understood with unwieldy blobs of logic/description, and you need an easy way to query or calculate information about it.

                                                              I think the master would say “ah, but what is programming if not pattern matching on data”? And at these words a PhD student is enlightened. It seems to makes sense for both describing the tree of a running program and smaller components like conditionals. It also seems like the Haskell folk back their way into similar spaces. But my brain just can’t quite get there.

                                                              1. 2

                                                                Sorry to hear that. For me Prolog is mainly about unification (which is different from most pattern matching I’ve seen because you need to remember the unifications you’ve done before between variables) and backtracking (which was criticized for being slow but in modern systems you can use different strategies for every predicate, the most famous alternative is tabling). For the rest, it should be used as a purely functional language (it is not and lots of tutorials use side effects, but keeping yourself pure you can reason about a lot of things making debugging way easier).

                                                                I did Prolog at university (which is not very rare in Europe) and we studied the logic parts of Prolog and where they come from, and yes, it’s logic but it’s heavily modified from the “usual way” to perform better and it’s not 100% mathematically equivalent (for example using negation can produce bad results, no occurs check, …) and it uses backward-chaining which is the reverse to what people usually learn. Also lots of people use ! which can be used to improve performance to cut solutions, but it makes the code non-pure and harder to reason about.

                                                                However what I really liked about Prolog was the libraries that are made using the simple constructs. Bidirectional libraries that are very useful like dcgs (awesome stuff, I did some Advent of Code problems only using this “pattern matching over lists” helpers), findall, clpz, reif, dif, CHR if you want forward-chaining logic is also available in most Prolog systems,…

                                                                Yes, computational complexity is a problem, having backtrackable data structures will always have a penalty but it’s not unfixable and there are ongoing efforts like the recent hashtable library.

                                                                Having that said, at the end it’s also a matter of preference. I’ve seen in the repo that you consider Haskell easier. In my case it’s just the opposite, Prolog fits my mind better there are fewer abstractions going on than in Haskell IMO.

                                                                For some modern Prolog you can checkout my HTTP/1.0 server library made in Prolog: https://github.com/mthom/scryer-prolog/pull/726/files

                                                                1. 1

                                                                  FWIW I think I’m more interested in logic programming than Prolog.

                                                                  I am probably going to play with the Souffle datalog compiler.

                                                                  And a hybrid approach of ML + SMT + Datalog seems promising for a lot of problems:

                                                                  https://arxiv.org/abs/2009.08361

                                                                  Prolog just feels too limited and domain specific. I think I compared it to Forth, which is extremely elegant for some problems, but falls off a cliff for others.

                                                            1. 10

                                                              I’m putting in my 2 weeks notice today so I can go off and do my own thing! So I would guess this week will be kicking off the wrap up process, answering questions, and documenting things.

                                                              1. 3

                                                                Good luck. Whats the next for you?

                                                                1. 5

                                                                  I’m building a tool to help observe and understand event based systems. I’ve needed it at every single job I’ve ever had and have been frustrated that tooling to scale event pipelines and event data gets so much attention but no one puts effort into tooling to understand the data. I’ve put alot of thought into it over the last few years and think I can hide most of the mechanics to just give direct insight. Ideally it should be clear enough that you don’t have to be technical at all to view the data and understand what’s happening.

                                                                  There wasn’t a way for me to build it when working at a company because event systems were how things were implemented, but the business problem we were solving was totally separate so I could never get time to dedicate to attacking the problem. I’m now taking the time to try to actually solve it.

                                                              1. 5

                                                                Nope, it’s the holiday week. Doing things is out of the question.

                                                                1. 6

                                                                  I’m definitely not an expert on garbage collection and haven’t thought it through, but the first thought I had was how useful this would be as an “infrequent access” GC’d heap in addition to a normal GC’d heap. So you allocate things in a normal mark-and-sweep heap, and if a thing hasn’t been dereferenced in a certain number of sweeps then it’s worth the overhead of moving it to the Infrequent Access GC’d heap. That way the normal “hot” GC doesn’t have large pauses checking everything, and you only pay the O(log32(n)) cost of dereference for things you know aren’t being accessed frequently.

                                                                  Again, I know very little about GC and it’s likely this is already something Generational GCs do. But I can think of many data services I’ve personally written where it seems like that’d be an almost ideal way to handle things (assuming a GC at all). One could imagine a “stow” keyword in a language to explicitly set aside objects you wanna keep around but probably won’t be used for a while.

                                                                  1. 7

                                                                    Thank you for commenting! I was trying to fully remove mark-and-sweep from the main thread, while also providing compaction/defragmentation. I wanted to get down to small O(1) pause times, even at the expense of making dereferences of handles more expensive, because I felt that this would be beneficial for the ergonomics of a language. (Once one accepts the cost of the dereferences being a little more expensive, they are distributed evenly throughout the program, and GC pauses essentially go away / are free). I think this is a sensible tradeoff. There may be other useful tradeoffs to make, where perhaps mark-and-sweep still exist to some degree on the main thread, as you’re suggesting, but I imagine present generational garbage collectors may already hit the mark (no pun intended) there.

                                                                  1. 42

                                                                    xsv! At work people often communicate data via csv/tsv. For better or for worse these can sometimes be multiple gigabytes. xsv lets me easily slice and dice things, or shove together a quick Unix pipeline to process it in well under a minute before someone else can even set up whatever unnecessary distributed big data tool they are primed to reach for. Plus, being able to easily “join” two csvs on a column without any prep is a godsend.

                                                                    /u/burntsushi should setup that github sponsors thing (or equivalent). Between xsv and the rust regex crate I owe him at least a couple beers.

                                                                    1. 35

                                                                      Thanks so much for the kind words. Instead of sponsorship or beers, I suggest donating to your favorite charity. :-)

                                                                      1. 2

                                                                        Out of curiosity: is there a a CSV tool that supports the conversion between narrow and wide representation of data? By this I mean that most people prefer a presentation with many columns (and hence many values) but conceptually it is often better to have a narrow representation where each row represents a single observation. In R this is a common theme.

                                                                        1. 3

                                                                          xsv flatten will do that. The output is itself CSV, so you can xsv flatten data.csv | xsv table to get a nice aligned output display.