Threads for eatonphil

  1. 16

    I wish ‘newly born language’ posts would include:

    • What makes it special? Why is this language not like all other languages?
    • What makes it hard? What new concepts do you need to wrap your brain around?
    • What would it be great at writing? What classes of applications, e.g., hard real-time, built for the cloud, etc., does it address well?
    1. 5

      It makes delightful software.

      Roc is a new purely functional programming language built for speed and ergonomics.

      The readme doesn’t explain anything. It is a work in progress (what isn’t?).

      https://github.com/roc-lang/roc/blob/main/examples/algorithms/quicksort.roc

      https://github.com/roc-lang/roc/blob/main/examples/benchmarks/NQueens.roc

      It looks like ML with a Haskell like syntax.

      1.  

        The readme doesn’t explain anything. It is a work in progress (what isn’t?).

        (This readme.) If it doesn’t explain anything, it’s not a work in progress yet.

        I clicked the link in the submission and the readme didn’t tell me anything at all. Not a single slight hint of why software written in this language is delightful. That is subjective, but I was curious about the claim anyway.

        1.  

          Why does it look like an ML?

          1.  

            It’s heavily inspired by Elm. It’s Elm, but not for front end websites, and all that entails.

            1.  

              I don’t understand what that has to do with ML. As far as I can see these (Elm and Roc) don’t look anything like ML. They just look like Haskell.

              1.  

                …and Haskell looks like ML

                1.  

                  Haskell is an ML descendent. It’s branched off a bit, but it’s related. See the family tree in the timeline.

                  https://courses.cs.washington.edu/courses/cse341/04wi/lectures/02-ml-intro.html

          2.  

            My understanding from previous materials:

            • It’s a pure functional programming language. They are also using the Perceus algorithm to for reference counting so that they can perform opportunistic mutation of immutable data structures. With some thought it can be as fast at imperative algorithms as fast imperative languages (e.g., Go, Java).
            • There are performant, general purpose pure functional languages, but they tend to be difficult to learn. Elm is an easier to learn pure functional language, but it’s built for front ends. This language is going to try to bridge the gap.
            • CLI and web services, I believe

            That said, this is all kind of new. If I’m wrong about any of this, I’d love to know.

          1. 3

            SBCL (or Common Lisp in general) is pretty boring and supports single binary deploys.

            For example: https://stackoverflow.com/questions/14171849/compiling-common-lisp-to-an-executable.

            1. 5

              Hey Sam, welcome to lobsters!

              1. 2

                Thank you!

              1. 3

                I guess I’m out of the loop. I thought R7RS has been around for a while now. What is the process discussion about?

                1. 5

                  R7RS small was ratified in 2013, but work on R7RS large is ongoing. This work seems to largely involve adding SRFIs to the large language standard, which has evolved over a series of “editions”. https://github.com/johnwcowan/r7rs-work/blob/master/R7RSHomePage.md

                  1. 5

                    Ongoing discussion on R7RS-large is happening here nowadays: https://codeberg.org/scheme/r7rs/issues

                  2. 3

                    I recall that in the early days, the Scheme standard was controlled by a large group of notable researchers, and a unanimous vote was required to add a new feature. That process collapsed when R7RS-large was proposed. Now it looks like the work is being continued by John Cowan and 3 other people (looking at the codeberg link from @Drakonis), using the SRFI process to generate and test proposals (each SRFI is implemented by multiple Scheme implementations before it gets added to the standard).

                  1. 12

                    Parts of functional programming that I still like: everything is an expression (especially “if”) , map/filter, constant values by default.

                    Parts of functional programming I’m over: recursion for everything, reduce, variable shadowing. I’m sure there’s more but I can’t remember right now.

                    Separate from this is a discussion of type systems. But expressive type systems and functional programming have no real connection.

                    1. 9

                      I’m curious why you like map/filter but dislike fold/reduce?

                      recursion for everything

                      Most recursion should be invisible anyway

                      1. 4

                        Map and filter are stateless, basically.

                        1. 3

                          I’m curious what you think of these:

                          map f as = reduce (a bs -> f a : bs) empty as

                          filter p as = reduce (a bs -> if p a then a : bs else bs) empty as

                          1. 7
                            map(f, as) = {
                             out = []
                             for a in as {
                              out.append(f(a))
                             }
                             return out
                            }
                            
                            1. 1

                              That’s just an imperative way to write the same thing.

                              1. 2

                                That’s my point. Saying “you can implement map as a reduce” isn’t any more insightful than saying you can implement it as a for loop.

                                1. 1

                                  But then why would having reduce be worse than having for loops?

                                  1. 1

                                    The conversation was about why @eatonphil likes map but not reduce, “even though” you can implement map with reduce.

                                  2. 1

                                    My motivation was moreso that state is present regardless of which you use, it’s just hiding behind the interface of those functions. I have no qualms about writing a for-loop :)

                              2. 4

                                If you use them under the hood sure. I just don’t like reviewing code with reduce in it because it takes me longer to figure out what it’s doing than if it was a for loop modifying a variable.

                                1. 12

                                  A for loop modifying a variable doesn’t have any less state than a fold. Arguably, it has more. Fold forces the stateful parts of the routine to be isolated and made explicit.

                                  1. 3

                                    That’s a good argument for what I said about statelessness (which I agree was not well put) but you’re responding to my better argument that it’s really about readability.

                            2. 3

                              I have written many KLOC of FP and recursion was rarely needed. It’s cryptic and not very declarative.

                              I like point-free style as in: https://pages.cpsc.ucalgary.ca/~robin/class/449/Evolution.htm

                            3. 4

                              reduce

                              The way I like to think about it, there’s reduce, and there’s fold. reduce is a fold where the operation is monoidal – theer’s a neutral element, and the operation is associative (so the directionality of fold doesn’t matter, it can even be computed in parallel). This reduce is good and readable: sum(xs) = reduce(+, xs). The general fold, where directionality matters, is indeed unreadable. Sadly, few languages provide reduce without providing fold.

                              1. 1

                                Recursion is useful (try writing a parser without it!), but tail recursion can take a hike. If you don’t need to keep unique state over the tree, that’s a good indicator that recursion is overkill for your usecase.

                              1. 6

                                I’m not an expert so maybe that’s that but I found the original paper kinda weird.

                                Copying my comment (and quote of the original paper) from a previous submission of the original paper:

                                Despite these apparent success stories, many other DBMSs have tried—and failed—to replace a traditional buffer pool with mmapbased file I/O. In the following, we recount some cautionary tales to illustrate how using mmap in your DBMS can go horribly wrong

                                Saying “many other DBMSs have tried — and failed” is a little weirdly put because above that they show a list of databases that use or used mmap and the number that are still using mmap (MonetDB, LevelDB, LMDB, SQLite, QuestDB, RavenDB, and WiredTiger) are greater than the number they list as having once used mmap and moved off it (Mongo, SingleStore, and InfluxDB). Maybe they just omitted some others that moved off it or ?

                                True they list a few more databases that considered mmap and decided not to implement it (TileDB, Scylla, VictoriaMetrics, etc.). And true they list RocksDB as a fork of LevelDB to avoid mmap.

                                My point being this paper seems to downplay the number of systems they introduce as still using mmap. And it didn’t go too much into the potential benefits that, say, SQLite or LMDB sees keeping mmap an option other than the introduction when they mentioned perceived benefits. Or maybe I missed it.

                                1. 5

                                  I’m maybe at the “apprentice-expert” level, and I also had issues with the original paper’s conclusions, although it did make a lot of good points. Real, production DBs do use mmap successfully. (You can add libMDBX to that list, although it’s a fork of LMDB.)

                                  Anyway, this response article is solid gold. Great stuff. I’m not familiar with Voron — it’s interesting that it apparently uses a writeable mapping and lets the kernel write the pages. Other mmap-based storage engines I know of, including my incomplete one, use read-only mappings and explicit write calls. The writeable approach seems dangerous to me:

                                  • You don’t know when those pages are actually going to be written. (OTOH, you don’t really know that with a regular write either; and in any case you’re going to make some kind of flush/sync call later when committing the transaction, after which everything is known to be on-disk.)
                                  • A writeable memory map is vulnerable to stray writes. So unless your engine is written in a memory-safe language (and so is everything else it links with, and the app itself if this is an embedded db), a memory bug might cause garbage to get written into any resident page. That’s horrible, and nearly impossible to debug if you can’t reproduce it in an instrumented build…
                                  1. 2

                                    I agree the response article was good.

                                    I once did writable mmaps with backing store over NFS (meaning slow) and the buffer cache became 100% dirty pages since storage could not even come close to keeping up with some rapid creation of data. At least back in 2009, this led to a rapid Linux kernel crash/panic, essentially from an OOM condition. Even back then, I imagine a sysadmin could have tuned some /proc|/sys stuff to make it somewhat more robust, but I did not have admin rights. They may have hardened Linux defaults/behaviors to such scenarios by now, but my guess would be that “default tuning” remains risky on any number of OSes for mmap writes the IO cannot keep up with.

                                    Note, this is a more subtle problem than OOM killer tuning/etc. Once you dirty a virt.mem. page, it becomes the kernel’s responsibility to flush. So, there is no process to kill. You just have to hope you can flush the page eventually. An NFS server that was fully hung could dash that hope indefinitely..(not to suggest that running a DBMS on top of any kind of network filesystem is a smart idea…just elaborating on the general topic of “hazards of writable mmaps”).

                                    1. 4

                                      Yes, everything I’ve heard about mmap and network filesystems comes down to basically “never do this.”

                                      (Which has implications for using memory-mapped databases as application data formats — since documents are not unlikely to live on network volumes. That’s one reason the toy storage manager I was working on supported either mmap or a manual buffer store.j

                                      1. 1

                                        Re: your “basically”, one can be pretty sure application data is “small” sometimes, though, like in this little demo of a thesaurus command-line utility that I did (inspired by this conversation, actually): https://github.com/c-blake/nio/blob/main/demo/thes.nim (needs cligen-HEAD to compile at this time..). :-) :-)

                                        At least with that Moby Thesaurus, space written with mmap is a mere 1.35 MiB and saving the answer unsurprisingly provides a many orders of magnitude speed-up. The bigger 10 MiB file can be more easily streamed to storage.

                                        (Yes, yes, it could be another 4-8X faster still or maybe much more in a worst-case sense, but would get longer, uglier, and harder to understand, defeating its demo code purposes as well as needing to do a second pass & more random writing – not that 10..20 MB scale is a real problem.)

                                1. 2

                                  I’m a raw beginner at these things but I love a title that gives me ideas before I’ve even read the paper. Now to try to comprehend and find out how much I don’t know.

                                  1. 3

                                    I recommend checking out /r/databasedevelopment if you want to learn more. :)

                                    1. 1

                                      /r/databasedevelopment

                                      I subbed a few weeks back, thank you for this ;)

                                  1. 6

                                    I’m a bit suspicious of the results of this one. I would expect a well-implemented SQLite implementation to beat PostgreSQL for this class of read-heavy application due to skipping the network overhead between the server and the database - my impression was that the queries look to be simple enough that deep algorithmic differences between the two databases shouldn’t have much if any impact.

                                    So my hunch is this is more about differences in how the Python/SQLAlchemy code interacts with the databases than a fair comparison of the databases themselves. Or maybe shows that, without careful optimization (well selected indexes for example) PostgreSQL outperforms SQLite - which is a useful observation if that’s the case.

                                    I could well be wrong in this hunch though.

                                    1. 5

                                      I agree. I’m suspicious of Python’s single-threaded interpreter here. If the SQLite glue calls the C SQLite API directly, it’s going to tie up the interpreter, resulting in poor concurrency. Whereas Postgres glue will end up sending a request-response over TCP, which is probably optimized to work like other Python I/O and not tie up the interpreter.

                                      1. 3

                                        I’m a bit suspicious of the results of this one. I would expect a well-implemented SQLite implementation to beat PostgreSQL for this class of read-heavy application due to skipping the network overhead between the server and the database - my impression was that the queries look to be simple enough that deep algorithmic differences between the two databases shouldn’t have much if any impact.

                                        FWIW, there’s only one SQLite implementation. Different vendors and distros may compile it differently, but most of them aren’t going to make any substantial changes.

                                        And there’s no network overhead in this case because PostgreSQL is running on localhost, at least that’s what the connection string in the article shows. In most cases reading and writing to localhost can skip huge swathes of the networking stack, so there should be very little overhead.

                                        1. 1

                                          FWIW, there’s only one SQLite implementation.

                                          That is not completely true. :) But I don’t think this is what Simon is trying to say.

                                        2. 1

                                          That was my hunch as well. We need to know what the indexes are before it’s a fair comparison.

                                          1. 12

                                            Read the rest of this story with a free account.

                                            no thanks

                                            1. 6

                                              I’m using scribe.rip as an alternative frontend to medium. Just replace it in the URL. (Fully agreeing that one shouldn’t have to do this)

                                          1. 1

                                            It looks like at the moment this treats each SQL query as a separate process - it runs both ssh and the sqlite3 command over that SSH connection to completion for each query it executes.

                                            Evidently this works OK - and maybe it performs well enough that it’s not a problem for an application with just one user. But I wonder if it could instead maintain a persistent SSH connection with the sqlite3 shell running on the remote machine as a persistent process, then feed it commands one by one and read the responses?

                                            The implementation would be quite a bit more complex so this would only be worthwhile if the performance improvement made things materially better for the user.

                                            1. 3

                                              Hey it’s a big improvement on how remote SQLite works in DataStation right now which is to copy the entire file locally before querying. I’m going to have to port this idea.

                                              1. 1

                                                I also thought about having a persistent sqlite3 shell. In the past, I’ve tried playing with persistent subprocesses reading and writing data and always had problems with stdin/out getting locked, buffered writes and other issues which haven’t let me have a working implementation (I tried this with a background ripgrep process). At some point, I plan to try again using asyncio subprocesses. I think most systems could support a few 100s of ssh subprocesses as long as the connections are multiplexed and the initial authentication process is only done by the first connection.

                                                Having a persistent shell also has the “clean-up problem”. If the litexplore app dies, I didn’t want to leave the shell running. I’m considering offering an option by which a user can upload a temporary sqlite3 binary compiled with the -json flag, this makes me be more cautious about what I do in the remote VM.

                                                I’ve also been doing some (promising, for now) tests with sshfs.

                                              1. 2

                                                Has anybody benchmarked this against other options for similar results such as CSVKit’s csvsql or the Zed project’s zq query tool? Less SQL like options like the AWK variant that has direct CSV support and duckdb or others would be interesting to see benchmarked too.

                                                1. 1

                                                  There are links to benchmarks in this post. :)

                                                  1. 1

                                                    Only 1 of the 4 I listed is benchmarked or even mentioned in any of the links I see.

                                                1. 2

                                                  Reminds me of the textql utility which always seemed very cool to me.

                                                  1. 1

                                                    Huh, I hadn’t seen textql before. Looks like it even uses SQLite under the hood!

                                                    1. 3

                                                      See also: sqlite-utils, dsq (I develop), harelba/q, trdsql. General comparison here, benchmarks here.

                                                      1. 6

                                                        This was extremely interesting, thanks . And It showed me there’s also a org Babel plug-in! https://github.com/fritzgrabo/ob-dsq

                                                  1. 7

                                                    This is written by a Kubernetes developer and the argument isn’t considered that one should not ever use Kubernetes. Nor that a few bare metal servers (without containers) might be most cost effective for, what I suspect is, 95% of companies.

                                                    1. 1

                                                      Even if you’re big, sometimes. I was building a new service at a big cloud provider recently, and we ended up doing basic Linux VMs, no containers. Our service involved a number of stateful moving parts that had to interact with each other on the same system, and containerization would have made that far more difficult than straightforward systemd services.

                                                    1. 2

                                                      Automatic query projection: The compiler currently only inspects the lambda passed to the filter function. However, if we inspect whole endpoints, we can — via escape analysis — determine what properties of an entity are actually used, and perform automatic projection. That is, the runtime can simply not SELECT unused columns in the underlying SQL query.

                                                      Interesting idea but not necessarily viable to do automatically since you can’t be sure clients aren’t using some undocumented/untyped fields in a response.

                                                      Overall good article though. This is the first time I’m starting to understand what chiselstrike is.

                                                      1. 1

                                                        Right, so the idea is that if we can prove in the endpoint handler that an object never escapes, we can project the query automatically. For example, imagine the following (pseudo) code:

                                                        export default async function (req: Request) {
                                                            const users = User.findAll();
                                                            return users.map(user => user.username);
                                                        }
                                                        

                                                        we know that only the username property escapes and, therefore, can project the query just for that column in the database.

                                                        1. 2

                                                          Thats exactly what I’d be worried about. Why would you only return the one column? Nowhere in there did I indicate I wanted only one column returned or that I expect the clients to only use the username field.

                                                          1. 1

                                                            Sorry for not being clear, in that example endpoint, I specifically wrote code to return a response with a list of usernames (that map() call there does it).

                                                            IOW, what that endpoint would return is something like:

                                                            [
                                                              "penberg",
                                                              "eatonphil"
                                                            ]
                                                            

                                                            That is, in the example, the logic of the endpoint is to return only some specific properties, not whole objects. Clients cannot, therefore, expect any other properties to be there because it’s not in the contract of that endpoint.

                                                            If you want an endpoint with more stuff, then you’d return a list of user objects:

                                                            export default async function (req: Request) {
                                                                const users = User.findAll();
                                                                return users;
                                                            }
                                                            

                                                            and now no projection can happen, obviously, and you will get all the properties that User has.

                                                            (And if this explanation still wasn’t clear, happy to continue discussion at our Discord, for example.)

                                                            1. 2

                                                              Oh I’m sorry I missed the map. I was thinking it was a find/filter. Yes that makes sense.

                                                      1. 4

                                                        I’ve been a fish shell user for more than a decade. I cannot for the love of me understand this fixation on zshell. Why is it more popular than fish?

                                                        The author suggests writting autocompletion, installing a plugin dependent on an external program. If you wrote a manpage (which is also suggested) you get those three things for free with fish. Out of the box.

                                                        1. 9

                                                          Why is it more popular than fish?

                                                          zsh is mostly bash compatible and fish is not really. I don’t want to learn a new shell language right now. The few times I need to write something bashy interactively (not a script I can call bash on) it normally works inside zsh.

                                                          1. 5

                                                            Having tried both, I think fish is nicer but I couldn’t use it. I already had muscle memory for bourne shell syntax and switching to fish made that stop working - I was constantly trying to type things that did not work, and then having to go manually look up the fish equivalent.

                                                            I would go so far as to say that there are a lot of places where fish is gratuitously different from sh, in that the syntax is different but not meaningfully better, just different.

                                                            1. 3

                                                              Off topic: I’m a full-time fish user now, and I had no problem with it, although I’m rather used to using multiple languages at the same time. I contributed new completions for fish—something I never got to do with bash. However, these days I also avoid shell scripting like a plague, so I may be starting to forget how to write in Bourne shell already, but it doesn’t really bother me anymore. ;)

                                                              On topic: I agree that CLI confusion with TUI is a grave sin. Borders and other decorations have no place in CLIs at all. All CLI tools must be capable of detecting non-interactive use and automatically stoppind all output except actual data (and should provide a way to switch to that mode by hand). In a TUI, of course, anything goes, although accessibility considerations still exist there.

                                                              1. 2

                                                                From what I remember, completion scripts are one of the places where fish is vastly easier to understand than bash or zsh.

                                                                However, these days I also avoid shell scripting like a plague

                                                                It wasn’t that I was trying to write shell scripts, it’s that at that point in my life I was doing a lot of ad-hoc for X in $(y); do ... done and xyz | while read X; do ... done one-off one-liners. For scripts worth reusing I had Python and liked it.

                                                              2. 1

                                                                This comment reflects my experience and opinions on fish as well. In summary: in theory, I like it more than zsh, but in practice its uniqueness gets in the way too much.

                                                              3. 1

                                                                Bash compatibility really seems to be the primary reason. Unix refuses to die, and so does Bash (I blame POSIX sh).

                                                                1. 1

                                                                  I’ve personally found fzf-tab to be miles better than Fish tab-completion, especially alongside “z” or others like it (z.lua, zoxide), fzf history search, etc. Being able to fuzzy-search directories and tab-completions with the same interface, often with preview windows, is an absolute game-changer.

                                                                  I just used fzf-tab as an example of the benefit of adding simple, short, busybox-style help text alongside a more comprehensive manpage. Manpage explanations of CLI flags tend to be too long for ideal shell-completion; I’d be less likely to get the most relevant fuzzy matches first.

                                                                1. 3

                                                                  Go 1.19 supports the Loongson 64-bit architecture LoongArch on Linux (GOOS=linux, GOARCH=loong64).

                                                                  Interesting, I thought loongson was just an x86 clone. I wonder how they do automated testing for this too. It didn’t seem like I could get any loongson SoCs/dev boards last time I looked. But the Go team is better connected than I am…

                                                                  1. 1

                                                                    Loongson has always been a MIPS derivative AFAIK.

                                                                    1. 1

                                                                      Oh yeah you’re right. Either way I thought it was just a clone. But I guess it differs enough from MIPS that they needed a new architecture to represent it in Go.

                                                                  1. 4

                                                                    Five part series on building a minimal but high quality ecommerce site. Pretty interesting.

                                                                    1. 1

                                                                      Very cool, though I kept expecting the article to get into CPU vector instructions, or vector libraries. But I guess since they’re coding in Go they can’t directly access that level of optimization(?) Does Go’s compiler do any auto-vectorization? Or did they get this speedup entirely from other factors like fewer heap allocations and better cache coherency?

                                                                      1. 1

                                                                        Yeah I was a little confused too.

                                                                        Does Go’s compiler do any auto-vectorization?

                                                                        No I don’t think it does.

                                                                        Or did they get this speedup entirely from other factors like fewer heap allocations and better cache coherency?

                                                                        I assume this is the case.

                                                                      1. 1

                                                                        One or two people should be handle it — it’s not a huge industrial project.

                                                                        I think you’re missing a word there.

                                                                        For example the leaf Token type can be used a variant of both word_t and expr_t.

                                                                        And there.

                                                                        1. 1

                                                                          Thanks, fixed!

                                                                          Aside: I just wrote a little Markdown/HTML spell checker with Python/shell after trying some canned Unix tools, which was super useful on the docs. Wondering if they would have caught these non-spelling typos :-/

                                                                          1. 3

                                                                            My low-tech technique is to read the post out loud. Normally a bunch of times after I post because I’m lazy. ;) Somehow you seem to catch missing words more easily when reading out loud than just reading silently as usual.