1. 5

    I love mermaid! I use them in my obsidian notes all the time.

    1. 2

      Do i need a plugin for that?

      1. 2

        It’s supported out of the box, you can create a markdown code block with the language as mermaid (starting with 3 backticks followed by mermaid and ending with 3 backticks)

      2. 1

        One great thing about Azure DevOps is the wiki supports mermaid

        1. 1

          Makes me think GitLab does also support mermaid: https://about.gitlab.com/handbook/tools-and-tips/mermaid/, which can be pretty handy to use in the wiki or to answer in some issues.

      1. 3

        Multimodal and sparse models aren‘t a new idea. They‘ve also been implemented before. I‘m not exactly sure what this blog entry is supposed to convey?

        1. 2

          I’m not an AI specialist by any means, but I was also confused because it did not mesh at all with my understanding of the field. In my limited experience, most ML models aren’t trained from scratch, they’re built by taking an existing computer vision or language processing model and hacking it to work with a new domain (which leads to some very fun artefacts). Transfer learning is also very much a thing when reusing models across similar domains and my understanding was that existing ML toolchains are quite heavily geared to that workflow.

          1. 1

            I agree it’s lacking details, though Jeff Dean has earned my respect and I’m anticipating what this means when there’s details publicly. To be fair, he didn’t claim any discoveries or inventions. It’s clear that it’s just a new way of thinking about problems. I’ve seen many times in software engineering how thinking about an old problem in new ways can lead to breakthroughs.

          1. 18

            This resembles my experience in adding types to existing projects: you almost always find a few a couple of real bugs. The other thing is that typechecking speeds up development: mypy is usually quicker to run than the testsuite so you waste less time before finding out you’ve made a silly mistake.

            1. 4

              I wholeheartedly agree, however, the type errors can be dizzying for programmers who aren’t software engineers. I work with data scientists & product managers who contribute Python code, and adding mypy types had some negative effects to their ability to contribute. Overall, I think we came out ahead; I’m thankful for mypy. I’d love to see better error messages.

              1. 5

                Yeah, this is somewhere where I think most type checkers/compilers leave a ton of value on the table – tracking down a bug caught by a type error is usually much easier than than one caught by a test suite (or in prod…), because it points you to the source of the error rather than the eventual consequences of not catching it. But then many type checkers do a poor job of explaining the error, which undermines this. Elm deserves mention for doing a particularly good job here.

                1. 3

                  I would rather teach data scientists who use Python about how to use type annotations than forego using them in Python programs just in case a data scientist needs to touch that code.

                  1. 2

                    I work on pytype, and we do try to improve the error messages where we can (e.g. here’s a recent commit improving “primitive types ‘str’ and ‘int’ aren’t comparable” to “primitive types ‘x: str’ and ‘10: int’ aren’t comparable”), however when you’re down in the weeds of developing a type checker it can often be hard to notice an error message is not readily comprehensible or helpful. I would encourage you to file a bug with mypy whenever you find an error message hard to read.

                1. 2

                  What does this mean? (regarding AVIF)

                  It offers much better lossless compression compared to PNG or JPEG formats, with support for higher color depths and transparency.

                  PNG and JPEG are lossy. Is AVIF lossy also and it’s just a typo? Or is it lossless and yet still better compression than PNG or JPEG?

                  1. 8

                    PNG can be lossless, though it’s not always used that way.

                    “Although PNG is a lossless format, PNG encoders can preprocess image data in a lossy fashion to improve PNG compression.”
                    https://en.wikipedia.org/wiki/Portable_Network_Graphics#Lossy_PNG_compression

                    1. 6

                      According to Wikipedia AVIF supports lossless and lossy compression: https://en.wikipedia.org/wiki/AVIF

                      1. 5

                        PNG is lossless…

                      1. 5

                        I’m a huge fan of both Rust and Delta Lake, but my eyebrows shot off my face when I saw “exactly once delivery”

                        EDIT

                        I asked about this years ago and @aphyr and @mjb gave me memorable answers about why exactly once delivery isn’t possible. I highly recommend reading it. More recently, we’ve discovered that there are some cases where exactly once delivery is possible, but the semantics are very difficult to grok, to the point that it’s probably best to only claim “exactly once” when you’re in the presence of extremely knowledgeable people. Any “exactly once” guarantees require a strict protocol with the client, so the semantics don’t bubble up to larger systems.

                        For example, someone might extrapolate from this title that when I stream the Delta Lake table that I’ll receive the message exactly once. That’s not true, Delta Lake doesn’t give those guarantees, neither does Kafka. Only the connector from Kafka into Delta Lake gives the guarantees.

                        It’s still useful, to be sure. But be careful about the semantics.

                        1. 4

                          Thanks @kellogh for the links. I fully agree with what you said and what was said in the discussion you linked. Like you said, it all comes down to semantics.The kafka-delta-ingest project is a Kafka to Delta connector. What I meant in the title is we delivery a message from Kafka to Delta Table exactly once. Notice I used the phrase “from Kafka to Delta Lake” in the title, not “From Kafka to Delta Lake to your client” ;) It certainly doesn’t make sense to talk about exactly once delivery to an opaque client in a physical sense. In real world distributed systems, messages get redelivered all the time. The consumer of a Delta Table or Kafka topic will need to have its own atomic progress tracking capability in order to process the message exactly once logically.

                        1. 11

                          These aren’t for comments, but rather for replies on a review thread. I think it is unwise to overload the term ‘comment’ in computing.

                          For code comments, I have been using https://www.python.org/dev/peps/pep-0350/ for this for a long time, and recommend it to others.

                          For review responses, I suppose this looks decent enough, although the use of bold assumes styled text; I would prefer all-caps, as has been conventional in unstyled text for quite awhile. When styles are available, bold and all-caps is quite visually distinct.

                          1. 4

                            These aren’t for comments, but rather for replies on a review thread. I think it is unwise to overload the term ‘comment’ in computing.

                            These are comments, readers are supposed to understand that we’re talking about something different from code comments by context. This is absolutely not an unreasonable expectation. Both my kids have understood contextual words without being taught. Context really is intuitive to human nature and it’s perfectly reasonable to use the same word in different contexts to mean something different.

                            1. 4

                              “Reviews” is the standard word for this.

                              1. 1

                                The only real context here outside of TFA itself is computing, or maybe slightly more broadly, technology. All we see on this site, which is generally full of programming minutia, is “comments”, in both the title and domain name. The use of the word “conventional” only makes it worse: conventions in code comments are an almost universally recognized and common thing, conventions in reviews, not so much. One might even argue that this is nearing territory considered off-topic on this site (being not particularly technical).

                                I’d be low-key surprised if anyone here assumed differently. This is actually my second click through to this article, because although I read the whole thing the first time, it didn’t even occur to me that this link was to that article, and not something on code commenting practices or whatever that I missed before.

                                Sure, anyone who actually reads the whole thing and comes away confused… well, has bigger problems… but it’s still a poor choice of words. Maybe this is a superficial bikeshed, but that sort of thing is pretty important when the whole point is to define a soft standard for things with a standard name. Even in the context this is specifically intended for (code review), I’d assume that “conventional comments” was something about the code (did I get the Doxygen tags wrong or something?), because of course I would. That’s what a code review is.

                            1. 2
                              1. 10

                                You might also like alas if in squire. :)

                                journey sq_dump(args) {
                                    arg = run(args.a1)
                                
                                    if kindof(arg) == 'string' { proclaim('String(' + arg + ')') }
                                    alas if kindof(arg) == 'number' { proclaim('Number(' + string(arg) + ')') }
                                    alas if kindof(arg) == 'boolean' { proclaim('Boolean(' + string(arg) + ')') }
                                    alas if kindof(arg) == 'unbenknownst' { proclaim('Null()') }
                                    alas { dump(arg) }
                                
                                    proclaim("\n")
                                    reward arg
                                }
                                
                                1. 1

                                  Haha amazing. ‘kindof’ could also have been ‘natureof’

                                2. 7

                                  Ada, Perl, Ruby and a couple of languages inspired by them.

                                  When I was way younger and jumping programming languages a lot that felt like the main thing I always got wrong, elif, elsif, elseif and else if.

                                  The last one despite being the most to type feels the most logical to me, being a combination of what’s already there with else and if, but is also the closest to a natural language/English.

                                  1. 1

                                    It should really be else, if or else; if to be even more like English and to really make it hard for parsers.

                                    1. 3

                                      x equals thirty-three or else... if your hat is green, of course. Good luck, parser! o7

                                    2. 1

                                      After discovering cond in Lisp, I wished every language had it instead of if, else and the various combinations of those two.

                                    3. 4

                                      ruby?

                                      1. 3

                                        Ada uses elsif. I wish all these elifs, elsifs, elseifs and else ifs keywords were interchangeable.

                                      1. 4

                                        Has anyone seen it in the wild? Other than Apple?

                                        1. 3

                                          Many people. Check the HN thread.

                                          1. 3

                                            I’m guessing not, because their goal is a lower level “building blocks” interface

                                            FoundationDB (FDB) [5] was created in 2009 and gets its name from the focus on providing what we saw as the foundational set of building blocks required to build higher-level distributed systems.It is an ordered, transactional, key-value store natively supporting multi-key strictly serializable transactions across its entire key-space. Unlike most databases, which bundle together a storage engine, data model, and query language, forcing users to choose all three or none, FDB takes a modular approach: it provides a highly scalable, transactional storage engine with a minimal yet carefully chosen set of features. It provides no structured semantics, no query language, data model or schema management, secondary indices or many other features one normally finds in a transactional database. Offering these would benefit some applications but others that do not require them (or do so in a slightly different form) would need to work around. Instead, the NoSQL model leaves application developers with great flexibility. While FDB defaults to strictly serializable transactions, it allows relaxing these semantics for applications that don’t require them with flexible, fine-grained controls over conflicts.

                                          1. 2

                                            If actor models aren’t OO then what is? Naming an abstraction and sending instances polymorphic messages? Doesn’t get more OO than that

                                            1. 3

                                              Actor models are “OO done right” as opposed to “popular OO” seen in C++ and Java.

                                              1. 1

                                                I read, “actors are OO, but people don’t call them that”, as proof that OOP as a phrase is in decline (paraphrasing)

                                                1. 1

                                                  I feel like most of the time, when people say OO they really just mean Java.

                                                  Python is OO and has had lambdas and closures for a long time, and that doesn’t make it any less OO. It’s good that we have different paradigms working together, instead of trying to beat each other.

                                                  1. 2

                                                    Python is OO and has had lambdas and closures for a long time

                                                    Smalltalk was the language created by the person who coined the term ‘object oriented’ to embody the style of programming closures and message passing (for method invocation) were the only form of flow control flow that it had. An if statement in Smalltalk is implemented by sending an ifTrue: or ifFalse: message with a closure as its argument to an object (typically a boolean object), which then either invokes the closure or doesn’t, depending on its internal value. Loops are built in a similar way.

                                                1. 6

                                                  Zig article -> 44 upvotes, Scala 3 -> 14 upvotes.

                                                  I am not sure what Scala is used for anymore. F# / Rust / Kotlin and probably OCaml ate away much of the user base over time. If you are an ex Scala coder, what language did you switch to?

                                                  1. 11

                                                    There’s 18 Apache projects written in Scala

                                                    • Apache CarbonData
                                                    • Apache Clerezza
                                                    • Apache Crunch (in the Attic)
                                                    • Apache cTAKES
                                                    • Apache Daffodil
                                                    • Apache ESME (in the Attic)
                                                    • Apache Flink
                                                    • Apache Hudi
                                                    • Apache Kafka
                                                    • Apache Polygene (in the Attic)
                                                    • Apache PredictionIO (in the Attic)
                                                    • Apache Samza
                                                    • Apache ServiceMix
                                                    • Apache Spark
                                                    • Apache Zeppelin

                                                    There’s some big names there (Spark, Kafka, Flink, Samza), especially in data movement. Also Netflix has atlas (time-series DB), Microsoft has hyperspace. Seems like most Scala code is associated to Spark in one way or another.

                                                    1. 2

                                                      Huh, I thought Kafka, Flink and Samza were written in Java. Shows what I know. Neat link!

                                                      1. 2

                                                        This overstates the case for Scala a bit. Check the founding years of these projects. Kafka and Spark, which are two of the most popular projects in this list, were created in 2011 and 2014, at the height of Scala popularity. Both projects were written in Scala, but had to put significant effort into engineering a first-class pure Java API. Kafka team even rewrote the clients in Java eventually. GitHub repo analysis has the Kafka codebase as 70% Java and 23% Scala.

                                                        It’s true that Spark does use Scala a bit more. GitHub there has Scala as 70% of codebase, with Python 13% and Java 8%. But Spark might just be the “perfect” use case for a language like Scala, being as focused as it is on concurrency, immutability, parallel computing, higher-order programming, etc.

                                                        I also closely tracked development of Apache Storm (created 2011), and it started as a Clojure project, but was eventually rewritten (from scratch) in Java. There are lots of issues with infrastructure software not sticking with vanilla Java (or other “systems” languages like C, Go, etc.). Apache Cassandra and Elasticsearch stuck with pure Java, and had fewer such issues. Durability, simplicity, and ecosystem matter more than programming language features.

                                                    2. 8

                                                      It’s still pretty big in data engineering. Apache Spark was written in Scala.

                                                      1. 6

                                                        The company I work for uses Scala for data engineering. I don’t think that team has any plans to move away from it. I suspect that the use of Scala is a product of the time: the company is about ten years old; Scala was chosen very early on. It was popular back then.

                                                      2. 7

                                                        Elixir and loving it for almost 6 years now. I miss a couple of Scala’s features; in particular, implicits were nice for things like DI, execution contexts, etc. I don’t miss all the syntax and I certainly don’t miss all the Haskell that creeps in in a team setting.

                                                        1. 6

                                                          If you are an ex Scala coder, what language did you switch to?

                                                          Python, now Go. At one point I figured I could ship entire features while waiting for my mid-sized Scala project to compile.

                                                          I hope they addressed that problem.

                                                          1. 3

                                                            YAML in mid 2015 (I took a detour in infrastructure) but Clojure since 2019 now that I’m a dev again.

                                                            FWIW I liked Scala as a “better Java” when I started using it around mid 2012, until early 2015 when I left that gig.

                                                            I remember that I found it very difficult to navigate Scala’s matrix-style documentation; and that I hated implicits, and the operator precedence. I loved case classes, and I think I liked object classes (not sure that’s the right terminology). And I liked that vals were immutable.

                                                            Compile times didn’t bother me that much, perhaps because I worked on early-stage greenfield projects with one or two other devs. (So didn’t have lots of code.)

                                                            I liked programming with actors (we used Akka) but we found it difficult to monitor our services. Some devs were concerned about loss of type safety when using Akka actors.

                                                            1. 2

                                                              Akka actors are type-safe now for several years now.

                                                              1. 2

                                                                Off topic: there’s a lot of conversation/blog posts to be had about devs finding themselves in infrastructure. I’m there at the moment, and it’s a very different world.

                                                              2. 3

                                                                I’m using Scala for my hobby projects. It does all I need. I like multiple argument lists and ability to convert last argument to a block. Implicits are cool, though they need to be controlled, because sometimes they’re confusing; for example, I don’t really understand how uPickle library works inside, even though I know how to use it. Compilation times are not that bad as some people say; maybe they were bad in some early Scala versions, but they’re not as bad as e.g. C++. It works with Java libraries (though sometimes it’s awkward to use Java-style libs in Scala, but it’s the same story as using C from C++ – it’s a matter of creating some wrappers here and there).

                                                                1. 3

                                                                  I wrote several big scala projects a decade ago (naggati, scrooge, kestrel) but stopped pretty much as soon as I stopped being paid to. Now I always reach for typescript, rust, or python for a new project. Of the three, rust seems the most obviously directly influenced by (the good parts of) scala, and would probably be the most natural migration.

                                                                  Others covered some of the biggest pain points, like incredibly long compile times and odd syntax. I’ll add:

                                                                  Java interoperability hurt. They couldn’t get rid of null, so you often needed to watch out and check for null, even if you were using Option. Same for all other java warts, like boxes and type erasure.

                                                                  They never said “no” to features. Even by 2011 the language was far too big to keep in your head, and different coders would write using different subsets, so it was often hard to understand other people’s code. (C++ has a similar problem.) Operator overloading, combined with the ability to make up operators from nothing (like <+-+>) meant that some libraries would encourage code that was literally unreadable. Implicits were useful to an extent, but required constant vigilance or you would be squinting at code at 3am going “where the heck did this function get imported from?”

                                                                  1. 2

                                                                    2 days later, Scala 3 -> 44 upvotes ;)

                                                                    1. 1

                                                                      I’m currently in Scala hiatus after nearly eight years as my primary stack with a splash of Rust, Ruby, Groovy, and a whole lot of shell scripting across several products at three companies. This included an IBM product that peaked near $100M/yr in sales. The major component I managed was JVM-only and 3/4 of it was Scala.

                                                                      For the last few months, I’m working in Python doing some PySpark and some Tensorflow and PyTorch computer vision stuff. While I concede that the Python ecosystem has certainly matured in the 15 years since I did anything material with it, my preference would be to rewrite everything I’m writing presently in Scala if the key libraries were available and their (re)implementations were mature.

                                                                    1. 5

                                                                      Meanwhile, PyPy is around 4x faster than CPython.

                                                                      1. 6

                                                                        Annecdote ain’t data, but I’ve never been successful at getting PyPy to provide improved performance. My use cases have been things like running tooling (Pylint is extremely slow under PyPy, much moreso than CPython), just running web apps, and a lot of other things that aren’t benchmarks.

                                                                        I don’t want to be too critical of PyPy, I imagine it gets a lot of what a lot of people want. But I don’t know what real workloads end up benefiting from it.

                                                                        1. 4

                                                                          PyPy upstream generally treats slowness as a bug and is willing to expend resources to fix it, if you’re willing to file issues with minimal test cases. (Here is a recent example bug about slowness.)

                                                                          Anecdotes aren’t data, but about a decade ago, I ported a Minecraft server from Numpy and CPython to array.array and PyPy, and at the time, I recorded a 60x speedup on a microbenchmark, and around a 20x speedup for typical gameplay interactions, resulting in a backend that spent most of its time sleeping and waiting for I/O.

                                                                          As long as we’re on the topic, it’s worth knowing that PyPy comes with a toolkit, RPython, which allows folks to generate their own JITs from Python. So, if one wanted more speed than was available with Python’s language design, then PyPy provides a route for forking the interpreter and standard library, and making arbitrarily distant departures from Python while still having high performance. For example, if we can agree that Dolphin implements “real workloads”, then PyGirl (code, paper) probably does as well.

                                                                          1. 3

                                                                            Yeah to me it helps to think of workloads in these categories (even if there are obviously way more than this, and way more dimensions)

                                                                            1. String / hash / object workloads (similar to web apps. Similar to a linter, and similar to Oil’s parser)
                                                                            2. Numeric workloads (what people write Cython extensions for; note that NumPy is written largely in Cython.)

                                                                            JITs are a lot better at the second type of workload than the first. My experience matches yours – when I tried running Oil with PyPy, it was slower and used more memory, not faster.

                                                                            Also, I think that workload 1 is the more important one for Python. If I want to write fast numeric code, it’s not painful to do in C++. On the other hand, doing string/hash/object graph workloads in C++ is very painful. It’s also less-than-great in Rust, particularly graphs.

                                                                            So while I think PyPy is an astonishing project (and that impression grows after learning more about how it works), I also think it doesn’t speed up the most important workloads in Python. Not that I think any other effort will do so – the problems are pretty fundamental and there have been a couple decades of attempts.

                                                                            (In contrast I got much better performance results adding static types manually, and semi-automatically translating Oil to C++. This is not a general solution as its labor intensive and restricts the language, although there are some other benefits to that.)

                                                                            1. 1

                                                                              I see the outline of your point, but I’m not sure on the specifics. In particular, a mechanism is missing: What makes strings, dictionaries, and user-customized classes inherently hard to JIT, particularly with a PyPy-style tracing metainterpreter?

                                                                              Edit: Discussion in #pypy on Freenode yielded the insight that CPUs have trouble with anything which is not in their own list of primitive types, requiring composite operations for composite types. Since JITs compile to CPU instructions, they must struggle with instruction selection for composite types. A lesson for language designers is to look for opportunities to provide new primitive object implementations, using the CPU’s existing types in novel ways.

                                                                              Our experience in the Monte world is that our RPython-generated JIT successfully speeds up workloads like parsing and compiling Monte modules to bytecode, a task which is string- and map-heavy. Our string and map objects are immutable, and this helps the JIT remove work.

                                                                              1. 1

                                                                                Yes the JITs do a lot better on integers and floats because they’re machine types.

                                                                                The performance of strings and hash tables is sort of “one level up”, and the JITs don’t seem to help much at that level (and for some reason lots of people seem to misunderstand this.)

                                                                                As an anecdote, when Go was released, there were some benchmarks where it was slower than Python, just because Python’s hash tables were more optimized. And obviously Go is compiled and Python is interpreted, but that was still true. So that is a similar issue.

                                                                                So there are many dimensions to performance, and many workloads. Saying “4x faster” is doing violence to reality. In some cases it’s the difference between being able to use PyPy and not being able to use it.

                                                                              2. 1

                                                                                SciPy has some cython code along with a bunch of fortran code but NumPy is all C.

                                                                                1. 1

                                                                                  Ah sorry you are right, I think I was remembering Pandas, which has a lot of Cython in its core:

                                                                                  https://github.com/pandas-dev/pandas/tree/master/pandas/_libs

                                                                                2. 1

                                                                                  cython is also a translator to C. why didn’t you use cython for oil?

                                                                                  1. 1

                                                                                    It generates code that depends on the Python runtime, and Cython is a different language than statically-typed Python. I don’t want to be locked into the former, and translating the code is probably even more labor intensive than what I’m doing (I leveraged MyPy team work on automatic type annotation etc.). It also wouldn’t be fast enough as far as I can tell.

                                                                                3. 3

                                                                                  pypy is 4x faster…. for long-running tasks that allow the jit to warm up. Lots of python workloads (e.g. pylint) run the interpreter as a one-off so pypy won’t help there. Interpreter startup speed is also critical for one-off workflows and pypy isn’t optimized for that either.

                                                                                  1. 3

                                                                                    I think it’s more like 10x-100x faster OR 10% slower for different workloads – “4x” doesn’t really capture it. See my sibling comment about string/hash/object vs. numeric workloads.

                                                                                  2. 2

                                                                                    I used PyPy recently, for the first time and I had a nice experience. I am experimenting with SQLite and trying to figure out the fast ways to insert 1B rows. My CPython version was able to insert 100M rows in 500 is seconds, same in PyPy took 150 seconds.

                                                                                    The best part was, I did not have to change anything in my original code. It was just drop in, as advertised. Ran it with PyPy and got the speed bumps.

                                                                                  3. 2

                                                                                    Specifically, we want to achieve these performance goals with CPython to benefit all users of Python including those unable to use PyPy or other alternative virtual machines.

                                                                                    1. 1

                                                                                      Apparently the goal is a 2x speed up by 3.11 and a 5x speed up in 4 years.

                                                                                      1. 4

                                                                                        Yes. Assuming that those numbers are not exaggerated, I expect that PyPy will still be faster than CPython year after year. The reasoning is due to the underlying principle that most improvements to CPython can be ported to PyPy since they have similar internal structure.

                                                                                        In GvR’s slides, they say that they “can’t change base layout, object layout”. This is the only part of PyPy’s interpreter which is structurally different from CPython. The same slide lists components which PyPy derived directly from CPython: the bytecode, the stack frames, the bytecode compiler, and bytecode interpreter.

                                                                                        Specializing bytecode has been tried for Python before; I recall a paper which monomorphized integers and other common builtin types. These approaches tend to fail unless they can remove some interpretative overhead. I expect that a more useful product of this effort will be a better memory model and simpler bytecodes, rather than Shannon’s grand explosion of possible bytecode arrangements.

                                                                                        1. 1

                                                                                          I’m curious about mypyc personally. Seems to me like (c)python is just hard to optimize and depends too much on implementation details (the C API) to be changed; to get a significant leap in performance it seems like using a statically typed, less dynamic subset, would give significantly higher speedups. Of course the downside is that it doesn’t work for old code (unless it happens to be in this fragment).

                                                                                          1. 1

                                                                                            Monomorphizing code does not always speed it up. There are times when tags/types can be checked for free, thanks to the dominating effects of cache thrashing, and so the cost of dynamically-typed and statically-typed traversals ends up being similar.

                                                                                            It’s not an accident that some half-dozen attempts to monomorphize CPython internals have failed, while PyPy’s tracing JIT is generally effective. Monomorphization can remove inner-interpreter work, but not interpretative overhead.

                                                                                            1. 2

                                                                                              Well by “less dynamic” I also mean not having a dictionary per class and this kind of stuff :-). I should have been clearer. tag checks is one thing, but performing dictionary lookups all the time to resolve identifiers or fields is also very heavy. The statically typed aspect, I have no idea if it’s truly necessary, but it’d make it easier to implement, right?

                                                                                    1. 3

                                                                                      A Case For Learned Index Structures is probably my favorite paper. They reimplement traditional CS data structures like B-Trees and bitmap indices as deep neural networks, then they make the case that they’re both faster and more memory efficient, given a GPU or TPU. The paper itself leaves a ton of unanswered questions, but if you dig, a lot of follow up work has been done to close the gaps since it was written.

                                                                                      1. 4

                                                                                        I learned it 9 years ago, to contribute to VsVim (at the time, VsVim was almost entirely F# but since then parts have been moved over to C#). It was a great experience. I learned a ton about Vim and software engineering in general.

                                                                                        1. 2

                                                                                          To be fair to SQLite, some normalization of this schema would have helped a lot, such as creating tables for licenses and repo URLs and referring to those with foreign keys. That would have reduced those columns in the main table to “a few” bytes per row (SQLite uses varints so I can’t say exactly; probably 1 byte for the license, 2 or 3 for the URL?)

                                                                                          1. 2

                                                                                            In row-based systems, that’s your only option. In column-based systems, you can normalize or rely on run-length-encoding. I do a lot of data engineering, and Parquet is great because I frequently don’t have control over the schema or I don’t have multi-table transactions.

                                                                                            1. 2

                                                                                              Normal forms were created in an era where disk space was hilariously expensive. You are right for an OLTP use case, even today. However, OLAP usecases require flat schemas with no JOINs, because it is much simpler to query. The RCFile white paper investigated what is the best scenario for OLAP and they concluded that columnar formats are the best. There was also a lot of effort put into columnar optimizations (RLE, dict encoding, etc.) that makes normalisations largely irrelevant. Why would you split to many tables when you can realise the same benefits without having multiple tables?

                                                                                            1. 3

                                                                                              Does anyone know if there is a good way to use parquet files in python without using a data frame? I have a number of pipelines that stream data to files, but I’m not a huge fan of pandas. (The lack of nullable ints (until recently) really turned me off of it.) Last time I checked it didn’t seem like there were any good options.

                                                                                              1. 3

                                                                                                https://github.com/uber/petastorm (not sure if it uses pandas under the hood or not)

                                                                                                1. 1

                                                                                                  It looks like they are using pyarrow under the covers.

                                                                                                2. 3

                                                                                                  There is PyArrow that has a Table. Would that work for you?

                                                                                                  https://arrow.apache.org/docs/python/generated/pyarrow.Table.html

                                                                                                  1. 2

                                                                                                    I’ll check it out! We try to stream data in our pipelines to keep memory overhead low (as much as that’s possible in python) so I’ve been hesitant to use table based solutions. I know that it’s possible to stream pandas via HDF5, but again, I was a bit turned off because of other issues. So I’m admittedly a bit ignorant of the space as a result.

                                                                                                  2. 2

                                                                                                    ORC is a similar columnar compressed format and pyorc reads/writes it with a nice minimal interface.

                                                                                                    1. 1

                                                                                                      That’s the sort of interface that I was hoping for. It looks like ORC is a compatible data format for Azure Data Warehouse (er, Synapse Analytics) too. I’ll have to test it out. Thanks!

                                                                                                      1. 2

                                                                                                        Yep! For anyone else reading, you can query ORC via SQL with Presto/Trino if you’re self hosting, and if you’re in Amazon-land it works with AWS’s Athena (their Presto deployment) and Redshift (for import or via Spectrum).

                                                                                                    2. 2

                                                                                                      pyspark is a very heavy tool, but it’s got a much better type system than Pandas. I think when you say “streaming”, you would be satisfied by Spark’s laziness, but Spark also supports streaming Kafka (queue-style message processing) or Delta tables (stream changes to a table).

                                                                                                    1. 12

                                                                                                      I have to wonder whether wide-spread adoption of Java applets might have led to an outcome qualitatively better than the modern web. I mean, the Java runtime was intended to be an application platform, whereas the web is a document delivery system abused and contorted to make do as an application platform.

                                                                                                      1. 12

                                                                                                        Except we had widespread adoption of java applets and the web platform turned out to be a better application platform. On the desktop we’re running VS Code (the web platform) rather than Eclipse (Java).

                                                                                                        I wrote Java applets professionally in the 90s and then web apps. Even back in the pre dynamic html days native web apps were better for most interesting stuff.

                                                                                                        1. 4

                                                                                                          we had widespread adoption of java applets

                                                                                                          We did?

                                                                                                          My memory isn’t what it used to be but I can’t remember a single instance of seeing this in the wild.

                                                                                                          1. 4

                                                                                                            I recall Yahoo using these for apps/games and what not.

                                                                                                            1. 4

                                                                                                              Not widespread like today where a large fraction of websites run JS on load. But I did run across pages here and there that would present an applet in a frame on the page, and you’d wait for it to load separately.

                                                                                                              1. 4

                                                                                                                They were supported in all popular browsers. Java was widely taught and learned. There definitely were lots and lots of applets deployed but compared to the web they were bad for actually building the applications people wanted to use.

                                                                                                                1. 4

                                                                                                                  I remember quite a few. Maybe you didn’t really notice them? Even today I occasionally run across a site with an applet that won’t load, especially older sites for demonstrating math/physics/electronics concepts. It also used to be a popular way to do remote-access tools in the browser, back when you couldn’t really do any kind of realtime two-way communication using browser APIs, but you could stick a vnc viewer in an applet.

                                                                                                                  1. 1

                                                                                                                    Aha; now that you mention it I do remember using a VNC viewer that was done as an applet, and also an SSH client. So I don’t think I ever used a Java applet on my own computer, but I did use a couple in university when I was stuck on a Windows machine and didn’t have anything better around.

                                                                                                                  2. 3

                                                                                                                    Runescape Classic xD

                                                                                                                2. 9

                                                                                                                  I have to agree with ianloic. Applets just didn’t work very well. They weren’t part of the web, they were a (poorly designed) GUI platform shoehorned into a web page with very limited interconnection. And they were annoyingly slow at the time.

                                                                                                                  Flash was a much more capable GUI but still badly integrated and not web-like.

                                                                                                                  With HTML5 we finally got it right, absorbing the lessons learned.

                                                                                                                  1. 8

                                                                                                                    With HTML5 we finally got it right, absorbing the lessons learned.

                                                                                                                    Now, instead of a modular design with optional complexity (user installs/loads given module only when needed), we have bloated web browsers consisting of 20+ millions lines of code with complexity that is mandatory often even for simple tasks (e.g. submit a form with some data, read an article or place order in an e-shop).

                                                                                                                    1. 6

                                                                                                                      Very strongly agree.

                                                                                                                      Back when Flash was widespread, it didn’t seem that popular - it was a delivery mechanism for overzealous advertising that jumped all over content. People were happy to embrace the demise of Flash because Flash was a liability for users.

                                                                                                                      What we have today are synchronous dialog boxes that jump all over content which are very difficult to remove because they’re indistinguishable from the content itself. The “integration” has meant it can no longer be sandboxed or limited in scope. The things people hated about Flash have become endemic.

                                                                                                                      The web ecosystem is not doing a good job of serving users today. I don’t know the mechanism, but it is ripe for disruption.

                                                                                                                      1. 3

                                                                                                                        Flash was also a delivery mechanism for games and videos that entertained millions, and educational software that probably taught more than a few people. If you think games, videos, and education beyond what flat HTML can provide are not “valid” that’s fine, but Flash filled a role and it served users.

                                                                                                                        1. 3

                                                                                                                          I didn’t mean to suggest that all uses of flash are not “valid”; if there was no valid use, nobody would intentionally install it. I am suggesting that it became misused over time, which is why Steve Jobs didn’t encounter too much resistance in dropping it.

                                                                                                                          But the real point from franta which I strongly agree with is being a plugin model it was relatively easy for users to enable when the content really needed it, and leave disabled in other cases. Personally I had two browser installs, one with flash and one without. That type of compartmentalization isn’t possible with HTML5.

                                                                                                                      2. 3

                                                                                                                        Optional complexity is not the right choice in this context. Nobody wants to design an experience where most users are just met with complex plug-in installation instructions. One of the best parts of the HTML5 ecosystem is that it’s largely possible to make websites which work on most of the browsers your users are actually going to use.

                                                                                                                        I agree that the complexity of “HTML5” is a problem. Maybe it would be nice to have two standards, one “simplified” standard which is basically Google’s AMP but good and standardized, and one heavy-weight standard. Simpler websites like news websites and blogs could aim to conform to the simplified standard, and simple document viewer browsers could implement only the simplified standard. But it definitely 100% wasn’t better when the “web” relied on dozens of random proprietary closed-source non-standard plug-ins controlled by single entities with a profit motive.

                                                                                                                      3. 2

                                                                                                                        I think that’s an overstatement. We haven’t gotten it right yet. Browser APIs are getting decent, but HTML+CSS is not a felicitous way to represent a UI. It’s a hack. Most everything to do with JavaScript is also a hack, although on that front we’ve finally started to break the “well, you have to write JS, or transpile to JS, because JS is the thing browsers have” deadlock with WASM, which finally offers what Java and Flash had a quarter century ago: compact bytecode for a fairly sensible VM.

                                                                                                                      4. 4

                                                                                                                        The biggest problem was that Java wasn’t integrated with the DOM. The applet interface was too impoverished.

                                                                                                                        jQuery had a nice tight integration that was eventually folded into the browser itself (document.querySelector). And if you look at modern frameworks and languages like React/preact, Elm, etc. you’ll see why that would continue to be a problem.

                                                                                                                        They use the DOM extensively. Although interestingly maybe the development of the virtual DOM would have been a shim or level of indirection for Java to become more capable in the browser.

                                                                                                                        The recent Brendan Eich interview has a bunch of history on this, i.e. relationship between Java, JavaScript, and the browser:

                                                                                                                        https://lobste.rs/s/j82tce/brendan_eich_javascript_firefox_mozilla

                                                                                                                        1. 3

                                                                                                                          It was in fact perfectly possible to manipulate the DOM from an applet (although at some level you did still need to have the applet visible as a box somewhere; I don’t think it was possible or at least frictionless to have “invisible applets”).

                                                                                                                          I would instead say the biggest problem was the loading/startup time; the JVM was always too heavy-weight; there was a noticable lag while applets started up; early on it would even freeze the whole browser. There were also a lot of security issues; the Java security model wasn’t great (it was fine in principle, but very difficult to get right in practice).

                                                                                                                          Now, funnily enough, the JVM can be much more light-weight (the “modules” effort helps, along with a raft of other improvements that have been made in recent JDKs) and the startup time is much improved, but it’s too late: applets are gone.

                                                                                                                          1. 2

                                                                                                                            I don’t think it was possible or at least frictionless to have “invisible applets”

                                                                                                                            it totally was. Make them 1x1 pixel large and use css to position them off screen. I have used that multiple times to then give the webpage access to additional functionality via scripting (applets could be made accessible to JS)

                                                                                                                            Worse: the applets could be signed with a code signing cert which gave them full system access, including JNA to FFI call into OS libraries.

                                                                                                                            Here is an old blog post of mine to scare you: https://blog.pilif.me/2011/12/22/grave-digging/

                                                                                                                            1. 1

                                                                                                                              It was in fact perfectly possible to manipulate the DOM from an applet

                                                                                                                              How? I don’t recall any such thing. All the applets I used started their own windows and drew in them.

                                                                                                                                1. 1

                                                                                                                                  OK interesting. It looks like this work was done in the early 2000’s. I think it must have lagged behind the JS implementations but I’m not sure. In any case jQuery looks a lot nicer than that code! :)

                                                                                                                            2. 2

                                                                                                                              In that interview, Brendan noted that JavaScript was async, which helped adoption in a UI world. It’s true, it made it nearly impossible to block a UI on a web request.

                                                                                                                              1. 3

                                                                                                                                Yes good point. IIRC he talks about how JavaScript was embedded directly in Netscape’s event loop. But you can’t do that with Java – at least not easily, and not with idiomatic Java, which uses threads. As far as I remember Java didn’t get async I/O until after the 2000’s, long after Javascript was embedded in the browser (and long after Python).

                                                                                                                                So yeah I would say those are two absolutely huge architectural differences between JavaScript and Java: integration with the DOM and the concurrency model.


                                                                                                                                This reminds me of this subthread with @gpm

                                                                                                                                https://lobste.rs/s/bl7sla/what_are_you_doing_this_weekend#c_f62nl3

                                                                                                                                which led to this cool experiment:

                                                                                                                                https://github.com/gmorenz/async-transpiled-xv6-shell

                                                                                                                                The question is “who has the main loop”? who can block? A traditional Unix shell wants to block because wait() for any process is a blocking operation. But that conflicts with GUIs which want to have the main loop.

                                                                                                                                Likewise Java wants the main loop, but so does the browser. JavaScript cooperates better by allowing callbacks.

                                                                                                                                When you have multiple threads or processes you can have 2 main loops. But then you have the problem of state synchronization too.

                                                                                                                          1. 4

                                                                                                                            TL;DR

                                                                                                                            Magic is different. It feels different.

                                                                                                                            1. 1

                                                                                                                              re: conway’s law

                                                                                                                              i think you can work around this to some extent:

                                                                                                                              • have more components than people (an order of magnitude or so?)
                                                                                                                              • avoid “utils” and other catch-all dumping grounds
                                                                                                                              • have your components be searchable

                                                                                                                              you’re still going to get libraries/tools/etc. that follow org structure, but you can get a lot outside of that structure too, and that’ll be more reusable

                                                                                                                              1. 2

                                                                                                                                Why work around it? Isn’t the purpose of Conway’s Law to accept the inevitable rather than fight it?

                                                                                                                                FWIW I’ve worked in an environment that follows your suggestions and it still followed Conway’s Law.

                                                                                                                                1. 1

                                                                                                                                  Yes. This goes beyond software, too. The way to exploit Conway’s law is to shape the organisation after the system you desire to build. This implies things like smaller, cross-functional teams* with greater responsibility (in order to get less coupling between system components). That way you maximise communication efficiency.

                                                                                                                                  * Lean people would advocate that the team should be co-located too. The idealist in me still clings to the Microsoft research that showed org chart distance mattered orders of magnitude more than physical distance.

                                                                                                                              1. 3

                                                                                                                                Why support both multi-line and single-line comments? Seems like unnecessary complexity for a standard that’s explicitly avoiding complexity. If forced to choose, they should retain multi-line comments because they play better with compact JSON.

                                                                                                                                1. 3

                                                                                                                                  git doesn’t have actual command to un-stage a file(s), though. To get around this limitation…

                                                                                                                                  Limitation, or poor UI decision? I’m guessing the latter.

                                                                                                                                  1. 10

                                                                                                                                    newer versions of git have git restore so I think that counts

                                                                                                                                    1. 5

                                                                                                                                      git reset -- filename or git reset HEAD filename do the same, tho, right? And that’s been in git for ages.

                                                                                                                                      1. 5

                                                                                                                                        I know, just wanted to say there is now an actual command. The article claimed there wasn’t one.

                                                                                                                                        1. 1

                                                                                                                                          Sometimes. If the file is already in HEAD then this works, but if it’s a newly created file I don’t think this works.

                                                                                                                                          1. 2

                                                                                                                                            It definitely works with newly created files.

                                                                                                                                      2. 4

                                                                                                                                        The naming problem. There is, and always have been, git reset that does what OP wanted, however the “feeling” that this one command does “a lot of different things” (reset staging and reset working tree, depending on the flags) is what made people say it doesn’t have such command.

                                                                                                                                        1. 3

                                                                                                                                          I use tig which makes unstaging single files easy and natural, among other things