1. 6

    This needs no fundamental changes to git.

    If you want to implement your own diff/merge tool that is syntax aware, this is a git configuration away.

    The fact that there are no great tools for this speaks to the sufficiency of text, and the difficulty of implementing an acceptable syntax aware tool.

    1.  

      the sufficiency of text

      Worse really does seem better, doesn’t it? :/

      1.  

        Plastic SCM has a tool they call SemanticMerge that supposedly is more intelligent than plain diff, but I haven’t used it (yet) so cannot comment on how useful it is.

        https://semanticmerge.com/

        1.  

          Oh, those crafty git configuration options. There’s so many of them though! Which one?

        1. 1

          Going to continue adapting Runtype for Pydantic’s benchmark, and making sure there is full parity. Because so far, all indications are that Runtype is almost twice as fast, while supporting the same features.

          1. 11

            My summary: PEP 582 specifies a directory named __pypackages__ that works like node_modules for nodejs: the interpreter prefers to import from any __pypackages__ in any directory ancestor before checking user home or system-wide packages.

            So, PDM is a package manager that much more closely follows the npm model than previous python package managers that use virtual environments (eg pipenv, poetry) or just install to the system (eg pip).

            If I could give feedback to the authors, I’d remind them that very few programmers memorize the meanings of PEP numbers, so it’s much more helpful to discuss the benefits of your software than what PEPs it respects.

            1. 9

              In addition, PEP 582 is still listed as a “Draft” (not Final, not even Accepted) from 2018. The listed motivation is that activating a virtual-environment is difficult and confusing (which is true) but I feel like there’s easier solutions to that problem:

              1. Just don’t activate virtualenvs, and don’t teach people that it’s a thing they’re expected to do. Instead of activating a virtualenv, you can just run the wrapper scripts in venv/bin/ directly and they’ll automatically set everything up before invoking the wrapped script.
              2. A lot of the complexity of activation comes from the design decision to reconfigure the current shell environment, and to be able to deconfigure it again later. Unix natively supports (and MS-DOS/Windows copied) the idea of a modified execution environment - if the activation script just set the right environment variables and executed $SHELL (or the equivalent on Windows) then it would be an ordinary command (not an unfamiliar source command), and it would be consistent across platforms instead of being weirdly different depending on the shell you’re using.
              1. 3

                direnv makes it easy to automatically activate a venv when you enter some directory in your shell and deactivate the venv when you leave it. It’s also able to set/unset variables, deal with Go/Ruby stuff, etc.

              2. 2

                While I understand what it’s trying to solve, npm has its own disadvantages, like a huge hidden directory for every tiny project I clone and try. Also, a long install time, even when you have all the packages in a nearby directory.

                In my ideal world, packages would be global by default, but then whenever there is a deviation from the package.lock, the user gets the choice to upgrade, or have the specific package installed locally.

              1. 41

                That’s my general impression of open-source projects vs industry work -

                x1000 times the impact and benefiting mankind

                x0.0001 the personal payoff

                1. 27

                  I’ve had the opposite experience, to be honest. I’ve published a bunch of open-source projects over the years and with just two exceptions back in the mid-1990s, nobody but me ever gave a damn about any of them. Meanwhile, especially during my stint at a big -name tech company, I worked on professional projects that benefited literally hundreds of millions of people.

                  Don’t get me wrong, I love and continue to contribute to open source but I think open-source project impact follows the same power-law curve as anything else. Probably 90% of projects could vanish overnight and only their authors would notice or care.

                  1. 6

                    In the same vein, I imagine 99.9% of industry code could vanish overnight and no one would care. In fact it does all the time, since maybe 5% of new companies end up staying alive for longer than a few years, and within those that stay, internal projects constantly get thrown away and replaced.

                    I worked on professional projects that benefited literally hundreds of millions of people

                    You must be one of the lucky few!

                  2. 1

                    Would not necessarily agree and the second point. I don’t have any significantly popular open source projects, but hiring managers still usually make positive comments on my GitHub, and explicitly say that it makes me more attractive than other candidates.

                  1. 3

                    I feel like you could make a language that’s just SQL with the SELECT statements last and it would be so much easier to use and learn, except it will never catch on because it’s just SQL with the SELECT statements last.

                    1. 3

                      There was such query language - QUEL. Now there are query builders projects (like Ecto or jOOQ) that supports more “free form” ordering of the statements in query and make these feel much more “natural”, for example in Ecto you write:

                      from s in "stories",
                        as: :story,
                        left_join: votes in subquery(count_query("votes")),
                        on: votes.story_id == s.id,
                        left_join: flags in subquery(count_query("flags")),
                        on: flags.story_id == s.id,
                        left_join: comments in subquery(count_query("comments")),
                        on: comments.story_id == s.id,
                        inner_join: tags in subquery(tags_query),
                        on: tags.story_id == s.id,
                        order_by: [desc: fragment("hotness"), desc: :inserted_at],
                        select: %{
                          story_id: s.id,
                          hotness:
                            (1 + tags.mod) * (coalesce(votes.count, 0) - coalesce(flags.count, 0)) * 3600 /
                              fragment("EXTRACT(epoch FROM (now() - ?))", s.inserted_at),
                          inserted_at: s.inserted_at
                        }
                      

                      Which is rough equivalent to:

                      SELECT
                        story.id AS id,
                        (1 + tags.mod) * (coalesce(votes.count, 0) - coalesce(flags.count, 0)) * 3600 / EXTRACT(epoch FROM (NOW() - story.inserted_at) AS hotness,
                        inserted_at: story.inserted_at
                      FROM stories AS story
                      LEFT JOIN (
                        SELECT
                          story_id,
                          COUNT(*)
                        FROM votes
                        GROUP BY story_id
                      ) AS votes
                      -- rest of the joins omited for brevity
                      ORDER BY hotness DESC, inserted_at DESC
                      
                      1. 1

                        Or the same query in Preql -

                            leftjoin(story: stories, votes: votes{.story_id => count()}, flags: flags_count)
                            {
                              id: .story.id
                              hotness: (1 + .tags.mod) * ((.votes.count or 0) - (.flags.count or 0)) * 3600 / (now() - .story.inserted_at).epoch
                              inserted_at: .story.inserted_at
                            } 
                            order {^hotness, ^inserted_at}
                        
                    1. 3

                      Flying to Lisbon, Portugal.

                      1. 2

                        Lark work: Finishing up the Javascript port, and improving the new online IDE.

                        I also need to figure out by the end of the week where to travel next. Maybe Amsterdam?

                        1. 2

                          It seems we’re allowed to submit our own projects, so I’d like to propose Lark. It’s written in Python, it’s around 7500 LOC in total, and while it has a few functional elements, it is primarily object-oriented, and all the better for it. I’m proud of it, of course, but I welcome all scrutiny.

                          1. 2

                            OT-ish, but I tried out Lark in 2018 (I was stubbing out some ideas that are stuck in not-enough-time-in-the-year limbo) and really enjoyed it. Nice to see you here.

                            1. 1

                              Thanks! It’s actually improved a lot since then.

                          1. 2

                            It seems that good software developers hardly work on easy things, but that’s just because the hard things take months, and the easy things get hammered down in a matter of days.

                            1. 2

                              Finishing up the porting of Lark from Python to Javascript. I already have it parsing Python (using the example grammar), but I still need to make sure all the different features and options are working as expected. Then I need to figure out how to adapt the interface to JS idioms, and a few other stuff to tidy up.

                              1. 40

                                Graph database author here.

                                It’s a very interesting question, and you’ve hit the nerve of it. The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries. The question comes to what you optimize for.

                                And so, unlike a table-centric view, which has benefits that are much better-known, what happens when you optimize for joins? Because that’s the graph game – deep joins. A graph is one huge set of join tables. So it’s really hard to shard a graph – this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear. Similarly, it’s really easy to write a very expensive graph query, where starting from the other side is much cheaper.

                                So then we get to the bigger point; in a world where joins are the norm, what the heck are your join tables, ie, schema? And it’s super fluid – which has benefits! – but it’s also very particular. That tends to be the first major hurdle against graph databases: defining your schema/predicates/edge-types and what they mean to you. You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time. And $DEITY help you if you want to share a common schema with others! This is what schema.org is chasing, choosing some bare minimum.

                                This is followed on by the fact that most of the standards in the graph database world are academic in nature. If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. I wanted to standardize. I wanted to do right by the community. But, jeez, it’s just so abstract as to be useless. OWL goes another meta-level and defines properties about properties, and there’s simpler versions of OWL, and there’s RDFS/RDF* which is RDF about RDF and on and on…. it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient or define your schema. Example: There’s a direct connection to the difference between a vector and a linked list – they both represent an ordered set. You can’t do a vector in triples, but you can do a linked list.

                                I know I’m rambling a little, but now I’ll get to the turn; I still think there’s gold in them hills. The reason it’s not popular is all of the above and more, but it can be really useful! Especially when your problem is graph-shaped! I’ve implemented this a few times, and things like mapping, and social networks, and data networks, and document-origin-tracing – generally anything that would take a lot of joins – turn out swimmingly. Things that look more like tables (my example is always the back of a baseball card) look kind of nuts in the graph world, and things that look like a graph are wild in third normal form.

                                So I think there’s a time and a place for graph databases. I just think that a combination of the above counter-arguments and the underlying needs are few enough that it’s under-explored and over-politicized. They work great in isolation, ironically enough.

                                I’m happy to chat more, but that’s my quick take. Right tool, right job. It’s a shame about how that part of database theory has gone.

                                1. 10

                                  Full disclosure: I work for Neo4j, a graph database vendor.

                                  Very well said.

                                  I’d add that most of the conversation in responses to OP assume “transactional” workloads. Graphs databases for analytic workloads are a whole other topic to explore. Folks should check out Stanford Prof. Jure Leskovec’s research in the space…and a lot of his lectures about graphs for machine learning are online.

                                  1. 2

                                    The long and the short of it is that, much like lambda calculus can represent any program, relational algebra can represent pretty much all database queries.

                                    When faced with an unknown data problem. I always choose RDBMS. It is a known quantity. I suspect I’d choose differently if I understand graph dbs better.

                                    I would love to see more articles here on practical use for graph dbs. In particular, I’d love to know if they are best deployed as the primary datastore for data or maybe just for the subset of data that your interested in query (e.g., perhaps just the products table in an ecommerce app).

                                    this is connected to why NoSQL folks are always anti-join. It’s a pain in the rear.

                                    Interesting. People use NoSQL a lot. They simply do joins in the application. Maybe that’s the practical solution when it comes to graph dbs? Then again, the point of graph solutions is generally to search for connections (joins). I’d love to hear more on this aspect.

                                    Thank you and the OP. I wish I can upvote this more. :)

                                    1. 1

                                      Yeah, you’re entirely right that the joins happen in the application as a result. The reason they’re a pain is that they represent a coordination point — a sort of for-all versus for-each. Think of how you’d do a join in a traditional MapReduce setting; it requires a shuffle! That’s not a coincidence. A lot of the CALM stuff from Cal in ~2011 is related here and def. worth a read. That’s what I meant by a pain. It’s also why it’s really hard to shard a graph.

                                      I think there’s something to having a graph be a secondary, problem-space-only engine, at least for OLTP. But again, lack of well-known engines, guides, schema, etc — it’d be lovely to have more resources and folks to explore various architectures further.

                                    2. 2

                                      You’re given a paintbrush and a blank canvas and have to define the world, one edge at a time.

                                      That’s such a great way to put it :)

                                      Especially when your problem is graph-shaped!

                                      I think we need collective experience and training in the industry to recognize problem shapes. We’re often barely able to precisely formulate our problems/requirements in the first place.

                                      Which database have you authored?

                                      1. 5

                                        Cayley. Happy to see it already mentioned, though I handed off maintainership a long while ago.

                                        (Burnout is real, kids)

                                      2. 2

                                        Thanks for Cayley! It’s refreshing to have such a direct and clean implementation of the concept. I too think there’s alot of promise in the area.

                                        Since you’re here, I was wondering (no obligation!) if you had any ideas around enforcing schemas at the actual database level? As you mentioned, things can grow hairy really quick and once they are in such a state then the exploration to know what needs to be fixed and the requisite migraions are daunting.

                                        Lately I’ve been playing with an idea for a graph db that is by default a triplestore under the hood. But with a (required!) schema that would look something commutative diagram-y. This would allow for discipline and validation of data, but also allow you to recognize multiple edge hops that are always there so for some things you could move them out of the triplestore into a quad- or 5- store to produce more compact disk representations to yield faster scans with fewer indexes and give the query planner a bit of extra choice. I haven’t thought it through too much, so I might be missing something or it might just not be worth it.

                                        Anyway, restriction and grokkability of the underlying schema/ontology does seem like the fundamental limiter to me in alot of cases and was curious if as someone who has alot of experience in the area if you had thoughts on how to improve the situation?

                                        1. 1

                                          If you don’t mind me joining in, have you heard of https://age.incubator.apache.org/ ? I’m curious to hear your opinion about whether it can be an effective solution to this problem.

                                          1. 1

                                            If I have one regret, it’s trying to follow the W3C with RDF. RDF is fine for import/export but it’s not a great data model. […] it’s super cool that triples alone can represent pretty much anything, but that doesn’t help you much when you’re trying to be efficient

                                            I’ve been using SPARQL a little recently to get things out of Wikidata, and it definitely seems to have pain points around that. I’m not sure at exactly what level the fault lies (SPARQL as a query language, Wikidata’s engine, etc.), but things that seem to me like they should be particularly easy in a graph DB, like “is there a path from ?x and ?y to a common node, and if yes, give me the path?” end up both hard to write and especially hard to write efficiently.

                                            1. 2

                                              This goes a bit to the one reply separating graphs-as-analytics and graphs-as-real-time-query-stores.

                                              SPARQL is the standard (once again, looking at you W3C) but it’s trying to be SQL’s kid brother — and SQL has it’s own baggage IMO — instead of trying to build for the problem space. Say what you will about Cypher, Dude, at least it’s an ethos. Similarly Gremlin, which I liked because it’s easy to embed in other languages. I think there’s something in the spectrum between PathQuery (recently from a Google paper — I remember the early versions of it and some various arguments) and SPARQL that would target writing more functional paths — but that’s another ramble entirely :)

                                          1. 19

                                            This was much better than I thought, and worth reading for sql fans.

                                            My main disagreement is that this conflates two things:

                                            1. Sql being seriously suboptimal at the thing it’s designed for; and distinctly
                                            2. Sql being bad at things general purpose programming languages are good at.

                                            There’s value in a restricted language with a clearly defined conceptual model that meets well defined design goals. Despite serious flaws sql is quite good at its core mission of declarative relational querying.

                                            In many ways the porosity story is not bad - for example Postgres lets you embed lots of languages. I think a lot of the criticisms here really mean that more than one language is needed, and integrating them smoothly is the issue.

                                            For me, better explicit declaration of what extensions are required for a query to run would make things more maintainable. I think the criticisms in the article around compositionality are in the right area at least - much more clarity would be better here.

                                            In terms of an upgrade path - if we accept that basically sql is pretty sound but too ad hoc - then this is a very similar problem to that of shell programming. I find the “smooth upgrade path” theory of oil shell plausible (and I’d add that Perl in many ways was a smooth upgrade from shell) although many more people have attempted smooth upgrade paths than have succeeded.

                                            My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way. If you can get the better query language into at least two of those, you’ll be reaching a large audience who are actually trying to do real work. All easier said than done, of course.

                                            1. 15

                                              Sql being bad at things general purpose programming languages are good at.

                                              I think this (and what follows) is a misinterpretation.

                                              The core idea is not to change things such that SQL is suddenly good a GP tasks, but to adopt the things from GP languages that worked well there, and will also work well in the SQL context; for instance:

                                              • Sane scoping rules.
                                              • Namespaces.
                                              • Imports.
                                              • Some kind of generic programming.

                                              These things alone would enable people to write “cross-database SQL standard libraries” that would make it easier to write portable SQL (which the database vendors are obviously not interested in).

                                              Which would then free up resources from people who want to improve communication with databases in other ways¹ – because having to write different translation code for 20 different databases and their individual accumulation of 20 years of “quirks” is a grueling task.

                                              principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL - that accommodates the different engines being different and allows their differences to be exposed in a convenient way

                                              I think most of the ecosystems weakness comes from any non-trivial SQL code being non-portable. I would neither want “differences exposed in a convenient way”, nor would I call a language that did that “principled”.


                                              ¹ E. g. why does shepherding some data from a database into a language’s runtime require half a dozen copies and conversions?

                                              1. 2

                                                I guess maybe I just disagree on the problem. I don’t think portability is a very important goal, and I would give it up before pretty much anything else.

                                                1. 5

                                                  Portability is not the important goal, it’s simply the requisite to get anything done, including things you may consider an important goal.

                                                  Because without it, everyone trying to improve things is isolated into their own database-specific silo, and you have seen the effect of this for the last decades: Little to no fundamental improvements in how we use or interact with databases.

                                                  1. -2

                                                    No I don’t think so.

                                              2. 7

                                                My best guess as to how to do it would be to implement an alternative but similar and principled language on top of at least two popular engines - probably drawn from the set of SQLite, Postgres, and MySQL

                                                That’s exactly what I did with Preql, by making it compile it to SQL (https://github.com/erezsh/Preql)

                                                Still waiting for the audience :P

                                                1. 3

                                                  Yeah but (a) it’s not available out of the box (b) it’s not obvious there’s a smooth upgrade path here or even that this is the language people want. Which is only somewhat of a criticism- lots of new things are going to have to be tried before one sticks.

                                                2. 2

                                                  Sql being seriously suboptimal at the thing it’s designed for; and distinctly

                                                  Sql being bad at things general purpose programming languages are good at.

                                                  Excellent point. Bucketing those concerns would make this “rant” even better! I do think that stuff falls into both buckets (though that which falls into the first buckets are trivially solvable, especially with frameworks like linq or ecto, or gql overlays). The second category though does reflect that people do want optimization for some of those things, and it’s worth thinking about how a “replacement for sql” might want to approach them.

                                                1. 18

                                                  Does anyone else see this as a sign that the languages we use are not expressive enough? The fact that you need an AI to help automate boilerplate points to a failure in the adoption of powerful enough macro systems to eliminate the boilerplate.

                                                  1. 1

                                                    Why should that system be based upon macros and not an AI?

                                                    1. 13

                                                      Because you want deterministic and predictable output. An AI is ever evolving and therefore might give different outputs for given input over time. Also, I realise that this is becoming an increasingly unpopular opinion, but not sending everything you’re doing to a third party to snoop on you seems like a good idea to me.

                                                      1. 3

                                                        Because you want deterministic and predictable output. An AI is ever evolving and therefore might give different outputs for given input over time.

                                                        Deep learning models don’t change their weights if you don’t purposefully update it. I can foresee an implementation where weights are kept static or updated on a given cadence. That said, I understand that for a language macro system that you would probably want something more explainable than a deep learning model.

                                                        Also, I realise that this is becoming an increasingly unpopular opinion, but not sending everything you’re doing to a third party to snoop on you seems like a good idea to me.

                                                        There is nothing unpopular about that opinion on this site and most tech sites on the internet. I’m pretty sure a full third of posts here are about third party surveillance.

                                                        1. 2

                                                          Deep learning models don’t change their weights if you don’t purposefully update it.

                                                          If you’re sending data to their servers for copilot to process (my impression is that you are, but i’m not in the alpha and haven’t seen anything concrete on it), then you have no control over whether the weights change.

                                                          1. 2

                                                            Deep learning models don’t change their weights if you don’t purposefully update it.

                                                            Given the high rate of commits on GitHub across all repos, it’s likely that they’ll be updating the model a lot (probably at least once a day). Otherwise, all that new code isn’t going to be taken into account by copilot and it’s effectively operating on an old snapshot of GitHub.

                                                            There is nothing unpopular about that opinion on this site and most tech sites on the internet. I’m pretty sure a full third of posts here are about third party surveillance.

                                                            As far as I can tell, the majority of people (even tech people) are still using software that snoops on them. Just look at the popularity of, for example, VSCode, Apple and Google products.

                                                        2. 2

                                                          I wouldn’t have an issue with using a perfect boilerplate generating AI (well, beyond the lack of brevity), I was more commenting on the fact that this had to be developed at all and how it reflects on the state of coding

                                                          1. 1

                                                            Indeed it’s certainly good food for thought.

                                                          2. 1

                                                            Because programmers are still going to have to program, but instead of being able to deterministically produce the results they want, they’ll have to do some fuzzy NLP incantation to get what you want.

                                                          3. 1

                                                            I don’t agree on the macro systems point, but I do see it the same. As a recent student of BQN, I don’t see any use for a tool like this in APL-like languages. What, and from what, would you generate, when every character carries significant meaning?

                                                            1. 1

                                                              I think it’s true. The whole point of programming is abstracting away as many details as you can, so that every word you write is meaningful. That would mean that it’s something that the compiler wouldn’t be able to guess on its own, without itself understanding the problem and domain you’re trying to solve.

                                                              At the same time, I can’t deny that a large part of “programming” doesn’t work that way. Many frameworks require long repetitive boilerplate. Often types have to be specified again and again. Decorators are still considered a novel feature.

                                                              It’s sad, but at least, I think it means good programmers will have job security for a long time.

                                                              1. 1

                                                                I firmly disagree. Programming, at least as evolved from computer science, is about describing what you want using primitive operations that the computer can execute. For as long as you’re writing from this directions, code generating tools will be useful.

                                                                On the other hand, programming as evolved from mathematics and programming language theory fits much closer to your definition, defining what you want to do without stating how it should be done. It is the job of the compiler to generate the boilerplate after all.

                                                                1. 1

                                                                  We both agree that we should use the computer to generate code. But I want that generation to be automatic, and never involve me (unless I’m the toolmaker), rather than something that I have to do by hand.

                                                                  I don’t think of it as “writing math”. We are writing in a language in order to communicate. We do the same thing when we speak English to each other. The difference is that it’s a very different sort of language, and unfortunately it’s much more primitive, by the nature of the cognition of the listener. But if we can improve its cognition to understand a richer language, it will do software nothing but good.

                                                            1. 20

                                                              I can’t believe he didn’t mention how all the code must first be written on the whiteboard, before being pushed onto git as a jpeg.

                                                              1. 2

                                                                I’ll try to get a demo working of my idea for cast-oriented programming. I hope it will be easier to explain if I can show it working.

                                                                1. 26

                                                                  The article treats Go and Rust as on equal footing when it comes to safety. This is quite wrong, unless “safety” refers only to memory safety: Rust is far safer than Go in general, for a number of reasons: algebraic data types with exhaustive pattern matching, the borrow checker for statically preventing data races, the ability to use RAII-like patterns to free resources automatically when they go out of scope, better type safety due to generics, etc.

                                                                  Of course, both languages are safer than using a dynamically typed language. So from the perspective of a Python programmer, it makes sense to think of Go as a safer alternative.

                                                                  There may be certain dimensions along which it makes sense to prefer Go over Rust, such as the learning curve, the convenience of a garbage collector, etc. But it’s important to be honest about each language’s strengths and weaknesses.

                                                                  1. 22

                                                                    On the other hand, if you do think about memory safety, it’s not quite so clear cut. In Go, I can create a cyclic data structure without using the Unsafe package and the GC will correctly collect it for me. In Rust, I have to write custom unsafe code to both create it and clean it up. In Go I can create a DAG and the GC will correctly clean it up. In Rust, I must use unsafe (or a standard-library crate such as RC that uses unsafe internally) to create it, and clean it up.

                                                                    However, in Go I can create an object with a slice field and share it between two goroutines and then have one write update it in a loop with a small slice and a large slice and the other goroutine read until it sees the base of the small slice and the length of the large slice. I now have a slice whose bounds are larger than the underlying object and I can violate all of the memory-safety invariants without writing a single line of code using the Unsafe package. In Rust, I could not do this without incorrectly implementing the Sync trait, and you cannot implement the Sync trait without unsafe code.

                                                                    Go loses its memory safety guarantees if you write concurrent software. Rust loses its memory safety guarantees if you use non-trivial data structures. C++ loses its memory safety guarantees if you use pointers (or references).

                                                                    1. 12

                                                                      Go loses its memory safety guarantees if you write concurrent software. Rust loses its memory safety guarantees if you use non-trivial data structures. C++ loses its memory safety guarantees if you use pointers (or references).

                                                                      This is fantastically succinct. Thanks, I might use this myself ;)

                                                                      1. 6

                                                                        In Rust, I have to write custom unsafe code to both create it and clean it up

                                                                        No, you really don’t.

                                                                        You can create it with no unsafe code (outside of the standard library) and no extra tracking by using Box::leak, it will just never be cleaned up.

                                                                        You can create it with no unsafe code (outside of the standard library) and reference counted pointers by using Rc::new for forward pointers and Rc::downgrade for back pointers, and it will be automatically cleaned up (at the expense of adding reference counting).

                                                                        You can make use of various GC and GC like schemes with no unsafe code (outside of well known libraries), the most famous of which is probably crossbeam::epoch.

                                                                        You can make use of various arena datastructures to do so with no unsafe code (outside of well known libraries), provided that that form of “GC all at once at the end” fits your use case, e.g. typed arena.

                                                                        1. 3

                                                                          You can create it with no unsafe code (outside of the standard library)

                                                                          The parenthetical here is the key part. The standard library implementations for all of the things that you describe all use unsafe.

                                                                          1. 6

                                                                            No, it isn’t. That’s how the entire language works, encapsulate and abstract unsafe things until they are safe. To argue otherwise is to argue that every allocation is bad, because implementing an allocator requires unsafe (and the standard library uses unsafe to do so)…

                                                                            Unsafe code is not viral.

                                                                            1. 1

                                                                              Note also that the Rust Standard library has special dispensation for unsafe and unstable features – it can assume a particular compiler version, it can use unsafe code that would be unsound without special knowledge of the compiler, and it can compel the compiler to change in order to support what the stdlib wants to do.

                                                                        2. 13

                                                                          Of course, both languages are safer than using a dynamically typed language.

                                                                          I wish people would stop saying that. Especially with “of course”. We can believe all we want, but there is no data supporting the idea that dynamically typed languages are inherently less safe. Again, I don’t care why you think it should be the case. First show me that it actually is, then try to hypothesize as to why.

                                                                          1. 21

                                                                            I often find production bugs in dynamically typed systems I work on which are due to issues that would be caught be a type checker for a modern type system (e.g., null reference errors, not handling a case of an algebraic data type when a new one is added, not handling new types of errors as the failure modes evolve over time, etc.). That’s an existence proof that having a type checker would have helped with safety. And this is in a codebase with hundreds of thousands of tests and a strict code review policy with code coverage requirements, so it wouldn’t be reasonable to attribute this to an insufficient test suite.

                                                                            Very large companies are migrating their JavaScript codebases to TypeScript precisely because they want the safety of static types. Having been privy to some of those discussions, I can assure you there were made with a lot of consideration due to the enormous cost of doing so.

                                                                            Going down the academic route, dependent types let you prove things that tests cannot guarantee, such as the fact that list concatenation is associative for all inputs. I personally have used Coq to prove the absence of bugs in various algorithms. As Dijkstra said, “program testing can be used to show the presence of bugs, but never to show their absence”. Types, on the other hand, actually can show the absence of bugs if you go deep enough down the rabbit hole (and know how to formally express your correctness criteria).

                                                                            You don’t have to believe me, and perhaps you shouldn’t since I haven’t given you any specific numbers. There are plenty of studies, if you care to look (for example, “A Large Scale Study of Programming Languages and Code Quality in Github; Ray, B; Posnett, D; Filkov, V; Devanbu, P”). But at the same time, there are studies that claim the opposite, so for this topic I trust my personal experience more than the kind of data you’re looking for.

                                                                            1. 2

                                                                              Yet Go lacks many of those features, such as algebraic data types, exhaustive switches, and has both nil and default values.

                                                                              1. 3

                                                                                Yes, hence my original claim that Rust is far safer than Go. But Go still does have a rudimentary type system, which at least enforces a basic sense of well-formedness on programs (function arguments being the right type, references to variables/functions/fields are not typos, etc.) that might otherwise go undetected without static type checking. Since these particular types of bugs are also fairly easy to catch with tests, and since Go programs often rely on unsafe dynamic type casting (e.g., due to lack of generics), Go is not much safer than a dynamically typed language—in stark contrast to Rust. I think one could reasonably argue that Go’s type system provides negligible benefit over dynamic typing (though I might not go quite that far), but I do not consider it reasonable to claim that static types in general are not capable of adding value, based on my experience with dependent types and more sophisticated type systems in general.

                                                                                1. 2

                                                                                  Since these particular types of bugs are also fairly easy to catch with tests, and since Go programs often rely on unsafe dynamic type casting (e.g., due to lack of generics), Go is not much safer than a dynamically typed language—in stark contrast to Rust.

                                                                                  But only <1% of Go code is dynamically typed, so why would you argue that it’s not much safer than a language in which 100% of code is dynamically typed? Would you equally argue that because some small amount of Rust code uses unsafe that Rust is no safer than C? These seem like pretty silly arguments to make.

                                                                                  In my experience writing Go and Rust (and a whole lot of Python and other languages), Go hits a sweet spot–you have significant type safety beyond which returns diminish quickly (with respect to safety, anyway). I like Rust, but I think your claims are wildly overstated.

                                                                              2. 2

                                                                                so for this topic I trust my personal experience more than the kind of data you’re looking for.

                                                                                I wonder how much of this is us as individual programmers falling into Simpson’s Paradox. My intuition says that for large, complex systems that change infrequently, static typing is a huge boon. But that’s only some portion of the total programs being written. Scientific programmers write code that’s more about transmitting mathematical/scientific knowledge and easily changeable for experimentation. Game scripters are looking for simple on-ramps for them to change game logic. I suspect the “intuitive answer” here highly depends on the class of applications that a programmer finds themselves working on. I do think there’s an aspect of personality here, where some folks who enjoy abstraction-heavy thinking will gravitate more toward static typing and folks who enjoy more “nuts-and-bolts” thinking may gravitate toward dynamic typing. Though newer languages like Nim and Julia are really blurring the line between dynamic and static.

                                                                                1. 2

                                                                                  Have you done Coq to prove the correctness of anything that wasn’t easy by inspection? I’ve looked at it and I’m definitely interested in the ideas (I’m working through a textbook in my spare time), but I’ve never used it to prove anything more complicated than, say, linked list concatenation or reversing.

                                                                                  And how do you generate the program from your code? Do you use the built-in extraction, or something else?

                                                                                  1. 2

                                                                                    I often find production bugs in dynamically typed systems I work on that are due to things that would be caught be a type checker

                                                                                    I can offer an equally useless anecdotal evidence of my own practice where bugs that would be caught by a type checked happen at a rate of about 1/50 to those caused by mis-shapen data, misunderstanding of domain complexity and plain poor testing, and when they do, they’re usually trivial to detect and fix. The only thing that tells me is that software development is complex and we are far from making sweeping statements that start from “of course”.

                                                                                    Very large companies are migrating their JavaScript code bases to TypeScript for exactly for that reason.

                                                                                    Sorry, the “thousand lemmings” defense won’t work here. Out whole industry has been investing countless engineer-years in OO abstractions, but then people started doing things without it and it turned out OO wasn’t the requirement for building working systems. Software development is prone to fads and over-estimations.

                                                                                    Types, on the other hand, actually can show the absence of bugs

                                                                                    That’s just plain wrong. Unless you mean some very specific field of software where you can write a formal specification for a program, but to this day it’s just not practical for anything that’s useful.

                                                                                    1. 5

                                                                                      It’s clear that many people find value in static types even if you don’t. Maybe you make fewer mistakes than the rest of us, or maybe you’re working in a domain where types don’t add as much value compared to others. But you shouldn’t try to invalidate other people’s experiences of benefitting from static types.

                                                                                      they’re usually trivial to detect and fix

                                                                                      I prefer to eliminate entire categories of errors without having to detect and fix them down the line when they’ve already impacted a user.

                                                                                      That’s just plain wrong.

                                                                                      Maybe you haven’t used formal verification before, but that doesn’t mean it isn’t used in the real world. There’s a great book series on this topic if you are interested in having a more open mind. I’ve used these kind of techniques to implement parts of a compiler that are guaranteed correct. Amazon also uses deductive techniques in multiple AWS teams (example), and there’s a decent chance you have indirectly benefitted from some of those efforts. So, my claim is not “just plain wrong”. As you alluded to, it usually doesn’t make sense to invest that much in those kinds of formal guarantees, but it’s nice that types can do that for you when you need them to.

                                                                                      At this point, it seems like you aren’t interested in having a good faith discussion, with your abrasive comments like “I don’t care why you think it should be the case”, “equally useless anecdotal evidence”, and dismissing a demonstrable claim as “just plain wrong”. I think you have some good points (e.g., I completely agree about your position on OO) and could be more effective at delivering them if you didn’t seem so invested in discounting other people’s experiences.

                                                                                      I respect your opinion that I should not have stated that Rust and Go are “of course” safer than dynamically typed languages. In particular, Go’s type system is so underpowered that I can see a reasonable argument that the ceremony of appeasing it without reaping the guarantees that a better type system would give makes it more difficult to build robust software than not having types at all. I certainly wouldn’t say the same for Rust, though. Rust often forces me to handle error cases that I didn’t even know were possible and would never think to test.

                                                                                      1. 1

                                                                                        But you shouldn’t try to invalidate other people’s experiences of benefitting from static types.

                                                                                        Go’s type system is so underpowered that I can see a reasonable argument that the ceremony of appeasing it without reaping the guarantees that a better type system would give makes it more difficult to build robust software than not having types at all.

                                                                                        Do you not think this is invalidating the experience of Go users who benefit from usage of the language?

                                                                                        1. 1

                                                                                          I said I could see it as a reasonable argument, not that I personally agree with it. I’m trying to give some validity to what isagalaev is saying and potentially meet them in the middle by acknowledging that not all type systems provide a clear benefit over dynamic typing. But I already stated my stance in my original top-level comment: that Go’s type system is still better than no type system when it comes to safety (not memory safety, but a more broad notion of safety).

                                                                                          It is true, though, that Go’s type system is quite weak compared to other type systems. That’s not the same as saying that people don’t benefit from it. On the contrary, I’ve claimed the opposite—which is what started this whole discussion in the first place.

                                                                                        2. 1

                                                                                          abrasive comments

                                                                                          Apologies on that, fwiw.

                                                                                        3. 2

                                                                                          […] caused by mis-shapen data, […]

                                                                                          Statements like these always make me suspect the author doesn’t appreciate just how much can in fact be captured by even a relatively simple type system. “Make illegal states unrepresentable” has become something of a mantra in the Elm community in particular, and I can’t remember the last time I saw a bug in a typed FP language that was due to “mis-shappen data” get past the type checker.

                                                                                          I think there’s a tendency to try to compare languages by lifting code wholesale from one language into the other, assuming it would be written more or less the same way, which is often not the case. So, if you see a something that throws TypeError, of course you assume that would get caught by a static type system. Folks who have only worked with type systems like those in Java/Go/C generally look at null/nil bugs and assume that those wouldn’t get caught, even though they’re impossible in other systems. It’s easy to point out “null isn’t a thing in this language,” but what’s a bit harder to capture is that a lot of things that aren’t obviously type errors that crop up at runtime in a dynamically typed language would likely be captured by types in a program writing with the benefit of a modern type system. Obviously it won’t if you just write a stringly-typed program, but… don’t do that, use your tools.

                                                                                          1. 1

                                                                                            “Make illegal states unrepresentable” has become something of a mantra in the Elm community in particular, and I can’t remember the last time I saw a bug in a typed FP language that was due to “mis-shappen data” get past the type checker.

                                                                                            It’s not about illegal states. I mean code expecting a response from an HTTP call in a particular schema and getting a different one. I don’t see how this problem can be prevented at compile time. Or more subtly, getting data in a correct shape (say, an ISO-formatted time string), successfully parsing it (into an internal datetime type), but then producing the result that the user doesn’t expect because it assumes the time to be in a different time zone.

                                                                                            (Also, I don’t see how making illegal states unrepresentable is in any way endemic to type-checked languages. It’s just a good architectural pattern valid everywhere.)

                                                                                            1. 1

                                                                                              I mean code expecting a response from an HTTP call in a particular schema and getting a different one.

                                                                                              Ah, you are talking about something somewhat different then. Yes, obviously types can’t statically prove that inputs that come in at runtime will be well-formed. However, they can force you to deal with the case of a parse failure – which many languages make it really easy to forget. “Forgetting a case” is another one of those things that I think people often incorrectly assume aren’t (or can’t easily be made) type errors. It’s hard to say what you should do in that case without more context, but it makes it hard to introduce a bug by omission.

                                                                                              If the bug is just that the programmer was mistaken about what the endpoint’s schema was (or the server operator changed it inappropriately), I’ll agree that just having static types in your client program does not really help that much, though it might be a bit easier to track down since the error will occur right at the point of parsing, rather than some time later when somebody tries to use the value.

                                                                                              That said, I’ll point to stuff like protobuf, capnproto, and even swagger as things that are trying to bridge this gap to some degree – there is still an opportunity to just assign the entirely wrong schema to an endpoint, but they narrow the space over which that can happen substantially; once you’ve got the right schema rigged up the programmer is unlikely to get the shape of the data wrong, as that’s just defined in the schema.

                                                                                              Or more subtly, getting data in a correct shape (say, an ISO-formatted time string), successfully parsing it (into an internal datetime type), but then producing the result that the user doesn’t expect because it assumes the time to be in a different time zone.

                                                                                              Dealing with fiddly distinctions like this is something types are really great at. I have some good points of comparison with date/time stuff, as I’ve worked on projects where this stuff is core to the business logic in both Python and Haskell. Having a type system for it in Haskell has been a godsend, and I wish I’d had one available when doing the prior project.

                                                                                              Somewhat simplified for presentation (but with the essentials in-tact), the Haskell codebase has some types:

                                                                                              • A date & time with an attached (possibly arbitrary) time zone, “ZonedDateTime”
                                                                                              • A date & time in “local time,” where the timezone is implied by context somehow. “LocalDateTime”

                                                                                              As an example of where this is useful: the code that renders the user’s feed of events in order expects a list of LocalDateTime values, so if you try to pass it the datetime with some arbitrary timezone, you’ll get a type error. Instead, there’s a function timeInZone which takes a ZonedDateTime and a TimeZone, and translates it to a LocalDateTime with the provided timezone implied. So in order to get an event into a users feed, you need to run it through this conversion function, or it won’t type check.

                                                                                              (Also, I don’t see how making illegal states unrepresentable is in any way endemic to type-checked languages. It’s just a good architectural pattern valid everywhere.)

                                                                                              It’s a lot easier to do it when you can actually dictate what the possible values of something are; if in the event of a bug a variable could have any arbitrary value, then your options for enforcing invariants on the data are much more limited. You can put asserts everywhere, but having static types is much much nicer.

                                                                                      2. 6

                                                                                        Also, people talk about “dynamic languages” like they’re all the same, while they are often as different as C and Rust, if not more.

                                                                                        Writing safe Javascript is an impossible nightmare..

                                                                                        Writing safe Python is easy and fun. I do think it would be safer with a good type-system, but at the same time, a shorter and more idiomatic code (where you don’t have to fight with the compiler) brings its own sort of safety and comfort.

                                                                                        1. 4

                                                                                          Writing safe Python is easy and fun.

                                                                                          Python has its own share of footguns, such as passing strings somewhere where bytes are expected, or new, unhandled exceptions being added to libraries you’re calling.

                                                                                          Mypy doesn’t completely protect you. IME, many errors occur at the seams of typed/untyped contexts. Which is not surprising, but it is a downside of an after-the-fact, optional type checker.

                                                                                          1. 1

                                                                                            Yeah, and mypy has far less coverage for third-party packages than TypeScript. IME when I use TS I’m surprised if I don’t find a type package, whereas with mypy I’m surprised if I do.

                                                                                            1. 1

                                                                                              About one in twenty times I do see a types package in DefinitelyTyped which is very incorrect. Almost always only with libraries with small user bases.

                                                                                              1. 1

                                                                                                Incorrect as in “this parameter is a number but is typed as a string”, or too loose/too restrictive?

                                                                                                1. 1

                                                                                                  Varies. I’ve seen symbols typed as strings, core functionality missing, functions that couldn’t be called without “as any”.

                                                                                            2. 1

                                                                                              Python isn’t perfect, but unhandled exceptions can happen in C++ too..

                                                                                              It doesn’t have to be after the fact. You can check types at run-time, and there are libraries that will help you with that. It comes at a slight performance cost, of course (but if that matters, why are you using Python?), but then you gain the ability to implement much more sophisticated checks, like contracts, or dependent types.

                                                                                              Anyway, at least personally, type errors are rarely the real challenge when writing software.

                                                                                          2. 3

                                                                                            there is no data supporting the idea that dynamically typed languages are inherently less safe

                                                                                            If we use the Gary’s Types document for the definition of a dynamically typed language, I am able to find some research:

                                                                                            I do agree that this amount of research is far away from us being able to say “of course”.

                                                                                            1. 3

                                                                                              Not gonna take a position here on the actual static vs dynamic debate, but the second paper you linked is deeply flawed. I wrote a bit about the paper, and the drama around it, here: https://www.hillelwayne.com/post/this-is-how-science-happens/

                                                                                              1. 1

                                                                                                Awesome! Thank you for sharing.

                                                                                            2. 2

                                                                                              I think it stands to reason that statically typed languages are safer than dynamically typed languages. I’m vaguely aware of some studies from <2005 that compared C++ to Python or some such and found no significant difference in bugs, but I can’t imagine those findings would hold for modern mainstream statically typed languages (perhaps not even C++>=11). Personally I have extensive experience with a wide array of languages and my experience suggests that static typing is unambiguously better than dynamic typing; I hear the same thing from so many other people and indeed even the Python maintainers–including GVR himself–are compelled. Experience aside, it also stands to reason that languages in which entire classes of errors are impossible would have fewer errors in total.

                                                                                              So while perhaps there isn’t conclusive data one way or the other, experience and reason seem to suggest that one is better than the other.

                                                                                              1. 1

                                                                                                Experience aside, it also stands to reason that languages in which entire classes of errors are impossible would have fewer errors in total.

                                                                                                What this (very common) argument is missing is that a) those errors are not as frequent as the words “entire classes of errors” may lead to believe, that b) testing covers many of those errors better (by testing for correct values you’re getting correct types for free), and that c) instrumenting your code with types isn’t free, you get bloated and more rigid code that may lead to more serious errors in modeling your problem domain.

                                                                                                1. 4

                                                                                                  I’ve personally found that relying on a decent type system makes my code more flexible, as in easier to refactor, because I can rely on the type system to enforce that all changes are valid.

                                                                                                  1. 1

                                                                                                    I’ve developed and operated Python services in production for about a decade. Type errors were the most common kind of errors we would encounter by a wide margin. We were very diligent about writing tests, but inevitably we would miss cases. Some of these were “this function is literally just an entry point that unmarshals JSON and passes it into a library function… I don’t need a test” but they would forget to await the library function.

                                                                                                    Moreover, how do you reconcile “annotating your code takes too much time and makes your code too inflexible, but writing tests is good value for money”? Am I misunderstanding your argument? Note also that tests are useful for lots of other things, like documentation, optimizations, refactoring, IDE autocomplete, etc.

                                                                                                    1. 1

                                                                                                      Moreover, how do you reconcile “annotating your code takes too much time and makes your code too inflexible, but writing tests is good value for money”?

                                                                                                      Types are not a replacement for tests, you have to write tests anyway. And they are good value for money, which is one of the few things we actually know about software development. So the proposition I’m making is that if you write good tests they should cover everything a type checker would. Because, essentially, if you check all your values are correct then it necessarily implies that types of those values are correct too (or at least, they work in the same way).

                                                                                                      Now, to your point about “but inevitably we would miss cases” — I’m very much aware of this problem. I blame the fact that people write tests in horribly complicated ways, get burnt out, never achieve full test coverage and then turn to type checking to have at least some basic guarantees for the code that’s not covered. I’m not happy with this, I wish people would write better tests.

                                                                                                      You example with a test for a trivial end point is very telling in this regard. If it’s trivial, then writing a test for it should also be trivial too, so why not?

                                                                                                      1. 1

                                                                                                        I disagree. In my extensive experience with Python and Go (15 and 10 years, respectively), Go’s type system grants me a lot more confidence even with a fraction of the tests of a Python code base. In other words, a type system absolutely is a replacement for a whole lot of tests.

                                                                                                        I specifically disagree that checking for a value guarantees that the logic is correct (type systems aren’t about making sure the type of the value is correct, but that the logic for getting the value is sound).

                                                                                                        While 100% test coverage would make me pretty confident in a code base, why burn out your engineers in pursuit of it when a test system would reduce the testing load significantly?

                                                                                                        With respect to “trivial code thus trivial tests”, I think this is untrue. Writing “await foo(bar, baz)” is trivial. Figuring out how to test that is still cognitively burdensome, and cognition aside it’s many times the boilerplate.

                                                                                                        Lastly, people rarely discuss how static type systems make it harder to write certain kinds of bad code than dynamic type systems. For example, a function whose return type varies depending on the value of some parameter. The typical response from dynamic typing enthusiasts is that this is just bad code and bad Go code exists too, which is true but these kinds of bad code basically can’t exist in idiomatic Go code and they are absolutely pedestrian in Python and JavaScript.

                                                                                                        At a certain point, you just have to get a lot of experience working with both systems in order to realize that the difference is really quite stark (even just the amount of documentation you get out of the box from type annotations, and the assurance that that documentation is correct and up to date).

                                                                                                        1. 1

                                                                                                          I specifically disagree that checking for a value guarantees that the logic is correct

                                                                                                          I didn’t say anything about the whole logic. I said if your values are correct then it necessarily implies your types are correct. Specifically, if you have a test that does:

                                                                                                          config = parse_config(filename)
                                                                                                          assert config['key'] == 'value'
                                                                                                          

                                                                                                          Then it means that parse_config got a correct value in filename that it could use to open and parse the config file. In which case it also means filename was of correct type: a string, or a Path or whatever the language’s stdlib could use in open(). That’s it, nothing philosophical here.

                                                                                                          While 100% test coverage would make me pretty confident in a code base, why burn out your engineers in pursuit of it

                                                                                                          Let me reiterate: if achieving 100% test coverage feels like a burn-out, you’re doing it wrong. It’s actually not even hard, especially in a dynamic language where you don’t have to dependency-inject everything. I’m not just fantasizing here, that’s what I did in my last three or four code bases whenever I was able to sell people on the idea. There’s this whole ethos of it being somehow impossibly hard which in many applications just doesn’t hold up.

                                                                                                          1. 1

                                                                                                            Some more :-)

                                                                                                            Writing “await foo(bar, baz)” is trivial. Figuring out how to test that is still cognitively burdensome, and cognition aside it’s many times the boilerplate.

                                                                                                            Huh? The tooling is out there, you don’t have to invent anything: https://pypi.org/project/pytest-asyncio/

                                                                                                            @pytest.mark.asyncio
                                                                                                            async def test_some_asyncio_code():
                                                                                                                res = await library.do_something()
                                                                                                                assert b'expected result' == res
                                                                                                            

                                                                                                            At a certain point, you just have to get a lot of experience working with both systems in order to realize that the difference is really quite stark

                                                                                                            Just to clarify here, I mostly work in Python, but I also work extensively in JavaScript, TypeScript, Kotlin and Rust (much less in the latter than I would like). And my experience tells me that types is not the most significant feature that makes a language safe (for whatever value of “safe”). It is also subjective. I do absolutely trust you that you find working in Go more comfortable, but it’s important to understand that the feeling doesn’t have to be universal. I would hate to have to program in Go, even though it’s a simple language.

                                                                                                  2. 1

                                                                                                    Genuine question: how could it be shown? You would need at least two projects of similar scope, in similar areas, written by programmers of similar skill (which is hard to evaluate on its own) and similar level of understanding of the problem area (which means that rewrites of the same code base can’t count), differing only in their choice of static/dynamic typing. How could such a research be possible?

                                                                                                    More generally: is there any solid research about which languages are better? Is there any data that programs written is the assembly language are more or less error-prone than those written in Python? This should be intuitively obvious, but is there data? I tried to find anything at all, but only discovered half-baked “studies” that don’t control for either the programmer experience or the complexity of the problem area.

                                                                                                    My point is, how can we do better than a bunch of anecdotes here?

                                                                                                    1. 3

                                                                                                      Right, I think one can admit that, as a field, we’re not in amazing shape epistemologically, but we are still left having to actually make decisions in our day to day, so not having opinions isn’t really an option – we’re stuck going on experience and anecdote, but that’s not quite the same as having no information. It’d be nice to have conclusive studies. Unfortunately, all I’ve got is a career’s worth of anecdata.

                                                                                                      I have no idea how to study this kind of thing.

                                                                                                      1. 3

                                                                                                        Yep, this was exactly my point: when someone asks for studies that conclusively show that X is better than Y in this context, I think that they are asking for too much and people are justified in saying “of course” even in the abscence of rock-solid evidence.

                                                                                                      2. 2

                                                                                                        I would go so far as to argue that, often, when people insist on data for something like this, they are actually being disingenuous. If you insist on (quantitative) data when you know that none exist or are likely to exist in the future, then you are actually just saying that you want to unquestioningly maintain the status quo.

                                                                                                        1. 2

                                                                                                          I would go so far as to argue that, often, when people insist on data for something like this, they are actually being disingenuous.

                                                                                                          … Why would it be disingenuous to ask for data? This just sounds absurd. Moreover there really isn’t consensus around this topic. Take a look at Static v. dynamic languages literature review, this has been an ongoing topic of discussion and there still isn’t a conclusion either way. Regardless this perspective frightens me. It sounds a lot like “I have an opinion and data is hard so I’m going to call you disingenuous for disagreeing with me.” This isn’t the way to make good decisions or to tolerate nuanced opinions.

                                                                                                          “The problem with any ideology is it gives the answer before you look at the the evidence. So you have to mold the evidence to get the answer you’ve already decided you’ve got to have.” – Bill Clinton

                                                                                                          1. 2

                                                                                                            Why would it be disingenuous to ask for data?

                                                                                                            It isn’t, and I didn’t say that it was.

                                                                                                            I said it was potentially disingenuous to insist on (quantitative) data as a prerequisite for having a discussion. If good data don’t exist, then refusing to talk about something until good data do exist is just another way of defending whatever the status quo is. What it basically says is that regardless of how we made the current decision (presumably without data, since data don’t exist), the decision cannot be changed without data.

                                                                                                            I’m honestly not sure how you came up with that interpretation based on what I wrote. I didn’t even say which side of the issue I was on.

                                                                                                            Edit: you can also substitute “inconclusive data” for “no data”.

                                                                                                            1. 2

                                                                                                              I’m honestly not sure how you came up with that interpretation based on what I wrote. I didn’t even say which side of the issue I was on.

                                                                                                              I think this is a difference between our interpretations on “insist”. I tend to read “insist” as an earnest suggestion, not a hard prerequisite, so that’s where my my disagreement came from. I didn’t mean to imply anything about which side you were on since that’s immaterial to my point really. I agree that categorically refusing to discuss without sufficient data is a bit irresponsible since in real life humans are often forced to make decisions without appropriate evidence.

                                                                                                              If good data don’t exist, then refusing to talk about something until good data do exist is just another way of defending whatever the status quo is.

                                                                                                              My thinking here is, at what point does this become a useless thought exercise? Static typing isn’t new and is gaining ground in several languages. There’s already a “programmer personality” identity based around static typing and “healthy” Twitter communities of folks who bemoan static or dynamic languages. At some point, the programming community at large gains nothing by having more talking heads philosophizing about where and why they see bugs. You can take a cursory search on the internet and see folks advocating for pretty much any point on the spectrum of this debate. To me this discussion (not this Lobsters thread, but the greater discussion as a whole) seems to have reached the point where it’s useless to proceed without data because there’s no consensus around which point on the spectrum of static/dynamic does actually lead to fewer (if any point does at all) bugs. And if more philosophizing doesn’t help us arrive at a conclusion, it really boils down to the same thing: your personal feelings and experiences, in which case the discussion is more of a form of socializing than a form of actual discussion. In other words, without data, this discussion trends more toward bikeshedding than actually answering the question under discussion.

                                                                                                              1. 2

                                                                                                                In other words, without data, this discussion trends more toward bikeshedding than actually answering the question under discussion.

                                                                                                                That’s fair. I agree that this particular topic has been pretty well discussed to death.

                                                                                                                1. 1

                                                                                                                  there’s no consensus around which point on the spectrum of static/dynamic does actually lead to fewer (if any point does at all) bugs

                                                                                                                  I think consensus is emerging slowly—static languages seem to have grown more popular in the last decade to the extent that many JavaScript developers are converting to TypeScript, Python developers are embracing Mypy, most (all?) of the most popular new languages of the last 10-15 years have been statically typed (the Go community in particular seems to consist of a lot of former Python and Ruby devs), etc. On the other hand, I scarcely if ever hear about people switching to dynamically typed languages (once upon a time this was common, when the popular static offerings were C, C++, and Java). It’s possible that this emerging consensus is just a fad, but things do seem to be converging.

                                                                                                              2. 1

                                                                                                                I suspect the problem is that this question is enormously multivariate (skill of developers, development methodology, testing effort to find bugs, different language features, readability of the language, etc).

                                                                                                                It’s entirely likely that we have been studying this for a long time and yet the variable space is so large that we’ve hardly scratched it at all. And then some people come along and interpret this lack of conclusive data as “well, static and dynamic must be roughly equal” which seems strictly more périlleuse than formulating one’s own opinion based on extensive experience with both type systems.

                                                                                                                I don’t think your Clinton quote applies because we’re talking about forming an opinion based on “a career’s worth of anecdata”, not letting an ideology influence one’s opinion. Everyone admits that anecdata is not as nice as conclusive empirical data, but we don’t have any of the latter and we have lots of the former and it seems to point in a single direction. In this case the ideological take would be someone who forms an opinion based on lack of evidence and lack of subjective experience with both systems.

                                                                                                            2. 1

                                                                                                              Genuine question: how could it be shown? You would need at least two projects of similar scope, in similar areas, written by programmers of similar skill (which is hard to evaluate on its own) and similar level of understanding of the problem area (which means that rewrites of the same code base can’t count), differing only in their choice of static/dynamic typing. How could such a research be possible?

                                                                                                              Remember that “project scope”, “project area” (systems, web, database, etc), “license status” (GPLv3, proprietary, etc) are all dimensions for a project that can be recorded and analyzed. There is a rich literature of statistical methods to compensate for certain dimensions being over or underrepresented, and for compensating for incomplete data. If we can come up with a metric for being error-prone (which is difficult, so perhaps we need multiple metrics (NB: Code complexity metrics are a good look here to look at the challenges)), and we can faithfully record other dimensions of projects, we can try to rigorously answer this question. The big barriers here usually involve data siloing (proprietary projects rarely share relevant data about their projects) and just manpower (how many developers, especially of open source projects, really have the time to also gather stats about their contributors and their bugs when they can barely hold together the project itself in their generous free time, or can get approval in a proprietary project).

                                                                                                              That said there’s this stubborn “philosopher-programmer” culture in programming circles that doesn’t seem particularly interested in epistemological work which also often muddles the conversation, especially if the topic under discussion has a lot of strong opinions and zealotry involved.

                                                                                                              1. 1

                                                                                                                The short answer is, I don’t know. I had some ideas though. Like an experiment where you solicit a public participation from a bunch of people to write from scratch something not complicated, and yet not trivial. Like, I don’t know, a simple working chat client for an existing reference server implementation. You don’t impose any restriction: people could work in any language, use libraries, etc. Time capped by, say, a couple of weeks. And then you independently verify the results and determine defects of any sort. And then you look at correlations: does number of defects correlate with dynamic/static nature? Programmer’s experience? Programmer’s experience with a language? Amount of active working hours? Something else?

                                                                                                                My hypothesis is that we can’t actually evaluate a language in a vacuum. Instead a language+programmer is a actually an atomic unit of evaluation. Like in auto racing you can’t say which driver is the best (although it’s people’s favorite pastime), you can only talk about a driver in a particular car driving on particular tracks.

                                                                                                                1. 2

                                                                                                                  There is a 1994 paper called “Haskell vs. Ada vs. C++ an Experiment in Software Prototyping Productivity” which doesn’t touch on static vs dynamic typing, but, I think, follows about the same format as what you’re proposing, right? That paper is widely disliked due to its methodological problems. An example of its discussion: https://news.ycombinator.com/item?id=14267882

                                                                                                              2. 1

                                                                                                                We can believe all we want, but there is no data supporting the idea that dynamically typed languages are inherently less safe

                                                                                                                I think that some people tend to use the term “safe” to mean both “memory-safe” and “bug-resistant”, whereas others would use the term “safe” to refer to “memory-safe” only.

                                                                                                                I can quite believe that applications written in dynamically-typed languages might be vulnerable to a whole class of bugs that aren’t present in their statically-typed equivalents because, unless you’re very careful, type coercion can silently get you in ways you don’t expect. You could write an entire book on the many mysterious ways of the JavaScript type system, for example.

                                                                                                                That said, these aren’t really bugs that make the program unsafe if you’re talking about memory safety. It’s not terribly likely that these sorts of bugs are going to allow unsafe memory accesses or cause buffer overflows or anything of the sort.

                                                                                                                1. 2

                                                                                                                  applications written in dynamically-typed languages might be vulnerable to a whole class of bugs [ … ] , type coercion can silently get you in ways you don’t expect.

                                                                                                                  Dynamic languages is not just JavaScript. For example Python and Clojure don’t do type coercion, nor do they silently swallow access to non-existing names and attributes.

                                                                                                              3. 4

                                                                                                                Of course, both languages are safer than using a dynamically typed language. So from the perspective of a Python programmer, it makes sense to think of Go as a safer alternative.

                                                                                                                Returns also diminish. Using Go rather than Python will probably reduce type errors by 95% while Rust would reduce them by 99%. And that additional 4% of type errors may not be worth the hit to productivity (yes, I assert that Go is quite a lot more productive than Rust for most applications, even though I am a fan of both languages). Note also that these are just type errors; there are lots of errors which neither Rust nor Go can protect against.

                                                                                                                1. 5

                                                                                                                  there are lots of errors which neither Rust nor Go can protect against.

                                                                                                                  “How does your language handle null references?”

                                                                                                                  Prohibited by the type system at compile time.

                                                                                                                  “Nice, nice. What about out-of-bounds array accesses?”

                                                                                                                  Sometimes detectable even at compile time and at any rate detected and safely handled at runtime.

                                                                                                                  “Wonderful. So obviously then if your allocator reports that you’ve run out of memory…”

                                                                                                                  Instant, unrecoverable crash, yes.

                                                                                                                  1. 2

                                                                                                                    I’m not sure how that relates to my “there are lots of errors which neither Rust nor Go can protect against.” statement that you’re quoting. Yes, those are categories of errors that Rust protects against and Go does not, but there are still lots of other errors that neither language protect against.

                                                                                                                    1. 1

                                                                                                                      My point is that one of the errors that neither protect you against is out-of-memory errors, which has always baffled me. Rust doesn’t even panic (which could be recovered), but aborts.

                                                                                                                      OOM is much more often treated as a situation where it’s seemingly okay to absolutely crash compared to other resource-exhaustion situations (nobody would be like “oh the disk is full, let’s just crash and not even attempt to let the programmer deal with it”).

                                                                                                                      1. 2

                                                                                                                        I don’t know the rationale for this in Rust, but I’m aware that there’s been some discussion of this in the C++ standards committee. Gracefully handling out-of-memory conditions sounds really useful, but there are two problems:

                                                                                                                        • In several decades of the C++ specification defining exception behaviour for operator new exhausting memory and longer for C defining malloc as returning NULL when allocation fails, there are no examples of large-scale systems outside of the embedded / kernel space that gracefully handle memory exhaustion in all places where it can occur. Kernels generally don’t use the standard may-fail APIs and instead use two kinds of allocations, those that may block and those that may fail, with the vast majority of uses being the ones that can block.
                                                                                                                        • Most *NIX systems deterministically report errors if you exhaust your address space (which is not easy on a 64-bit system) but don’t fail on out-of-memory conditions at the allocation point. They will happily report that they’ve allocated memory but then fault when you try to write to it.

                                                                                                                        If you do get an out-of-memory condition, what do you do? If you’re disciplined and writing a very low-level system, then you do all of your allocation up-front and report failure before you’ve tried any processing. For anything in userspace, you typically need to do a load of cleanup, which may itself trigger allocation.

                                                                                                                        In general, the set of things for which it is possible to gracefully handle allocation failure are so distinct from everyday programming that it’s difficult to provide a generic mechanism that works for both. This is what malloc(3) and malloc(9) are such different APIs.

                                                                                                                        1. 2

                                                                                                                          Part of the trouble is that in most languages, it’s really hard to actually do much of anything without further allocation. Especially in languages where allocation can happen implicitly, this really does seem like an “above the program’s pay grade” kind of thing.

                                                                                                                          That said, Rust is decidedly not one of those languages; this is an API design choice, and it has indeed often felt like an odd one to me.

                                                                                                                1. 2

                                                                                                                  If actor models aren’t OO then what is? Naming an abstraction and sending instances polymorphic messages? Doesn’t get more OO than that

                                                                                                                  1. 3

                                                                                                                    Actor models are “OO done right” as opposed to “popular OO” seen in C++ and Java.

                                                                                                                    1. 1

                                                                                                                      I read, “actors are OO, but people don’t call them that”, as proof that OOP as a phrase is in decline (paraphrasing)

                                                                                                                      1. 1

                                                                                                                        I feel like most of the time, when people say OO they really just mean Java.

                                                                                                                        Python is OO and has had lambdas and closures for a long time, and that doesn’t make it any less OO. It’s good that we have different paradigms working together, instead of trying to beat each other.

                                                                                                                        1. 2

                                                                                                                          Python is OO and has had lambdas and closures for a long time

                                                                                                                          Smalltalk was the language created by the person who coined the term ‘object oriented’ to embody the style of programming closures and message passing (for method invocation) were the only form of flow control flow that it had. An if statement in Smalltalk is implemented by sending an ifTrue: or ifFalse: message with a closure as its argument to an object (typically a boolean object), which then either invokes the closure or doesn’t, depending on its internal value. Loops are built in a similar way.

                                                                                                                      1. 2

                                                                                                                        Still working on the eloquent Python to Javascript transpiler. I integrated it with pyanalyze so I can follow the callgraph and fix the call convention to use object packing/unpacking where keywords arguments are being used (as javascript doesn’t support keyword arguments..). This week I’ll probably keep adding to the integration, so I can translate operations correctly based on type (for example, += needs to be translated differently, if it’s an int or a list)

                                                                                                                        1. 4

                                                                                                                          Working on converting Lark from Python to Javascript, by writing my own transpiler.

                                                                                                                          Just as an anecdote, it’s able to automatically convert this:

                                                                                                                          x = [(x,y) 
                                                                                                                            for x in range(10)
                                                                                                                            for y in range(x)
                                                                                                                            if x+y==5
                                                                                                                          ]
                                                                                                                          

                                                                                                                          To this:

                                                                                                                          let x = [].concat(
                                                                                                                            ...range(10).map((x) =>
                                                                                                                              range(x)
                                                                                                                                .filter((y) => x + y == 5)
                                                                                                                                .map((y) => [x, y])
                                                                                                                            )
                                                                                                                          );
                                                                                                                          
                                                                                                                          1. 3

                                                                                                                            Do you walk the ast? Which language is the transpiler implemented in? Looks like a really fun project! If you’re willing to bother with a basic test suite, a license I’d be interested in helping. Just for fun. Totally understand if you don’t :-)

                                                                                                                            1. 2

                                                                                                                              Hi! I’m writing it in Python, using Lark to parse the Python code in a way that preserves the comments and keeps track of line ranges. I walk the resulting ast, once to make structural transformations, and another to convert the code. I’ll probably add some analysis layer soon, because the code needs to change according to the types.

                                                                                                                              I put it here on github: https://github.com/erezsh/py2js

                                                                                                                              But because I was writing it to myself, the code is a little hacky. But if needed, I’m willing to refactor it a little to make it easier to work with (for example, to use AST classes instead of Tree instances).

                                                                                                                              Let me know if you’re still interested in helping :)

                                                                                                                              1. 1

                                                                                                                                Are you not using the python ast module? Do you emit a JS ast or directly source code?

                                                                                                                                1. 3

                                                                                                                                  I started using the ast module, but decided against it because it throws away the comments.

                                                                                                                                  The code currently emits JS source code directly.

                                                                                                                          1. 1

                                                                                                                            I made Preql thread-safe, and I’m going to add syntax for column indexing.