Threads for tel


    until I can feed it my entire monorepo codebase as context it’s not going to be able to replace me. that said, that could happen a lot sooner than I expect.


      I don’t think humans grok the entire monorepo either. We typically work with a set of interfaces from the monorepo and the standard library. If GPT’s context window is large enough for the interfaces from the monorepo, that’s probably enough.

      EDIT: a big corporate might just fine tune the model on their monorepo. I wonder if OpenAI already did that to their own code.


        This has already happened. There are multiple ways you could load your code base into an LLM today if you wanted.


          I’m curious, how would you do that?


            Depends on what you want to do. If you want to generate code from scratch, then you can take a curated selection of your best procedures/classes and use them as context. If you want to search your code base (say, to curate a selection of good procedures/classes) then you can parallelize the search over each module and then summarize the search results. This model is a decent starting point for code completion.


            Yeah, you can probably train the LLM over all the code, the documentation, all the dependencies (including the tooling, the runtime and the APIs) and their documentations as well as all the existing issues and Slack messages, customer emails and support tickets and see what comes out of all that…

            I think it would still require a lot of supervision to write code that will go to production, but it could really really speed up feature development.


              Our codebase is 15 years old and we still need to change things in the oldest code but we definitely don’t want to write code like that anymore. Bits of code are updated the boyscout rule way one merge request at a time. LLMs have no sense of time in that sense. Even if all the code it writes is correct, it’ll be eclectic.

        1. 10

          About the AI and Fakenews, my astrological prediction is that we will see a burst of these, and then they die down. Because let’s face it: We were always able to fake a lot of things that are taken for a fact in our daily lives. Like IDs, Signatures or Voices. Without any AI. It just seemed to be much more unlikely or people didn’t care as much. And faking news isn’t new either*. It was just less likely to be made by your friendly basement neighbor. If anything I hope it will create a little bit more awareness that you shouldn’t start a hate mob just because someone somewhere said that person X did Y.

          ( I always forget that I can’t post more than once, so I end up editing my posts to add more content about different topics )

          *In germany a news outlet called “Bild” (EN Picture) is well known for turning headlines and framing stories so drastically, that they are nothing more than a propaganda machine. In the same way facebook was accused for showing people more and more inflammatory posts, because that is what’s creating views and clicks. Or various things documented, like the iphone rumors about “features” like “wireless charging via microwaving” and other things, which people believed. (For more political examples look up Wikipedia about (known) false claims of different states during various wars.)

          “No Way To Prevent This,” Says Only Nation Where This Regularly Happens

          I see what you did there

          1. 2

            I wonder about it in terms of an SNR problem. I agree with everything you’re saying, but wonder about the particular mechanics of these things “dying down”. Generally, they do so because the cost to produce significant, impactful noise grows high enough to make it no longer worth it.

            I think we’ve already seen that this cost-benefit argument has been moving in the favor of disinformation since the start of the internet age and I suspect AI will “improve” it again.

            So then we are looking for tooling to more quickly discover the signal amidst growing noise that is both cheap and difficult to filter. So we turn to AI to do that as well.

            People have already discussed the realistic scenario where you use AI to generate a sufficiently “business polite” email which is then consolidated into practical bullet points via AI on the reader’s end.

            I don’t really know what to imagine about the world if we enter into that scenario, but it’s still pretty concerning. Especially if there are winner-takes-all dynamics in AI and a single provider controls both the encoder and the decoder.

            1. 3

              I can see the problem, but at the same time we have this problem of totally misinformed people since ages. What I mean with that is, that there are and always will be groups of people growing up with a completely different view on the world, up until the point that you may discover they believe into very weird stuff.

              Random examples include various religions, sports, technology, customs, privacy, weight loss methods, fear of electromagnetic waves and homeopathic medicine.

              And the “homeopathic medicine” is already a very strong contender: Virtually everyone will have learned household remedies which they use for illnesses such as the flu. Now the question is which of them are true and which are just plain useless. You can apply the same to various cooking habits. It all boils down to culture, your parents will have to teach you various things, and those may set you up for believes and a perception of the environment, that probably many other won’t share.

              Edit: Another big example are the global warming and covid fake-facts. Both of which have papers and studies, and a ton of people believing otherwise. I personally know people that simply believe their “physics buddy” which tells them why global warming is a hoax or why double-glazed windows are just an idea of the “construction mafia”. I’m talking fully studied people in tech.

              To make it short, without the modern communication every village has its own facts. And if you add newspapers and something like telegraphs, then you still have one “source of truth”. One oracle for the artificial intelligence in everyone to decide with.

              I’m missing the piece of the puzzle that explains why we aren’t drowning in false news already. Most of the stories we read are from giant news agencies, simply pasted into every newsletter. I doubt anyone really fact-checks this, or that anyone really could. We clearly don’t need AI to get overrun by false news, and people don’t seem to need AI fakes to believe random peoples stories over anything else.

              1. 5

                I’m missing the piece of the puzzle that explains why we aren’t drowning in false news already.

                A lot of the stories from giant news agencies are false — consider the consensus in the high-reputation newspapers and cable news channels in 2003 that Iraq possessed weapons of mass destruction and was immanently likely to use them on the US. And while this false news can cause tremendous harm (one million dead Iraqis), it doesn’t cause internal social discord, because the messaging is all consistent; every village gets the same set of (false) facts. Today, there are more news sources, all of them with their own distinct reality tunnels. But the giant news agencies are still mostly on the same page because the social/economic structures underpinning them haven’t really changed.

                1. 4

                  every village gets the same set of (false) facts

                  And that’s one of the points I wanted to make: The difference between true and false facts is, to a good extent, just a global consensus. Unfortunately, the fear of misinformation does not answer where the “true facts” come from. To quote a German politician:

                  Part of these answers would unsettle the population

                  You example is a great view into the human psychology. We may not want the truth, because we couldn’t live with it.

                  1. 1

                    We may not want the truth, because we couldn’t live with it.

                    Then we should die, no? Otherwise, what’s the point?

                    1. 1

                      what’s the point

                      Make your own decision how much suffering in the world you are ready to experience, and where you buy the cheap product from children’s hands.

          1. 1

            Nice write up. I’m confused why Infalliable necessitates special syntax (!) though?

            1. 3

              The special syntax, !, is experimental and gated behind a feature flag in nightly. I believe the reasoning is mostly around allowing a syntactic special case. Today, Infallible is available in stable and has identical type semantics.

              Infallible is useful in order to, for instance, declare that a Result will never error by giving it a type like Result<T, Infallible>. This also gives reason to its name.

              Unfortunately, its name is somewhat more specific than its semantics as the following type checks but reads weirdly.

              fn never_returns() -> Infallible {
                  loop { }
              1. 2

                Ah! Makes sense, also explains why Swift’s Never type pops up when using Combine, which is their reactive stream library.

                1. 2

                  for exactly that reason, I think the name Infallible would have been better for the whole thing – like:

                  type Infallible<T> = Result<T, Void>
                  // Values of type Void cannot be constructed
                  enum Void {}
                  fn never_returns() -> Void {
                   loop {}
                  // using Infallible to implement some error-aware trait
                  impl FromStr for () {
                   type Err = Void;
                   // from_str() is Infallible. Infallibly, this function returns a `()`
                   fn from_str(&str) -> Infallible<()> {
                  1. 1

                    I definitely see what you’re saying. I think Infallible is a decent historical compromise, though I also wish they just called it Void or Nothing.

              1. 7

                The interesting thing about Infallible/! is what happens when you are given a variable which appears to take that type. Definitionally, no such value exists and therefore we can be guaranteed that any code which ever appears to hold such a value can never be run. That said, we can still write it counterfactually.

                Perhaps the most practically useful case is using it to completely disable certain branches of an enum

                enum Three<A, B, C> {
                fn handle(v: Three<!, !, ()>) {
                    // we can ignore One and Two, they cannot ever exist
                    // rust knows this as well and won't get upset that we don't handle those branches
                    match v {
                        Three(()) => {}

                This leads to one of the most useless and marvelous pieces of code available in programming languages which support a type like this, the safe, unconditional coercion:

                fn coerce<A>(value: Infallible) -> A {
                    match value {}

                This code appears to be wildly unsafe, able to manufacture a value of any type whatsoever at any time.

                But, it requires access to a value which is certain to never exist. Therefore, we can guarantee that coerce is never actually called.

                1. 4

                  This insight is part of how type systems are used as theorem provers! It underlies dependent types and homotopy type theory among others, there’s a whole field of study. There’s a lot to cover but maybe start here if you’re interested.

                  According to type theories, types are propositions and implementations are proofs.

                  That is, a type Cat -> Meow is really the proposition that “Given a cat, I can produce a meow”. Any implementation of a function of that type is a proof of it and instances of a type are evidence. Making up some syntax:

                  do_cats_meow: Cat -> Meow
                  do_cats_meow cat =
                      cat <- pet  # produces and returns a result

                  This function is a proof of that proposition: a proof that a Meow can be derived from a Cat. More complicated versions might say “given any number and evidence that it is prime I can produce a higher number and evidence that it is also prime” as a proof that there is no largest prime.

                  This is of course a simple example but it’s generalisable to reproduce most of set theory and logic.

                  Your example coerce Infallible -> A says something like “if you can give me evidence that 0=1, I can use that to prove any statement at all”

                  1. 1

                    we can satisfy the borrow checker by wrapping the parent reference in something called a Weak pointer. This type tells Rust that a node going away, or its children going away, shouldn’t mean that its parent should also be dropped.

                    I read this a few times and am feeling nitpicky, but I feel Weak is more the opposite. It says that a node or its children sticking around isn’t sufficient reason to not drop the parent. Or, more to the point, the T in an Rc<T> can be dropped if only Weak<T> pointers remain.

                    From the docs

                    Since a Weak reference does not count towards ownership, it will not prevent the value stored in the allocation from being dropped, and Weak itself makes no guarantees about the value still being present.

                    Interestingly, Weak<T> pointers will allow the T to drop, cleaning up cycles if T contains Rcs, but will keep ahold of the allocation itself. This surprised me at first, but makes sense once you realize that that counters are themselves stored in the allocation. If dropping all the Rcs cleared the allocation too then the Weaks would deference dangling pointers when trying to access those counters.

                    1. 2

                      The Weak keeping allocations alive might be more obvious to people coming from a c++ background - there is the same issue with shared_ptr and weak_ptr.

                    1. 3

                      I think Copilot will fail as a useful tool. It will teach us, though, what we’ve already known: that the code artifact itself is not particularly meaningful and that the truly meaningful artifacts exist somewhere between the code, our own minds, and the minds of others on our teams.

                      Code is biased toward execution, so we can sometimes document our intentions and semantic desires as tests. Sometimes we pull away from execution and write long-form specifications or acceptance criteria. The truly mad write formal specifications. These artifacts are, at times, closer to what we want but it’s my experience that they’re all too far still from what’s actually meaningful.

                      Without being able to really write down what we want then the best Copilot can do is try to guess. It’ll probably be surprisingly good at getting in the ballpark whenever we ask for something that’s fairly common in fairly common language. The closer our desires are to the zeitgeist, the better Copilot can behave.

                      But this is silly. The most successful thing in computer science is the actual reuse of common ideas. I’d rather take a dependency than an additional 100 lines of code. Copilot delivering me something that rhymes heavily with a dependency is worse than useless.

                      So the last bastion of hope is that I can write a loose, barely even substantial enough to be wrong “specification” somewhere between a function name, some documentation, and maybe a few initial lines of code and Copilot can take the average between that and the closest thing it’s seen in public code and deliver to me something useful. This feels impossible. At least, without dialogue.

                      I’ll be impressed when you can converse with Copilot, go back and forth, hammer out an idea in specification, test, code, formal spec, types, etc. That iterative refinement process is really nice. I know this because we already have it in languages like Coq and Agda. Copilot takes a tremendously different technical approach, but I think it’d be wise to converge in that direction.

                      Or, if not, I’d rather it just be a smart way to search open source code for a library I can import.

                      1. 2

                        Last up, unless it gets into dialogue mode quickly I think it’s going to have a negative impact on learning to code and being a junior developer. I don’t want to say I’d go as far as banning it from my team, if I even could, but I’d suspect that the student who uses Copilot may learn and grow much more slowly.

                        To be clear, my prediction is that it can’t be like a calculator was for math, helping the user to focus on higher level details by ignoring the repetitive bits and thus accelerating learning. I don’t think the state of CS is yet to the point where we can separate the code from the higher level ideas. Comparatively speaking, arithmetic is an extremely simple programming language.

                      1. 4

                        Love to see it. I use Elixir everyday at work, and am a big fan of static types, so in principle I’m the target audience. However, I’m reluctant to give up the Elixir ecosystem I love do much. E.g. I’m really excited LiveView, Livebook, Nx, etc.

                        What are the benefits / necessity of a whole new language as opposed to an improved dialyzer, or maybe a TypeScript style Elixir superset with inline type annotations?

                        1. 11

                          One issue is that existing Elixir code will be hard to adapt to a sound type system, more specifically in how pattern matching is used. For example, consider this common idiom:

                          {:ok, any} = my_function()

                          (where my_function may return {:ok, any} or {:error, error} depending on whether the function succeeded)

                          Implicitly, this means “crash, via a badmatch error, if we didn’t get the expected result”. However this is basically incompatible with a sound type system as the left-hand side of the assignment has the type {:ok, T} and the function has the return type {:ok, T} | {:error, Error}.

                          Of course we could add some kind of annotation that says “I mismatched the types on purpose”, but then we’d have to sprinkle these all over existing code.

                          This is also the reason why Dialyzer is based on success typing rather than more “traditional” type checking. A consequence of this is that Dialyzer, by design, doesn’t catch all potential type errors; as long as one code path can be shown to be successful Dialyzer is happy, which reflects how Erlang / Elixir code is written.

                          1. 5

                            Of course we could add some kind of annotation that says “I mismatched the types on purpose”, but then we’d have to sprinkle these all over existing code.

                            This is what Gleam does. Pattern matching is to be total unless the assert keyword is used instead of let.

                            assert Ok(result) = do_something()

                            It’s considered best practice to use assert only in tests and in prototypes

                            1. 1

                              What do you do when the use case is “no, really, I don’t care, have the supervisor retry because I can’t be bothered to handle the error and selectively reconcile all of this state I’ve built up, I’d rather just refetch it”?

                              1. 1

                                Maybe we need another keyword:

                                assume Ok(result) = do_something()
                                1. 1

                                  That is what assert is for.

                                2. 1

                                  That’s what assert is. If the pattern doesn’t match then it crashes the process.

                                  1. 1

                                    So why not use it in production?

                                    1. 2

                                      You can for sure. I was a bit too simple there. I should have said “It is best to only use assert with expected non-exceptional errors in prototypes`. There’s place for Erlang style supervision in Gleam in production.

                              2. 3

                                Great, succinct explanation!

                                1. 2

                                  This is an interesting example. I’m still not sure I understand how it’s “basically incompatible”, though. Is it not possible to annotate the function with the possibility that it raises the MatchError? It feels kind of like Java’s unchecked exceptions a bit. Java doesn’t have the greatest type system, but it has a type system. I would think you could kind of have a type system here that works with Elixir’s semantics by bubbling certain kinds of errors.

                                  Are you assuming Hindley Milner type inference or something? Like, what if the system were rust-style and required type specifications at the function level. This is how Elixir developers tend to operate already, anyway, with dialyzer.

                                  1. 1

                                    I don’t see how that’s a problem offhand. I’m not sure how gleam does it, but you can show that the pattern accommodates a subtype of the union and fail when it doesn’t match the :ok.

                                    1. 4

                                      The problem is the distinction between failing (type checking) and crashing (at runtime). The erlang pattern described here is designed to crash if it encounters an error, which would require that type checking passes. But type checking would never pass since my_function() has other return cases and the pattern match is (intentionally) not exhaustive.

                                      1. 1

                                        Ah, exhaustive pattern matching makes more sense. But also feels a little odd in Erlang. I’ll have to play with Gleam some and get an understanding of how it works out.

                                  2. 4

                                    One thing is that TypeScript is currently bursting at the seams as developers aspirationally use it as a pure functional statically-typed dependently-typed language. The TypeScript developers are bound by their promise not to change JavaScript semantics, even in seemingly minor ways (and I understand why this is so), but it really holds back TS from becoming what many users hope for it to be. There’s clearly demand for something more, and eventually a language like PureScript / Grain / etc will carve out a sizable niche.

                                    So, I think starting over from scratch with a new language can be advantageous, as long as you have sufficient interoperability with the existing ecosystem.

                                    1. 2

                                      I won’t go too much into Dialyzer as I’ve never found it reliable or fast enough to be useful in development, so I don’t think I’m in a great place to make comparisons. For me a type system is a writing assistant tool first and foremost, so developer UX is the name of the game.

                                      I think the TypeScript question is a really good one! There’s a few aspects to this.

                                      Gradual typing (TypeScript style) offers different guarentees to the HM typing of Gleam. Gleam’s type system is sound by default, while with gradual typing you opt-in to safety by providing annotations which the checker can then verify. In practice this ends up being quite a different developer experience, the gradual typer requires more programmer input and the will to resist temptation not to leave sections of the codebase untyped. The benefit here is that it is easier to apply gradual types to an already existing codebase, but that’s not any advantage to me- I want the fresh developer experience that is more to my tastes and easier for me to work with.

                                      Another aspect is just that it’s incredibly hard to do gradual typing well. TypeScript is a marvel, but I can think of many similar projects that have failed. In the BEAM world alone I can think of 4 attempts to add a type checker to the existing Elixir or Erlang languages, and all have failed. Two of these projects were from Facebook and from the Elixir core team, so it’s not like they were short on expertise either.

                                      Lastly, a new language is an oppotunity to try and improve on Elixir and Erlang. There’s lots of little things in Gleam that I personally am very fond of which are not possible in them.

                                      One silly small example is that we don’t need a special .() to call an anonymous function like Elixir does.

                                      let f = fn() { 1 }

                                      And we can pipe into any position

                                      |> first_position(2)
                                      |> second_position(1, _)
                                      |> curried_last_position

                                      And we have type safe labelled arguments, without any runtime cost. No keyword lists here

                                      replace(each: ",", with: " ", in: "A,B,C")

                                      Thanks for the questions

                                      edit: Oh! And RE the existing ecosystem, you can use Gleam and Elixir or Erlang together! That’s certainly something Gleam has been built around.

                                      1. 1

                                        Two of these projects were from Facebook and from the Elixir core team, so it’s not like they were short on expertise either.

                                        Oh, wow, I don’t think I’ve heard of these! Do you have any more info? And why was Facebook writing a typechecker for Elixir? Are you talking about Flow?

                                        1. 1

                                          Facebook were writing a type checker for Erlang for use in WhatsApp. It’s not flow, but it is inspired by it. I’m afraid I don’t think much info is public about it.

                                    1. 2

                                      What size would the sqlite file be if it was also gzipped?

                                      1. 7

                                        Parquet can read directly from the compressed file and decompress on the fly, though.

                                        1. 3

                                          471MB zipped.

                                          1. 1

                                            Or anyone seriously investigated this?

                                            1. 1

                                              It’s not very popular due to the license and fees. Here is an interesting project should work really well with the zstd dict compression. Since there is tons of repeated values in some columns, or json text, zstd’s dict based compress works really well.

                                              [2020-12-23T21:14:06Z INFO sqlite_zstd::transparent] Compressed 5228 rows with dictid=111. Total size of entries before: 91.97MB, afterwards: 1.41MB, (average: before=17.59kB, after=268B)

                                              1. 1

                                                I would want to investigate using ZSTD (with a dictionary) and Parquet as well. Columnar formats are better for OLAP.

                                              2. 1

                                                Why would anybody investigate this when you have Parquet and ORC specifically designed for OLAP use cases. What does Sqlite add to this problem that a data scientist wants to process the Github export data?

                                              3. 1

                                                Apples to oranges.

                                              1. 2

                                                SOLID always feels like it’s advice about how to handle “promises” properly in a typed system. If I offer you some value of type T then that comes attached with a number of “promises” in terms of what you may do with values of type T, what you may expect from them.

                                                SOLID, or S_LID at least, implores me to (a) take those promises seriously, (b) offer only promises I am confident I can keep, and (c) offer as many promises as I can while demanding as few as I can get away with.

                                                • Single Responsibility drives for the separation of promises into groups, allowing enough richness in the subtyping hierarchy to even discuss “offering a lot” or “asking for a little”.
                                                • Liskov Substitution is a mandate on treating the subtyping relationship as the arbiter of these promises and, I feel, essentially protects against “subclassing” hierarchies which only carry convenience in construction, not consistency of properties
                                                • Interface Segregation essentially says, alongside S, that our functions should only ask for exactly what they need
                                                • Dependency Inversion is again a practice of demanding as little as possible (thus, abstract interfaces are more durable).

                                                From this perspective, the thing that is missing to my eye is that the types you generate should be the aggregations of all of the promises you can afford to make about them. Your clients can always choose to use less than you offer, but once a promise is made it must be kept (for backwards compatibility purposes).

                                                Another way to look at it is a given function A1 => B1 can also be treated as a function A0 => B2 for A0 < A1 and B1 < B2. So to make the highest utility interface, we should ask for the most abstract/least demanding input A and offer the most concrete/least abstract output B.

                                                The limitation is that every thing I refuse to ask for on the input side I must be confident I will never need and every thing I offer on the output side I must be willing to support “forever”.

                                                This idea, plus Liskov Substitution which gives teeth to the idea of “promises” all together, feel like it generalizes most of what S_LID is talking about.

                                                1. 4

                                                  This is a scan, or a stream transducer, or a state machine. Scala’s stdlib scans don’t have quite the right design to make this particular problem elegant, but it’s not too bad

                                                  sealed trait St {
                                                    def current: (Char, Int)
                                                    def update(c: Char): St = this match {
                                                      case Emit(_, current)                  => Count(current).update(c)
                                                      case Count((char, count)) if c == char => Count((char, count + 1))
                                                      case Count(current)                    => Emit(current, (c, 1))
                                                  case class Emit(prev: (Char, Int), current: (Char, Int)) extends St
                                                  case class Count(current: (Char, Int)) extends St
                                                  def soln(s: String): List[(Char, Int)] = s.toList match {
                                                    case Nil => List.empty
                                                    case c :: cs =>
                                                      val states = cs.scanLeft[St](Count(c, 1))(_.update(_))
                                                      states.collect { case Emit(prev, _) => prev } :+ states.last.current

                                                  I could code golf this a lot more, but this should communicate the idea a bit better.

                                                  1. 17

                                                    You can pass a List<int> to a function that wants a List<int?>.

                                                    It’s worth pointing out that this is not always sound. For example, this results in a runtime type error:

                                                    addNull(List<int?> xs) {
                                                    main() {
                                                      List<int> blah = [];

                                                    Of course, it’s valid to decide that this is an acceptable price to pay. TypeScript does this (with no runtime errors since the types basically just disappear), and I guess Dart does it as well.

                                                    1. 8

                                                      Isn’t this basically type covariance vs type contravariance? Some languages (notably Scala) allow to explicitly specify which is desirable.

                                                      1. 4

                                                        If the List type constructor was contravariant instead (i.e. List<int?> was a subtype of List<int>), then you would have analogous problem with the types swapped:

                                                        oops(List<int> xs) {
                                                          int x = xs[0];
                                                          // do something with x
                                                        main() {
                                                          List<int?> blah = [null];

                                                        So it doesn’t seem that this dichotomy is relevant here.

                                                        1. 14

                                                          An immutable list is covariant in its type parameter. A mutable list is invariant, neither covariant or contravariant. The proper solution is for List<A> and List<B> to be unrelated types in the sub typing relation.

                                                          1. 3

                                                            It is relevant. The covariance/contravariance of the type depends on whether it’s in the “input” position or the “output” position. But generally speaking, List is neither contravariant nor covariant.

                                                            Kotlin gets this correct with its in and out qualifiers on generic parameters.

                                                        2. 1

                                                          “Valid” is a measured tradeoff between convenience and potential for errors. Because, it can cause runtime errors.

                                                          function addString<a>(lst: (a | string)[]) {
                                                              lst.push("oh no")
                                                          let xs: number[] = [1, 2, 3]
                                                          // straightforward function contract, returns a number
                                                          function sum(lst: number[]): number {
                                                              return lst.reduce((a, b) => a + b)
                                                          console.log(sum([2, "bar"]))  // <- this produces a type error
                                                          console.log(sum(xs))          // <- this doesn't, but it doesn't return a number
                                                          // will this behave as it looks like it will?
                                                          for (let i = 0; i <= sum(xs); i++) {

                                                          The alternative is explicit variance specification, or seriously weakening subtyping relations.

                                                        1. 16

                                                          People like me have been saying this for quite some time. You could use traditional non-linear optimization techniques here to do even better than what the author’s simple random search does, for example gradient descent.

                                                          My old boss at uni used to point out that neural networks are just another form of interpolation, but far harder to reason about. People get wowed by metaphors like “neural networks” and “genetic algorithms” and waste lots of time on methods that are often outperformed by polynomial regression.

                                                          1. 12

                                                            Most of ML techniques boil down to gradient descent at some point, even neural networks.

                                                            Youtuber 3blue1brown has an excellent video on that: .

                                                            1. 3

                                                              Yep, any decent NN training algorithm will seek a minimum. And GAs are just very terrible optimization algorithms.

                                                              1. 1

                                                                I’d say that only a few ML algorithms ultimately pan out as something like gradient descent. Scalable gradient descent is a new thing thanks to the advent of differentiable programming. Previously, you’d have to hand-write the gradients which often would involve investment into alternative methods of optimization. Cheap, fast, scalable gradients are often “good enough” to curtail some of the other effort.

                                                                An additional issue is that often times the gradients just aren’t available, even with autodiff. In this circumstance, you have to do something else more creative and end up with other kinds of iterative algorithms.

                                                                It’s all optimization somehow or another under the hood, but gradients are a real special case that just happens to have discovered a big boost in scalability lately.

                                                              2. 6

                                                                A large part of ML engineering is about evaluating model fit. Given that linear models and generalized linear models can be constructed in a few lines of code using most popular statistical frameworks [1], I see no reason for ML engineers not to reach for a few lines of a GLM, evaluate fit, and conclude that the fit is fine and move on. In practice for more complicated situations, decision trees and random forests are also quite popular. DL methods also take quite a bit of compute and engineer time to train, so in reality most folks I know reach for DL methods only after exhausting other options.

                                                                [1]: is one I tend to reach for when I’m not in the mood for a Bayesian model.

                                                                1. 1

                                                                  Didn’t know about generalized linear models, thanks for the tip

                                                                2. 5

                                                                  For a two parameter model being optimized over a pretty nonlinear space like a hand-drawn track I think random search is a great choice. It’s probably close to optimal and very trivial to implement whereas gradient descent would require at least a few more steps.

                                                                  1. 3

                                                                    Hill climbing with random restart would likely outperform it. But not a bad method for this problem, no.

                                                                  2. 1

                                                                    I suppose people typically use neural networks for their huge model capacity, instead of for the efficiency of the optimization method (i.e. backward propagation). While neural networks are just another form of interpolation, they allow us to express much more detailed structures than (low-order) polynomials.

                                                                    1. 4

                                                                      There is some evidence that this overparameterisation in neural network models is actually allowing you to get something that looks like fancier optimisation methods[1] as well as it’s a form of regularisation[2].

                                                                      1. 2

                                                                        The linked works are really interesting. Here is a previous article with a similar view:

                                                                      2. 1

                                                                        neural networks […] allow us to express much more detailed structures than (low-order) polynomials

                                                                        Not really. A neural network and a polynomial regression using the same number of parameters should perform roughly as well. There is some “wiggle room” for NNs to be better or PR to be better depending on the problem domain. Signal compression has notably used sinusodial regression since forever.

                                                                        1. 2

                                                                          A neural network and a polynomial regression using the same number of parameters should perform roughly as well.

                                                                          That’s interesting. I have rarely seen polynomial models with more than 5 parameters in the wild, but neural networks easily contain millions of parameters. Do you have any reading material and/or war stories about such high-order polynomial regressions to share?

                                                                          1. 3

                                                                            This post and the associated paper made the rounds a while ago. For a linear model of a system with 1,000 variables, you’re looking at 1,002,001 parameters. Most of these can likely be zero while still providing a decent fit. NNs can’t really do that sort of stuff.

                                                                    1. 2

                                                                      I still not get it. What’s the point of associativity? Related to programming? Just fancy syntax sugar?

                                                                      And monoids: what the purpose of returning Nothing? Pipeline not broke. Great. What’s the point? With exceptions you get at least stack trace.

                                                                      1. 9

                                                                        Associativity turns trees into sequences. Monoids are essentially things that can be modeled as being “in sequence” where composition is putting one sequence in front of the other.

                                                                        Without associativity, what you’re left with can have arbitrary branching. That’s very useful when you want it, but it’s much more complex and it’s much harder to say anything about “all branching things” because they’re so much more variable.

                                                                        1. 1

                                                                          No applicable to real life.

                                                                          1. 3

                                                                            I disagree. It’s just not a “thing”. It’s a shape. In programming awareness of common shapes helps you to write better code. It’s hard to say much more than that without diving into a larger example.

                                                                            I recommend Brent Yorgey’s “Monoids: Themes and Variations” as a set of examples.

                                                                      1. 4

                                                                        I’m pretty sure most programmers who are new to GIS etc. will make an effort to study the intricacies involved. Unlike previous popular “falsehoods programmers believe”, namely names and dates, map coordinates simply don’t have the ingrained assumptions people have with the previous areas.

                                                                        Edit to expand, what I’m trying to say is that it’s vastly more common for developers to deal with names and dates than it is to deal with coordinate systems - especially for mission critical applications such as surveying.

                                                                        1. 5

                                                                          While that’s true, the inherent complexity of geospatial issues feels a bit higher. In practice, there are geospatial experts who do this all the time and know the ins and outs, geospatial tourists who just need to touch some well-behaved corner of this, and then the mixed teams where the experts have to deal carefully with the tourists learning their way through the minefield.

                                                                          1. 2

                                                                            It seemed a very strange list. I don’t think I believed any of them. Some were things I knew were obviously wrong (e.g. one degree is a fixed distance), some of the others were things that I didn’t believe because I didn’t know enough about GIS to get to the basic level of misunderstanding (e.g. I have never heard of web mercator, though I did know non-mercator projections were important). It seems like you have to go down a particularly odd learning path to get to these beliefs: you need a lot of domain-specific knowledge with no context.

                                                                            1. 1

                                                                              I realize the author wanted to riff off the famous “Falsehoods programmers believe…” about names and dates, but a better title would have been “here are eight surprising facts about map coordinates you probably didn’t know”.

                                                                            2. 2

                                                                              With GPS and Google Maps being ubiquitous I think many people don’t think about these topic, but take the patterns saw there for granted.

                                                                              People don’t know about the intricacies of timekeeping and calendaring, because watches (now phones) are everywhere. I have learnt quite a bit about them, so I don’t assume anything being simple about them. I think the situation is similar with navigation/mapping.

                                                                              Btw. developers deal with GIS quite often in business applications/logistics. Coordinate systems often not considered in those cases, until they pop out from a hidden corner, and then are used incorrectly :)

                                                                            1. 10

                                                                              Do programmers actually believe these falsehoods? They seem specific enough that anybody working in the domain would know these.

                                                                              1. 6

                                                                                Geodesy and mapping are both quite complicated and there’s a lot of stuff to keep in mind. When I started a few years ago I definitely was not aware of all the complexities I know about now, and I’m sure I remain ignorant of yet more.

                                                                                An example might be that data in Open Street Map usually doesn’t have an associated reference frame, it’s just presented as WGS84 lat/lon, but anything with sub-meter accuracy must be relatable to a reference frame, usually one of the ITRF frames, or you can’t meaningfully interpret it.

                                                                                Another example that I suspect most people doing data-science with geospatial data are ignorant of is that heights are preserved very badly in much of the world by the WGS84 geoid (tho with the EGM96 it is better). But the fit is good across North America, so it would be easy for yanks to miss. This can have practical effects (as I recall sea level in Cornwall is about 100m wrong in WGS84, so you need to adjust for that). The locally poor height fit of WGS84 is why the UK used to use its own geoid (we still sort of do, but it’s defined in relation to a version of ITRF now) and is probably part of why China and Russia still use their own.

                                                                                Lots of these issues can bite you hard, but often don’t, which is why they are easy to miss. You can get away with using an inappropriate UTM grid; or calculating distances naively with euclid and lat/lon; or ignoring the issues with high-precision data. Things will mostly work.

                                                                                1. 4

                                                                                  “programmers” don’t think about these, so they likely have some/most of these assumptions. When I was working in a field which involved mapping these were said in the onboarding to the project, thus likely the prior experience was most programmers don’t know these, so it is better to tell them upfront. (I did have prior knowledge on the topic from astronomy though)

                                                                                  1. 2

                                                                                    No, but it still seems like a good list.

                                                                                    I’ve tangentially worked with map data in the last few years and half of those were new to me, because they didn’t exactly apply to our problems.

                                                                                    1. 2

                                                                                      I worked in geospatial stuff for a while. You run into this stuff, these vocabulary terms, constantly, but they complex enough that in a sufficiently large team you will still constantly see mistakes. What’s worse is some fraction of the time the mistakes lead only to very subtle data errors. You end up with a class of geospatial professionals who are implicitly qualified enough to review all geospatial code and check up on subtle data distortion.

                                                                                    1. 17

                                                                                      I’m working on some ‘pretty big’ (several kilolines) project on rust, and two things that frustrate me to no end:

                                                                                      • All the stuff around strings. Especially in non-systems programming there’s so much with string literals and the like, and Rust requires a lot of fidgeting. Let’s not even get into returning heap-allocated strings cleanly from local functions. I (think) I get why it’s all like this, but it’s still annoying, despite all the aids involved

                                                                                      • Refactoring is a massive pain! It’s super hard to “test” different data structures, especially when it comes to stuff involving lifetimes. You have to basically rewrite everything. It doesn’t help that you can’t have “placeholder” lifetimes, so when you try removing a thing you gotta rewrite a bunch of code.

                                                                                      The refactoring point is really important I think for people not super proficient in systems design. When you realize you gotta re-work your structure, especially when you have a bunch of pattern matching, you’re giving yourself a lot of busywork. For me this is a very similar problem that other ADT-based languages (Haskell and the like) face. Sure, you’re going to check all usages, but sometimes I just want to add a field without changing 3000 lines.

                                                                                      I still am really up for using it for systems stuff but it’s super painful, and makes me miss Python a lot. When I finally get a thing working I’m really happy though :)

                                                                                      1. 4

                                                                                        I would definitely like to learn more about the struggles around refactoring.

                                                                                        1. 4

                                                                                          Your pain points sound similar to what I disliked about Rust when I was starting. In my case these were symptoms of not “getting” ownership.

                                                                                          The difference between &str and String/Box<str> is easy once you know it. If it’s not obvious to you, you will be unable to use Rust productively. The borrow checker will get in your way when you “just” want to return something from a function. A lot of people intuitively equate Rust’s references with returning and storing “by reference” in other languages. That’s totally wrong! They’re almost the opposite of that. Rust references aren’t for “not copying” (there are other types that do that too). They’re for “not owning”, and that has specific uses and serious consequences you have to internalize.

                                                                                          Similarly, if you add a reference (i.e. a temporary scope-limited borrow) to a struct, it blows up the whole program with lifetime annotations. It’s hell. <'a> everywhere. That’s not because Rust has such crappy syntax, but because it’s basically a mistake of using wrong semantics. It means data of the struct is stored outside of the struct, on stack in some random place. There’s a valid use-case for such stack-bound-temp-struct-wrappers, but they’re not nearly as common as when it’s done by mistake. Use Box or other owning types in structs to store by reference.

                                                                                          And these aren’t actually Rust-specific problems. In C the difference between &str and Box<str> is whether you must call free() on it, or must not. The <'a> is “be careful, don’t use it after freeing that other thing”. Sometimes C allows both ways, and structs have bool should_free_that_pointer;. That’s Cow<str> in Rust.

                                                                                          1. 4

                                                                                            Indeed, but I think this proves the “Complexity” section of TFA. There are several ways to do things including:

                                                                                            • References
                                                                                            • Boxed pointers
                                                                                            • RC pointers
                                                                                            • ARC pointers
                                                                                            • COW pointers
                                                                                            • Cells
                                                                                            • RefCells

                                                                                            There’s a lot of expressive power there, and these certainly help in allowing memory-safe low-level programming. But it’s a lot of choice. Moreso than C++.

                                                                                            1. 2

                                                                                              Absolutely — with a GC all these are the same thing. C++ has all of them, just under different names, or as design patterns (e.g you’ll need to roll your own Rc, because std::shared_ptr will need to use atomics in threaded programs).

                                                                                              There are choices, but none of them are Rust-specific. They’re emergent from what is necessary to handle memory management and thread safety at the low level. Even if C or C++ compiler doesn’t force you to choose, you will still need to choose yourself. If you mix up pointers that are like references, with pointers that are like boxes, then you’ll have double-free or use-after-free bugs.

                                                                                              1. 2

                                                                                                There are choices, but none of them are Rust-specific. They’re emergent from what is necessary to handle memory management and thread safety at the low level.

                                                                                                I disagree. ATS and Ada offer a different set of primitives to work with memory safe code. Moreover, some of these pointer types (like Cow) are used a lot less frequently than others. Rust frequently has multiple ways and multiple paradigms to do the same thing. There’s nothing wrong with this approach, of course, but it needs to be acknowledged as a deliberate design decision.

                                                                                                1. 1

                                                                                                  I’d honestly like to know what Ada brings to the table here. AFAIK Ada doesn’t protect from use-after-free in implementations without a GC, and a typical approach is to just stay away from dynamic memory allocation. I see arenas are common, but that’s not unique to Ada. I can’t find info what it does about mutable aliasing or iterator invalidation problems.

                                                                                                2. 2

                                                                                                  The set of Boost smart pointers demonstrates some of the inherent complexity in efficient object ownership:

                                                                                            2. 1

                                                                                              It doesn’t help that you can’t have “placeholder” lifetimes

                                                                                              I’m not sure what you mean, but maybe this can help?

                                                                                            1. 9

                                                                                              A good list.

                                                                                              Alas, it’s often not reasonable to “Make Invalid States Unrepresentable”. It’s an excellent aim, but not always feasible/practical.

                                                                                              However, it’s close relative, “Make it impossible to create an instance of type in, or mutate it into an invalid state, by strict encapsulation and only providing public interfaces that do not do so.”

                                                                                              That’s always possible and always practical and should be rigidly strict standard programming practice.

                                                                                              “Data Consistency Makes Systems Simpler”

                                                                                              On one project using sqlite, I decided to take CJ Date at his word.

                                                                                              I created a data dictionary table. Every field name, it’s type and meaning was in there.

                                                                                              I rigidly stuck to that.

                                                                                              The tables were fully normalised.

                                                                                              There were no nulls anywhere ever.

                                                                                              I only used natural joins.

                                                                                              The resulting sql was orders of magnitude simpler, faster and more understandable and I haven’t had issues with it for years.

                                                                                              1. 3

                                                                                                I’m happy to hear success stories taking Date seriously. I would love to hear more.

                                                                                                1. 2

                                                                                                  I’m unfamiliar with the technique you describe for SQL. Is there a write up?

                                                                                                  I came to a similar conclusion as you when I’m writing Clojure. A hash map can have anything, but if you have a schema for it, and your tools only can ever create valid maps, then you get much of the benefit. And obviously if you need exactly one of something, use a set or map, etc.

                                                                                                  1. 2


                                                                                                    Basically I wrote SQL as if everything he said about Relational Algebra was gospel. Oh yes, I nearly forgot, every projection “select” was a “select distinct”

                                                                                                  2. 1

                                                                                                    Shoot for the moon, because even if you miss you…

                                                                                                    Cheesy cliches aside, I agree with you that with “make invalid states unrepresentable” there is a tipping point into complexity. When you start using Peano numbers you might have gone too far. However, I intended it in a much broader sense than types. If you squint, normalisation is just a way of making invalid states unrepresentable.

                                                                                                    1. 2

                                                                                                      Actually, you don’t have to squint hard at all. DB normalization is all about that.

                                                                                                    2. 1

                                                                                                      On one project using sqlite, I decided to take CJ Date at his word.

                                                                                                      I created a data dictionary table. Every field name, it’s type and meaning was in there.

                                                                                                      I rigidly stuck to that.

                                                                                                      The tables were fully normalised.

                                                                                                      There were no nulls anywhere ever.

                                                                                                      I only used natural joins.

                                                                                                      The resulting sql was orders of magnitude simpler, faster and more understandable and I haven’t had issues with it for years.

                                                                                                      I’d love to see this code.

                                                                                                      1. 1

                                                                                                        Unfortunately it doesn’t stand alone, but is a debug and analysis tool for the obscurer innards of a much much larger proprietary system.

                                                                                                        Another “oddity” which isn’t an oddity from a relational point of view…

                                                                                                        No auto increment keys anywhere. If the table doesn’t have a clear primary key, you haven’t thought your data model through.

                                                                                                    1. 4

                                                                                                      Using HOFs amidst business logic needlessly complicates the code. It’s exactly like inheritance, makes your code hard to follow, hard to reason about, hard to debug, hard to experiment with.

                                                                                                      That’s the only mention of the word inheritance in the body of the text. It’s also characterized as “extraneous openness and indirection”. Generally, these are just negative qualities that might be ascribed to inheritance, not the concept itself.

                                                                                                      I happen to agree inheritance is a bit of a misfeature, but the reasoning should be clear and specific. This article seems mostly to say “HOFs can be used in a confusing way”.

                                                                                                      I agree with that, too. HOFs operate at a higher level of abstraction than 1st order functions. It can be harder to reason at higher orders. This is one of the more direct places where types are really quite useful.

                                                                                                      But this isn’t an argument that HOFs are like inheritance. It’s not even an argument that either inheritance or HOFs are necessarily causal of “hard to follow, hard to reason about, hard to debug, hard to experiment with”. The only place any assertion like that shows up is in a single specific example and one generalization

                                                                                                      usually code ends up structured in a way where [the HOF] makes a lot of setup before calling [its argument function], and recreating all that setup takes effort

                                                                                                      So all this criticism aside, I think there’s some truth to the assertion. And there’s real pain in inheritance and HOFs both being capable of causing difficulty. It’s even, as far as I’m concerned, true that you can model inheritance by taking classes as defining factory functions and inheritance as having those factory functions have access to super-class factory functions: inheritance is a HOF!

                                                                                                      But there’s a lot going on behind the scenes to get to this point.

                                                                                                      1. 16

                                                                                                        Not sure who told the author of this piece that security by obscurity is bad, but what I have always heard is that security through obscurity is simply not to be relied upon. It’s not that you shouldn’t do it, but you should assume it will be defeated.

                                                                                                        So if you want to change your SSH port, fine, but don’t leave password authentication enabled and go thinking you’re safe

                                                                                                        1. 5

                                                                                                          “Security by obscurity is bad” is the line that is parroted by many who don’t understand.

                                                                                                          1. 1

                                                                                                            There seems to be a consensus that “6!x8GWqufk-EL6tv_A4.E” is a stronger password than “letmein”. The only significant difference I see between these passwords is obscurity.

                                                                                                            I wonder if this can be considered an example of “security by obscurity” that is widely considered neither “bad” nor likely to be defeated?

                                                                                                            1. 20

                                                                                                              There’s a long history of distinguishing obscure information like passwords or cryptographic keys from obscure methods like encryption algorithms. The key difference, I think, is that the only purpose of the secret information is to be secret, and you can measure its properties in that respect; that’s not true of code that’s meant to be secret, and competing requirements like “needs to run on someone else’s machine” make obscurity an unreliable crutch in many situations.

                                                                                                              EDIT: Another key difference is that “obscurity” can be taken as “the information is still present in whatever the adversary can access, it’s just harder to read”, e.g. obfuscated source code in a JavaScript file. That’s also different from a secret like a password, which should be protected by not exposing it at all.

                                                                                                              Like most maxims, “Security through obscurity is bad” is an oversimplification, but in my opinion it’s a good rule of thumb to be disregarded only when you know what you’re doing.

                                                                                                              1. 3

                                                                                                                I think the “security by obscurity is bad” aphorism is quite a bit narrower than the original meaning: security by algorithmic obscurity is bad because one has to presume that a motivated attacker will be able to identify or acquire the algorithm. Therefore, any additional security from algorithmic obscurity is ephemeral, and sacrifices the very real benefit of allowing the cryptographic community to examine the algorithm for weakness (since weaknesses are often non-obvious, especially to the creator). As such, one could say that it’s a corollary to one of Kerchoff’s principles (rephrased by Shannon as simply [assume that] “the enemy knows the system”).

                                                                                                                The aphorism has been adopted by those lacking the technical knowledge to understand the full meaning and generalized further than it should be.

                                                                                                                The artificial distinction between “secrecy” (which is necessary to protect the key) and “obscurity” (which is generally used to apply to the system) is most important to understanding the aphorism and unfortunately the distinction appears non-obvious to the layman and leads to confusion.

                                                                                                                Edit: Ugh, just realized that this is essentially paraphrasing an old Robert Graham blog post: Also corrected a sentence in which I nonsensically used “security” in place of “obscurity.”

                                                                                                                1. 2

                                                                                                                  That definition makes sense and clears up something I had been wondering about for a long time. Thanks!

                                                                                                                2. 3

                                                                                                                  I think to rectify these definitions you need to have an idea of the system under test. The system expects, takes in, comments on the quality of its inputs and is required, when assumptions are satisfied, to produce trusted output.

                                                                                                                  Security by obscurity says that the system is more difficult to break if the adversary doesn’t know what it is. This is generally true, it at least adds research costs to the adversary and may even substantially increase the effort required to make an attack.

                                                                                                                  The general maxim is that security by obscurity should not be relied upon. In other words, you should have confidence that your system is still reliable even in the circumstance where your adversary knows everything about it.

                                                                                                                  So, ultimately, the quality of the password isn’t really about the system. The system could, for instance, choose to reject bad passwords and improve its quality. The adversary knowing about the system now knows not to test a certain subset of weak passwords (no chance of success) but the system is still defensible.

                                                                                                                  1. 2

                                                                                                                    The difference is not only obscurity; it’s (quantifiable) cryptographic strength.

                                                                                                                    Your website uses 256-bit AES, because it’s impossible to brute-force without using more energy than is contained in our solar system. You wouldn’t use 64-bit AES, though. Is the difference that the former algorithm’s key is more obscure?

                                                                                                                    1. 1

                                                                                                                      An obscure system will be understood, and therefore cracked if its only advantage was obscurity. Passphrase-protected crypto systems are not obscure. Their operation is laid open for all to see, including what they do with passwords. If you can go from that to cracking specific cryptexts, that’s a flaw everyone will admit. However, if you must skip the system entirely and beat a passphrase out of someone in order to break the cryptext, that’s no flaw of the system under discussion. It might be a flaw of some larger system, but I believe it is universally acknowledged that, if you’re beating a passphrase out of someone and will only stop when you get the information you’re looking for or you kill the person you’re beating, the person will almost certainly give the passphrase before they die.

                                                                                                                  1. 5

                                                                                                                    Incomplete list of things that are not strings:

                                                                                                                    • Password

                                                                                                                    This is the least obvious one to me, and I notice it’s the only one for which you didn’t give examples of typed representations. Do you know of any?

                                                                                                                    1. 6

                                                                                                                      I don’t quite agree they’re not strings. They are strings at least from user perspective. However, they would benefit from a type that isn’t a generic string:

                                                                                                                      • hashing uses bytes, and bytes depend on encoding, so you should be consistent with that (e.g. always hash NFC-normalized UTF-8 bytes)
                                                                                                                      • you don’t want passwords get printed in logs or data dumps. A non-printable container could help with that.
                                                                                                                      • for extra level of paranoia you may want to zero out memory when the password object is freed.
                                                                                                                      1. 5

                                                                                                                        In Haskell

                                                                                                                        newtype Password = Password String

                                                                                                                        in other words, it’s simply a different type with an identical representation, String.

                                                                                                                        Why does that matter? In my opinion, you should treat passwords as mostly opaque identifiers. One possible design thought experiment is “Should Password support length operations?”

                                                                                                                        • Pro: yes, it should, because we must validate the length of passwords
                                                                                                                        • Con: no, it shouldn’t, because we should validate the length of a string representation of the password prior to legally converting it into a password which is now opaque

                                                                                                                        Both feel reasonable, slightly different styles. There are other possible paths here too “No, Password should only support entropy evaluations”. But in any case, we can discuss how String and Password differ.

                                                                                                                        1. 1

                                                                                                                          Yeah I was slightly confused by this one too. My best guess is that passwords are subject to restrictions (length, requiring non alphanumeric characters, etc.) that a plain string isn’t.

                                                                                                                          1. 7

                                                                                                                            Passwords cannot be safely compared for equality using string functions; you can run into timing attacks if you do.

                                                                                                                            1. 6

                                                                                                                              Not that you should ever have to compare the plaintext of a password..