1. 8

    Wouldn’t this be pretty easy to do without mutability?

      1.  

        Which of those list implementations is immutable?


        I don’t remember reading the PSA about how bad lists are the first time I went through this. To me it stinks of rationalization. There’s a large space of possible data structures out there, and Rust’s borrow checking can’t represent a big chunk of them – including such examples as binary trees with parent pointers. Are they all things we shouldn’t be teaching students of programming?

        1. 12

          The decision procedure seems straight-forward to me:

          1. If you want a tree structure with parent pointers in Rust, see if someone has written a generic data structure that will suit your needs. If not, continue.
          2. Implement the tree structure yourself. If you need to express shared ownership, then use a reference counted type. If this is too annoying or has too much performance overhead, continue.
          3. Use raw pointers and unsafe to express your parent pointers, just like you’d do in C. Spend the time to make sure you’ve gotten it right and expose a safe interface at a module boundary. Enjoy the benefits of using safe code elsewhere. If this doesn’t work for whatever reason, continue.
          4. Compile time borrow checking might not be suitable for your use case. Don’t use Rust.

          I don’t know exactly what would cause someone to go from 3 to 4, although I imagine there are legitimate reasons. Annoyance could be one. I don’t really understand that one too much myself though, unless you for some reason can’t find a way to hide unsafe behind a safe API (which you absolutely should be able to do with a binary tree).

          Maybe the future holds more sophisticated borrow checking. But this is what we have for now. If you think expounding on trade offs is equivalent to “stinking of rationalization,” then so be it. But it seems entirely appropriate to me.

          There is of course a separate issue where many Rust programmers will suggest that one should not implement these data structures themselves. My “rationalization” for this is a conflation of target audience. If you’re just wanting to get a taste for Rust and you want to try it out by implementing a doubly linked list, then people are going to rightfully steer you away from that because it might not be the best way to dip your toes in. If, however, you have a specific use case and know that’s what you need and existing generic data structures aren’t suitable for legitimate reasons, then receiving the aforementioned advice sounds downright patronizing. But that doesn’t make it wrong in every case.

          Common questions have common answers, and not everyone is discerning enough to know when the common answer is inappropriate. We shouldn’t hold that against them. Use it as an opportunity to educate (or, sometimes, delightfully, be educated, because everyone is wrong once in a while).

          1.  

            I’ll add to your excellent analysis that, at 4, one might just use an external tool to verify the unsafe Rust. There’s quite a few tools, esp for C, that can check that either partly or totally. They require their own expertise. They can be hard to use or have serious limitations. One must also be careful about mismatches between meaning of the Rust code and the “equivalent” version. However, they might be a better option than no mechanical checking on unsafe part or no use of Rust at all in a given project.

            “Maybe the future holds more sophisticated borrow checking.”

            Definitely. The CompSci folks are working on building more flexible models all the time. Most I saw were building on things like linear types rather than affine or using functional languages. There’s at least potential if these people start doing similar things to Rust’s model. Even if main language doesn’t adopt it, an extension for an unsafe piece of Rust or something Rust can call with FFI would still be beneficial.

            1.  

              I think that’s totally reasonable. In the future, programmers might figure out how stuff that is only doing in unsafe Rust can actually be done in safe Rust. I don’t think the full implications of borrow checkers has been worked out. As it’s better understood, some of these data structures will be revisited. Rust is still useful in the interim time.

              1.  

                Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                I think “this is a limitation of borrow checking” is a perfectly great statement. Where it edges into rationalization is when it says “you shouldn’t need this” as the above link does. Or “if you need this you’re stupid or doing something wrong”, as the above link kinda implies. I see this in the Go community as well, where conversation often jumps from “we don’t know how to do generics well” to “you shouldn’t need generics”. This isn’t from the leaders, but it does get said a lot. (As an ex-Lisp hand I’m reminded of all the times we said “you shouldn’t need this” to noobs, and how it affected Lisp’s adoption.)

                The worldview I’m approaching Rust from is to determine if it can replace C or Java (or other system languages that expose pointers) to the point where someone can go through their career without needing to use those older languages (modulo interop scenarios). That makes me care more about learning situations even if they are rarely relevant in “the real world”.

                1. 5

                  Sure. It’s a classic hedge. If you’re writing material and you want to address common mistakes, then these sorts of “don’t do these things” rules make sense, especially if they are prone to overuse. For example, if a lot of beginners are picking up Rust and struggling because they’re trying to write linked lists, then we can either:

                  1. Make the process of writing linked lists easier.
                  2. Nudge them in another direction.

                  The presumption being here that those of us who aren’t beginners in Rust will know when to break these rules. We could probably be more explicit about this in more areas, but it’s hard to be 100% precise all of the time. And you certainly can’t control what others say either. There’s likely a phenomenon at play here too, one that I’ve heard described as “there is nothing like the zeal of the newly converted.” :-)

                  Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                  To clarify here, because I think this is important, but a key part of Rust’s value proposition isn’t necessarily that you never need to use unsafe, but rather, when you do, it’s usually possible to hide it behind a safe abstraction. That means that when a memory violation bug occurs, you know exactly where to look.

                  Of course, that isn’t the complete story. If you legitimately ended up needing to use unsafe in a ton of places, then that would weaken the aforementioned value proposition. The key is striking a balance, and one part of learning Rust is learning when to use unsafe and when not to. It can be tricky, but that’s why a good first approximation is “don’t use unsafe.” :-) The classic uses of unsafe in my experience are:

                  1. FFI.
                  2. In the implementation of generic data structures that must have as little overhead as possible. These uses are probably the most difficult form of unsafe to get right, partially because of generics. (The nomincon is instructive here.)
                  3. Getting around checks like bounds checks, doing unaligned loads/stores, avoiding redundant UTF-8 checks, etc. These are as hard or as easy to get right as in C.
                  1.  

                    The tragic thing here is that that page is awesome from an information content perspective. It’s just a matter of how the parts of the argument are phrased. Compare:

                    “Here’s how you do linked lists in Rust. Be warned, it’s going to be a little klunky compared to other languages, but fortunately you’ll rarely need it in the real world, and there’s often better alternatives.”

                    with:

                    “Linked lists are awful, but you’re twisting my arm so here’s how you do them in Rust.”

                    What’s nice about the first approach is that it shortcuts the whole cycle of mutual accusations of defensiveness that often happen in these situations. Because really nobody is being defensive, we’re just wondering internally if the other side is being defensive. I learned programming using linked lists, and if you tell me they suck it doesn’t trigger defensiveness. My identity isn’t wrapped up in using linked lists. What it triggers is skepticism. And it’s totally unnecessary and distracting.

                  2.  

                    There are plenty of benefits to writing Rust instead of C even if you typed every single line of code within an unsafe { } block. Rust has a lot of ergonomic improvements compared to C and C++ (my personal favorite is a type system which supports good algebraic data types).

                    It’s also worth mentioning that the reason the unsafe keyword exists in Rust is precisely because there are lots of useful things you might want to do in a program that will violate the borrow checking rules, and you need some way to sidestep it on occasion. The fact that the keyword is named “unsafe” gives people pause when using it - which is normally good, because you should think carefully about writing code that you know can’t be automatically checked for memory safety - but that doesn’t mean that it’s wrong to write a Rust program that uses unsafe blocks, even if you are a beginner.

                    If I want an doubly-linked list in Rust, I can in 12 lines of code do:

                     struct List<T> {
                           item: T,
                          next: Option<*mut List<T>>,
                          prev: Option<*mut List<T>>
                      }
                      
                      fn main() {
                          let mut first: List<i32> = List { item: 1, next: None, prev: None };
                          let mut second: List<i32> = List { item: 5000, next: None, prev: Some(&mut first)
                         };
                          first.next = Some(&mut second);
                          println!("first item: {}", first.item);
                          println!("second item: {}", unsafe {(*first.next.unwrap()).item});
                          println!("first item again: {}", unsafe {(*second.prev.unwrap()).item});
                    
                     }
                    

                    and this will compile with no errors and have exactly the same behavior as the equivalent C program. It’s unsafe of course - and the fact that you have to use unsafe blocks is a good sign that you should think about whether this is a good way to write this program, or at least that you should be very, very careful when writing it. But you can do it. Even if you are a beginner to Rust.

                    1.  

                      Thanks, I hadn’t really considered using unsafe, under the assumption that if I need to use unsafe I might as well use C. But it sounds like there are benefits here. I’ll dig deeper into this.

                      It’s a long-time practice in safe, systems languages to support escape hatches to do what the safety mechanisms won’t allow to be done. This included writing OS kernels in Pascal, PL/S, Oberon, and Ada. The developers of such languages realized that most of an app doesn’t have to be done unsafely by default. So, they’re usually safe by default. Then, if you need unsafety, you can put it in a specific module that turns off one or more safety features just for that module. Compiler still automatically does static or dynamic checks for everything else. This focuses the mind on the most dangerous modules when looking for low-level problems. Finally, the common practice was to wrap the unsafe code in function calls that (a) may do input validation to ensure the unsafe code receives sane input and/or (b) had its own checkable rules for being called safely in rest of the safe code. Some went further to have formal specifications for correct use of all code with the unsafe code’s logical correctness checked there with language-level correctness checked by eye and code-analysis tools.

                      So, there’s the big picture of how this has been done going back to the 1960’s with Burroughs doing an ALGOL CPU/OS combo. It consistently worked, too, if you look at what caused most crashes and security bulletins. At one point, when hardware was expensive and slow, one source I had said Burroughs even offered to turn off the safety checks for their customers for performance boost. The customers said no: didn’t want the headaches. Software is a lot more complex today on much faster machines. Might as well keep the good practice. :)

            1. 2

              One possible strategy is design it to fail during testing on anything not in the contract. I mean, obviously want to be cautious about doing this at all or how one does it. I just remember some people using it. Netflix’s Simian Army or other fault-injection of distributed systems are examples where reliable, ordered messaging might be assumed in code despite no contract or implementation for it.

              1. 4

                It’s hard to test for stuff that’s not in the contract but is in the code. Let me tell you a story of something that happened to me yesterday.

                I have a script that updates Mu on my students’ servers. However, my students sometimes modify Mu themselves (they are encouraged to do so). So the way my script works is that it runs rsync -n (dry run), shows me what would happen, then asks me for confirmation that all is well before performing a real rsync. If I see changes from my students, I cancel the operation and perform a more manual merge.

                This workflow has been pretty much unchanged for a couple of years. The only change was that I switched the script to use /bin/sh a year or so ago, as part of a recent kick to minimize dependencies. And everything continued to work fine. A month ago I upgraded Ubuntu on my machine to 16.04, and yesterday (teaching has been slow recently because life) I ran my script for the first time after the upgrade – and it ran the rsync without prompting me at all. A little digging showed that my approach of waiting for a prompt by running just read was a violation of Posix, and on Ubuntu Xenial /bin/sh now hews closer to the letter of Posix, raising this error:

                read: arg count
                

                read needs to pass a variable to save the input in, or read _ to not save the input.

                I’m not sure what the lesson is here. My bias is to think contracts are shit, because people don’t read contracts. But maybe this is a learning experience to change my mind.

                1. 1

                  Remember that contracts a la DbC in a language with good tooling can be runtime checks or generate tests to ensure they’re being used. They can ignore the contracts but the check you leave in won’t ignore the bad input.

                  Some things, esp in build systems, need human review to catch, though. What you described seems to be a side effect of UNIX style of composing programs without contracts in a mix of unsafe and informal languages. The problems that came from such stuff are why stuff like contracts were invented and deployed.

                  1. 1

                    Oh I see, by “contract” you mean “formal contract”. I think the point of OP is that it’s impossible to enumerate the “intended contract” in all particulars as a “formal contract”. Because if you could, you’d just make those scenarios well-defined.

                    More rigorous and formal languages will make these corner cases rarer, but they don’t actually obsolete OP. They just push up the “sufficient number of users”.

                    1. 1

                      Hmm, on reflection my comment is bad. OP conflates Hyrum’s law with XKCD 1172 and I was mindlessly following along, but really they’re separate scenarios. The XKCD is about unintended uses of a piece of code. Hyrum’s law is about intended divergence between code and some spec. Between the two Hyrum’s law is actually easier. If you have a spec it is possible in principle to catch violations as you surmise. There’s just the question of what the costs are. The XKCD is however about violations you didn’t even know you cared about. Hyrum’s law is about known unknowns. XKCD 1172 is about unknown unknowns.

                      1. 1

                        “Oh I see, by “contract” you mean “formal contract”.”

                        I mean both depending on what we’re talking about. It was originally API but can also be formal contracts. The thing is that unspecified or poorly-specified things won’t be checked by default. So, you gotta mandate they get checked, make them fail in ways that nobody relies on them versus correct behavior, or deal with them plus what people are doing yourself. These are a few possibilities that come to mind.

                        1. 1

                          I wrote a DNS packet decoder library and to ensure safety, I check every bit of the incoming packet. There’s one bit left undefined (no RFC defines it as far as I know), and if it’s not 0, I reject the packet. Am I too intolerant of the DNS packet contract?

                          1. 1

                            I can’t give you a right answer to how to handle stuff in Internet protocols given what users will demand and implementations will do is so out of most of us’ control. Internet standards are a much bigger problem than an API for your personal project or commercial product. I will tell you some things that came to mind reading the question:

                            1. The stance I have on API’s is pro formal specification. That’s because just formally specifying things has caught problems. Comparing output of random implementations with executable, formal specifications also caught problems. In your article, you report that your code that follows the spec… that leans toward an executable specification… caught problems in other implementations. That’s a valuable thing that matches the prediction.

                            2. There’s a middle ground that says you can accept something but log it to analyze situation further. What you find might lead to changes in your spec if not the true one. Also, maybe run it with more checks on or isolation as your protocol engine goes through it. In micro and separation kernel schemes, there’s often simple functionality that’s trusted due to its well-vetted spec/implementation. Everything else got re-routed through user-mode components whose output is validated for sanity. Any explosions the unusual features cause will be contained.

                            3. Postel’s law. If app need to integrate well with 3rd parties’ apps, then you might need to be able to accept any crap the products around you do just to succeed financially or with uptake if FOSS. Alternatively, your solution might be replacing another that the users built a lot of code on that expects the out-of-spec behavior of the original a bit. For FOSS, the struggles of OpenOffice to make headway in a Microsoft world designed for lock-in to existing doc files are a perfect example. On protocol side, another is comparing behavior of your protocol implementation against many others to either create a superset spec containing all of it or a series of profiles the user can select based on what their internal environment expects. People without legacy baggage get your most robust version with others taking on as much risk as they already chose to.

                            So, those are the three things that come to mind reading your question followed by your article. On DNS side, I think people commonly try to copy whatever behavior is accepted by most popular clients with least trouble with middleboxes. I have no idea what that set is, though. I’d imagine it changes over time, too, where you’d have to constantly run tests probably with help of other vendors or their customers, too.

                1. 8

                  Personally I trust Russ Cox’s judgement… though I could see how people who worked on ‘dep’ would be furious. Go has a reputation for taking community direction with a grain of salt. The go team certainly is not afraid to do unpopular things in the goal of simplicity.

                  1. 7

                    I am reminded of this comment from Russ, which I think explains rather a lot: https://news.ycombinator.com/item?id=4535977

                      1. 1

                        vgo is certainly different than dep, and in some ways it’s simpler, but in other ways it pushes a lot more complexity on the user. I think on balance it’s got to be a wash, at least for now.

                        1.  

                          What are the complexities pushed onto the users?

                      1. 2

                        Maybe a dumb questions, but in semver what is the point of the third digit? A change is either backwards compatible, or it is not. To me that means only the first two digits do anything useful? What am I missing?

                        It seems like the openbsd libc is versioned as major.minor for the same reason.

                        1. 9

                          Minor version is backwards compatible. Patch level is both forwards and backwards compatible.

                          1. 2

                            Thanks! I somehow didn’t know this for years until I wrote a blog post airing my ignorance.

                          2. 1

                            PATCH version when you make backwards-compatible bug fixes See: https://semver.org

                            1. 1

                              I still don’t understand what the purpose of the PATCH version is? If minor versions are backwards compatible, what is the point of adding a third version number?

                              1. 3

                                They want a difference between new functionality (that doesn’t break anything) and a bug fix.

                                I.e. if it was only X.Y, then when you add a new function, but don’t break anything.. do you change Y or do you change X? If you change X, then you are saying I broke stuff, so clearly changing X for a new feature is a bad idea. So you change Y, but if you look at just the Y change, you don’t know if it was a bug-fix, or if it was some new function/feature they added. You have to go read the changelog/release notes, etc. to find out.

                                with the 3 levels, you know if a new feature was added or if it was only a bug fix.

                                Clearly just X.Y is enough. But the semver people clearly wanted that differentiation, they wanted to be able to , by looking only at the version #, know if there was a new feature added or not.

                                1. 1

                                  To show that there was any change at all.

                                  Imagine you don’t use sha1’s or git, this would show that there was a new release.

                                  1. 1

                                    But why can’t you just increment the minor version in that case? a bug fix is also backwards compatible.

                                    1. 5

                                      Imagine you have authored a library, and have released two versions of it, 1.2.0 and 1.3.0. You find out there’s a security vulnerability. What do you do?

                                      You could release 1.4.0 to fix it. But, maybe you haven’t finished what you planned to be in 1.4.0 yet. Maybe that’s acceptable, maybe not.

                                      Some users using 1.2.0 may want the security fix, but also do not want to upgrade to 1.3.0 yet for various reasons. Maybe they only upgrade so often. Maybe they have another library that requires 1.2.0 explicitly, through poor constraints or for some other reason.

                                      In this scenario, releasing a 1.2.1 and a 1.3.1, containing the fixes for each release, is an option.

                                      1. 2

                                        It sort of makes sense but if minor versions were truly backwards compatible I can’t see a reason why you would ever want to hold back. Minor and patch seem to me to be the concept just one has a higher risk level.

                                        1. 4

                                          Perhaps a better definition is library minor version changes may expose functionality to end users you did not intend as an application author.

                                          1. 2

                                            I think it’s exactly a risk management decision. More change means more risk, even if it was intended to be benign.

                                            1. 2

                                              Without the patch version it makes it much harder to plan future versions and the features included in those versions. For example, if I define a milestone saying that 1.4.0 will have new feature X, but I have to put a bug fix release out for 1.3.0, it makes more sense that the bug fix is 1.3.1 rather than 1.4.0 so I can continue to refer to the planned version as 1.4.0 and don’t have to change everything which refers to that version.

                                    2.  

                                      I remember seeing a talk by Rich Hickey where he criticized the use of semantic versioning as fundamentally flawed. I don’t remember his exact arguments, but have sem ver proponents grappled effectively with them? Should the Go team be wary of adopting sem ver? Have they considered alternatives?

                                      1.  

                                        I didn’t watch the talk yet, but my understanding of his argument was “never break backwards compatibility.” This is basically the same as new major versions, but instead requiring you to give a new name for a new major version. I don’t inherently disagree, but it doesn’t really seem like some grand deathblow to the idea of semver to me.

                                        1.  

                                          IME, semver itself is fundamentally flawed because humans are the deciders of the new version number and we are bad at it. I don’t know how many times I’ve gotten into a discussion with someone where they didn’t want to increase the major because they thought high major’s looked bad. Maybe at some point it can be automated, but I’ve had plenty of minor version updates that were not backwards compatible, same for patch versions. Or, what’s happened to me in Rust multiple times, is the minor version of a package incremented but the new feature depends on a newer version of the compiler, so it is backwards breaking in terms of compiling. I like the idea of a versioning scheme that lets you tell the chronology of versions but I’ve found semver to work right up until it doesn’t and it’s always a pain. I advocate pinning all deps in a project.

                                          1.  

                                            It’s impossible for computers to automate. For one, semver doesn’t define what “breaking” means. For two, the only way that a computer could fully understand if something is breaking or not would be to encode all behavior in the type system. Most languages aren’t equipped to do that.

                                            Elm has tools to do at least a minimal kind of check here. Rust has one too, though not widely as used.

                                            . I advocate pinning all deps in a project.

                                            That’s what lockfiles give you, without the downsides of doing it manually.

                                  1. 11

                                    Dijkstra may have seen this too. Look at the tail end of this quote.

                                    “APL is a mistake, carried through to perfection. It is the language of the future for the programming techniques of the past: it creates a new generation of coding bums.” - Edsger W.Dijkstra

                                      1. 10

                                        Dijkstra is the battle rapper of computer scientists

                                        1. 2

                                          Imagining him battling Alan Perlis.

                                    1. 6

                                      Important for anyone considering trying this out on Linux: you’ll have serious issues if you install this.

                                      I think it’s irresponsible that an issue this serious has been open since November, without the author highlighting the danger prominently in the Readme or somewhere.

                                      1. 1

                                        Looks like a fix-ish went in 10 hours ago: https://github.com/cknadler/vim-anywhere/pull/68

                                      1. 5

                                        It seems odd to spend so much space discussing the complexities of software development, only to conclude that the answer is empiricism. Surely the number of variables and their non-linear effects make experimentation difficult. I think $10M is an extremely tiny fraction of what a reproducible experiment would cost, and it would take a long time to run. You need a huge scale in number of projects and also lengthy longitudinal studies of their long-term impacts. And after you did all that the gigantic experiment would be practically certain to perturb what it’s trying to measure: programmer behavior. Because no two projects I’m on get the same me. I change, mostly by accumulating scar tissue.

                                        Empiricism works in a tiny subset of situations where the variables have been first cleaned up into orthogonal components. I think in this case we have to wait for the right perspective to find us. We can’t just throw money at the problem.

                                        1. 6

                                          What else can we do? Our reason is fallible, our experiences are deceitful, and we can’t just throw our hands up and say “we’ll never know”. Empiricism is hard and expensive, but at least we know it works. It’s gotten us results about things like n-version programming and COCOMO and TDD and formal methods. What would you propose we do instead?

                                          1. 4

                                            Empiricism is by no means the only thing that works. Other things that work: case studies, taxonomies, trial and error with motivation and perseverance. Other things, I’m sure. All of these things work some of the time – including empiricism. It’s not like there’s some excluded middle between alchemy and science. Seriously, check out that link in my comment above.

                                            I’m skeptical that we have empirically sound results about any of the things you mentioned, particularly TDD and formal methods. Pointers? For formal methods I find some of @nickpsecurity’s links kinda persuasive. On some mornings. But those are usually case studies.

                                            Questions like “static or dynamic typing” are deep in not-even-wrong territory. Using empiricism to try to answer them is like a blind man in a dark room looking for a black cat that isn’t there.

                                            Even “programming” as a field of endeavor strikes me as a false category. Try to understand that domain you’re interested in well enough to automate it. Try a few times and you’ll get better at it – in this one domain. Leave the task of generalization across domains to future generations. Maybe we’ll eventually find that some orthogonal axis of generalization works much better. Programming in domain X is like ___ in domain X more than it is like programming in domain Y.

                                            You ask “what else is there?” I respond in the spirit of Sherlock Holmes: “when you have excluded the impossible, whatever is left, however unlikely, is closer to the answer.” So focus on your core idea that the number of variables is huge, and loosen your grip on empiricism. See where that leads you.

                                            1. 5

                                              I think we’re actually on the same page here. I consider taxonomies, ethnographies, case studies, histories, and even surveys as empirical. It’s not just double blind clinical studies: as Making Software put it, qualitative findings are just as important.

                                              I reject the idea that these kinds of questions are “not even wrong”, though. There’s no reason to think programming is any more special than the rest of human knowledge.

                                              1. 2

                                                Ah ok. If by empiricism you mean, “try to observe what works and do more of that”, sure. But does that really seem worth saying?

                                                It can be hard psychologically to describe a problem well in an article and then not suggest a solution. But sometimes that may be the best we can do.

                                                I agree that programming is not any more special than the rest of human knowledge. That’s why I claim these questions are not even wrong. Future generations will say, “sure static typing is better than dynamic typing by about 0.0001% on average, but why did the ancients spend so much time on that?” Consider how we consider ancient philosophers who worried at silly questions like whether truth comes from reason or the senses.

                                                Basically no field of human endeavor had discovered the important questions to ask in its first century of existence. We should spend more time doing and finding new questions to ask, and less time trying to generalize the narrow answers we discover.

                                                1. 3

                                                  Ah ok. If by empiricism you mean, “try to observe what works and do more of that”, sure. But does that really seem worth saying?

                                                  It’s not quite that. It’s all about learning the values and limitations of all the forms of knowledge-collection. What it means to do a case study and how that differs from a controlled study, where ethnographies are useful, etc. It’s not “observe what works and do more of that”, it’s “systematically collect information on what works and understand how we collect and interpret it.”

                                                  Critically in that is that the information we collect by using “reason” alone is minimal and often faulty, but it’s how almost everybody interprets software. That and appealing to authority, really.

                                                  Basically no field of human endeavor had discovered the important questions to ask in its first century of existence. We should spend more time doing and finding new questions to ask, and less time trying to generalize the narrow answers we discover.

                                                  The difference is that we’ve already given software control over the whole world. Everything is managed with software. It guides our flights and runs our power grid. Algorithms decide whether people go to jail or go free. Sure, maybe code will look radically different in a hundred years, but right now it’s here and present and we have to understand it.

                                                  1. 2

                                                    It is fascinating that we care about the same long-term problem but prioritize sub-goals so differently. Can you give an example of a more important question than static vs dynamic typing that you want to help answer by systematically collecting more information?

                                                    Yes, we have to deal with the code that’s here and present. My answer is to reduce scale rather than increase it. Don’t try to get better at running large software projects. Run more small projects; those are non-linearly more tractable. Gradually reduce the amount of code we rely on, and encourage more people to understand the code that’s left. A great example is the OpenBSD team’s response to Heartbleed. That seems far more direct an attack on existing problems than any experiments I can think of. Experiments seem insufficiently urgent, because they grow non-linearly more intractable with scale, while small-scale experiments don’t buy you much: if you don’t control for all variables you’re still stuck using “reason”.

                                                    1. 2

                                                      Can you give an example of a more important question than static vs dynamic typing that you want to help answer by systematically collecting more information?

                                                      Sure. Just off the top of my head:

                                                      • How much does planning ahead improve error rate? What impacts, if any, does agile have?
                                                      • What are the main causes of cascading critical failures in systems, and what can we do about them?
                                                      • When it comes to maximizing correctness, how much do intrinsic language features matter vs processes?
                                                      • I don’t like pair programming. Should I be doing it anyway?
                                                      • How do we audit ML code?
                                                      • How much do comments help? How much does documentation help?
                                                      • Is goto actually harmful?

                                                      Obviously each of these have ambiguity and plenty of subquestions in them. The important thing is to consider them things we can investigate, and that investigating them is important.

                                                      1. 0

                                                        Faced with Cthulhu, you’re trying to measure how the tips of His tentacles move. But sometimes you’re conflating multiple tentacles! Fthagn!

                                                        1. 1

                                                          As if you can measure tentacles in non-Euclidean space without going mad…

                                                          1. 1

                                                            Now you’re just being rude. I made a good faith effort to answer all of your questions and you keep condescending to me and insulting me. I respect that you disagree with me, but you don’t have to be an asshole about it.

                                                            1. 2

                                                              Not my intention at all! I’ll have to think about why this allegory came across as rude. (I was more worried about skirting the edge when I said, “is that really worth talking about?”) I think you’re misguided, but I’m also aware that I’m pushing the less likely theory. It’s been fun chatting with you precisely because you’re trying to steelman conventional wisdom (relative to my more outre idea), and I find a lot to agree with. Under it all I’ve been hoping that somebody will convince me to return to the herd so I can stop wasting my life. Anyway, I’ll stop bothering you. Thanks for the post and the stimulating conversation.

                                                    2. 2

                                                      “try to observe what works and do more of that”

                                                      That is worth saying, because it can easily get lost when you’re in the trenches at your job, and can be easy to forget.

                                                2. 2

                                                  What else can we do?

                                                  If describing in terms of philosophies, then there’s also reductionism and logic. The hardware field turning analog into digital lego’s and Oberon/Forth/Simula for software come to mind for that. Maybe model-driven engineering. They break software into fundamental primitives that are well-understood which then compose into more complex things. This knocks out tons of problems but not all.

                                                  Then, there’s logical school that I’m always posting about as akkartik said where you encode what you want, the success/failure conditions, how you’re achieving them, and prove you are. Memory safety, basic forms of concurrency safety, and type systems in general can be done this way. Two of those have eliminated entire classes of defects in enterprise and FOSS software using such languages. CVE list indicates the trial-and-error approach didn’t works as well. ;) Failure detection/recovery algorithms, done as protocols, can be used to maintain reliability in all kinds of problematic systems. Model-checking and proof has been most cost-effective in finding protocol errors, esp deep ones. Everything being done with formal methods also falls into this category. Just highlighting high impact stuff. Meyer’s Eiffel Method might be said to combine reductionism (language design/style) and logic (contracts). Cleanroom, too. Experimental evidence from case studies showed Cleanroom was very low defect, even on first use by amateurs.

                                                  Googled a list of philosophies. Let’s see. There’s the capitalism school that says the bugs are OK if profitable. The existentialists say it only matters if you think it does. The phenomenologists say it’s more about how you perceived the failure from the color of the screen to the smell of the fire in the datacenter. The emergentists say throw college grads at the problem until something comes out of it. The theologists might say God blessed their OS to be perfect with criticism not allowed. The skeptics are increasingly skeptical of the value of this comment. The… I wonder if it’s useful to look at it in light of philosophy at all given where this is going so far. ;)

                                                  I look at it like this. We have most of what we want out of a combo of intuition, trial-and-error, logic, and peer review. This is a combo of individuals’ irrational activities with rational activity on generating and review side of ideas. I say apply it all with empirical techniques used to basically just catch nonsense from errors, bias, deception, etc. The important thing for me is whether something is working for what problems at what effort. If it works, I don’t care at all whether there’s studies about it with enough statistical algorithms or jargon used in them. However, at least how they’re tested and vetted… the evidence they work… should have rigor of some kind. I also prefer ideological diversity and financial independence in reviewers to reduce collusion problem science doesn’t address enough. A perfectly-empirical study with 100,000+ data points refusing my logic that Windows is insecure is less trustworthy when the people who wrote it are Microsoft employees wanting a NDA for the data they used, eh?

                                                  I’ll throw out another example that illustrates it nicely: CompCert. That compiler is an outlier where it proves little to nothing about formal verification in general most empiricists might tell you. Partly true. Skepticism’s followers might add we can’t prove that this problem and only this problem was the one they could express correctly with logic if they weren’t misguided or lying to begin with. ;) Well, they use logical school of specifying stuff they prove is true. We know from testing vs formal verification analysis that testing or trial-and-error can’t ensure the invariants due to state space exploration. Even that is a mathematical/logical claim because otherwise you gotta test it haha. The prior work with many formal methods indicate they reduce defects a lot in a wide range of software at high cost with simplicity of software required. Those generalizations have evidence. The logical methods seem to work within some constraints. CompCert pushes those methods into new territory in specification but reuses logical system that worked before. Can we trust claim? Csmith throws CPU years of testing against it and other compilers. Its defect rate bottoms out, mainly spec errors, unlike about any compiler ever tested that I’ve seen in the literature. That matches prediction of logical side where errors in proven components about what’s proven should be rare to nonexistent.

                                                  So, the empirical methods prove certain logical systems work in specific ways like ensuring proof is at least as good as the specs. We should be able to reuse logical systems proven to work to do what they’re proven to be good at. We can put less testing into components developed that way when resources are constrained. Each time something truly new is done like that we review and test the heck out of it. Otherwise, we leverage it since things that logically work for all inputs to do specific things will work for the next input with high confidence since we vetted the logical system itself already. Logically or empirically, we can therefore trust methods ground in logic as another tool. Composable, black boxes connected in logical ways plus rigorous testing/analysis of the box and composition methods are main ways I advocate doing both programming and verification. You can keep applying those concepts over and over regardless of tools or paradigms you’re using. Well, so far in what I’ve seen anyway…

                                                  @derek-jones, tag you’re it! Or, I figure you might have some input on this topic as a devout empiricist. :)

                                                  1. 2

                                                    Googled a list of philosophies. Let’s see. There’s the capitalism school that says the bugs are OK if profitable. The existentialists say it only matters if you think it does. The phenomenologists say it’s more about how you perceived the failure from the color of the screen to the smell of the fire in the datacenter. The emergentists say throw college grads at the problem until something comes out of it. The theologists might say God blessed their OS to be perfect with criticism not allowed. The skeptics are increasingly skeptical of the value of this comment. The… I wonder if it’s useful to look at it in light of philosophy at all given where this is going so far. ;)

                                                    Awesome, hilarious paragraph.

                                                    We have most of what we want out of a combo of intuition, trial-and-error, logic, and peer review. This is a combo of individuals’ irrational activities with rational activity on generating and review side of ideas. I say apply it all with empirical techniques used to basically just catch nonsense from errors, bias, deception, etc. The important thing for me is whether something is working for what problems at what effort. If it works, I don’t care at all whether there’s studies about it with enough statistical algorithms or jargon used in them.

                                                    Yes, totally agreed.

                                                  2. 2

                                                    What else can we do? Our reason is fallible, our experiences are deceitful, and we can’t just throw our hands up and say “we’ll never know”.

                                                    Why not? The only place these arguments are questioned or really matter is when it comes to making money and software has ridiculous margins, so maybe it’s just fine not knowing. I know high-risk activities like writing airplane and spaceship code matter, but those folks seem to not really have much contention about about if their methods work. It’s us folks writing irrelevant web services that get all uppity about these things.

                                                1. 13

                                                  Fails to deliver on the promise of an unthinkable thought.

                                                  1. 4

                                                    The author seems to be thinking that the Smalltalk programme had some merits that the current PLT programme doesn’t, which I find unthinkable. So it delivered as far as I’m concerned.

                                                    1. 3

                                                      Most tantalizing question was:

                                                      ..when and why did we start calling programming languages “languages”?

                                                      1. 3

                                                        That seems neither unthinkable nor unanswerable. I mean, I don’t know the answer off hand, but there’s a finite number of papers one can read to find out.

                                                        1. 1

                                                          Yup. I wasn’t disagreeing with you.

                                                        2. 1

                                                          The same reason earlier formal languages like predicate logic are called languages.

                                                          1. 1

                                                            Uhh, citation required?

                                                            1. 2

                                                              Do you not think computer languages are formal languages? Do I need a citation if I say English is a natural language?

                                                              1. 1

                                                                Ah, that link/term is helpful. Thanks!

                                                                I’m sorry I’m annoying you.

                                                                Do I need a citation if I say English is a natural language?

                                                                No, but the whole point under discussion is why our terminology connects formal languages with natural languages. When did the term “formal language” come to be? The history section in your Wikipedia link above mentions what the term can be applied to, but not when the term was coined. Was it coined by Chomsky at the dawn the field of mathematical linguistics? That’s not before computers, in which case the causality isn’t quite as clear and obvious as you make it sound.

                                                                I’ll stop responding now, assuming you don’t find this as interesting as I do.

                                                                Edit: wait, clicking out from your profile I learn that you are in fact a linguist! In which case I take it back, I’m curious to hear what you know about the history of Mathematical Linguistics.

                                                                1. 3

                                                                  Was it coined by Chomsky at the dawn the field of mathematical linguistics?

                                                                  It’s at least older than that. The term “formal language theory” in the sense of regular languages, context-free grammars etc. does date to Chomsky. But the idea that one might want to invent a kind of “formal” language for expressing propositions that’s more precise than natural languages is older. One important figure making that argument was Gottlob Frege, who was also an early user of the term (I’m not sure if he actually coined it). He wrote an 1879 book entitled Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens, which you could translate as something like, Concept-Script, a formal language modeled on that of arithmetic, for pure thought.

                                                                  1. 1

                                                                    Thanks a lot for that elaboration on the Frege link!

                                                                  2. 2

                                                                    In general they’re all languages because they have a syntax (certain combinations are ‘ungrammatical’ or produce interpreter/compiler errors) and a (combinatorial) semantics (the basic symbols have meaning and there are rules for deriving the meaning of [syntactic] combinations of symbols).

                                                                    Formal languages go back at least to Frege’s Begriffsschrift of 1879, which isn’t before Babbage described the Analytical Engine (1837) but certainly before digital computers. And there are precursors like Boole’s Logic and Leibniz also worked on something of the same sort, and there are yet earlier things like John Wilkins’ “philosophical language” and other notions of a similar kind.

                                                                    For modern linguistic work on semantics, the work of Richard Montague is perhaps the most important, and there are connections to computer science from very early on - Montague employs Church’s lambda calculus (from the 1930s) which also underlies Lisp.

                                                          2. 1

                                                            Nothing that extreme, no, but I’d say it delivers on questioning rarely-questioned assumptions.

                                                            1. 1

                                                              How would you know? Maybe you simply failed to think it.

                                                            1. 2

                                                              It’s been running for a couple of minutes now, but I guess it’s working.

                                                              Very interesting language!

                                                              Edit: Took ~15 minutes to get to a REPL?!

                                                              1. 2

                                                                Thanks for trying this out and reporting back! Which file in precompiled/ did you pass it?

                                                                If you just want its (FlpcForth) REPL, then ./flpc with no arguments should work. Nothing is defined in that case, of course. You can try to paste in some source from precompiled/*.

                                                                The sample sources in precompiled/ are all for parsing (some of) its own source and/or generating its own source. That’s why it takes so long (although even that shouldn’t…).

                                                                There’s no FlpcPython REPL yet (it can parse FlpcPython but doesn’t know what to do with the resulting AST). So right now, to write FlpcPython, I suggest writing to a file, say foo.flpc, and running

                                                                python compiler.py lib/stage{0,1{a,c,d},3{a,b}}.flpc foo.flpc > foo.f
                                                                ./flpc foo.f
                                                                

                                                                This uses the external Python compiler which will eventually be removed.

                                                                1. 1

                                                                  Thanks. I ran precompiled/flpc-all.f. Oh, I just noticed that you recommended flpc-gen.f above :/

                                                              1. 4

                                                                This is very clever.

                                                                However, this seems like solving the symptom rather than the cause: if the command and/or history of commands was important enough, shouldn’t a more rigorous approach towards provisioning be adopted? Or even command aliases?

                                                                Almost every time I’ve had to go looking through my (ba|z)sh history, it’s been indicative of a failure in my own processes, whether they be for remotely administering servers, or even my own personal machine.

                                                                1. 2

                                                                  It depends on your usage patterns. If you’re doing the same workflows over and over again more process can help. But a complex command you ran one time six months ago is best captured in command history.

                                                                  Command history is the place where all automation should begin. In the spirit of YAGNI, don’t create a script until you run the commands manually three times.

                                                                  Command history is actually a good source of things to automate, if you periodically try to look for patterns.

                                                                1. 3

                                                                  Here’s what I’ve been doing for 15 years. In .bashrc:

                                                                  unset HISTFILE HISTFILESIZE
                                                                  export HISTSIZE=1000000
                                                                  export HIST_LOGFILE="$HOME/.log/shell/$(date +"%Y")/$(date +"%m-%d-%H-%M-%S")-$(hostname)-$(tty |perl -pwe 's,/,_,g')"
                                                                  mkdir -p $(dirname $HIST_LOGFILE)
                                                                  export HISTTIMEFORMAT="%Y-%m-%d-%H-%M-%S "
                                                                  trap "history -w $HIST_LOGFILE" EXIT
                                                                  

                                                                  Sometime in the past I switched to Zsh and started saving complete timestamps for each command. In .zshrc:

                                                                  setopt extended_history hist_find_no_dups
                                                                  unset HISTFILE
                                                                  export SAVEHIST=1000000
                                                                  export HISTSIZE=$SAVEHIST
                                                                  export HIST_LOGFILE=$HOME/.log/shell/`date +"%Y"`/$(date +"%m-%d-%H-%M-%S")-$(hostname)-$(test $TMUX && echo "tmux-")$(tty |perl -pwe 's,/,_,g')
                                                                  mkdir -p $(dirname $HIST_LOGFILE)
                                                                  save_history() {
                                                                    fc -l -n -t "%Y-%m-%d %H:%M:%S" -D -i 1 >! $HIST_LOGFILE.tmp && mv $HIST_LOGFILE.tmp $HIST_LOGFILE  # -d doesn't include seconds
                                                                  }
                                                                  precmd_functions+=(save_history)
                                                                  history() {
                                                                    fc -l -n $* 1
                                                                  }
                                                                  

                                                                  Basically I turn off the default location for history to keep windows from clobbering each other. Each session gets its own private history file. To search for commands I use grep. To combine history from servers I periodically rsync. Everything is always private to guard against the password issue.

                                                                  My priority is maintaining an audit trail of what I did in experiments. So YMMV. But if this works for you, it’s a lot fewer moving parts than OP.

                                                                  1. 2

                                                                    I just do this with zsh, it handles merging multiple histories in parallel with a few set options. I’ve never really wanted to combine history between servers however. Just have common history amongst shells. I prefer sharing the history as I end up in one shell and want to use a command I just typed in another.

                                                                    export HISTSIZE=5000
                                                                    export SAVEHIST=${HISTSIZE}
                                                                    export HISTFILE=~/.zsh_history
                                                                    setopt append_history
                                                                    setopt extended_history
                                                                    setopt hist_reduce_blanks
                                                                    setopt hist_no_store
                                                                    setopt hist_ignore_dups
                                                                    setopt histignorespace
                                                                    setopt share_history
                                                                    setopt inc_append_history
                                                                    

                                                                    I could probably sort out saving a history file a day but really can’t be arsed to bother, don’t see much value in that.

                                                                  1. 2

                                                                    Huh, I always assumed the initial 0 was to indicate an octal base.

                                                                    1. 3

                                                                      I’m still looking for a test harness that doesn’t need me to explicitly call each test/suite in main. My current approach is a simple-minded code-generation. Is there a way to do this that avoids autogenerating files and whatnot?

                                                                      1. 3

                                                                        There’s a couple of ways I can imagine that would be possible. Currently, each top-level describe generates a function; I could have a global array of function pointers, and use the __COUNTER__ macro to automatically insert describe‘s functions into that array. However, that would mean that the length of the array would have to be static. It probably wouldn’t be too bad though if it was configurable by defining a macro before including the library, and defaulting the length to something like 1024, though.

                                                                        Another solution would be to not have these top-level describes, and instead have a macro called testsuite or something, which generates a main function. This would mean that, if your test suite is in multiple files, you’d have to be very careful what you have in those files, because they would be included from a function body, but it would be doable.

                                                                        I think the first approach would be the best. You could then also have a runtests() macro which loops from 0 through __COUNTER__ - 2 and runs all the tests.

                                                                        1. 1

                                                                          That’s a great idea. Thanks!

                                                                          1. 2

                                                                            An update: the first solution will be much harder than I expected, because you can’t in C do things like foo[0] = bar outside of a function. That means you can’t assign the function pointer to the array in the describe macro. If you could append to a macro frow within a macro, you could have a macro which describe appends to which, when invoked, just calls all the functions created by describe, but there doesn’t seem to be any way to append to a macro from within a macro (though we can get close; using push_macro and pop_macro in _Pragma, it would be possible to append to a macro, but not from within another macro).

                                                                            It would still be possible to call the functions something deterministic (say test_##__COUNTER__), and then, in the main function, use dlopen on argv[0], and then loop from i=0 to i=__COUNTER__-2 and use dlsym to find the symbol named "_test_$i" and call it… but that’s not something I want to do in Snow, because that sounds a little too crazy :P

                                                                            1. 1

                                                                              I appreciate the update. Yes, that would be too crazy for my taste as well. (As is your second idea above.)

                                                                              1. 1

                                                                                FWIW, you can do this by placing the function pointer in a custom linker section with linker-inserted begin/end symbols; unfortunately, that requires your user to use a custom linker script, which will be annoying for them.

                                                                          1. 6

                                                                            This is a pretty nice justification for the whole conventional belief system that leads to “abstraction”, “coupling vs cohesion”, “SOLID”, etc. But that whole memeplex is dangerously misguided. It has a kernel of truth, but it’s incomplete, and it’s dangerous because the vast majority of programmers seem to get brainwashed by it until they’re blind to anything else.


                                                                            To see how it’s misleading, let’s focus on this sentence:

                                                                            ..a triangle is actually very highly interconnected for the number of nodes it has..

                                                                            This is a telling sentence, because it highlights that this entire notion of ‘complexity’ is really about density of complexity. Which means that if you just took your triangle and started spliting the vertices into lots of nodes, so that it has lots of nodes along each edge of the triangle, the whole would seem a lot less complex. But you haven’t really made the whole any less complex, you’ve just spread the complexity out into a smooth thin layer. The inherent complexity of the system remains. It’s just harder to find; randomly selected nodes/modules seem simple, and you have to jump through a lot of hoops to find the ‘complex node’ that invariably gets the lion’s share of updates.

                                                                            The whole thing takes me back to my childhood years, when I would shove my stuff into my closet to try to convince my mom I’d cleaned up my room.

                                                                            Abstraction is useful, and it’s useful to think about the right places to draw module boundaries. But as a rule of thumb, mistrust anybody who tries to pontificate about the difference between simplicity and complexity merely by making reference to module/interface boundaries, without any attention to what the system does.


                                                                            So much for tearing down somebody else who in fairness wrote very elegantly indeed. Can I be constructive instead? I’ll suggest an alternative memeplex for thinking about complexity that gets a lot less attention than it should. Complexity comes from the state space of inputs a system needs to handle, and the number of regimes this state space gets broken down into. (Imagine something like the characteristics of a transistor.) The more regimes a state space has, the more intrinsically complex the domain is.

                                                                            If we could demarcate this state space, and make the regimes in the state space explicit in our representation of the system – rather than implicit in lines of code as happens today – I think we’d make far larger strides in controlling complexity than any number of attempts at redrawing internal boundaries between sub-systems. In particular, we’d be able to detect over-engineering and architecture astronomy: they would be situations where a code has significantly higher complexity than the domain it’s trying to address.

                                                                            I think such a representation has to start at (no surprises for people here who’ve heard me ranting before) tests. Tests are great because they’re like a Fourier transform applied to the conventional representation of a program as code. Instead of focusing on what to do at each point in time for a program, looking at a program as a collection of tests tells you how the entire trajectory of processing needs to go for the major regimes in the input space. Tests allow you to get past what your current program does, and think about the essential things any program has to do to achieve the same ends.

                                                                            (I’ve written about this idea before. Hopefully this latest attempt is clearer.)

                                                                            1. 2

                                                                              I’ll add the state space is my favorite way of looking at complexity because it’s actual values/actions of your software. Different implementations will have different state spaces to assess complexity. Different techniques for reducing complexity can show they do with the state space reductions. Either actual reductions or what it lets us ignore in an analysis for correctness.

                                                                              1. 2

                                                                                I like the idea of thinking in terms of a state space! I have a couple of questions:

                                                                                • Aren’t there really three spaces/sets involved: the set of inputs, the set of possible programs that handle such inputs, and the set of internal program states?
                                                                                • I noticed you say in the linked post that tests are the only way to delineate boundaries in the state of programs. What about formal verification methods?
                                                                                1. 1

                                                                                  Sorry I thought I’d responded to this. I’m glad to see more thinking about state spaces of different stripes.

                                                                                  I overstated my case. Formal methods may well be an alternative. I just don’t know much about them.

                                                                                2. 2

                                                                                  Can you define what you mean by “regime” here?

                                                                                  1. 1

                                                                                    It’s meaning 1b at Merriam -Webster:

                                                                                    a regular pattern of occurrence or action (as of seasonal rainfall)

                                                                                    I see now that the figure of transistor characteristics I linked to above refers to active and saturation “regions”. I seem to recall learning them as regimes in undergrad in India.

                                                                                    Basically it’s a subset of the input space for which the program behavior is the same. How you define “same” is flexible. One definition would be identical control flow paths through the program. Alternatively you can think of timing attacks on encryption schemes as exposing regimes in the underlying algorithm.

                                                                                  2. 1

                                                                                    Whether the triangle has three nodes, or many along each edge, it still has the same number of loops: 1. That’s perhaps a more important observation. A graph that’s easy to color is probably a graph with few loops.

                                                                                    1. 1

                                                                                      Loops might be part of it, but imagine a ‘necklace’ of loops. Each loop can be 2-coloured, and you can join the links at points with the same colour, so you can have as many loops as you like and still only need 2 colours.

                                                                                    1. 3

                                                                                      Also in response to the article, two different people wrote color-identifiers-mode and rainbow-identifiers-mode for emacs.

                                                                                    1. 2

                                                                                      Has anybody else seen issues with Duplicity running out of memory? I’ve been backing up ~100GB without issues on a machine with 8GB RAM.

                                                                                      1. 2

                                                                                        Just a note – your scenario is nearly two orders of magnitude off of what I was observing (100GB vs 600GB data, and 8GB vs 1GB ram), and I didn’t see the error at first (did a full backup, and several incremental additions before it ran into trouble). If it’s linear (for software that’s been around so long, I would be surprised if it were worse than that), then for you to run into the same problem you would need an archive that is about 4.8TB.

                                                                                        1. 1

                                                                                          Thanks!

                                                                                          Do you happen to remember what block size you used? The default of 25MB is way too small, and I believe signatures for all blocks are required to reside in memory.

                                                                                          1. 2

                                                                                            I just looked at the config file I had (for the duply frontend), and there wasn’t anything for block size, so presumably it was set to the default? That could explain it, though I think there is a little more going on, unless signatures for all individual files also are in memory (as if it were just on every 25MB block, 600GB is only about 25,000, and each signature would have to take up about 36K of memory for it to use up the 900+MB that I was seeing).

                                                                                            1. 1

                                                                                              I just downloaded the duply script from http://duply.net, and it does look like the default is 25MB. One correction to my comment: the terminology is “volume” (--volsize) not “block”.

                                                                                              You’re right that it would have to be saving more metadata per volume for my hypothesis to bear out.

                                                                                          2. 1

                                                                                            Can you elaborate on what operations were running out of memory?

                                                                                            1. 2

                                                                                              The actual operation was it unpacking the signatures file from a previous backup (and it retried it I think 4 times, each time running for a while, gradually using up all available memory, before giving up). I think I was just trying to make a new incremental backup. I had made one full backup and several incremental backups, and had just added a bunch of new files and was trying to make a new incremental backup.

                                                                                              1. 1

                                                                                                Was it on a new machine or anything like that? I’m wondering if I should retry a backup after blowing away my signature cache.

                                                                                                Thanks a lot for answering these questions! A potential issue with my backups is extremely worrying.

                                                                                                1. 2

                                                                                                  Nope, on the same machine, but it had been wiped and reinstalled at least once (so it’s possible that library versions had changed, and perhaps the memory efficiency of some of them got slightly worse). It’s pretty confusing, because previous incremental backups had worked. The upside with duplicity is that in a catastrophe, you pretty much don’t even need duplicity itself to restore (tar and gpg and a bunch of tedium should do it). :)

                                                                                        1. 20

                                                                                          The GDPR will probably cause a lot of headaches in the business but I’m sure it’ll help EU startups to flourish compared to US competition from the outside, especially since it’s easier to start with compliance than the retrofit it.

                                                                                          1. 2

                                                                                            That’s a useful refactoring, to view it as about protectionism rather than privacy. (And I don’t mean that negatively. I’m influenced by Andy Grove that protectionism is sometimes necessary.)

                                                                                            1. 2

                                                                                              Especially in the current world with national branch companies and international tax evasion schemes, I often feel the world could do with a little more protectionism.

                                                                                          1. 2

                                                                                            My archives show long stretches of using Firefox’s Scrapbook Autosave, which would save every page I visited. But there are also multiple interruptions caused by the extension breaking, or by (the last time) Firefox breaking it.

                                                                                            1. 1

                                                                                              Thanks for mentioning this Firefox add-on.

                                                                                            1. 2

                                                                                              I can’t decide if a press release is spam or just offtopic.

                                                                                              1. 1

                                                                                                I can’t decide if it’s a press release for Etsy or Google Cloud.

                                                                                                1. 0

                                                                                                  Yes.