1. 5

    I don’t have much experience in this area, but I’d be interested in an overview of how different pieces of sofware handle the concurrent / multiplayer editing problem, like:

    So is the gist of it that OT relies on central servers and they all use OT rather than CRDT? That was not entirely clear to me.

    Looking at the xi analysis, it sounds like “IME” is specific to a desktop application using X (?), so it doesn’t apply to any of this web software.

    The rest of them seem like they do apply?

    Is the problem that you have to pay a “CRDT tax” for every piece of state in the application? I thought the same was true of OT. Don’t you have to express every piece of state within those constraints too?

    repl.it doesn’t seem to have a problem with syntax highlighting or brace matching (try it, it’s pretty slick). So is it just that they paid that tax with a lot of code or is there something else fundamentally different about xi vs. repl.it ? Or maybe xi is going for a lot lower latency than web-based editors?

    recent thread about OT vs. CRDT that might be interesting: https://news.ycombinator.com/item?id=18191867

    (copy of HN comment)

    1. 6

      I responded on HN also, but the short answer is that you care about Input Method Editing in some form any time you’re not dealing with English.

      With brace matching, there’s no real reason for it to be slow, especially if you limit yourself to small documents. It’s much simpler to do it synchronously rather than deal with the potential for edits to conflict.

      1. 3

        OK, thanks for the clarification!

        On the second point, you’re saying that you can “cheat” for brace matching and do it synchronously instead of asynchronously? I guess can cheat for any feature, but it increases latency and hurts user experience?

        1. 7

          I’m not sure I would consider it cheating, in fact this is a complex tradeoff.

          I would summarize the lesson learned as: asynchronous might be worth it for things that are actually slow. Otherwise, probably not. Brace matching is not an inherently slow task, though to do it precisely depends on having an accurate parser for at least a subset of the language. It’s also unbounded, which was my motivation for making it async. If you edit a 100MB file with brackets around the contents and need to find the match, that might take a while.

          Syntax highlighting is somewhere in the middle. I think regex-based highlighting is somewhere in the ballpark of a couple thousand lines per second. This means, on a largish document, if you synchronously wait for re-highlighting, you will be facing noticeable lag. I consider that a problem that needs to be solved. But regex-based is almost certainly not the global best; I think parser combinator based approaches are promising and there’s an experiment on that in the xi-editor tree. But the advantage of regex-based is that there’s a huge library of syntax definitions for hundreds of different languages.

          Now, something that is actually slow is semantic analysis, like the various Language Server implementations. Those can and should be made to go as fast as possible (the Roslyn work is excellent), but at the end of the day you can’t guarantee that can be made to keep up with typing speed. So it is an absolute requirement for a code editor that it be able to handle language server annotations asynchronously, ideally without having artifacts like showing the annotation on the wrong span of text if it is being concurrently edited.

          Hope that helps.

          1. 5

            Yes thanks, that framing helps a lot! I can see that an editor performs many computations on your constantly updated buffer, and some of them are a lot more expensive than others.

            Is it correct to say that “synchronous” in this context means: do the whole computation on a single editor buffer state, disregarding any updates you received while the computation is being done? After you do the computation, you can update the screen with text changes, but then that leads to user-perceived lag. And then you potentially need to compute the whole thing synchronously all over again?

            So in the brace match case, once you type {, you can’t get any updates until you potentially parsed the whole file and find the ending } ?

            On the other hand, asynchronous means that if you get an update while doing the computation, you can somehow take it into account? I am not clear on how that happens. I guess that is why you have to rewrite every single computation to work on the CRDT, because you don’t want to redo the entire thing from scratch if you receive an update in the middle, and you don’t want to cause lag?


            Though I’m still confused by the relationship between these two issues:

            1. Asynchronous vs. synchronous (if I have the definitions above right)

            2. Whole-file batch computation of vs. incremental recomputation (of say syntax highlighting)

            I guess the question is: can’t you synchronously wait for an incremental computation rather than a whole-file computation?

            For example, I agree regex-based highlighting isn’t ideal. Have you seen Treesitter? It’s based on GLR parsing and does incremental syntax highlighting. Github is using it in the Atom editor and for a few server side things.

            This video is very well motivated:

            Tree-sitter - a new parsing system for programming tools” by Max Brunsfeld

            One of the complaints, which I agree with, is that lexical syntax highlighting is not all that useful. For example, it would be nice if types and variables were consistently highlighted, but that requires parsing, not just lexing. He gives some good examples.

            So even though Treesitter uses a more powerful model (GLR, along with some “lexer hacks”), it can still do things incrementally – i.e. not re-highlighting the entire file when you insert a few characters. I don’t recall exactly how the algorithm works at the moment, but it’s well explained in the video. He gives a citation to someone’s Ph.D. thesis.

            Some more comments on treesitter [1].

            Basically, doesn’t having an incremental algorithm for syntax highlighting “offload” some of the work so you don’t have to do it on the CRDT to be fast? Or am I misunderstanding?

            It’s not exactly clear to me how you would do incremental brace matching, but it seems like Treesitter solves a “superset” of that problem.


            [1] I’m pretty impressed with Treesitter because it sounds like they have worked through many of the problems you hit in “real” languages. A lot of grammar-based approaches work in theory but not in practice, but it sounds like they did a lot of work to handle the real world situations.

            I have written a bunch on my blog about the limitations of grammar-based approaches, many of them linked from How to Parse Shell Like a Programming Language

            [2] Other speculation: In the limit, I wonder if you could use something like the Skip language to say write a semantic analyzer?

            http://skiplang.com/

            It’s basically a programming language that enforces that every algorithm you write can be made incremental. As I understand it, while most of these kinds of things are limited functional languages, Skip goes to some lengths to give you an imperative style of programming, so you can express more things naturally. I haven’t used it but I peeked at the source code.

            1. 7

              Lemme try to answer, there isn’t really one question in there.

              Yes, of course we’ve seen Treesitter. It’s interesting work for sure. There are other interesting incremental computing frameworks too, like Salsa from Rust-land.

              Incremental and asynchronous are two complementary approaches to the same problem of reducing latency. The syntax highlighting in xi-editor (see rope science 11) is both. To say it briefly, an incremental approach reduces the amount of work by avoiding recomputation of already known quantities, while async by itself doesn’t reduce work, but lets you get back to typing or scrolling or whatever before the work is actually done.

              There’s a third approach, which is making the work so fast you don’t care. That’s what I’ve done for markdown parsing, and there’s no reason you couldn’t do similar approach for syntax highlighting, autoindent, and all those other code-related tasks.

      2. 4

        So is the gist of it that OT relies on central servers and they all use OT rather than CRDT? That was not entirely clear to me.

        You can do OT p2p. In another lifetime I used to work on Groove which was p2p and based on OT, but collaborative document editing was not its main focus. The article actually makes the less ambitious claim:

        To do OT well, you need to elect a centralized server, …

        [emphasis added]

        1. 1

          That’s correct, and there’s a much more nuanced discussion of exactly that point in followup discussion in the github issue.

        2. 3

          From what I’ve seen, Etherpad (or at least, etherpad-lite) has a central server and it just takes changes from each person and diff’s them together. I think occasionally it decides there’s an authoritative revision and future revisions go based off that. It’s not actually terribly complicated. Of course, it also doesn’t scale that well and needs a central server, but that makes life a lot simpler than dealing with P2P stuffs.

        1. 11

          No mention of the xpi direct link that was posted everywhere? That worked on e.g. Firefox for Android even when studies didn’t? Is there a reason we’re still not talking about this method? Why is it only studies or dot release upgrade?

          1. 3

            I agree that a mention of it would have been nice, but when working at Mozilla’s scale I think it makes sense to focus on only solutions that can be deployed automatically. I doubt even 1% of users clicked on that direct link, or read any of the mozilla communications about the bug.

            1. 10

              The initial communications included “if you’ve turned studies off, turn them back on and wait six hours” which would have been a great place to say “or click this link to install the intermediate cert xpi now”. There’s a peculiar fight club style silence bubble around it.

              1. 2

                I assume they don’t want to teach people to install XPIs from websites other than AMO?

                1. 1

                  Although I did read that that xpi was signed by AMO, otherwise Firefox would have rejected it.

                2. 1

                  That and how it happened in general as you said on HN. This is really weird for a web company with recent focus on privacy that has hundreds of millions of dollars depending on screw ups like this not happening. I previously speculated that their management barely invests in security vs what they could be doing. They don’t care enough. Adding this to the list.

              2. 1

                I read that solution wouldn’t work on vanilla Firefox, just the unstable nightly builds or something, so I didn’t even bother trying it. Did it work on vanilla Firefox for you?

                1. 1

                  This is different then the about:config setting change to disable checking. This is the xpi that the study installed with the updated certs. (Yes, it worked fine.)

              1. 4

                One challenge is having good detection for when your model is starting to fail. Incidentally, if it’s not too onerous to be relabeling data regularly, you can train a random-walk model that tries to learn this decay-rate jointly with your model parameters which now become a function over time. https://twiecki.io/blog/2017/03/14/random-walk-deep-net/

                1. 1

                  That’s a solid idea. I’ll take a read of that soon :)

                  1. 1

                    Is it a challenge, though? If you can’t already measure the downstream success of your model (and therefore easily measure a regression in that metric) why would you bother?

                    For example at my job we used ML to predict when a person would be likely to click a button and we preload the data. This makes the performance of the UI better but doesn’t cause too much of a resource regression because we’re not generating and throwing away the UI too much. A way of detecting that the model was regressing would be to monitor wasted resources (caused by an increase in false positive predictions) or a regression in UI loading time (caused by an increase in false negative predictions).

                    1. 1

                      That’s good for detection, but estimation of the decay would help in prediction.

                      1. 1

                        Makes sense!

                      2. 1

                        I’d argue detecting false positives in an automated way is equivalent to the original prediction task. Likely that detection is not automated or at least somewhat expensive to perform at the same frequency as the prediction task you are monitoring.

                        1. 1

                          I’m not sure what you mean, can you elaborate? Detecting a false positive in my case is as simple as knowing a positive prediction did not lead to a button click. Tracking the aggregate number of these and detecting a deviation is trivial.

                          1. 1

                            Sure. I was thinking about settings like spam classification and say recommending someone for a loan. Settings where having a bad classifier can’t be measured immediately at the decision point but only by spikes in certain actions later (spikes in emails being deleted unread / rate of loan default).

                    1. 1

                      Excellent write-up. Trimming allocations in the middle of the heap would benefit certain applications and hurt others, though. You’re possibly making it less likely to be able to get contiguous memory blocks in the future by fragmenting the heap, and it only benefits applications where you’re (a) running lots of long-lived processes that have highly variable working sets and (b) you’re actually memory constrained as a result. If you don’t have both these conditions the existing behavior where you asymptotically approach allocating slightly more than your peak working set will probably perform better.

                      1. 7

                        Did we really call the accessibility tag a11y?

                        1. 16

                          FWIW, a11y is a fairly industry-standard term for accessibility, since it both serves as a short moniker (a la i18n for internationalization), and evokes being an “ally” for issues relating to accessibility.

                          1. 3

                            Accessibility is often abbreviated as the numeronym a11y, where the number 11 refers to the number of letters omitted. This parallels the abbreviations of internationalization and localization as i18n and l10n respectively.

                            https://en.wikipedia.org/wiki/Computer_accessibility

                            1. 3

                              The number of times I have to type I18n in our app makes me glad this happened.

                              1. 2

                                Programmers really are the worst. :p

                                1. 4

                                  Alternatively, it is nice to think of it as being an “ally” to people who use the web differently than we do. It’s cute, easy to type, and overall I don’t think it’s that hard to understand. I do see the concern that to people outside the industry it may not make sense, I just feel like on a highly technical site like lobsters it’s not as big of a concern. The tag description does say “accessibility” for what it’s worth.

                              2. 2

                                You know, I never questioned whether that was appropriate until now…

                                1. 4

                                  Wait, I’m confused (or just ignorant) here - why wouldn’t it be? Is it different from shortening internationalization to i18n?

                                  1. 5

                                    It’s the same idea, but it’s kind of not very accessible in that it excludes people because it needs explanation.

                                    1. 4

                                      Every field has jargon, and the cost of not having jargon is being unable to talk about that field’s ideas at all. I support Ed Kmett’s approach: use only jargon you understand and always be prepared to explain things to onboard newbies.

                                      1. 2

                                        Yeah, I mean, I realize that this term is fairly well-known. It’s a field that’s about people’s lives, so it does have a higher standard to meet than, say, type theory. I suspect that in practice, this abbreviation isn’t a problem. I was just a bit surprised at myself that I never questioned it before. I still don’t even know how it sounds in a screen reader, since I don’t have one set up to test with.

                                        1. 3

                                          I still don’t even know how it sounds in a screen reader, since I don’t have one set up to test with.

                                          Just tested with Windows Narrator, it pronounces it “ay-eleven-why”.

                              1. 2

                                Thanks for the excellent write-up. In case anyone else needs it, Gitlab offers a free easy to use docker container registry that I have found convenient for a similar application.

                                1. 1

                                  How did you make use of this?

                                  1. 1

                                    I built alpine-based docker images for a few websites, then pushed to the authenticated Gitlab docker registry. On the target host I had simple automation to docker pull the images & start the containers. Very much like the main article.

                                    1. 1

                                      Ah neat. Do you know about Docker Machine? I’m curious how people who are pushing to multiple servers handle that and if there are reasons to prefer one approach over the other. I’m still just playing with this at home on one server.

                                1. 2

                                  Is there a way to push Docker images directly to a remote host thus avoiding an external registry?

                                  1. 5

                                    docker save myimage > myimage.tar and then scp or whatever

                                    1. 3

                                      You can create a local Dockerfile and then build and use that Dockerfile’s image without a registry at all. The built image gets added to your local machines image pool. See https://docs.docker.com/compose/reference/build/

                                      This tripped me up at first when using docker-compose as well (unnecessarily publishing images to Docker Hub).

                                      1. 1

                                        There might be. As I was writing the section on the registry up I remembered that you can point the docker tools at remote machines so maybe it’s possible to push from one to the other.

                                      1. 2

                                        The biggest lessons I’ve learned in all my years of software development:

                                        1. Test the shit out of everything. Experimentation-heavy development is the only way to develop good software. Invest heavily in experimentation tooling. ALWAYS EXPERIMENT and when you think you shouldn’t you’re almost certainly wrong.
                                        2. Experimentation will sometimes drive you to a local maximum† and you need to just put the experiments away and blow everything up.

                                        † For example, because:

                                        1. you didn’t control for good performance and let performance die from 1000 engagement-improving cuts or
                                        2. the industry leapfrogged you in some way that you can’t incrementally get to fast enough, etc
                                        1. 9

                                          If you’d asked me five minutes ago what network filesystem would be the best way to share files between Windows and Linux, 9P would not have been in my top five.

                                          Can other 9P clients access WSL environments this way? Can Windows Explorer connect to other 9P hosts?

                                          1. 2

                                            I’m curious to hear your top 5 (or whatever).

                                            1. 1
                                              1. SMB (obviously)
                                              2. WebDAV
                                              3. FTP
                                              4. regular HTTP
                                              5. SFTP

                                              SFTP goes last because unike the others, I don’t think Windows comes with a client by default… although in this crazy WSL era, who knows?

                                              1. 5
                                                C:\Users\Calvin>sftp
                                                usage: sftp [-46aCfpqrv] [-B buffer_size] [-b batchfile] [-c cipher]
                                                          [-D sftp_server_path] [-F ssh_config] [-i identity_file] [-l limit]
                                                          [-o ssh_option] [-P port] [-R num_requests] [-S program]
                                                          [-s subsystem | sftp_server] destination
                                                
                                                 Directory of C:\Windows\system32\openssh
                                                
                                                2018-09-15  05:09 AM    <DIR>          .
                                                2018-09-15  05:09 AM    <DIR>          ..
                                                2018-09-15  05:09 AM           322,560 scp.exe
                                                2018-09-15  05:09 AM           390,144 sftp.exe
                                                2018-09-15  05:09 AM           491,520 ssh-add.exe
                                                2018-09-15  05:09 AM           384,512 ssh-agent.exe
                                                2018-09-15  05:09 AM           637,952 ssh-keygen.exe
                                                2018-09-15  05:09 AM           530,432 ssh-keyscan.exe
                                                2018-09-15  05:09 AM           882,688 ssh.exe
                                                
                                            2. 2

                                              I thought you were joking, until I read the link for myself and did a spit-take.

                                              Based on the wording in the article and the linked article about socket use, I suspect that unmodified 9p clients won’t be able to directly use this. They might have just added some 9p extensions(like Erlang on Xen to support WSL, or they might have more drastic(but “legal 9p”) modifications like Styx on a Brick

                                              Still though, this made my day!

                                            1. 17

                                              if only there was a browser that was committed to the open web

                                              1. 5

                                                I use Firefox on GNU and Android, which puts me in a tiny minority. See my comment at top level for why I brought up this post.

                                                1. 10

                                                  My frustration with firefox was their support of DRM/EME is a serious violation to the idea of an open web

                                                  1. 16

                                                    I agree that the EME is a horrible thing for the open web. However, I think a strong Firefox is one of the most important things for the open web, and people would’ve been switching even faster from Firefox to Chrome if Chrome was the only way to play Netflix/HBO/whatever.

                                                    At least they implemented it in the best way possible; AFAIK, Firefox ships with no closed source EME blobs, just the infrastructure, and the blobs aren’t downloaded before the user requests it.

                                                    1. 7

                                                      I agree but if this is our biggest frustration with Firefox it is in good shape.

                                                      1. 2

                                                        there’s also the ssl pushing and deferring to google to say which sites are dangerous

                                                      2. 4

                                                        You have to pick your battles DRM wasn’t one that could be won right away. Firefox waited out for a long time after all the other browsers added drm. It did not cause drm to be stopped, it just caused everyone to leave firefox to watch videos.

                                                        If firefox keeps its users it can use its power to win other battles, right now they are adding tracker blocking and stopping autoplay. If they stuck to having no drm they probably wouldn’t even exist anymore.

                                                    2. 2

                                                      there’s not.

                                                    1. 15

                                                      “The reason for this high percentage is because Windows has been written mostly in C and C++, “

                                                      That’s part of the reason. The other part is that Microsoft is intentionally not spending enough money applying the tools for finding bugs or proving their absence in C/C++ code. Tools that Microsoft can afford to not only license: they could acquire the entire company, apply it to their code base, do all the fixes w/ false positives identified, and still have billions of dollars. They might have even cornered the market on C/C++ analyzers buying all of them at some point. They just don’t care.

                                                      Corporate apathy from the product divisions of Microsoft are the major cause here. MS Research should be highlighted for steadily inventing awesome, more-practical-than-usual tools for these things. One knocked out most of their driver issues (aka blue screens). The product division doesn’t adopt more, even the automated stuff. Even Midori, with all its advances, was only fielded in one product that I know of.

                                                      They also use C/C++. The empirical evidence tells us teams with little focus on or time to address QA who use those languages will produce more vulnerable code than memory-safe languages. They should probably get off C/C++ slowly over time if they are minimizing QA and not throwing all kinds of verification tooling at what their teams produce. They need a safe-by-default language with ability to selectively tune modules so we at least know where the dragons will be. The rest might be fine… by default.

                                                      That’s why dodging C/C++ is best fit for these kinds of companies and projects. Which is most of them that I can tell.

                                                      1. 27

                                                        That’s part of the reason.

                                                        It’s actually the entire reason.

                                                        The other part is that Microsoft is intentionally not spending enough money applying the tools for finding bugs or proving their absence in C/C++ code. Tools that Microsoft can afford to not only license: they could acquire the entire company, apply it to their code base, do all the fixes w/ false positives identified, and still have billions of dollars. They might have even cornered the market on C/C++ analyzers buying all of them at some point. They just don’t care.

                                                        This is an extremely misinformed and baseless opinion. I used to work on MS Office and I can’t imagine a company spending more time and effort on securing C/C++ codebases with amazing static analysis tooling, insanely smart people working on teams entirely dedicated to security, the SDL, etc. It’s a herculean effort.

                                                        1. 8

                                                          Microsoft spent a long time not doing security. This let them acquire a legacy codebase which was full of problems, esp design. They also intentionally used obfuscated formats to block competition. While achieving that, it made long-term maintainability and security worse. While their security features were basic, smaller companies were designing security kernels into their OS’s that they were building in safer languages such Pascal, Ada, and Gypsy. Some of this work predates the existence of Microsoft. Some of them tried to sell to Microsoft who aren’t interested in those or other such products to this day. They build their own stuff with less assurance than projects like this. See Layered Design and Assurance sections that deserve the phrase “herculean effort.”

                                                          Note: That project, using a basic analysis for timing channels, found side channels in caches using relatively, low-cost labor. I heard Microsoft is offering six digits for anyone that finds those with critical impact. Maybe they should’ve just applied public methods that worked. Data61 did, finding lots of new components to hit.

                                                          So, Microsoft ignored high-assurance security despite being one of few companies that could afford to apply it, at least on new code. They got hit by every preventable defect you could think of. Eventually, they caved into pressure to save their image by hiring Steve Lipner from VAX VMM. Microsoft’s priority was quickly shipping massive piles of features not designed with security in mind connected to legacy code with same problem. So, Lipner’s philosophy there was to dramatically water down the TCSEC into the SDL to add some review, testing, and other steps to each phase of their lifecycle. This made huge improvements to the security of their codebase but fell totally short of what Microsoft was capable of doing with money on hand.

                                                          Today, we’ve seen both small companies and FOSS projects delivering useful functionality with great security. They often had access to fewer tools than Microsoft. The trick was they used careful design, minimalism in implementation style, and whatever tools they had available. Unlike Microsoft’s code, theirs did way better during pentesting. We also have some suppliers formally verifying their OS’s or core protocols. MS Research has world-class people in that are who Microsoft’s managers largely don’t use. I remember Hyper-V using VCC with the company using some of their static analysis, tools. They applied driver verifier, which eliminated most blue screens. That’s about it. There’s also tools like RV-Match, Trust-in-Soft, and Astree Analyzer that can prove absence of the kind of errors hitting Microsoft’s codebase. Far as I know, they’re not using them. There’s also about five types of automated, test generation that knock out piles of bugs. Their developers didn’t mention Microsoft using them even though MS Research could’ve clean slated them in better form in short time if management requested.

                                                          Just piles and piles of methods out there, mostly from 1970’s-1990’s, that do better than what Microsoft is doing now. Altran/Praxis said their Correct-by-Construction methodology cost about 50% extra to do. Cleanroom costs anywhere from negative (lifecycle savings) to that. You’re telling me Microsoft is throwing herculean effort at their codebase but the results consistently have more defects than smaller players’ builds? I doubt that. I think they’re likely throwing herculean effort at inferior design techniques, implementation methods (including C/C++), and inadequate tooling. This is great for checklists, even producing a lot of results. For actual security, their money spent to vulnerabilities found ratio is way worse than what those smaller firms were doing on projects with a handful of developers. A firm with billions a year in profit employing people with similar skill should do better. If they cared and listened to their best experts in security and verification. ;)

                                                          1. 6

                                                            They applied driver verifier, which eliminated most blue screens. That’s about it. There’s also tools like RV-Match, Trust-in-Soft, and Astree Analyzer that can prove absence of the kind of errors hitting Microsoft’s codebase. Far as I know, they’re not using them.

                                                            I don’t know why you’re persisting here but you have absolutely no information. Microsoft has an internal version of SAL/PREfast that does static analysis beyond the compiler and it’s amazing, and they’ve been using it for far longer than most of the industry has cared about static analysis. It can find things like unsafe uses of raw buffers, unsafe array access, return values that go unbound, etc.

                                                            1. 1

                                                              Microsoft has an internal version of SAL/PREfast that does static analysis beyond the compiler and it’s amazing

                                                              I mentioned their use of internal tooling for static analysis in my comment on top of a few others (SLAM and VCC). Apparently I did have information since you just corroborated it. Since you’re continuing with unsubstantiated claims of rigour, let’s just go with what others were doing comparing it to Microsoft. Just tell me which of these methods have been in use in your groups. I’d like the names of the tools, too, so any corrections I get tell me about tools that I can pass onto other developers. I’m leaving off review since I know Lipner had that in every part of SDL.

                                                              1. Formal specifications. Z, VDM, ASM’s… companies like IBM started specifying requirements and/or high-level design in a way where they could capture properties about the data and interfaces. Those tools caught all kinds of errors. VDM and ASM at different points had code generators. Many companies used SPIN for analyzing concurrency in apps and protocols with similar successes. TLA+ is taking over that niche now. Eiffel and Praxis were using Design by Contract for their stuff. Where the requirements and/or design of Microsoft’s components formally specified to prevent errors due to ambiguities, incorrect bounds, and/or interface mistakes?

                                                              2. Static analyzers. The standard practice at places with funding is to use more than one since each tend to catch errors others miss. They’ll use at least one, badass, commercial tool they have to pay for combined with a mix and match of open-source tools with proven track record. NASA’s teams with less funding than Microsoft were using about four. Some folks mix and match the commercial ones with at least one that finds tons of things w/ false positives combined with one of the “no-false positive,” sound tools. Which three or four other static analyzers were Microsoft using in addition to their internal tool?

                                                              3. Dynamic analysis. Did they use any of those tools to supplement the static analysis and test generation through program analysis? Teams loading up on tooling will typically try at least one. It’s usually Daikon.

                                                              4. Testing. Path-based, symbolic, combinatorial, spec/model-contract-based, concolic… lots of phrases and methods in Lobsters submissions for automatically generating tests from software that hit high coverage throughout it. If I was Microsoft, I’d use one of each just alternating them during overnight runs. The rest of the time before the morning would be fuzzing. Which automated, test generators were they using while you were at Microsoft? Did they at least try KLEE on production code? And how much fuzzing did they do?

                                                              5. Concurrency. I mentioned model-checking in SPIN. However, Eiffel had a model called SCOOP that makes concurrent code about as easy to write as sequential code. There’s been a few that automatically put in locks after annotations. IBM had a tool for their Java apps that would automatically find lots of concurrency errors. Was Microsoft following high-assurance practice in using something like Ada Ravenscar or SCOOP to be immune to some of these heisenbugs? Followed by an analysis tool and/or model-checking to catch the dead/live-locks that might slip through?

                                                              6. Aside from OS’s, a number of projects like cryptlib and OP Web Browser (later Chrome, too) were building in forms of security kernels inside the app with isolation mechanisms, enforcement of policies, and/or dropping privileges of whole app or specific components when unneeded. Guttman adopted what was called “assured pipelines” inside his library. You specify exactly what order functions should be called in what states of the program. They do these through the security kernel instead of just jumps. It can be an actual kernel underneath processes or just a module that checks calls. Did Microsoft’s products use such security mechanisms to prevent situations such as opening a Word document from controlling an entire computer?

                                                              7. Compilers had a bad habit of introducing bugs and/or removing security checks. The optimizations also screwed things up a lot. A number of teams in academia started writing their compilers in strongly-typed, safe, functional languages using a combo of intermediate languages and type system to improve quality. A few verified compilers for simple languages with CompCert doing one for C. AdaCore modified their IDE’s to allow developers to see the assembly easily for human inspection. Did you hear about or see the compiler teams at Microsoft use 1-5 on top of the safe, functional, type-driven approaches that were working? Did they license CompCert to protect their C code like companies in safety-critical are doing? Did they develop or contract their own certifying compiler for C and/or C++?

                                                              8. Drivers and protocols. SLAM was a nice, success story. It would be great if they forced all drivers to also undergo at least 2-4 above. They could also acquire AbsInt to get both CompCert and Astree Analyzer. Astree proves absence of all the common problems in code structured in ways similar to how one might write device drivers. That’s on top of probably many other things in Windows ecosystem. At one point, Astree was the only tool that could do this. You hear about Microsoft buying AbsInt or just licensing Astree to use on as many drivers and protocols as it can with passing being mandatory for interoperability certification and branding?

                                                              9. Formal proof. MS Research is awesome at all that stuff. They have tools that make it easier than usual. They even applied some of them to MS products as part of their funded, research activities. Microsoft adopted a few. They have more, though, like the ability to do everything from proving code correctness in Dafny to type-safety in assembly. Did you see them using any of that at least on riskiest and most-used routines? Did any assembly code you see use their type- and/or memory-safe variants? Were they using Frama-C or SPARK Ada on critical routines?

                                                              So, there’s a few just based on what academics, smaller companies, defense contractors, smartcard companies, and a random number of players in other industries have been doing. If they can do it, so can Microsoft. Some of this was done on tech that predates Microsoft. So, it would’ve been easier for Microsoft to do it than the forerunners. These are also a diverse array of things that industries do when they really want to achieve the goal. They throw everything at it that they can afford to do. Most can’t afford to do all of this. Some do, though, with those NASA teams and Praxis having way, way, less money than Microsoft’s operations side. They still put in a herculean effort with some of the lowest-defect, most-reliable software in existence coming out. If Microsoft did likewise, you’ll be able to name a lot of tools and approaches in 1-9.

                                                              I actually hope you do since I’d love to be wrong. Problem is many of these are trick questions: many CVE’s Microsoft got hit by would be impossible if they applied these methods. Empirical data already refutes your point or refutes my belief that using 1-9 creates low-defect software. Most evidence indicates my belief that 1-9 bottoms out the defects is true. So, I defaulted to believing Microsoft wasn’t using these methods… wasn’t putting in herculean effort like other companies, some academics, and some folks FOSSing stuff were. Their employees might not even know much of this stuff exists. That would be more damning given MS Research has bright folks who certainly pushed adoption of some of that. What of 1-9 did Microsoft do in your group and other groups doing production software?

                                                              Note: That excludes Midori even though it was used in a production app. It was mostly just for research. I haven’t heard about any migrations of their services to it.

                                                              1. 3

                                                                This is a really weird conversation. Do you want to know more because you’re curious or are you quizzing me? I honestly can’t tell.

                                                                1. 3

                                                                  I keep track of what methods different companies use to assure software, what results, and (if good results) their availability. I do comparisons among them. I do it mostly to learn and pass things on. I also use it to objectively assess what they’re doing for software correctness, reliability, and security. I assessed Microsoft based on their simultaneous decline in 0-days and blue screens (success of their methods) plus evidence in what remains that they were not using strongest methods that they certainly could afford. I claimed apathy was the reason that they’d ignore those methods, especially automated ones, in their assurance activities. They could give their teams the training and budget if they wanted.

                                                                  You said they were doing all kinds of things, putting in massive effort, and I had no clue what I was talking about. You kept insisting that last part without evidence past an argument from authority (“I was there, I know, trust me”). I very rarely have people tell me I’m clueless about verification tech, esp if seeing my Lobsters submissions and comments. Maybe you’re new I thought missing stuff like this.

                                                                  In any case, whether I’m right or wrong, I decided to test your claim by presenting different classes of assurance techniques companies outside Microsoft were using (esp projects/companies with fewer resources), describe some benefits/examples, and then ask what Microsoft was doing there. If I’m wrong, you’ll have examples of most of that maybe even with descriptions of or links to the tools. If I’m right, you’ll be looking at that stuff saying “we didn’t do almost any of that stuff.” You might even wonder if it was even possible to do that stuff given the budget and time-to-market requirements forced on you. If so, that makes it even more likely I’m right about them intentionally limiting what security you could achieve to maximize their numbers for the execs.

                                                                  So, I’m simply testing your claims in a way that gets you to give us actual data to work with, maybe teaches us about awesome QA methods, and maybe teaches you about some that you weren’t aware of. I try to be helpful even in debate. There’s also others reading that might benefit from specific techniques and assessments in these comments. They’ll tune out as the thread gets deeper and more flamish. Hence, me just switching right to asking about specific methods already used outside Microsoft, some for decades, to see if Microsoft was matching that effort or doing less-effective efforts.

                                                                  1. 1

                                                                    I’m sorry but without primary sources, you’re appealing to (self) authority to make your claim. You seem to be self assured in your position, enough to analyze internal Microsoft decisions.

                                                                    1. 1

                                                                      Im really not. Ive submitted endlessly examples of these activities countering problems Microsoft is experiencing. If they’re experiencing them, then it’s a valid inference that Microsoft isn’t using them.

                                                                      The other person is doing argument from authority by saying we should trust a huge claim contradicting empiricsl observations just because they claim to know what Microsoft was doing. No evidence past that. Weird that you call my comment out rather than theirs or both of ours if you wanted provable, primary sources or evidence.

                                                                      1. 1

                                                                        If they’re experiencing them, then it’s a valid inference that Microsoft isn’t using them.

                                                                        No, you’re creating a black-and-white issue where there isn’t one. Formal methods (as you undoubtedly know) have many issues when used in practice. Maybe Microsoft did try to use these tools (especially given how many of them come from MSR), but found their codebases intractable, the amount of instrumentation to be too onerous to ship without a major rewrite, or maybe they did run these tools and classes of bugs were still not found, or there were too many false positives to justify using a tool. It feels overly simplistic, and almost disingenuous, to insinuate that just by having certain classes of bugs that they didn’t run the formal checkers or implement the formal methods that counteract them.

                                                                        1. 1

                                                                          Although you bring up sensible points, let me illustrate why I’m not using them in a Microsoft assessment by applying your comment pre-SDL. That was back when Microsoft was hit by vulnerabilities left and right. We claimed they weren’t investing in security based on that field evidence. We also cited OS’s and products with fewer vulnerabilities and/or better containment as evidence of what they could’ve been doing. You could’ve come in with the same counter:

                                                                          “No, you’re creating a black-and-white issue where there isn’t one. Security engineering (as you undoubtedly know) has many issues when used in commercial products. Maybe Microsoft did try to use such methods (especially given how many in MSR know about them), but found their codebases intractable, the amount of instrumentation to be too onerous to ship without a major rewrite, or maybe they did do reviews and run security tools and classes of bugs were still not found, or there were too many constraints or false positives to justify attempting to secure their legacy products or use security tools. It feels overly simplistic, and almost disingenuous, to insinuate that just by having certain classes of bugs that they didn’t have a security team in place or use the techniques that counter them.”

                                                                          Yet, we’d have been right because it wasn’t simplistic: methods and tools that consistently improve things in specific ways would’ve likely done it for them. Then, they adopted a few of them, new results came in, and they matched our outcome-based predictions (i.e. corroboration). Piles of companies, including IBM, were using some on my list. Some of these were on big projects, new and legacy. Hell, fuzzers are a low-effort tool that work on about everything. Like shooting fish in a barrel I’m told.

                                                                          I doubt Microsoft is doing something so unique that no extra techniques could’ve helped them. If anything, you’re introducing a new claim against decades of field evidence saying Microsoft’s code is so unique that universal, QA methods all fail on them except a few, specific tools they’re already using. Since mine is status quo about QA results, burden of proof is on you to prove Microsoft code is the exception to the rule.

                                                                          That brings me to another problem with your counter: you exclusively focus on formal methods, the smallest part of my comment. If that’s all I said, I could about half-concede it to you at least on legacy, C/C++ code they had. However, my assessment had a range of techniques which included push-button tools like source-based test generators, fuzzers, and 3rd-party static analyzers. We’re talking low to no effort tooling that works on fairly-arbitrary code. You’re telling me Microsoft is about the only company who couldn’t benefit from using extra tools like that? And with what evidence?

                                                                          The simplest test of your counter is quickly DuckDuckGoing if anyone used a fuzzer on Microsoft products getting results in relatively, little effort. Yup. I stand by my method of extrapolating likely internal investment in techniques from results those suppliers achieved as assessed by independent parties hacking the software. If one is low and other is high, then they probably aren’t doing enough of the good things.

                                                          2. 3

                                                            The point is that they could do more. I believe Microsoft can invest more in security, but probably can’t do so economically now. Which is, as you said, herculean, since I believe other companies can invest more in security, even in economically sound manner.

                                                            1. 2

                                                              The Office team is like a completely separate company compared to how other MS teams work, so maybe Office was better than the others’ at this?

                                                              1. 1

                                                                Could very well be!

                                                          1. 10

                                                            This is what I posted to a similar topic over on reddit r/selfhosted recently:

                                                            Data Center

                                                            Dedicated FreeBSD 11 server on a ZFS mirror, from OVH. The host is very slim and really just runs ezjail, and unbound as a local resolver. All the action happens inside jails.

                                                            • MySQL jail - provides database in a “container” for the other jails
                                                            • PowerDNS jail - Authoritative DNS server, backed by MySQL
                                                            • LAMP stack jail - a place to host web pages with apache and PHP, for the most part. Using PHP-FPM with per-vhost chroots and UIDs. Containers within containers! Very happy with this setup. Notably hosts:
                                                              • Ampache - which the whole family uses for mobile-friendly music streaming.
                                                              • Chevereto - image hosting
                                                              • NSEdit - Web app for managing our DNS server.
                                                              • WP Ultimate Recipe - recipe database built on wordpress
                                                              • Wallabag - read-it-later article saver like Pocket
                                                              • Lots of WordPress sites for friends and family and assorted custom scratch-built things and experiments
                                                            • NextCloud jail - NextCloud on nginx + php-fpm. In it’s own jail to keep it completely separated. The whole family uses it for files, calendars and contacts.
                                                            • Minecraft server jail
                                                            • Email jail - Custom email hosting setup built on Postfix, Courier-IMAP, Maildrop, MySQL, Apache, PHP. I’ve been hosting my own email since the 90s.
                                                            • Legacy jail - Really just the old server, that went P2V two or three server moves ago - so easy to do when everything lives in ZFS jails. This is deprecated and I have been moving things off it (very slowly).

                                                            Home Network

                                                            PoS old FreeBSD 11 server with a ragtag collection of hard drives on ZFS. It’s mainly our home NAS, storing our media, but also hosts:

                                                            • Nagios jail - Network monitoring and notification
                                                            • Asterisk jail - Home and office VoIP
                                                            • Assorted experiments and goofs

                                                            Raspberry Pi 3A - Kodi in the livingroom, playing media from the NAS

                                                            Raspberry Pi 2A - Custom dashboard app showing server/network status and weather and stuff.

                                                            Raspberry Pi 1A - Running sprinklers_pi to control our lawn sprinklers.

                                                            Remaining Pain Points

                                                            Still getting a decent KeePass workflow established

                                                            Need to setup a VPN at home

                                                            Still don’t have Ampache sharing working. It should be easy for me to tell a friend to, “listen to this song” or “this album”. Need to get a good beets workflow going to clean things up.

                                                            Need to pick a wiki for a family knowledge base.

                                                            Asterisk is crusty and no one is developing personal-scale VoIP managers, because everyone just uses cell phones these days.

                                                            Need more hard drives.

                                                            1. 4

                                                              Would you be willing to move off KeePass to Bitwarden? Did it myself a while back using bitwarden_rs. Super easy to host and everything Just Works™. Also would allow groups for shared passwords between the family.

                                                              1. 2

                                                                What kind of KeePass workflow are you looking for? I have personal and shared key databases in KeePassXC, and share the shared ones with SyncThing – I assume NextCloud could do that level of file sharing for you. I’m very happy with it so far, but it’s also so trivial I suspect you’re looking for something beyond that, no?

                                                                1. 1

                                                                  So, I have KeyWeb on OSX, and pair it with ChromelPass in Chrome. I save my DB inside a NextCloud folder so that it is available on my other devices/platforms. I like it generally, but it always seems to be locked when I want to add something to it, so I have to type in a high-entropy password and select the key file, and by that time ChromelPass has forgotten that I was trying to save a password and given up. So like I log out and back in, and save the password, “now that I’m ready”. It’s not as integrated or smooth in chrome as the built-in password db, so it’s easy to forget it, and I always have a sense of, “but do I really have it saved?” on new additions.

                                                                  I don’t actually have an Android app yet. What do people use there?

                                                                2. 1

                                                                  Asterisk

                                                                  Yep, personal-scale VOIP doesn’t quite make sense when most folks having unlimited call and internet calls. I’ve only seen personal VIOP on my friend family. He has parents living abroad and it’s easier to deploy VOIP rather than teaching them how to use APPs.

                                                                  1. 1

                                                                    How’s your experience with Kodi? My annual Plex sub has just renewed so I’ve got plenty of time to look up a replacement, but they’re adding a bunch of crap I don’t want, and don’t seem to be fixing things that annoy me so I’d like this to be my last year at that particular teat.

                                                                    1. 4

                                                                      Kodi, like Plex, has garbage UI full of extremely frustrating UX quirks. But, in my house, it’s still my main way of consuming my library (with an nvidia shield as the client). But they also serve different audiences: Kodi is hard to serve externally and is mostly just a client, while Plex is good at remote, shared access and solving the server side.

                                                                      1. 2

                                                                        It works well. It’s not a great UI. The family is comfortable with the Roku UI even though it is terrible and they complain about it. If there’s something on both, they’ll play the one on the Roku first every time. Searching is meh. Playlist building is straight-up user-surly. Indexing is hit-or-miss and needs a lot of interventions. Actual video playing is great. Playing music through it is not fun.

                                                                        1. 2

                                                                          I’ve looked into plex alternatives as well. Emby was kind of interesting, but they recently closed some of the source, and then there was a fork of it by some members of the community. Going to wait and see how that shakes out.

                                                                          Universal media server (UMS) paired with Infuse (appletv/ios) is kind of interesting – the main drawback is how large the local storage usage in infuse gets, and how slowly it builds the first time. If only it pulled more metadata from the server at runtime. I tried pairing infuse with plex (recent supportly configuration), and it had the same issue with local storage size and slow initial build time. It’s unfortunate, because otherwise I found it fairly decent (UI/UX).

                                                                        2. 1

                                                                          What’s your experience been like with Chevereto? I’m in the market for something very much like it, and I see it mentioned a fair bit, but I don’t run any other PHP/MySQL things so I’m a bit wary.

                                                                          1. 2

                                                                            Minimal, honestly. I set it up and it runs nicely, but I haven’t really used it heavily.

                                                                          2. 1

                                                                            Nice list. Have you considered to run a database in each jail instead of having a dedicated MySQL jail? I have been looking for a discussion of the pros and cons of both approaches.

                                                                            1. 2

                                                                              Yes. I mean, 15 years ago I had one server with no jails/containers/etc and everything was just stacked in one big messy pile, and we all know what happens with that over time and upgrades. I moved that whole thing into its own jail, just to draw a line around the mess, and started a new server layout, with pieces in separate jails. I love having stuff compartmentalized into its own container so that upgrades are never an issue. I never have to, “upgrade PHP because upgrading MySQL upgraded gettext which now conflicts with, bah! no!” If anything, I am moving (carefully) towards further containerization right now. For instance, I’d like to have PHP in it’s own jail separate from the web server, so that I can run several versions and just pick which socket to connect a site to in the config. But as you guessed, it is a balance. I never want to get into installing simple docker web apps that each install a web server and a db server and duplicate each other in stupid and wasteful ways. On the other hand, for somethings, it is nice to have a self-contained “package”, where if something got busy enough to need it’s own server, I could just move it to bigger hardware in one shot.

                                                                          1. 1

                                                                            Already self-hosted:

                                                                            • A ZFS storage array for network storage, pc backup, VM storage, digital packratting, etc. Hosted on OmniOS VM, exposed as NFS and smb shares to the rest of the network.
                                                                            • Plex for movies and music (Kodi when internal)
                                                                            • Minecraft servers for playing with friends (Docker on Ubuntu)
                                                                            • Some scratch Windows VMs for messing
                                                                            • DNS server, network analytics, and other goodies in pfSense

                                                                            My self-hosting todo list:

                                                                            • VPN server
                                                                            • Star Trek GIF repository
                                                                            • Apache Guacamole for an easy-to-access farm of VMs for testing and development purposes
                                                                            • Dockerizing my Plex server
                                                                            • Personal document storage
                                                                            • Dropbox-like storage for my friends (Seafile?)
                                                                            • Mercurial code repos
                                                                            • My static websites

                                                                            And of course one of the joys of self-hosting is getting to play with lots of cool hardware you otherwise wouldn’t justify: https://imgur.com/gallery/eO1XDbH

                                                                            1. 2

                                                                              Using Mercurial is such a delight after having had to work with git every day. I never think about version control and all the arcane commands, I just do my work. I hope I never have to go back.

                                                                              1. 2

                                                                                When diving into the zfs deep-end I evaluated FreeNAS and OmniOS+nappit and settled on the latter. Solaris-derivatives natively play nicely with Windows permissions and nappit was ultimately easier to configure despite being much less polished.

                                                                                I’m surprised more people don’t run this setup.