Threads for cript0nauta

  1. 4

    Although I’m not a frequent reader or distributed systems topics, I can recommend a few resources as a starting point:

    • The Morning Paper blog gives a review of lots of useful papers. Recently, the author focused on cloud architecture and distributed systems research
    • The Papers We Love repo organizes great papers in different categories, such as distributed systems. Its YouTube channel also has talks that give an insight on those papers
    1. 2

      The Morning Paper has unfortunately ended, but the archives are still full of great content.

    1. 13

      I think it’s somewhat irresponsible for this article not to mention that the story with security updates for all the “real privacy respecting browsers” is very poor. Only the big leagues can afford to have full-time security teams on staff. In fact, Debian goes so far as to state in their release notes: (emphasis mine)

      Debian 10 includes several browser engines which are affected by a steady stream of security vulnerabilities. The high rate of vulnerabilities and partial lack of upstream support in the form of long term branches make it very difficult to support these browsers and engines with backported security fixes. Additionally, library interdependencies make it extremely difficult to update to newer upstream releases. Therefore, browsers built upon e.g. the webkit and khtml engines[6] are included in buster, but not covered by security support. These browsers should not be used against untrusted websites. The webkit2gtk source package is covered by security support.

      Of course, if you disable scripting in an obscure browser it greatly reduces the attack surface, but I don’t feel like this is a well-understood problem, and it’s very unfortunate that the article neglects to mention this very depressing fact.

      1. 3

        I agree with this. In the case of Firefox, they fix security issues in every release, and they had previously reported 0day attacks exploiting these issues. Due to the complexity of browsers, using an out-of-date browser can be very dangerous.

        Disabling JS on untrusted sites sites seems to mitigate most vulnerabilities. But there also exist ways to bypass this protection, so it isn’t enough if you want maximum security.

        1. 2

          I think security updates are a overrated. If you want security, use Qubes or run your browser in a VM. As your own link demonstrates, even the latest and greatest won’t protect you from newly discovered vulnerabilities which are immediately exploited.

          If you only browse trusted sites without third-party ads (and/or block tpa) there’s little opportunity for exploit code to find a way into your browser.

          If you browse with JS disabled by default, odds are even lower, as that’s where most of the attack surface is.

          If you are writing a browser based on a major engine, you can get the security updates rolled into your project pretty quickly.

          This is how I do web security today. It will probably not protect you from anyone with a budget over 100K, but neither will Chrome or Firefox. Embrace being owned by our 8enevolent 0wners.

          This is only for protection against random drive-by Web exploits. None of this will help you against privacy attacks, spearphishing, downloading and opening random shit, enabling javascript on porn sites, and so on.

          I’m afraid this means not clicking random links in search results. Whenever you’re searching for e.g. a how-to or advice on the best X to use for Y, make sure you gate your searches with or

          1. 1

            You make some decent arguments, but they have so many caveats attached.

            “If you always do X and never forget to do Y, it should be safe to use Z.” There is no reason to believe the average reader of the article would know do X, so failing to call that out is a big mistake.

            1. 2

              The intended audience for this article seems to be techies, and that’s who my comment was for.

              I agree that this type of thinking shouldn’t be a requirement for the average user, but here we are, and there’s no easy solution for them.

        1. 10

          Yesterday I wanted to use my laptop I just recently put NixOS on to resume working on one of my projects that uses pip/requirements.txt to set up dependencies. After realizing that ‘pip install’ doesn’t work in here, I determined that the amount of work required to accomplish this was too great and ended up putting my HDD with Arch back in so I could do some actual work rather than re-solve the already solved (many times over) problem of locally managing python dependencies. The complexity of doing this on NixOS effectively kills using virtualenvs as a quick way to get a special environment up and running.

          Since I’m still very new to NixOS/Nix, deducing the steps in this post from the Nix manual on python was too much for me. This post might encourage me to give it a try though.. but my point of “why is this so ridiculously/needlessly complicated here?” still stands…

          1. 3

            Addressing your point, I like to bring up the difference between complicated and difficult: Nix is difficult, but not very complicated.

            I think it is difficult because we are used to working with package managers that treat the filesystem as a global shared space and that results in a lot of implicit dependencies (and all the problems that come with it). Nix on the other hand makes dependencies explicit which I think is simpler and lets you do things such as having 10 different variations of the Python runtime without resorting to external tools and shell hacks.

            It takes effort plus the official manuals aren’t very welcoming, but it is very rewarding to me.

            1. 1

              external tools and shell hacks

              Thing is, the Python world already has tools like tox which solve this well-known problem, are comparatively well documented, and generally just more stable and better supported for Python than Nix’s “final solution”. For those of us who need (or just prefer) to get some development work done rather than tinker with their system package manager, it’s a significant barrier to adoption.

              1. 1

                I agree with you, and I think the best case scenario is the end user (the persona you described) using Nix through a wrapper so they don’t even know Nix is involved which makes it is more interesting to people working on tooling. I have used Nix to reduce the onboarding of new developers on a project to “install Nix and run nix-shell”. That included multiple programming languages (with their packages), databases, web servers, etc.

                If you think about it (and I will exagerate a bit to make my point), it’s weird that you need to install Python (probably using some package manager), then use pip (another package manager) to install Tox which manages virtualenv instances containing Python installations. Now take that and replicate it for every other programming language/environment out there.

                In the end everything is just files depending on other files.

                1. 2

                  In a polyglot project or environment, language-specific dependency management may well be inadequate. Reproducible builds are a good thing, no matter how you get there. But Nix is a pretty big hammer, and it’s probably not a good idea to try to force everyone up that learning curve. If you’re going to make a wrapper around it for a project, there should be someone who can maintain it and respond quickly when that abstraction barrier is breached.

                  I was responsible for introducing Nix on a big, ambitious, polyglot project myself. Can’t say for sure that it solved more problems than it created, but I do believe it was a net win. Most of the devs still won’t touch it except to run nix-shell, but that’s OK. But I do have mixed feelings; mostly along the lines of, Nix can make it a little too easy and “safe” to add complexity which will end up biting you anyway, and probably worse than if you’d been forced to deal with it at an earlier stage. YMMV; Nix is good stuff, just no substitute for discretion and other forms of sound engineering judgement. Beware silver bullets.

            2. 2

              The nixpkgs manual has the section “ How to consume python modules using pip in a virtual environment like I am used to on other Operating Systems?” that explains how to use pip and virtualenv inside Nix. I previously used that approach to work on an existing project with lots of dependencies. It’s not elegant but can be useful in some cases.

              Now I’m using a custom solution (not released yet) to build Nix expressions from Python projects. It’s similar to mach-nix, the one mentioned in the article. Actually, I didn’t know mach-nix existed before reading the article :)

              My recommendation is to start with the pip/virtualenv approach mentioned earlier in order to have a working, good enough development environment. Then you can move to a pure Nix solution in order to guarantee reproducibility.

              1. 2

                I had a similar experience. I was working on a project that required a lot of other new technologies and I thought I would do it on my NixOS install. Only to realize that on top of all of the other technologies I was trying to learn, I would also have to learn about how to use pip in Nix. I spent a while trying the tips on the guide, only to have one required packages not be in the Nix package collection. With this guide I may try to get that project working in Nix again.

                I love the idea of NixOS, not allowing so many small things to work as expected puts a real damper on productivity. And as a beginner to Nix, it is daunting not knowing if I will be able to do a project without any hitches or if I will hit some a multi-hour speed bump that leads me down a rabbit hole with 5 different imperfect solutions to a problem.

              1. 18

                What has been the problem with Python/Flask/SQLA?

                1. 28

                  Python: the size of the codebase and number of moving parts has reached a point where the lack of static typing has become the main source of programmer errors in the code. There are type annotations now, but they don’t work very well IMO, are not used by most of our dependencies, and would be almost as much to retrofit onto our codebase as switching to a type-safe language would be. The performance of the Python VM is also noticably bad. We could try PyPy, but again… we’re investing a lot of effort just to stick to a language which has repeatedly proven itself poorly suited to our problem. The asyncio ecosystem helps but it’s still in its infancy and we’d have to rewrite almost everything to take advantage of it. And again, if we’re going to rewrite it… might as well re-evaluate our other choices while we’re at it.

                  Flask: it’s pretty decent, and not the main source of our grief (though it is somewhat annoying). My main feedback for Flask would be that it tries to do just a little bit too much. I wish it was a little bit more toolkit-oriented in its design and a more faithful expression of HTTP as a library.

                  SQLAlchemy: this is now my least favorite dependency in our entire stack. It’s… so bad. I just want to write SQL queries now. The database is the primary bottleneck in our application, and hand-optimizing our SQL queries is always the best route to performance improvements. Some basic stuff is possible with SQLAlchemy, simple shit like being smart about your joins and indicies, but taking advantage of PostgreSQL features is a pain. It’s a bad ORM - I’m constantly fighting with it to just do the shit I want it to and stop dicking around - and it’s a bad database abstraction layer - it’s too far removed from Postgres to get anything more than the basics done without a significant amount of grief and misery. Alembic is also constantly annoying. Many of the important improvements I want to do for performance and reliability are blocked by ditching these two dependencies.

                  Another problem child that I want to move away from is Celery. It just isn’t flexible enough to handle most of the things I want to do, and we have to use it for anything which needs to be done asyncronously from the main request handling flow. In Go it’s a lot easier to deal with such things. Go also allows me to get a bit closer to the underlying system, with direct access to syscalls and such*, which is something that I’ve desired on a few occasions.

                  For the record, the new system is not without its flaws and trade-offs. Go is not a perfect tool, nor GraphQL. But, they fit better into the design I want. This was almost a year of research in the making. The Python codebase has served us well, and will continue to be useful for some time to come, in that it (1) helped us understand the scope necessary to accomplish our goals, and (2) provided a usable platform quickly. Nothing quite beats Python for quickly and easily building a working prototype, and it generally does what you tell it to, in very few lines of code. But, its weaknesses have become more and more apparent over time.

                  * Almost. The runtime still gets on my nerves all the time and is still frustratingly limiting in this respect.

                  1. 9

                    Thanks for responding. I think static typing in Python works really well once configured so I’m surprised to hear you say that. I think it’s better than the static typing in most other languages because generics are decent and the inference is pretty reasonable. For example it seems better thought out than Java, C and (in my limited experience) Go. My rough feeling is that 75% of the Python ecosystem either has type annotations or has type stubs in typeshed. Where something particularly important is untyped, I tend to just wrap it and give it an explicit annotation (this is fairly rare). I’ve written some tips on getting mypy working well on bigger projects.

                    I don’t think you have the right intuition that asyncio would help you if your problem is speed. I pretty convinced that asyncio is in fact slower than normal Python in most cases (and am currently writing another blogpost about that - UWSGI is for sure the fastest and most robust way to run a python webservice). Asyncio stuff tends to fail in weird ways under load. I also think asyncio is a big problem for correctness - it actually seems quite hard to get asyncio programs right and there are a lot of footguns around.

                    Re: SQLAlchemy - I’m also very surprised. I think SQLAlchemy is a good ORM and I’ve used postgres specific features (arrays, json, user defined functions, etc) from it a great deal. If you want to write SQL-level code there is nothing stopping you from using the “core” layer rather than the “ORM” layer. There’s also nothing stopping you using SQL strings with the parameterisation, ie "select col_a from table where col_b = :something - I do that sometimes too. I have to say I have never had trouble with hand optimising a SQL query in SQLA - ever - because it gives you direct control over the query (this is even true at the ORM level). One problem I have run into is where people decide to use SQLA orm objects as their domain objects and…that doesn’t end happily.

                    Celery however is something that I do think is quite limited. It’s really just a task queue. I am not sure that firing off background tasks as goroutines is a full replacement though as you typically need to handle errors, retry, record what happened, etc. I think even if you were using go every serious system ends up with a messaging subsystem inside it - at least for background tasks. People do not usually send emails from their webserving processes. Perhaps the libraries for this in go land are better but in Python I don’t think there is a library that gets this kind of thing wholly right. I am working on my own thing but it’s too early to recommend it to anyone (missive). I want to work on it more but childcare responsibilities are getting in the way! :)

                    Best of luck in your rewrite/rework. I have not been impressed with GraphQL so far but I haven’t used the library you’re planning to use. My problems with GraphQL so far are that a) it isn’t amenable to many of the optimisations I want to do with it b) neither schema first nor code first really work that well and c) it’s query language is much more limited than it looks - much less expressive than I would like. You may not find that the grass is greener!

                    1. 5

                      I don’t think you have the right intuition that asyncio would help you if your problem is speed.

                      I don’t want asyncio for speed, I want it for a better organizational model of handling the various needs of the application concurrently. With Flask, it’s request in, request out, and that’s all you get. I would hope that asyncio would improve the ability to handle long-running requests while still servicing fast requests, and also somewhat mitigate the need for Celery. But still, I’ve more or less resigned from Python at this point, so it’s a moot point.

                      I am not sure that firing off background tasks as goroutines is a full replacement though as you typically need to handle errors, retry, record what happened, etc.

                      Agreed. This is not completely thought-out yet, and I don’t expect the solution to be as straightforward as fire-and-forget.

                      My problems with GraphQL so far are that a) it isn’t amenable to many of the optimisations I want to do with it b) neither schema first nor code first really work that well and c) it’s query language is much more limited than it looks - much less expressive than I would like.

                      I have encountered and evaluated all of the same problems, and still decided to use GraphQL. I am satisfied with the solutions to (a) and (b) presented by the library I chose, and I feel comfortable building a good API within the constraints of (c). Cheers!

                    2. 3

                      So do you plan to keep the web UI in Python using Flask, and have it talk to a Go-based GraphQL API server? Or do you plan to eventually rewrite the web UI in Go as well? If the latter, is there a particular Go web framework or set of libraries that you like, or just the standard library?

                      1. 4

                        To be determined. The problems of Python and Flask become much less severe if it’s a frontend for GraphQL, and it will be less work to adapt them as such. I intend to conduct more research to see if this path is wise, and also probably do an experiment with a new Golang-based implementation. I am not sure how that would look, yet, either.

                        It’s also possible that both may happen, that we do a quick overhaul of the Python code to talk to GraphQL instead of SQL, and then over time do another incremental rewrite into another language.

                      2. 3

                        I’m curious about why you consider that Flask does “a little bit too much”. It’s a very lightweight framework, and the only “batteries included” thing I can think of is the usage of Jinja for template rendering. But if I’m not wrong, sourcehut uses it a lot so I don’t thing this is what annoys you.

                        Regarding SQLAlchemy, I totally agree with you. It’s a bad database abstraction layer. When you try to make simple queries it becomes cumbersome because of SQLAlchemy’s supposed low level abstractions. But when you want to make a fine-grained query it’s also a real pain and you end up writing raw SQL because it’s easier. In some cases you can embed some raw SQL fragment inside the ORM query, but it is often not the case (for example, here is a crappy piece of code I’m partially responsible of). Not having a decent framework-agnostic ORM is the only thing that makes me miss Django :(

                        1. 8

                          Regarding Flask, I recently saw Daniel Stone give a talk wherein he reflected on the success of wlroots compared to the relative failure of libweston, and chalked it up to the difference between a toolkit and a midlayer, where wlroots is the former. Flask is a midlayer. It does its thing, and provides you a little place to nestle your application into. But, if you want to change any of its behavior - routing, session storage, and so on - you’re plugging into the rails its laid down for you. A toolkit approach would instead have the programmer always be in control, and reach for the tools it needs - routing, templating, session management, and so on - as they need them.

                          1. 1

                            I’ve personally found falcon a bit nicer to work with than flask, as an api/component.
                            That said, as a daily user for some mid-sized codebases (some 56k odd lines of code), I very much agree with what you said about python and sqlalchemy.

                          2. 4

                            I find that linked piece of code perplexing because converting that from string-concat-based dynamic SQL into SQLA core looks straightforward: pull out the subqueries, turn them into python level variables and then join it all up in a single big query at the end. That would also save you from having a switch for sqlite in the middle of it - SQLA core would handle that.

                          3. 1

                            SQLAlchemy: this is now my least favorite dependency in our entire stack. It’s… so bad

                            That’s also the only thing I remember about it from when I used it years ago. Maybe it’s something everyone has to go through once to figure out the extra layer might look tasty, but in the end it only gives you stomach ages.

                          4. 13

                            Yeah, I’d be very interested to hear more about that too. Not that I disagree, but I think his article was light on details. What were the things that “soured” his view of Python for larger projects, and why was he “unsatisfied with the results” of REST?

                            1. 11

                              I found REST difficult to build a consistent representation of our services with, and it does a poor job of representing the relationship between resources. After all, GraphQL describes a graph, but REST describes a tree. GraphQL also benefits a lot from static typing and an explicit schema defined in advance.

                              Also, our new codebase for GraphQL utilizes the database more efficiently, which is the main bottleneck in the previous implementation. We could apply similar techniques, but it would require lots of refactoring and SQLAlchemy only ever gets in the way.

                            2. 1

                              Ive been using Flask and Gunicorn. I basically do native dev before porting it to web app. My native apps are heavily decomposed into functions. One thing that’s weird is they break when I use them in web setup. The functions will be defined before “@app” or whatever it is like in a native app. Then, Gunicorn or Flask tells me the function is undefined or doesn’t exist.

                              I don’t know why that happens. It made me un-decompose those apps to just dump all the code in the main function. Also, I try to do everything I can outside the web app with it just using a database or something. My Flask apps have stayed tiny and working but probably nearing the limit on that.

                            1. 8

                              git-bug is an issue tracker using git as database. It’s not a general purpose database, but it has a great document explaining how it stores the data without having to deal with merge conflicts. Maybe you’ll find this document useful.

                              1. 4

                                very similar: git-dit:

                                Git-dit stores issues and associated data directly in commits rather than in blobs within the tree. Similar to threads in a mailing list, issues and comments are modeled as a tree of messages. Each message is stored in one commit.