1. 17

    TLDR:

    1. Start with big IDs so you don’t enter in PHP weird behaviors.
    2. Use UTF-8, always.
    3. Allow-list system with deny by default for anything related to security.
    4. Monitor your SQL queries.
    1. 5

      The one time I read the article before the comments…

      1. 2

        Monitor your SQL queries.

        The idea that SQL queries are a special kind of error is really weird. You should be reporting all your errors that aren’t caused by bad user input or network hiccups.

        The allow-list bit is a good call but the rest seems oddly language-specific. Starting with big IDs implies you’re using integers, which is not a great idea to begin with; stick with UUIDs and the whole problem isn’t even possible.

        1. 3

          The idea that SQL queries are a special kind of error is really weird.

          Because SQL syntax errors indicate injection bugs, and SQL injection bugs can exploited to accomplish literally anything. The OP isn’t talking about queries that are supposed to return exactly one result returning more than one, which are also bugs, but probably not exploitable to completely pwn the database.

          1. 1

            The OP isn’t talking about queries that are supposed to return exactly one result returning more than one, which are also bugs […]

            If it’s a bug … why are you ignoring it? You should … fix it instead?

            1. 2

              If a bug exists but nobody discovers it, is it truly a bug?

              1. 2

                If a bug exists and all your users find out about it before you do, are you really a professional?

                1. 1

                  Precisely. I’m sure I don’t need to tell a fellow professional how this kind of disparate-parity bug can quietly develop over time.

              2. 2

                Because I need to fix the highest severity bugs first, and SQL injections are much higher severity.

            2. 2

              What’s wrong with using integers as keys? If you want to make sure that two integers with different semantic meanings are never confused for each other, you should be relying on a type system, not the improbability of UUID collisions. (Unless I’ve mistaken your point?)

              1. 2

                Integers are enumerable, while (some kinds of) UUIDs are not.

                1. 1

                  What’s the problem with enumerability—that someone might be able to guess a valid ID?

                  Which kinds of UUIDs aren’t enumerable?

                  1. 3

                    You can technically enumerate UUIDs, but considering how many there (quite a few) are and how they are generated (randomly for version 4 UUIDs), it may take you some time.

          1. 22

            The article doesn’t attempt to explain why legislation isn’t a solution, which is disappointing. Of course technologists are going to think that the solution is technology.

            1. 8

              Yeah, it also lost all credibility when it started comparing big tech to the Ministry of Truth, a comparison that is quickly as becoming as overused as comparison’s to Hitler. Of course technologists are going to think the answer lies in technology, they can’t see past their own nose to understand that not everyone being affected by these issues can utilize those technologies, though. Not every person has the technical understanding or even the access to the equipment to stand up a Mastodon server to get their daily news. The law is going to have to step in at some point.

              1. 4

                Yeah, it also lost all credibility when it started comparing big tech to the Ministry of Truth, a comparison that is quickly as becoming as overused as comparison’s to Hitler.

                Interesting take. So the article lost credibility by … making a comparison that, in your view, is too popular?

                1. 2

                  You say too popular, I say vague and hyperbolic. You can’t just write a blog post, compare something about contemporary tech to Orwell’s 1984, and expect blanket approval. It’s low effort and I’m calling it out as such.

              2. 3

                I think the problem with a lot of the more dubious content posted on these platforms is that it’s often not strictly illegal; certainly not in the US where free speech tends to extend further than most European countries. Much of the content we’re talking about here is certainly unpleasant or outright lies, but not illegal.

                Things that are outright illegal such as pedophelia, violence, threats, and the like already tend to be removed fairly quickly without much fuss. But stuff like “I don’t like black people”, “black people are of a lower IQ”, or burning a cross are offensive and stupid, but AFAIK not illegal.

                It seems to me that just repealing Section 230 or enacting some other legislation is not really all that effective as such against the more common versions of hate speech, misinformation, or other problematic content one might encounter.

                I don’t really know what a solution to this might be though; I’m not so sure federation is the answer either (actually, I’m pretty sure it’s not).

              1. 1

                I remember seeing this and thinking that “invasive behavioral information” is critical to how the app functions across devices. If you never read kindle on your phone then your kindle or tablet, perhaps this seems like surprising info, but if you do it’s not immediately obvious how any of the syncing would work without it. The whole piece seems to miss this. But if the bookseller knowing where you are in the book on their locked down device bothers you… well… try a different system.

                1. 3

                  The article details a lot of information that is being sent back that is not relevant to the reader’s position in the book. In fact, that information is sent to an entirely different URL than the metrics data.

                1. 16

                  FYI, “дроворуб” is a Ukrainian word, not Russian. Russian word for lumberjack is дровосек.

                  https://uk.wikipedia.org/wiki/%D0%94%D1%80%D0%BE%D0%B2%D0%BE%D1%80%D1%83%D0%B1

                  1. 17

                    I’m not sure if the people who try - and then publish - such attributions care a lot about precision, and “The Russian” is still (or again) a popular boogeyman while “The Ukrainian” is probably a friend to the West or something.

                    I suppose the interesting part is “malware targeting Linux”, everything else is political fluff.

                    1. 16

                      It’s a straw man argument. If anyone cares, first of all, “lumberjack” in Ukrainian is – лісоруб. Here’s the data from Google trends with both terms [1] usage ang geo.

                      Regarding the “drovorub” rootkit’s name. It has nothing to do with wood, per se.

                      The rootkit exploits Linux kernel modules. Modules are hardware drivers (mostly). Drivers are in russian slang – дрова (drova) – wood (engl.). “Drova” (as slang term) is nothing but short native Russian word sounding almost like “drivers”.

                      The second part of “drovo-rub” is derived from verb – рубить – to hack (engl.) ;)

                      Thus, whoever coined that rootkit’s name speaks russian really well and captured the idea of the rootkit’s approach in its name quiet well.

                      [1] https://trends.google.com/trends/explore?date=all&q=%D0%BB%D1%96%D1%81%D0%BE%D1%80%D1%83%D0%B1,%D0%B4%D1%80%D0%BE%D0%B2%D0%BE%D1%80%D1%83%D0%B1

                      1. 5

                        It’s a straw man argument.

                        It’s not an argument at all. I don’t care, really.

                        Proper name for lumberjack in Russian is лесоруб. Дроворуб makes no sense, unless it is a play on дрова. Which, I must say, is quite likely! Good catch!

                      2. 0

                        So what? If a German gives their malware a Polish name, you would argue it wouldn’t be German in origin if the NSA claimed it was German?

                        1. 14

                          I don’t claim anything apart from what I’ve actually stated. Don’t put words in my mouth.

                          Why is even it relevant that this is an Ukrainian word, not Russian? Because of this section in original pdf 0:

                          Why is the malware called “Drovorub”, and what does it mean? The name Drovorub comes from a variety of artifacts discovered in Drovorub files and from operations conducted by the GTsSS using this malware; it is the name used by the GTsSS actors themselves. Drovo [дрово] translates to “firewood”, or “wood”. Rub [руб] translates to “to fell”, or “to chop.” Taken together, they translate to “woodcutter”or “to split wood.”

                          This quote is from “Attribution” section of the document. While the name is not the only (I hope) reason why they attribute this malware to GRU, making a bullshit claim that anyone with dictionary can disprove… it’s just sloppy, sloppy job.

                          EDIT: although, technically, they only say how the word translates to English. They never say from what language.

                          1. 4

                            Actually, they do:

                            The name of the malware means ‘woodcutter’ in Russian

                            1. 4

                              I asked you whether you would make the same argument if different nationalities were involved, because it seemed like you were arguing the malware wasn’t Russian because the name is actually Ukrainian. I don’t think I could know you were merely correcting something buried deep in the PDF linked in a sibling comment.

                              1. 4

                                That would be an assumption in good faith.

                                1. 1

                                  I didn’t assume anything: I asked a question to see if something could be assumed.

                                  As a sibling comment thay has appeared in the mean time shows, it was not a weird thing I only considered a possibly valid assumption.

                                  There are many Russia and China (and US) apologists, so nationalism as a motivation is always something to at least consider.

                                  That Russia is responsible for shooting down a passenger plane of ours and still attempts to blame it on Ukraine doesn’t help for this particular case.

                        1. 4

                          Gary Bernhardt is easily the most charismatic speaker in our industry; his way of presenting is very engaging. I had not seen this one, thank you for posting it.

                          Since this is 5 years old, and he doesn’t posit a “fix” for ideology- simply awareness: is there a way of asking yourself if there is an underlying unknown assumption to your belief?

                          1. 6

                            I don’t think it makes much sense to look for a “fix” for ideology, everyone has an ideology. If you think you don’t have an ideology, that just means you’re not aware of it (there is a great expansion of this idea in “The Pervert’s Guide to Ideology” mentioned above). The key is, as you say, being aware of how it affects your opinions, and that’s a matter of making a habit of questioning your assumptions, just as you would do as a part of the scientific method.

                            1. 2

                              In my experience, some useful ways to “unpack” one’s motivations include:

                              1. retrospective: Think about some particular past decisions and your thought processes behind them.

                              2. active: In the context of a decision you want to make, write down your thinking, motivations, emotions, and reasons about the decision.

                              3. proactive: As a future-oriented thought experiment, visualize a particular decision you need to make. Think about how you will respond. Write down your thinking, rationale, emotions, and so on.

                              This can apply to many kinds of decisions: personal, financial, technical, organizational, and so on.

                              All of these are just a starting point.

                            1. 1

                              Wow, I wasn’t aware of this and it seriously impacts almost every web app I’ve written. That’s rough. It does look like they’re excluding WebSQL though, probably because not many websites use it and it’s not supported across all browsers.

                              1. 5

                                WebSQL is not on the standards track AFAIK and might be removed in the future.

                                1. 1

                                  Correct, it’s been on the chopping block for a long time.

                              1. 0

                                It perhaps is easier for you as a developer but I’d argue that a lot of web applications should be local software that can be used without an internet connection and GB’s of RAM. Let’s not forget what the web is doing with regards to surveillance capitalism (ie messing up democracies because it makes some people money) and climate change.

                                1. 6

                                  Isn’t that less a problem with the web platform and more of a problem with specific web sites? Additionally, surveillance conducted on the user of a web site is much easier to see, simply by dint of the developer tools built into all browsers. The effort you need to put into seeing what a desktop application is phoning home about is much greater.

                                  1. 5

                                    In some cases, that’s completely true. But for some other use cases, i refuse to install a software when the service could be equivalently used as a website (this especially applies to mobile apps). Also, be aware that applications are as capable of surveillance (if not more) as web apps. The app that hosts all others for most people being Windows, is actually a a spying machine.

                                  1. 2

                                    What we really need are better databases.

                                    1. 3

                                      I feel that databases already stepped-up their game, but somehow people are not up to date with all the improvements. A lot of developers I meet have no clue how to optimize database and generally treat it as a black box. A lot of companies would rather hire someone with ReactJS experience, than DBA experience :)

                                      1. 2

                                        I obviously have no idea what you have in mind but I agree and am intrigued. (Even so, it’s interesting and instructive to see how the whole noSQL cycle went down.)

                                        1. 1

                                          What I mean is that I have spent some time with PostgreSQL’s views, triggers and row-level-security to glimpse a future where a lot of business logic gets encoded in a non-imperative way very close to the data. We are not there yet, though.

                                          It would be nice to be able to store the schema in a git repository and be able to statically check that all your views and procedures are compatible with each other. It would also be nice to have a tool to construct the best migration path from the current schema to the new one where you only instruct it on the missing bits (that is, how did the shape of data changed).

                                          I think that a tight type system and some good tooling might be able to combat the complexity much better than service oriented architecture that still needs a lot of attention on coordination and API stability. If a team changed their public views, they should immediately get a type error or a QuickCheck test suite should notify them that they broke something. They could share ownership and modify dependent code themselves more easily.

                                          1. 2

                                            This is indeed the technical platform I introduced at my last job and am using for my current project!

                                            It would be nice to be able to store the schema in a git repository

                                            I’m using the excellent setup pioneered (?) by the PostgREST/Subzero project:

                                            https://github.com/subzerocloud/subzero-cli

                                            It’s very simple actually: build up your schema with idempotent SQL scripts split up into a regular text files according to your taste (you can place them in a hierarchical file structure that fits your software model). Just use \ir path/to/script.sql to run all the scripts in order from a top init.sql file. For example, from init.sql, call one script to set up db users, another to create schemas and set basic permissions on them, then call one script for each schema which in turn calls sub-scripts to set up tables, views, functions… All of this happens in regular text files, under version control. Reloading the entire db strucure + seed data takes about a second, so you can iterate quickly.

                                            Now, the great thing that subzero-cli gives you is a way to turn the resulting schema into a migration (using the Sqitch stand-alone migration tool) by automatically diffing your current schema against the last checked in schema. (This involves a little dance of starting up ephemeral Docker containers and running a diffing tool, but you don’t really notice.) So you get a standard way of deploying this to your production system using simple migrations. (Sqitch is a pretty great tool in itself.)

                                            be able to statically check that all your views and procedures are compatible with each other

                                            Here you’ll have to rely on automated tests, like pgTAP or anything really that you prefer. Python is very well supported as an “in-database” language by Postgres and I’m working on writing tests using the wonderful Hypothesis library and run them directly inside Postgres to thoroughly test functions, views etc.

                                            It would also be nice to have a tool to construct the best migration path from the current schema to the new one

                                            Again, handled very well by subzero-cli, relying on apgdiff (apgdiff.com, yes it’s “old” but subzero maintain their own fork which gets small tweaks from what I’ve seen).

                                            I obviously agree with the rest of what you wrote :) If you put PostgREST, PostGraphile, or Hasura on top of your “smart” postgres system, you can give teams quite a bit of flexibility and autonomy in structuring client-server communication for their use cases, while keeping the core logic locked down in a base schema.

                                            1. 1

                                              It would be nice to be able to store the schema in a git repository and be able to statically check that all your views and procedures are compatible with each other. It would also be nice to have a tool to construct the best migration path from the current schema to the new one where you only instruct it on the missing bits (that is, how did the shape of data changed).

                                              Unless I misunderstand you, these tools already exist, at least for MySQL, PostgreSQL, and MSSQL. The compatibility checking does need to happen by deploying the schema, but the rest is there now.

                                              1. 1

                                                The compatibility checking does need to happen by deploying the schema, but the rest is there now.

                                                I am pretty sure that checking of procedure bodies only happens when you run them.

                                                Can you share links for the tools? I am not aware of them.

                                                1. 2

                                                  Yeah, stored procs are not statically analyzed in any of the tools I know. SQL Server: https://docs.microsoft.com/en-us/sql/relational-databases/data-tier-applications/deploy-a-database-by-using-a-dac?view=sql-server-ver15 MySQL: https://www.skeema.io/ For Postgres I know I’ve seen one or two tools that functioned in this way but I don’t seem to have saved the link.

                                        1. 9

                                          It’s s really wordy article for no real reason: took about 3 paragraphs to just state “developers don’t like reviewing large PRs”.

                                          1. 5

                                            Also, readers don’t like wall of texts that span 25 screens… ^^ This guy must be writing java entreprise code :p

                                            1. 0

                                              I agree it was perhaps too wordy but that TL;DR doesn’t do it justice at all.

                                            1. 3

                                              I’d be interested to see your approach to unit testing your functions.

                                              I’ve worked on large codebases in the past where the majority of business logic was contained in stored procedures (SQL Server) and it was really a nightmare to work with - difficult to test, difficult to understand, and not DRY at all.

                                              I’m sure there are better approaches to the philosophy of business logic in the database than that codebase, though.

                                              1. 6

                                                Thanks. My unit tests are in the Github:

                                                Tests of the hidden functions and triggers:

                                                https://github.com/sivers/store/blob/master/store/test-db.rb?ts=2

                                                Tests for the API:

                                                https://github.com/sivers/store/blob/master/store/test-api.rb?ts=2

                                                I just have a “fixtures.sql” file with some sample data I use for testing, a mini version of real-world data, and before each unit test it dumps and reloads the fixtures file. I’ve been doing it this way for years, and it works great.

                                                To me, it’s all quite DRY. Any outside code becomes more like the “Controller” in MVC, just calling these stored procedures, since all the data model and logic is in the PostgreSQL functions. I love it.

                                                1. 1

                                                  I too have worked where stored procedures proliferated. Unfortunately tests were really ad-hoc. Because risk of breakage was great, any changes were generally implemented as a clone & modify, leading to a profusion of almost identical stored procedures. It was not pretty.

                                                  I don’t know if it was the culture or the tools that made tests so rare at that place. You could test stored procedures with a test suite written in any language that lets you call stored procedures conveniently, and has decent test infrastructure.

                                                  1. 1

                                                    I worked at a place that had 20 million lines of Oracle PL/SQL doing all the business logic. Breakage and debugging was really stressful for the DBAs.

                                                  1. 21

                                                    I worked with a smart data contractor who loved and was good at SQL and who quit over trying to work with a large aggregation query in SQLAlchemy.

                                                    Maybe don’t be dogmatic and try not to fight with your tools. SQLAlchemy can execute literal SQL just fine if that’s what you feel you need to get the job done.

                                                    testing on sqlite when you target postgres is risky

                                                    This is convenient but not really the point of being vendor-agnostic. I’ve used SQLAlchemy a few times to swap vendors, not from testing to production, but from one production instance to another. No problem.

                                                    It makes any type-checking or linting impossible.

                                                    Are you telling me you have type-checking and linting for SQL as literal strings?

                                                    Migration is a hard problem.

                                                    Migrations with Alembic are pleasantly easy.

                                                    1. 8

                                                      Are you telling me you have type-checking and linting for SQL as literal strings?

                                                      If you look at the author’s tool (linked in the article) it seems like he’s going the direction of using the SQL as the source of truth, but not just as strings – the tooling is deriving migrations and object models from that, rather than the other way around. It seems like it could be a viable way of doing things.

                                                      1. 4

                                                        I looked at it and it may be viable at some point in the future, but it’s woefully underpowered in its current state. Checking off all the items on the list of what doesn’t work still won’t reach parity with the current function of Alembic.

                                                        Edit: tying migrations to git history seems fraught, given “anything that messes with the git history (like a rebase) is deeply confusing to this tool and will result in bad migrations.”

                                                        1. 3

                                                          Agreed, it’s far from complete or ready for real use. But the fundamental model seems to hold some promise…particularly for shops where there is already a lot of SQL expertise and the whole “hide the database bits from the programmer” approach of many more standard ORMs doesn’t make as much sense.

                                                          Edit: 100% agreed on the git part…it felt kind of icky even before that comment, and doesn’t sound like it’s going to handle the reality of how repos evolve.

                                                          1. 3

                                                            It would be better to keep the schema history in the source code. I’m not a database expert by any means, but I have the feeling this must have been done already…

                                                            1. 3

                                                              Alembic does this. Revisions to the data object model are in git history, but the migrations are represented as a DAG of scripts all available at the HEAD of the repo.

                                                            2. 2

                                                              SQL Server has support for automatic schema migration using DACPACs, which generate a migration script based on definitions from source code (e.g. a git repo) and comparing that to a given database. There’s a tool called skeema that does a similar thing for MySQL, although it is not nearly as full-featured.

                                                              Since schema drift is a thing, I think this approach makes more sense than generating migrations from git history alone.

                                                              1. 2

                                                                Only problem now is that you’re stuck using SQL Server =^.^=

                                                            3. 2

                                                              It says it’s a beta. If your repo is dedicated to SQL and you don’t allow for history changes then it seems okay? Not very practical once a security key ends up in that repo and needs wiped, but here we are :D

                                                              I am curious how they intend to support reverse migrations, though.

                                                          2. 2

                                                            I’ve used SQLAlchemy a few times to swap vendors, not from testing to production, but from one production instance to another. No problem. [emphasis mine]

                                                            Wow. That surprises me. We sometimes can’t even change versions across the same vendor without usually reading the sqlalchemy docs closely and thinking ahead a little bit.

                                                            Migrations with Alembic are pleasantly easy

                                                            Unless your app uses multiple databases… Maybe it is better now, but man was it it unpleasant last time I tried to do it – sqlalchemy itself wasn’t /too/ bad, but the intersection of it and alembic and a declarative (not reflected) schema was painful.

                                                          1. 2

                                                            surface mounting mechanical components is just asking for trouble. Through-hole FTW.

                                                            1. 1

                                                              But that costs more money, no?

                                                              1. 1

                                                                Yeah, not as easy to machine-assemble. If the rest of the board is surface mount you’d be adding a whole new manufacturing step, though I imagine attaching displays and such isn’t part of the surface mount step either.

                                                                1. 1

                                                                  Perhaps. I’m just a software person that works closely with hardware. I don’t know a lot of the details of hardware manufacturing.

                                                              1. 3

                                                                Aren’t a lot of the suggestions made implemented in Yarn?

                                                                1. 4

                                                                  The last suggestion, no fuzzy dependency versions, is how Yarn’s lockfile (which predates npm’s lockfile implementation) works. As I understand it, the other suggestions would have to be implemented by the registry itself, rather than a client like Yarn.

                                                                  1. 1

                                                                    Ah, that makes sense. For some reason it didn’t click with me that yarn didn’t have its own registry parallel to the npm registry.

                                                                  2. 2

                                                                    The important ones can’t be implemented without getting away from the npm infrastructure and (global) namespace.

                                                                  1. 1
                                                                    The object may not have neither of the two fields.
                                                                    The object may have only field1, and not field2.
                                                                    Only if the object has field1, then it can have field2.
                                                                    

                                                                    It seems to me that the way these are worded is rather confusing. I can figure it out eventually going through each line, and the #2 creates a contradiction with #3. It seems to me that if #3 is true, then #2 cannot be true.

                                                                    1. 1

                                                                      It definitely could be more clearly worded. I think it’s saying:

                                                                      • Both fields are optional
                                                                      • field2 can only be supplied if field1 is supplied as well
                                                                      1. 1

                                                                        Thanks zzing and chenghiz for the kind suggestion! I’ll tune down the poetical approach to writing.

                                                                    1. 3

                                                                      Thanks for making the effort to document this!

                                                                      btw, you’d probably avoid weird breaking changes on minor versions like this by using LTS (even-numbered major) versions of Node.

                                                                      1. 1

                                                                        Sweet, TIL Node has LTS versions.

                                                                      1. 2

                                                                        The white paper linked from this article is worth a read as well. For some reason it never occurred to me that it’s possible to read the contents of memory with an electron microscope.