There’s a lot of browser specific setup and a whole nodejs server included in this. I would not qualify this as “pure” CSS keylogging. Neat stuff though!
The actual keylogging inside is done inside of CSS; you’d still need something to exfiltrate it with.
[Comment removed by author]
It works if you can get style injection on a site already using JS to update the .value, though. (e.g. React)
[Comment removed by author]
All databases suck in their own ways, but historically I’ve found the hype around postgres to be pretty unreasonable. They’ve closed many of the severe operational gaps in the last few years, but you still see a lot of pain when pushed hard. Everyone eventually gets burned by the architecture’s performance limitations, multitenancy deficiencies, vacuum issues etc… It’s sort of like the redis of the relational world: lots of features, great for developing features againts, but gives SRE’s headaches when pushed into production at orgs with diverse tenant needs / high connection churn / high throughput / high concurrency / etc… Over time it will overcome these, but I don’t trust it with demanding usage for the time being.
Could you expand more, or maybe share a link, about the Redis critique? I’m curious to learn more about that.
I’m a noob if it comes to databases. Could you tell what alternatives to Postgres are there? I understand that answer is mainly: it depends. But maybe you could write a couple of “if this then that” alternatives?
I imagine that icefall was comparing it to MySQL. The best thing MySQL has going for it is that many operations teams know how to run it at scale and it scales well. Developers don’t need to spend a ton of time to make their SQL performant, they can just USE INDEX. Postgres is more featureful and “correct”, but it isn’t easy to run at the very high end of performance.
That being said, not many use cases require that high end performance and can get along quite well with Postgres.
AFAICT, most software does not run an especially demanding workload.
When the value your software generates is high compared to the (computer-and-therefore-operational) load it generates, correctness and features are in higher demand.
Does anyone using Go in production cases have any numbers on binary sizes for larger team sized projects? My personal projects have never been very large in code, even if the binaries are large by compiled standards, and the idea of having 500MB in dependencies seems like you’d have to be doing a awful lot, or have a crazy set of dependencies, especially when my usage of go puts it usually around 10-20MB in binary size.
I have access to one of the oldest continuously developed go code bases in existence outside of google. The optimization server (from https://techcrunch.com/2013/07/24/walmart-labs-scoops-up-website-optimization-startup-torbit-to-help-it-keep-pace-with-amazon/) is a single go binary clocking in at 17mb.
This app handles traffic for the 37th most popular site in the US.
Most of our other “microservices” style binaries weight in around 10mb. They generally do something like read from a queue, do some work, and put a result in object storage or another queue.
My code is at 14-15MB as well, pretty consistent across the binaries of two projects entirely created by me.
Looking over my bin folder again, I think that the dividing line is “does the program use net/http? if so, it’s >10MB, otherwise it’s <10MB.”
Some of his comparisons are really apples-to-oranges: obviously downloading a binary build of the JVM – or any runtime! – will be smaller and faster to get moving with than totalling up the source code, compiler, build dependencies, and compile times of another.
All of his comparisons dance around the real issue. Why do people complain that the JVM (particularly wrt clojure) feels heavy? To crib a line from James Carville: it’s the interactive performance, stupid.
irb starts imperceptibly fast. A bundle exec rails c with a spring preloader takes ~5 seconds on a moderately large Rails project (with several large engine dependencies). A simple lein repl outside a project directory, so it’s only loading the bare minimum, takes a slow count of 10 seconds.
The clojure project has been really open about why it is that just bringing up a simple repl is as slow as it is (it boils down to ‘loading all of the separate .class files needed is slow’ ), but the impression it leaves people with is, unsurprisingly, that these tools are very heavy and sluggish.
That’s all true. But what’s a bit silly to me is that the JVM doesn’t have to be that way. I have a very small script I use, jn, that is basically nothing but a thin shim to java -Xms32m -Xmx32m -client -noverify that I use for little utility programs.
How well does it work?
benjamin@Reliant ~/s/jcli> time python -c 'print "Hello, world!"'
Hello, world!
0.02 real 0.01 user 0.00 sys
benjamin@Reliant ~/s/jcli> time jn -cp build/classes/main Main
Hello, world!
0.08 real 0.05 user 0.02 sys
benjamin@Reliant ~/s/jcli>
So, slower than Python, but plenty fast enough for me to view it as “instant.” I know that Clojure’s a lot more complicated for tons of reasons, but there’s no reason you can’t have irbesque startup times on the JVM.
I totally get it, but at the same time this is the big problem I have with java. It’s crazy to me that you have to do this at all. And every time I deploy a java package, I know that some time in the near future I’ll have to dig in and learn precisely the correct GC / memory settings to even get the damn thing to run without OoM-ing every so often. Why is it like this, and why does it seem so accepted as okay?
Depends on your viewpoint, I guess. In Python, Go, and many other languages, you can’t tweak any of these things, so you risk having an OS-level OOM and/or fragmented memory if you have a leak. (Ruby actually now appears to have similar GC/VM flags.) Java instead defaults to crashing if you have a leak, but at least lets you choose not to with the Xmx and Xms flags. And the defaults are such that, while I understand you may’ve had a different experience, most normal Java programs can go with the default. (Which is a gigabyte, incidentally. If your program needs more than that, I think it’s reasonable to learn a bit about the GC. In C, I’d at least consider a custom malloc in a similar situation.)
In other words, for normal operation, the defaults are sane. For extreme situations, the knobs at least exist.
In this particular case: I am turning off bytecode verification (my code, my rules); turning off the JIT (it won’t run long enough to be worth it); and setting both the max and min heap to 32M, effectively avoiding any GC passes. Do you need to do that? No; on a recent JVM, my tweaks now appear to shave off a whopping .02 seconds. But at least those knobs are there if I want them.
setting Xmx == Xms means avoiding any GC passes
My gut reaction was to say “Be careful with that” - but you already are, aren’t you?
This is for command-line programs, not a server program. You don’t have heavy garbage cycles or weakref caches. Minecraft uses caches for loading data from disk, and setting Xmx == Xms is a recipe for disaster - as soon as you get near the max memory, the GC starts to thrash and it appears like the server is hanging (until all the clients time out, at which point it goes back to normal - because it doesn’t need as much memory anymore!).
So, yeah - that’s why they’re tuning flags.
Only allow reset while core is running
I think I’ve been affected by that bug, actually! Nice writing on that section ;)
Interestingly, C and C++ have a mechanism that can often solve this problem: const. You can define arguments as being const, i.e. unchangeable:
In Java, I would spell that as wordcount(ImmutableList.of(words)), and then get a runtime error when the function tries to modify the list.
Not ideal; merely workable.
Might be ignorance on my part but I’m not aware of anything that would be thought of as infrastructure code written in rust or go yet - is this changing? Either way, seems like would be a pretty big endorsement for either language.
Go has a pretty strong contender to ‘infrastructure’ with dl.google.com running in Go. This site serves ChromeOS, Android, Chromium & .deb updates. Additionally I believe cloudflare had parts of their infrastructure in Go and they are in-front of a lot of web pages.
I’m aware of a fair few things in go - a few examples:
… and the Hashicorp tools too (pretty much everything other than Vagrant).
AFAIK Digital Ocean uses Go for a large portion of its backend code (this was mentioned on the Gotime podcast, I think in episode 17).
One thing that I do wonder about NTPSec - was it really easier to start by stripping town classic NTP instead of adding the few missing features to the excellent (and secure) OpenNTPD?
I’m pretty sure Dropbox has some amount of their infrastructure written in Rust, but most is still in Go. I’m not sure they’ve written about it in a blog, but there is at least an HN comment.
This suggests it may have been written in Go before. It’s hard to tell though.
Let me see if I understand correctly: SQL is insecure because mainstream programming languages don’t have good interfaces to SQL databases?
Bad programmers will write bad code using any tools, but that’s really besides the point. OP argues that SQL is insecure because:
Raw SQL strings, prepared statements and ORMs are all interfaces between databases and programming languages. Unfortunately, none of them is perfect:
The real problem is types. Most programming languages don’t have sophisticated enough type systems to model the operations of relational algebra. (But some do!) In particular, nominal types don’t help. For example, if you have two classes Customer and Order, there is no type-level operation that can produce a third class corresponding to select * from Customer C join Order O on C.CustomerID = O.CustomerID. This is a very sorry state of affairs, and it absolutely isn’t SQL’s fault.
Prepared statements are a chore to use and they don’t buy you that much security, because you’re still manually supplying strings.
How so? The strings you do supply to prepared statements are incapable of changing the pre-prepared parse tree, which is the big insecurity of smashing random strings together.
I agree I’d rather use an ORM with correct-by-design types, though.
If a new maintainer of your software needs to add a field of a weird type to the query, will they learn how to add that flavor of field to the prepared statement, or will they interpolate a string in just this once?
I know what I would do with Go+Postgres: add a type conversion from string $1::json (etc) in the SQL, then marshal the data to string right before the query and give a string to the driver in Exec() or Query().
I’m actually not sure what the “correct” way to do that would be. One that popped out from the documentation is to implement sql/driver.Valuer on a local typedef or something like that. But that’s a massive pain in the behind and also depends on driver internals.
That isn’t really true. Ur/Web rules out invalid queries at compile time. But this requires two things:
Oh you’re talking about the strings submitted for the prepared statements, not the user input filling in the ?s. I misunderstood and was talking about runtime input.
Yes, I was primarily talking about the strings submitted for the prepared statements. However, even the user input filling in the ?s is often less statically checked than it could be. Will Java’s type checker complain if you attempt to suply an int where the database would expect a varchar? Ur/Web’s will.
How exactly can you argue in quantitative terms the difficulty of using prepared statements?
Isn’t the difficulty of a thing somewhat subjective?
This whole post seems like… satire
Nobody guarantees that the result of preparing a statement will be meaningful according to the database schema. That’s the difficulty.
Contrast with Ur/Web, where the type-checker makes sure that your SQL statements make sense.
You beat me to it. I was going to add Opa language, too, as it raises the bar vs common options. One could throw in memory-safe languages like Component Pascal or concurrency-safe languages like Eiffel or Rust. Like the web languages, these simply don’t allow specific classes of problems to occur unless the developer goes out of their way to make it happen. Always good to design languages to knock out entire classes of common problems without negative impact on usability if possible.
The real problem is types. Most programming languages don’t have sophisticated enough type systems to model the operations of relational algebra.
Types, yes. Type systems, no.
Q has tables, and operations that work on tables. There’s no reason a lesser language like PHP couldn’t do this, it’s just that PHP programmers don’t do this.
Prepared statements are a chore to use and they don’t buy you that much security, because you’re still manually supplying strings.
If you move your authentication into the database (like with row-level security) then your web-layer can simply authenticate against the database and run the prepared queries like an RPC. The biggest problem I see people have with prepared statements is if they have inadequate tooling and don’t invest in it. (Migrations are a dumb and painful way to program, and while commercial offerings are much better, open source is very popular)
My day job is to maintain a rather large ERP system. You know, the kind where the typical table has 40-50 fields and the typical primary key has 5 fields. The kind where people are afraid of altering existing tables, because who knows what queries might be affected, so they create another table with the same exact primary key, whose rows are intended to be in 1-to-1 correspondence with the original table, even though that will only make things harder in the long run and we know it.
This tremendous amount of pain is the price of the lack of coordination between language and database. If there were an automatic, convenient way to determine what parts of our application have to be changed in response to a given change in the database, I estimate that we could be twice as productive, while at the same time creating less technical debt. This is precisely the problem type systems solve.
This tremendous amount of pain is the price of the lack of coordination between language and database.
I’m not disagreeing with that: Having the business logic in the same language as the database is another way to obtain that coordination, and it offers far more benefits:
A large amount of pain is had in synchronising the continuous single history of “the business database” with the many-branches of modern software development. Building directly on top of the database, and solving the problems that you need in order to do that eliminates pain that you never thought possible, like writing migrations or having to maintain test databases. A type system doesn’t help me get there.
I estimate that we could be twice as productive
Using the same language for your database and your application wins much more than 2x. I would say it wins 10x or even 100x.
The kind where people are afraid of altering existing tables, because who knows what queries might be affected, so they create another table with the same exact primary key
Really the goal should be to have the data in the correct shape. KDB is column-based, and column-based data stores are useful here because you don’t usually want to alter the existing table. You want to hang another column on there, or you want another rollup/index somewhere. That’s cheap (microseconds) in KDB.
Having the database contain your program also means you can easily to analytics on which queries touch which columns, which increases bravery significantly (and safely!).
My day job is to maintain a rather large ERP system.
I have a similar database, although in addition to those fat business data tables that is ingested from a bunch of Oracle/Siebel databases, it also contains very tall analytics data growing at a rate of around 300m web events per day and around 60k call records per day.
KDB also has the advantage of being quite a bit faster than other database engines, so it wouldn’t surprise me if I’m dealing with more data than you.
If you don’t know KDB/Q, you should look into it. Ur/web+postgresql is great, but it has nothing on commercial offerings.
A large amount of pain is had in synchronising the continuous single history of “the business database” with the many-branches of modern software development.
Right. We need a notion of “time-evolving schema”, allowing new data to have a different structure from old data, while at the same time allowing queries to be meaningful across schema versions. As far as I know, that problem hasn’t been satisfactorily solved yet.
Building directly on top of the database, and solving the problems that you need in order to do that eliminates pain that you never thought possible, like writing migrations or having to maintain test databases. A type system doesn’t help me get there.
You piqued my curiosity. Let’s say you have a language where tables are first-class values. Altering the structure of a table amounts to changing its type. (As opposed to inserting, updating or deleting rows, which amounts to constructing a different value of the same type.) How do you validate that every part of your application that depends on this table is compatible with the new version, without type checking?
KDB is column-based, and column-based data stores are useful here because you don’t usually want to alter the existing table.
This is a physical implementation detail. I don’t want to worry about that.
KDB also has the advantage of being quite a bit faster than other database engines, so it wouldn’t surprise me if I’m dealing with more data than you.
I’m not too worried about the amount of data I need to process. I’m worried about the complexity of the logical constraints the data must satisfy in order to make sense. Logical errors can manifest themselves even with modest amounts of data.
If you don’t know KDB/Q, you should look into it.
I will.
Right. We need a notion of “time-evolving schema”, allowing new data to have a different structure from old data, while at the same time allowing queries to be meaningful across schema versions. As far as I know, that problem hasn’t been satisfactorily solved yet.
Tooling can help a lot, though, and may be good enough. There is commercial tooling (like Control for Kx) which is basically an IDE for your database, complete with multi-user version control. It has the disadvantage of being an online tool, but it provides hints of what the correct solution might look like to me.
This is something I’ve been thinking about for a while.
Let’s say you have a language where tables are first-class values. Altering the structure of a table amounts to changing its type.
However adding a column doesn’t affect code that doesn’t use the column.
How do you validate that every part of your application that depends on this table is compatible with the new version, without type checking?
Static analysis remains possible without type systems provided you don’t learn column names from the network (and if you do, your type system would be incomplete anyway).
This is a physical implementation detail. I don’t want to worry about that.
I know you don’t, but removing abstraction is reduces program size (and therefore bugs), and increases program speed so much that I think it’s often worth thinking about the fact we are meat programming metal. Bugs mean fixes, which is programming we didn’t plan for, and slowness generates heat that harms the environment. And so on.
If you want to change the type of a column from an 64-bit unix-seconds to a 32-bit time and a 32-bit date (KDB has native date types, btw), you have to decide:
And so on. These are real considerations that affect a real system. If we could only sit in our purely-software universe and have enough abstraction, we could make our decisions on what makes better software (asking for a date and getting a date is probably better than doing arithmetic on seconds – and what happens when the calendar changes, anyway) but someone has to solve them, and unfortunately a type system doesn’t actually solve these problems.
A type system only helps with the same part of the problem that tooling solves: Static analysis can find the code, and having a real table “type” means you just use a couple in-memory copy of some of the rows you the programmer believe are representative, which then form your tests for regression tracking.
However having views and a real table type (i.e. doing the database in your programming language) means (performance) testing is easier, there’s a migration path for the data, and you’ll have a good handle on what the real user-impact is.
I will.
Awesome. It is not easy to get into without a commercial need, but the #kq channel on freenode contains people willing to help answer questions. It’s not as high-volume as #ocaml so you might have to wait for the earth to turn and someone in the right timezone to wake up :)
That’s a good statement of a important point. If the simplest, most obvious way to use a tool isn’t secure, we must consider the system fundamentally insecure because that’s what will happen in practice. The programmer’s UX of security concerns is vital.
That seems mostly reasonable to me. Using SQL in PLs where the default way to use it is by passing in ordinary strings that contain code is indeed insecure. Imagine if mainstream PLs had us defining and calling functions by calling eval() on strings all over the place: I would expect that to lead to terrific quantities of horrid security problems too. I accuse that passing a string to sqlite3_exec() or mysql_query()or PQexec() is equally as scary as passing a string to eval() because RDBMS query languages are either powerful enough to execute arbitrary code or complicated enough to inevitably have bugs that can be leveraged into arbitrary execution.
I’ve seen an interesting alternative in one of C J Date’s older books, “An Introduction to Database Systems”. He has examples of relational queries embedded directly into a language that looks like PL/1, where the queries are actually fully parsed at compile time. I think they had all the niceties, like references to ordinary lexical scoped variables in the queries turning into code that does all the correct binding at runtime and everything.
I’m thinking that one could make a much safer language be just as convenient as doing broken string concatenation is in current PHP, by using quasiquoting, reader macros or just straight up embedding SQL’s entire grammar into the PL’s own grammar in an expression context. I’d identify “PHP with mysql_query() replaced by quasiquoting” as a safer PL than “current PHP”.
Another strategy for making SQL injection harder to write by accident that I’ve seen is in the postgresql-simple library for Haskell. The query execution functions accept a string-like type called Query for which there is an IsString instance, so you can switch on the OverloadedStrings pragma and write code like execute connection "INSERT INTO dogs VALUES (? ?);" (name, cuteness) — so the correct, parameterised-query pattern is easy and convenient to write. At the same time, the incorrect string-concatenation code is still possible but much less convenient, so you’re much less likely to write it it. While you can build Query objects from strings, the syntax to actually do that is longer and involves looking up more stuff than the syntax for putting parameters in your queries.
IIRC there are also quasiquoters that let you write that as something looking like [sql|INSERT INTO foo VALUES (${name}, ${cuteness});] as an expression and automatically turn that into the above parameterised-query.
In all of the above, anywhere I refer to “PHP” you may instead read “any PL in which you use SQL by passing an ordinary string to a function or method with query or execute in the name”, i.e. very nearly all of them. PHP only does slightly worse than average here because mysql_query() comes bundled with the runtime but you have to install an ORM on purpose, whereas plenty of other PLs come with neither SQL bindings nor an ORM so it’s almost equally difficult to install the ORM or the raw SQL binding.
Imagine if mainstream PLs had us defining and calling functions by calling
eval()on strings all over the place: I would expect that to lead to terrific quantities of horrid security problems too.
This gets to the heart of my position. Very well said.
Honest question, if not Stack Overflow, where to get help from? Sometimes I don’t have to post anything, the existing questions already solve my problem. I can’t think of any other community where I can get help from. Reddit works sometimes, but not always. Related IRCs work, but get lost in other noise. So, where?
That email is from the openbsd-misc mailing list. Lots of open source projects have their own lists where you can get help. C++ people still use usenet (comp.lang.c++).
From my experience (not talking about OpenBSD), a lot of those mailing lists don’t provide a user/developer support role, and are often far more toxic than StackOverflow.
and in a lot of cases, mails or posts simply go unanswered in dedicated project support channels.
You can say what you want about StackOverflow, and a lot of the problems mentioned here and in other discussions are real and serious problems, but they still have a huge body of useful information for a lot of problems people encounter.
Usually a busy-ish open source project will have several mailing list channels. One is typically dev’s only chatting about patches and the like, one is for announcements only and one is for users to chit chat.
Make sure you choose the right one. If it’s “How do you do this?” type questions always go to the user one.
If it’s a “I think there is a bug in…” make sure you have a good repeatable shortest possible test case in hand and then try the devs list.
Even better than a nice neat repeatable test case, is a nice neat repeatable test case and a (small) patch off the mainline that fixes it.
If you say something like, “Your code is crap. It doesn’t work in my companies million lines of proprietary spaghetti which you can’t look at”… Yup. Count yourself lucky if your question goes unanswered. Sometimes the toxins are there to kill stupid.
Always show some signs that you have, indeed, Read The Fine Manual, such as exists and maybe the unit tests for the functionality you using.
I have pretty much near 99% success rate in getting excellent answers from every open source mailling list I have interacted with.
Be prepared to read code, some of the best answers come in the form, “Ah, I think that’s handled somewhere such and such a file… Have a look at the comments and the test cases for function …”
Be prepared for the answer to be, yup, it’s fixed in version x.y
That all sounds like a lot of mental load to get a quick answer that’s blocking my work.
He who asks low (or no) effort questions should expect low (or no) effort answers.
However, friction and entropy exist in everything so that should be…
He who asks low (or no) effort questions should expect very low effort answers if they’re lucky, snarks if they aren’t.
Stack Overflow is often references more than official documentation, and it’s way way better than what we previously had: Xperts Exchange (which had the answers at the bottom, but was setup so it looked like you had to pay to see them).
They might have their issues, but I still have found the Stack Exchange sites really useful. Until I read this post, I wasn’t even aware of the massive deletion problem. I don’t think any of my posts have been deleted, but there’s no way to know for sure.