I’m prepping for a work meeting in an hour I can’t miss, but I’ve taken the afternoon off to rush finishing the fix for this. @355e3b has free time before then and has jumped into to help. Sorry to be vague about the in-progress security issue, but I trust y’all understand how that goes. I’ll leave a response to this comment in a few hours with full info.
In the meantime I’ve made the rate limiting much stricter. I’m sorry for the inconvenience and will revert as part of the fix.
So, the thing I was talking around is that the potential vulnerability was potentially much worse and we were trying to be thorough about it.
Hunter (@355e3b) and I dropped as much as we could to finish that work today.
To start, there was some kibitzing in chat, so to be explicit: we do think this was a potentially a viable attack.
An attacker could register a VPS at Digital Ocean to minimize network jitter, the dominant factor in this attack.
(Other things that might introduce jitter can multiply the cost of the attack but don’t fix it.)
We already see regular probing from other DO VPSs.
A few weeks before soatok’s report, some of that probing prompted me to implement some rate limiting to deal with unrelated abuse that prevented effective abuse of this potential vulnerability.
This was easy to extend to password resets.
An attacker who can only test four tokens per minute isn’t going to get very far.
Since this was reported, I was concerned that this kind of network timing attack could be used against other tokens.
The most important of these is called session_token and is used to authenticate logged-in users on every request.
Because this is every possible endpoint and not just the password reset flow, this would not be subject to the rate limit of 4 req/min and could lead to account takeover.
We assumed this was a viable attack and Hunter wrote an in-depth mitigation for it.
This afternoon I worked on finishing the mitigation and testing the attack, which required a bigger block of time than I previously could to devote to it.
I was unable to get the timing attack to work against session_token (usually pretty straightforward on localhost!) and pulled my hair out for a bit.
Eventually I recognized that the attacker can’t execute the attack against session_token because Rails stores the session as an encrypted, serialized value in the cookie.
The session cookie isn’t the value of the session_token, it’s an encrypted blob that can’t be edited by an attacker without being invalidated.
It’s a little embarrassing that I conflated the two, but I don’t much mind making an error in the direction of being over-cautious.
In short, I was uncommunicative because I’d been proceeding the last couple weeks on a much more paranoid footing than was needed.
My thanks and apologies to Hunter for his wasted effort; I’m sorry I didn’t have the time to write a proof-of-concept to test this first.
The other thing I’ve been doing the last few weeks has to do with the “sister sites” that have adopted the Lobsters codebase.
Lobsters is a site that’s open source for transparency, not to help start and run other sites.
It’s really rewarding to see the code used this way, but nobody has the spare attention to run that potential project.
Mostly this means we don’t take feature requests, but in this case it also means we don’t have a way to contact the site admins.
There is no mailing list or forum, even for security issues.
I’ve been in the process of finding admin emails and collecting translators (many sister sites aren’t in English) so we could notify them before the vulnerability was published.
If you recently got a DM from me asking you to play translator, you can ignore it, I’m proceeding from here in English for urgency’s sake.
Now that I’m confident there isn’t a worse vulnerability I’ve reverted the temporary stricter rate limiting.
I’m sorry to anyone I inconvenienced, I know if you read the site by opening several stories as tabs you probably hit this.
I think all the exciting bits are over, but am happy to be corrected if any volunteer spots something I missed, or any other issue.
My email is on my profile.
My thanks to soatok for alerting about the potential vulnerability.
As someone who runs an (intentionally-single-threaded, but multithreaded in prefetching upstream stories) fetch loop before skimming all the stuff in a text editor, a separate thanks for the transparent and friendly throttling implementation. To say the least, not all emergency-throttling implementations are so clear with the exact changes in the fetcher one needs to do. Thanks.
I know if you read the site by opening several stories as tabs you probably hit this.
I did indeed hit this! I thought it was strange, and just figured it must have been my fault and I must have had a VPN going that was using a popular IP. Didn’t think about it much but glad you explained it.
Neat writeup, and the conclusion of “hey, use split tokens and throttle reset attempts” seems reasonable.
I’m a bit skeptical of some of the assumptions, though. The feasibility of the timing attack seems…not high to me.
Relevant weakness from the cited “Opportunities and Limits of Remote Timing Attacks”, section 6.6 on application-induced jitter:
Our analysis of network jitter used a custom application on an unloaded server.
Servers, like the network, can introduce latency when under load. A realistic
application might introduce its own jitter into the response time because of
cache misses, page faults, TLB faults, lock contention and other sources. To
characterize such jitter and the ability to filter it, we analyzed the Apache web
server, version 1.3.33, both under no load and under two trace-driven loads
with a 1.1GB working set (larger than system RAM, guaranteeing pressure
on the disk system). For simplicity, the load generation clients ran on the
same computer as the HTTP server. One run was performed with sequential
requests and another run was performed with 30 concurrent requests. The
CPU was saturated during both runs.
Measurements were taken by a custom HTTP client, on a separate system,
requesting a small file over an unloaded LAN. We collected 100,000 measure-
ments over the three different workloads. As an attacker would do, our client
avoids connection startup jitter by first sending only part of the HTTP request,
then deliberately delaying to ensure that it is received and processed by the
server before it sends the remaining part of the request.
I don’t mean to disparage the authors (Wallach is a cool bro), but a couple important things are different twelve years later:
The research was initially done with a custom UDP setup; the follow work in the quoted section was TCP. It’s unclear how, say, SSL would impact this timing. It’s also unclear if newer variants of HTTP (especially ones based on QUIC/UDP) might be more or less susceptible to this timing.
For clients outside the data center, network is totally bonkers–the path goes through many intermediaries and carriers, and other traffic causes concern.
For clients inside the data center (co-locating as @soatok suggests), there’s still the question of noisy neighbors on the same or adjacent node, of software-defined networking shenanigans, of VPNs connecting backplanes, and so forth.
The tests were done on real hardware (Pentium III, 4s, and some Athlons IIRC), but modern server hosting (even at our previous host, prgmr) tends to be very virtualized and multi-tenant, which again has the problem of noisy neighbors.
A lot of modern applications have the database hosted on a different machine (or by a third-party entirely, say AWS), and again that machine may have noisy neighbors or flakey connections or what have you.
I’m not saying that this attack is not feasible, but it’d be really interesting to see a refresh of that research done today with modern application architectures and systems.
I also disagree with the assertions about Nginx, after reading the source we see the strawman in the blog post is 10-20x higher than supported by the link:
Generally, properly configured nginx can handle up to 400K to 500K requests per second (clustered). Most what I saw is 50K to 80K (non-clustered) requests per second and 30% CPU load, of course, this was 2 x Intel Xeon with HyperThreading enabled, but it can work without problem on slower machines.
I have similar concerns about the Rails performance numbers mentioned, 80K/sec with performance tuning. That means 80 requests every millisecond, which even as somebody using Go or Erlang I’d raise my eyebrows at. The linked talk, turns out, suggests that that throughput was achieved not with mere “performance tuning”, but with multiple datacenters, application sharding, and other tricks–which, if you dig into it, suggests all manner of complications for the timing attack suggested!
The suggested mitigations seem good, but it’d be nice if the numbers cited as motivating them were a little more in touch with reality.
And as always, a functioning proof-of-concept goes a long way. :)
which even as somebody using Go or Erlang I’d raise my eyebrows at
If you look at the data for go stuff on https://www.techempower.com/benchmarks/ you can see people reaching over 80k/sec with go. I think the author is handy-wavy enough at that point to tell you it’s more a hypothetical idea how hard you can hammer the system if you want and what you could get out of that, even without having google as target (which would definitely scale better than your attacks, so if captchas weren’t a thing you could test even faster..).
And as always, a functioning proof-of-concept goes a long way
The issue is currently not fixed, the idea is very basic and any person that actually needs a script for that might as well just be DdoS-ing other people because they found some script that doesn’t actually work for any other site (or just tests out the rate limits of losters).
If anything I think the blog post is a little bit too catchy and could’ve been toned down to something like “showing hypothetical timing attacks at lobste.rs”, which would’ve probably taken down the pressure a bit for pushcx, as this attack is just not that realistic.
If anything I think the blog post is a little bit too catchy and could’ve been toned down to something like “showing hypothetical timing attacks at lobste.rs”, which would’ve probably taken down the pressure a bit for pushcx, as this attack is just not that realistic.
Noted. I hadn’t gotten a response since the reply to my first email so I thought it was considered “too low severity” for them to even respond to emails about, and that’s why I published when I did. I also figured yet another follow-up email seeking clarification would be viewed as annoying.
(This post has sat in my drafts folder longer than anything else I’ve ever written.)
EDIT: I changed the title to be a bit less bombastic for Lobste.rs readers.
Totally on your side, I just thought you’re better of not riding the “look at me I’ve got an exploit” train (which the title can totally be), looking at some of the comments here. And rather underline the idea of showing off how a practical attack could look like, for something the reader may just be using.
On a different note: Would you like to tell me how good my idea here is to implement this, without adding more random ? Just very curious.
Neat link re: benchmarks, though it is also useful to consider their testing environment. And Ruby is nowhere near that good at using all those cores. :P
If anything I think the blog post is a little bit too catchy and could’ve been toned down to something like “showing hypothetical timing attacks at lobste.rs”, which would’ve probably taken down the pressure a bit for pushcx, as this attack is just not that realistic.
Indeed. The smugness of it was a real turn-off, even not including silly pictures. That’s why I’m so grumpy about the hand-waving and the misapplication of prior art–if you’re gonna be that smug, even to the point of bragging about “doing the math”…then you better deliver an air-tight case.
Like, this is why a lot of folks view infosec amateurs (not novices, there is a difference) as goobers.
I’m trying to parse what you mean by smugness. That’s not, at all, the tone I was going for.
I’m super interested in these topics, and tend to get excited and enthusiastic about them. At the same time, this is largely not a practical risk. (Hence, the rating of sev:low.)
Maybe when trying to write about something I think is cool and balance it out with a calm realist tone, I stuck the landing. I’m not surr.
It’s a hazard of being excited about the topic, I guess. :)
EDIT: I had a bunch of examples from the article, but maybe the two biggest ticket things to consider are that:
You might be interleaving your fursona all through there for comedic effect, but you might not catch when it comes off poorly in context (e.g., the one right after “this is just for educational purposes” thing).
Watch out for too many snide comments.
If you want to write that way on purpose, that’s your business–Lord knows I do it myself sometimes–but make sure it is, in fact, on purpose.
(And again, I want to say–the technical content in the article was very good and an enjoyable read!)
Interesting, but the suggested remediation strategy seems overly complex? (Or I very well may be missing something…)
When a password reset is initiated, generate timestamp-random-additionalrandom, but only store the first two parts in the database column. The additionalrandom is new (and sent to the user) but never directly touches the database.
….
When verifying password reset token, use the first two parts in a SELECT query (as currently implemented), but also re-calculate the SHA256 hash of the entire user-provided string and compare it (using a constant-time compare function) against the value stored in the database.
If it doesn’t match, first invalidate the stored token then redirect the user to the login page. This disincentivizes attacks, because you only get one bite at the apple: As soon as you get the correct prefix from a timing leak, unless you guessed additionalrandom correctly, you have to start your attack all over again.
Could you not just… add the user id to the reset link (which is pretty typical ime) & add an expiry time column? Set expiry to Time.now + 15.minutes when generating the token, look up the account by the user ID, and then hard fail the comparison immediately if you’re outside the 15 minute window?
Then you don’t have to worry about how well your database (or in this case, Ruby) implements constant-time comparison functions, because you’ve scoped the attack to a particular user and made the number of requests needed to accomplish it in the timeframe provided essentially guaranteed to melt the server before it could possibly succeed?
You can borrow the idea of wiping the token immediately on a non-match and eliminate the timestamp window, even.
Could you not just… add the user id to the reset link (which is pretty typical ime) & add an expiry time column? Set expiry to Time.now + 15.minutes when generating the token, look up the account by the user ID, and then hard fail the comparison immediately if you’re outside the 15 minute window?
There’s already an expiry window, but it’s set to 24 hours not 15 minutes.
A lot of people and companies don’t like leaking database primary keys to the public (for various reasons), so using the user ID in a password reset email would violate their tenets.
A lot of people and companies don’t like leaking database primary keys to the public (for various reasons), so using the user ID in a password reset email would violate their tenets.
I’m talking about Lobsters specifically here, and Lobsters is obviously fine with “leaking” the username, so User.find_by_username('soatok') would work just fine. Expire the key if the comparison doesn’t match what’s on the user instance, and you don’t have to mess around with chunking SHA-256 hashes and find constant-time comparison functions, no?
I’m really skeptical, for two reasons that were already mentioned in the article.
First, the article links to a 2009 study about using timing attacks to obtain secrets over the Internet. But, most of that study is trying to determine a secret with tens of microseconds of processing time. Here, since the string fits on one cache line, we’re measuring CPU performance (not memory performance) and need to observe a difference in tens of nanoseconds of processing time.
Pinging lobste.rs shows that I get milliseconds of difference each time. At a minimum, this implies that there need to be many attempts per comparison to just establish what the cost of that particular comparison is.
64 bytes from 67.205.128.5: icmp_seq=1 ttl=53 time=85.6 ms
64 bytes from 67.205.128.5: icmp_seq=2 ttl=53 time=84.8 ms
64 bytes from 67.205.128.5: icmp_seq=3 ttl=53 time=91.3 ms
64 bytes from 67.205.128.5: icmp_seq=4 ttl=53 time=82.4 ms
64 bytes from 67.205.128.5: icmp_seq=5 ttl=53 time=85.3 ms
64 bytes from 67.205.128.5: icmp_seq=6 ttl=53 time=83.4 ms
Second, the article correctly points out that the implementation of memcmp matters a lot. I’m really shocked (disappointed?) at the glibc implementation. Microsoft’s is using native machine word size (so 64 bit) but is also unrolling in order to compare multiple words per loop iteration. This means the search space per compare goes up, and the timing difference at a compare boundary goes way down. I really hope the compiler is smart enough to unroll some of glibc’s loop here. The most visible performance difference shouldn’t be about the result of the compare - it should be the memory alignment of variables and where the cache line breaks land.
I think what I got from this article is if I wanted to attack a cloud hosted service, the first step is to create a pile of VMs with the same provider in the same region, and test ping times between my VM and the victim, until I can get as close to the target as possible. The closer it’s possible to place, the more attempts can be applied in any given time and the greater potential to detect variance. For that reason, code running on a cloud provider really needs to assume it’s sharing a very fast LAN with hostile actors.
The memcmp linked is—I expect—an architecture-agnostic fallback. The primary memcmp are written in assembly and do use unrolled loops and large word sizes. Here’s one of the amd64 versions, for instance.
Pinging lobste.rs shows that I get milliseconds of difference each time. At a minimum, this implies that there need to be many attempts per comparison to just establish what the cost of that particular comparison is
I think what I got from this article is if I wanted to attack a cloud hosted service, the first step is to create a pile of VMs with the same provider in the same region
Here are ping times from one server to another that’s physically proximal:
round-trip min/avg/max/std-dev = 1.568/4.048/6.924/0.481 ms
Hell, here are ping times between a VM and its host:
round-trip min/avg/max/stddev = 0.117/0.525/2.257/0.188 ms
I’d be the last person to say you can’t carry out this attack, and the protections are definitely worth applying, but it seems highly impractical.
Thanks for the memcmp link, that makes much more sense. Further poking around shows the generic amd64 version is here and it’s using SSE2 (which is always present on amd64) to implement 128 bit compares. So I think that really kills the feasibility of this attack.
Thinking more about it, I mentioned in the earlier post that memory/variable layout would cause more jitter than the result of the compare, but that layout applies to all kinds of logic in processing the request, not just within the compare itself. I think therefore it’s basically guaranteed to have more per-request jitter than could be displayed by this compare. Given a large enough number of comparisons it might still be possible to detect, but that puts us in the territory of performing each of 128-bit values 10k times in a 24 hour period.
A 128-bit window absolutely mitigates this kind of attack, given sufficient entropy in each byte being compared. (If the strings were encoded as a series of 0 and 1 characters, this would reduce to 16 bits of actual entropy, which is a sample size of 65536.)
yes for a timing attack you would generally need to query a large number of times and take an average to smooth out the noise and detect a difference.
Sometimes this isn’t possible, if the noise dominates too hard with respect to the number of queries you can make. nanoseconds (10^-9) vs milliseconds (10^-3) sounds like one of these situations.
About the mitigation: How much less secure would storing the token in the database as HMAC_SHA256(secret_key, token) be than using the split-token approach? I have no background in this stuff but naively, it seems to me like that’d make the attack more or less impossible and would be less complex to implement.
HMAC with a secret key, whose output is not revealed. functions as a pseudo-random function (PRF), and would blind the calculation sufficiently to make this attack impractical.
This is true for the same reason that timing attack cannot break password hashes.
So when I realized I’ve got kinda the same vulnerability somwhere else, I was just asking myself if something like the following wouldn’t be easier and still secure: Generate a Reset-Token like “{user-Mail/-name/id/..}{sha256(secure random)}” and storing the first part in a column, the second in another one (and the timeout in the third, allowing a regular cleanup). Then use a select based on the first part, retrieve the second this way and then secure-compare the hash with the hash part of the provided token. That sounds far easier to me. It allows only one reset token at a time and based on the response you could evaluate if somebody has a reset token active (or if there ever was one created / the account exists*), but I don’t see that as an actual problem?
*Which you can probably also retrieve via the login timing in 99,9% of the websites.
An IP-based rate-limit might not be perfect if you only do a check per IP and not per block. Especially with IPv6 when everyone usually got a /48 or a /64.
A /48 means you have 2^(128-48) possible IP addresses to use, or 1208925819614629174706176.
This post makes me wonder if changing memcmp to 64-bits at a time would meaningfully increase software security…
I’m sure that lobste.rs isn’t the only site doing something like this (by a long shot). A 2^32 sized brute force is practical-ish over the network (as discussed in the post) enabling guessing the secret chunk by chunk, a 2^64 sized brute force, at a billion attempts a second, would take ~32 years. That’s still not great security compared to proper cryptography, but it’s enough to make network attacks impractical in most cases.
A little late to the party, but can’t you just sleep(rand() % 100) to introduce a random delay of up to, say, 100ms in any security sensitive context? It doesn’t eliminate the ability to use timing attacks given the uniform distribution but it dramatically increases the number of samples you would need to actually get meaningful results. If we’re talking password resets, you could delay up to 500ms without the user even noticing (especially redirected from their mail client) and I’d imagine that would require more samples than you could afford given the 24 hour password reset window.
Or just reduce the password reset window to 10 minutes or something.
There’s 3 variables in a timing attack: delta (the thing you’re trying to measure), noise, query rate (how often you can hit the site to take a sample). In theory, noise follows some kind of distribution with a mean around 0, so you can always get your measurement by performing enough queries and averaging to smooth out the noise. (In practice things are not this simple).
In this situation (lobste.rs) delta is in the range of nanoseconds, noise is probably in the range of 10s of milliseconds, this means you will need to perform an absolutely enormous number of queries to smooth out the noise and perform an accurate measurement of the timing side channel. You’re proposing to increase the noise which will (gigantically) increase the number of queries needed to measure delta. Multiply the number of queries needed by the query rate you are able to hit the site with to find out how long you need. The attack needs to be done in under 24 hours.
So this idea of adding randomness could be a valid solution but it needs quantified to know it will work or not. Although we do not know if the attack is possible at all - so it’s very hard to say if adding 100s noise even matters. It’s much less fragile to fix timing side channels in general by using constant time comparisons/algorithms.
this means you will need to perform an absolutely enormous number of queries to smooth out the noise and perform an accurate measurement of the timing side channel
Yes; that’s the point I was trying to get across - thanks.
It’s much less fragile to fix timing side channels in general by using constant time comparisons/algorithms.
I don’t know. I think it’s actually the more fragile but theoretically optimal approach to resisting timing attacks while my suggestion is the “huge bandaid” overkill approach that makes you want to roll your eyes and burn the resulting code because it’s clearly suboptimal but probably gives better real world results.
By real world results I mean that so-called constant time operations are only constant time on paper and are a research paper or two away from being cracked. They only consider the algorithm in question and a few “known unknowns” but in the real world we have to deal with things like overzealous compilers, speculative execution, cache locality, side channels up and down both the software and hardware stacks, etc any of which might very plausibly one day be shown (if not already privately known) to have a partiality to produce - on some micro level - a linearly converging preference for one codepath over another.
What I mean to say is, in any library I write I’m going to definitely take the “smart” approach and convince myself that I’ve done everything right and I’m using all the right algorithms for all the steps and have arrived at an optimally impartial validation function/entrypoint to serve as my oracle without sacrificing one nanosecond more than I had to in order to be resistant to timing attacks, but if tomorrow someone were stupid enough to hold a gun to my head and put me in charge of creating an (online or offline) frontend to some country’s nuclear controls, theoretically optimum code is going out the window: I’m using constant time algorithms but you bet I’m adding a random delay to AuthenticateIcbmCredentials().
@soatok, I’m curious about the 80 bits of security part of the article. After the ‘3 words’ thing from a few weeks ago, I’ve been considering using randomly generated 5-word strings for the input to the root key for a backup tool that I’m working on in my copious free time. I took two word lists, and subtracted everything in the small one from the large one so that I end up with 111K uncommon words (on the basis that no string that I generate can be brute-forced with the small 27K word list, some may be possible to find with a word list that’s less than 111K, but then any can be found with a five-word word list if it’s the right five and a back-of-the-envelope calculation suggests that the attacker can’t do much better with a smaller word list). For a five-word string, this gives me 84 bits of entropy, unless I’ve done something wrong (I’m using libsodium’s uniform random number generator to select words, but it’s entirely possible that I’ve missed something in my estimation of the word list’s entropy).
I’m then using Argon2id (again from libsodium) to generate a hash that’s used as the input to the KDF that generates the keys that I use. The current parameters to Argon2id take about 3 seconds of CPU time and 1 GiB of RAM (I only ever need to do this when I restore from backups, so I don’t care if it takes a few seconds because it’s probably going to take hours for the complete restore and a few seconds is in the noise). An attacker trying to brute force this is assumed to have access to the public key that’s generated by the KDF from the Argon2 hash of the passphrase, the salt, the context for the KDF used to derive the public key, and the context of the KDF used to derive the private key, and wants to derive the private key. I think your statements about hash rates were specifically for SHA hashes and so don’t mean I should be worried here but I wonder if I’m missing something?
I also implemented random keys, but there’s a usability tradeoff between a user being able to keep a 32-byte random number somewhere safe from attackers and being able to keep five words safe and secure. The latter is something that it’s plausible for them to keep in their head, and failing that on a piece of paper hidden somewhere (and definitely not vulnerable to an attacker who doesn’t have an extended period of physical access to wherever they hid it). The former basically requires storage on a computer and if you want it to last several years (hopefully I never need my off-site backups!) then that probably means multiple USB drives each one of which is plugged into a computer and checked periodically (unpowered flash can lose its contents over time).
For a five-word string, this gives me 84 bits of entropy, unless I’ve done something wrong (I’m using libsodium’s uniform random number generator to select words, but it’s entirely possible that I’ve missed something in my estimation of the word list’s entropy).
84 bits of entropy for a specific user’s passphrase is excellent. The kind of targeted attacks that will hit the 2^80 attack rate are almost always going to be directed at, say, the DNS root keys, or the TLS private key for a bank or government network.
As far as I know, nobody does a 2^80 brute force attack on random users’ passwords. There’s lower hanging fruit to be found everywhere.
I’m then using Argon2id (again from libsodium) to generate a hash that’s used as the input to the KDF that generates the keys that I use. The current parameters to Argon2id take about 3 seconds of CPU time and 1 GiB of RAM (I only ever need to do this when I restore from backups, so I don’t care if it takes a few seconds because it’s probably going to take hours for the complete restore and a few seconds is in the noise).
I forget the exact equation for how password hashing algorithms affect attack cost, but that’s exactly what they’re meant to do. An Argon2id-stretched 80-bit passphrase is probably equivalent to at least a 100-bit passphrase used with a fast cryptographic hash, from an attacker’s perspective.
I also implemented random keys, but there’s a usability tradeoff between a user being able to keep a 32-byte random number somewhere safe from attackers and being able to keep five words safe and secure.
Keys are for machines to remember, not humans. I think five words is good.
Since the comparison itself is in the database and the field is indexed, this will have different characteristics than just looking at the memcmp performance. To be clear I haven’t looked into how index comparisons work for tiny tables… so I’m not sure if that makes the attack more or less practical. What back-end does lobsters use?
The reasons I’m not clear on the impact is because you may get lots of noise from results just because you take a different btree branch. On one hand side, that’s a bigger indicator which range you’re in, on the other, depending on the implementation you may hit “yes, you’re in this range” then “you’re out of the narrower range” for chunks larger than 4 bytes, taking the scan back to a smaller brute force territory.
Also, in this case you’re not hitting a check-specific location like you would with many apps. Instead you’re hitting a generic comparison on index search path on a busy app. Not sure what to expect from a branch predictor here.
I keep forgetting about Lobste.rs even though it’s a swell place. Thanks for reminding me it exists with your post-patch report.
On-topic: The disown link has been there, though it might be an account age thing. My original non-furry account from The Beforetimes is about 8 years old!
Note: I’m not going to implement this attack publicly, because I don’t want to arm script kiddies with another exploit tool that they don’t deserve and will only use to cause harm to the Internet. (Or, more likely, DoS servers while failing to actually leak anything.)
It’s pretty hard to do. The amount of traffic that you’d need to send to lobste.rs to do the attack is well into the ‘not polite’ level, even if you’re trying to attack your own account. If you try to recreate the environment locally and attack it then you’re probably going to end up with a slightly easier environment to attack than the real deployment.
That said, reading the disclosure, there’s nothing in there that hasn’t been exploited in the wild in other contexts so I have no trouble believing @soatok’s assessment.
If you try to recreate the environment locally and attack it then you’re probably going to end up with a slightly easier environment to attack than the real deployment.
yes, you would generally do this to validate the finding.
And this is the point where I think you argue in bad faith, as the whole article debates how realistic this is and we’ve had plenty of timing based attacks the last years. Why you need to do a PoC to proof something which is “trivial to proof” so to speak at this point.
I read it less as “zomg lobsters is insecure!!!” and more as “using the vulnerability as a springboard to discuss the mechanics of side channel attacks and split tokens.”
I’m prepping for a work meeting in an hour I can’t miss, but I’ve taken the afternoon off to rush finishing the fix for this. @355e3b has free time before then and has jumped into to help. Sorry to be vague about the in-progress security issue, but I trust y’all understand how that goes. I’ll leave a response to this comment in a few hours with full info.
In the meantime I’ve made the rate limiting much stricter. I’m sorry for the inconvenience and will revert as part of the fix.
So, the thing I was talking around is that the potential vulnerability was potentially much worse and we were trying to be thorough about it. Hunter (@355e3b) and I dropped as much as we could to finish that work today.
To start, there was some kibitzing in chat, so to be explicit: we do think this was a potentially a viable attack. An attacker could register a VPS at Digital Ocean to minimize network jitter, the dominant factor in this attack. (Other things that might introduce jitter can multiply the cost of the attack but don’t fix it.) We already see regular probing from other DO VPSs.
A few weeks before soatok’s report, some of that probing prompted me to implement some rate limiting to deal with unrelated abuse that prevented effective abuse of this potential vulnerability. This was easy to extend to password resets. An attacker who can only test four tokens per minute isn’t going to get very far.
Since this was reported, I was concerned that this kind of network timing attack could be used against other tokens. The most important of these is called
session_token
and is used to authenticate logged-in users on every request. Because this is every possible endpoint and not just the password reset flow, this would not be subject to the rate limit of 4 req/min and could lead to account takeover. We assumed this was a viable attack and Hunter wrote an in-depth mitigation for it.This afternoon I worked on finishing the mitigation and testing the attack, which required a bigger block of time than I previously could to devote to it. I was unable to get the timing attack to work against
session_token
(usually pretty straightforward on localhost!) and pulled my hair out for a bit. Eventually I recognized that the attacker can’t execute the attack againstsession_token
because Rails stores the session as an encrypted, serialized value in the cookie. The session cookie isn’t the value of thesession_token
, it’s an encrypted blob that can’t be edited by an attacker without being invalidated. It’s a little embarrassing that I conflated the two, but I don’t much mind making an error in the direction of being over-cautious.In short, I was uncommunicative because I’d been proceeding the last couple weeks on a much more paranoid footing than was needed. My thanks and apologies to Hunter for his wasted effort; I’m sorry I didn’t have the time to write a proof-of-concept to test this first.
The other thing I’ve been doing the last few weeks has to do with the “sister sites” that have adopted the Lobsters codebase. Lobsters is a site that’s open source for transparency, not to help start and run other sites. It’s really rewarding to see the code used this way, but nobody has the spare attention to run that potential project. Mostly this means we don’t take feature requests, but in this case it also means we don’t have a way to contact the site admins. There is no mailing list or forum, even for security issues. I’ve been in the process of finding admin emails and collecting translators (many sister sites aren’t in English) so we could notify them before the vulnerability was published. If you recently got a DM from me asking you to play translator, you can ignore it, I’m proceeding from here in English for urgency’s sake.
Now that I’m confident there isn’t a worse vulnerability I’ve reverted the temporary stricter rate limiting. I’m sorry to anyone I inconvenienced, I know if you read the site by opening several stories as tabs you probably hit this.
I think all the exciting bits are over, but am happy to be corrected if any volunteer spots something I missed, or any other issue. My email is on my profile.
My thanks to soatok for alerting about the potential vulnerability.
Thanks as always for the transparency and also the significant effort you continue to put into the site.
To add to this, the encryption is AES-256-GCM and the underlying implementation is OpenSSL so there’s no timing attack on the encryption either.
As someone who runs an (intentionally-single-threaded, but multithreaded in prefetching upstream stories) fetch loop before skimming all the stuff in a text editor, a separate thanks for the transparent and friendly throttling implementation. To say the least, not all emergency-throttling implementations are so clear with the exact changes in the fetcher one needs to do. Thanks.
I did indeed hit this! I thought it was strange, and just figured it must have been my fault and I must have had a VPN going that was using a popular IP. Didn’t think about it much but glad you explained it.
Did you implement the timing attack described in the article and have success with it?
No. I took it as plausible but easily addressed with rate limiting. I only tried to repro the more serious attack I feared against
session_token
.Neat writeup, and the conclusion of “hey, use split tokens and throttle reset attempts” seems reasonable.
I’m a bit skeptical of some of the assumptions, though. The feasibility of the timing attack seems…not high to me.
Relevant weakness from the cited “Opportunities and Limits of Remote Timing Attacks”, section 6.6 on application-induced jitter:
I don’t mean to disparage the authors (Wallach is a cool bro), but a couple important things are different twelve years later:
I’m not saying that this attack is not feasible, but it’d be really interesting to see a refresh of that research done today with modern application architectures and systems.
I also disagree with the assertions about Nginx, after reading the source we see the strawman in the blog post is 10-20x higher than supported by the link:
I have similar concerns about the Rails performance numbers mentioned, 80K/sec with performance tuning. That means 80 requests every millisecond, which even as somebody using Go or Erlang I’d raise my eyebrows at. The linked talk, turns out, suggests that that throughput was achieved not with mere “performance tuning”, but with multiple datacenters, application sharding, and other tricks–which, if you dig into it, suggests all manner of complications for the timing attack suggested!
The suggested mitigations seem good, but it’d be nice if the numbers cited as motivating them were a little more in touch with reality.
And as always, a functioning proof-of-concept goes a long way. :)
If you look at the data for go stuff on https://www.techempower.com/benchmarks/ you can see people reaching over 80k/sec with go. I think the author is handy-wavy enough at that point to tell you it’s more a hypothetical idea how hard you can hammer the system if you want and what you could get out of that, even without having google as target (which would definitely scale better than your attacks, so if captchas weren’t a thing you could test even faster..).
The issue is currently not fixed, the idea is very basic and any person that actually needs a script for that might as well just be DdoS-ing other people because they found some script that doesn’t actually work for any other site (or just tests out the rate limits of losters).
If anything I think the blog post is a little bit too catchy and could’ve been toned down to something like “showing hypothetical timing attacks at lobste.rs”, which would’ve probably taken down the pressure a bit for pushcx, as this attack is just not that realistic.
Noted. I hadn’t gotten a response since the reply to my first email so I thought it was considered “too low severity” for them to even respond to emails about, and that’s why I published when I did. I also figured yet another follow-up email seeking clarification would be viewed as annoying.
(This post has sat in my drafts folder longer than anything else I’ve ever written.)
EDIT: I changed the title to be a bit less bombastic for Lobste.rs readers.
Totally on your side, I just thought you’re better of not riding the “look at me I’ve got an exploit” train (which the title can totally be), looking at some of the comments here. And rather underline the idea of showing off how a practical attack could look like, for something the reader may just be using.
On a different note: Would you like to tell me how good my idea here is to implement this, without adding more random ? Just very curious.
IRC is also a useful additional channel for reaching those folks.
I don’t use IRC.
Neat link re: benchmarks, though it is also useful to consider their testing environment. And Ruby is nowhere near that good at using all those cores. :P
Indeed. The smugness of it was a real turn-off, even not including silly pictures. That’s why I’m so grumpy about the hand-waving and the misapplication of prior art–if you’re gonna be that smug, even to the point of bragging about “doing the math”…then you better deliver an air-tight case.
Like, this is why a lot of folks view infosec amateurs (not novices, there is a difference) as goobers.
I’m trying to parse what you mean by smugness. That’s not, at all, the tone I was going for.
I’m super interested in these topics, and tend to get excited and enthusiastic about them. At the same time, this is largely not a practical risk. (Hence, the rating of sev:low.)
Maybe when trying to write about something I think is cool and balance it out with a calm realist tone, I stuck the landing. I’m not surr.
It’s a hazard of being excited about the topic, I guess. :)
EDIT: I had a bunch of examples from the article, but maybe the two biggest ticket things to consider are that:
If you want to write that way on purpose, that’s your business–Lord knows I do it myself sometimes–but make sure it is, in fact, on purpose.
(And again, I want to say–the technical content in the article was very good and an enjoyable read!)
Nobody breaks into computers with cryptographic attacks like this.
EDIT: BUY BITCOIN #BTC totallynotmalicious.example.com #MILLIONARIE
nervous laughter
I… don’t quite get the angle you were going with here, but I presume it’s meant to be humorous.
They’re implying that someone hacked their account via the currently-discussed method and edited their comment to insert some Bitcoin spam, I think.
Yup, that’s p much it. OP was a good post!
Oh, thanks. That makes sense now! :D
Interesting, but the suggested remediation strategy seems overly complex? (Or I very well may be missing something…)
Could you not just… add the user id to the reset link (which is pretty typical ime) & add an expiry time column? Set expiry to
Time.now + 15.minutes
when generating the token, look up the account by the user ID, and then hard fail the comparison immediately if you’re outside the 15 minute window?Then you don’t have to worry about how well your database (or in this case, Ruby) implements constant-time comparison functions, because you’ve scoped the attack to a particular user and made the number of requests needed to accomplish it in the timeframe provided essentially guaranteed to melt the server before it could possibly succeed?
You can borrow the idea of wiping the token immediately on a non-match and eliminate the timestamp window, even.
There’s already an expiry window, but it’s set to 24 hours not 15 minutes.
A lot of people and companies don’t like leaking database primary keys to the public (for various reasons), so using the user ID in a password reset email would violate their tenets.
I’m talking about Lobsters specifically here, and Lobsters is obviously fine with “leaking” the username, so
User.find_by_username('soatok')
would work just fine. Expire the key if the comparison doesn’t match what’s on the user instance, and you don’t have to mess around with chunking SHA-256 hashes and find constant-time comparison functions, no?I’m really skeptical, for two reasons that were already mentioned in the article.
First, the article links to a 2009 study about using timing attacks to obtain secrets over the Internet. But, most of that study is trying to determine a secret with tens of microseconds of processing time. Here, since the string fits on one cache line, we’re measuring CPU performance (not memory performance) and need to observe a difference in tens of nanoseconds of processing time.
Pinging lobste.rs shows that I get milliseconds of difference each time. At a minimum, this implies that there need to be many attempts per comparison to just establish what the cost of that particular comparison is.
Second, the article correctly points out that the implementation of memcmp matters a lot. I’m really shocked (disappointed?) at the glibc implementation. Microsoft’s is using native machine word size (so 64 bit) but is also unrolling in order to compare multiple words per loop iteration. This means the search space per compare goes up, and the timing difference at a compare boundary goes way down. I really hope the compiler is smart enough to unroll some of glibc’s loop here. The most visible performance difference shouldn’t be about the result of the compare - it should be the memory alignment of variables and where the cache line breaks land.
I think what I got from this article is if I wanted to attack a cloud hosted service, the first step is to create a pile of VMs with the same provider in the same region, and test ping times between my VM and the victim, until I can get as close to the target as possible. The closer it’s possible to place, the more attempts can be applied in any given time and the greater potential to detect variance. For that reason, code running on a cloud provider really needs to assume it’s sharing a very fast LAN with hostile actors.
The memcmp linked is—I expect—an architecture-agnostic fallback. The primary memcmp are written in assembly and do use unrolled loops and large word sizes. Here’s one of the amd64 versions, for instance.
Here are ping times from one server to another that’s physically proximal:
Hell, here are ping times between a VM and its host:
I’d be the last person to say you can’t carry out this attack, and the protections are definitely worth applying, but it seems highly impractical.
Thanks for the memcmp link, that makes much more sense. Further poking around shows the generic amd64 version is here and it’s using SSE2 (which is always present on amd64) to implement 128 bit compares. So I think that really kills the feasibility of this attack.
Thinking more about it, I mentioned in the earlier post that memory/variable layout would cause more jitter than the result of the compare, but that layout applies to all kinds of logic in processing the request, not just within the compare itself. I think therefore it’s basically guaranteed to have more per-request jitter than could be displayed by this compare. Given a large enough number of comparisons it might still be possible to detect, but that puts us in the territory of performing each of 128-bit values 10k times in a 24 hour period.
A 128-bit window absolutely mitigates this kind of attack, given sufficient entropy in each byte being compared. (If the strings were encoded as a series of
0
and1
characters, this would reduce to 16 bits of actual entropy, which is a sample size of 65536.)yes for a timing attack you would generally need to query a large number of times and take an average to smooth out the noise and detect a difference.
Sometimes this isn’t possible, if the noise dominates too hard with respect to the number of queries you can make. nanoseconds (10^-9) vs milliseconds (10^-3) sounds like one of these situations.
About the mitigation: How much less secure would storing the token in the database as
HMAC_SHA256(secret_key, token)
be than using the split-token approach? I have no background in this stuff but naively, it seems to me like that’d make the attack more or less impossible and would be less complex to implement.Yes, that would work.
HMAC with a secret key, whose output is not revealed. functions as a pseudo-random function (PRF), and would blind the calculation sufficiently to make this attack impractical.
This is true for the same reason that timing attack cannot break password hashes.
So when I realized I’ve got kinda the same vulnerability somwhere else, I was just asking myself if something like the following wouldn’t be easier and still secure: Generate a Reset-Token like “{user-Mail/-name/id/..}{sha256(secure random)}” and storing the first part in a column, the second in another one (and the timeout in the third, allowing a regular cleanup). Then use a select based on the first part, retrieve the second this way and then secure-compare the hash with the hash part of the provided token. That sounds far easier to me. It allows only one reset token at a time and based on the response you could evaluate if somebody has a reset token active (or if there ever was one created / the account exists*), but I don’t see that as an actual problem?
*Which you can probably also retrieve via the login timing in 99,9% of the websites.
Bravo to @soatok and the whole sysop team for their amazing effort!
An IP-based rate-limit might not be perfect if you only do a check per IP and not per block. Especially with IPv6 when everyone usually got a
/48
or a/64
.A
/48
means you have2^(128-48)
possible IP addresses to use, or1208925819614629174706176
.And Lobsters is accessible in IPv6.
This post makes me wonder if changing memcmp to 64-bits at a time would meaningfully increase software security…
I’m sure that lobste.rs isn’t the only site doing something like this (by a long shot). A 2^32 sized brute force is practical-ish over the network (as discussed in the post) enabling guessing the secret chunk by chunk, a 2^64 sized brute force, at a billion attempts a second, would take ~32 years. That’s still not great security compared to proper cryptography, but it’s enough to make network attacks impractical in most cases.
A little late to the party, but can’t you just
sleep(rand() % 100)
to introduce a random delay of up to, say, 100ms in any security sensitive context? It doesn’t eliminate the ability to use timing attacks given the uniform distribution but it dramatically increases the number of samples you would need to actually get meaningful results. If we’re talking password resets, you could delay up to 500ms without the user even noticing (especially redirected from their mail client) and I’d imagine that would require more samples than you could afford given the 24 hour password reset window.Or just reduce the password reset window to 10 minutes or something.
There’s 3 variables in a timing attack: delta (the thing you’re trying to measure), noise, query rate (how often you can hit the site to take a sample). In theory, noise follows some kind of distribution with a mean around 0, so you can always get your measurement by performing enough queries and averaging to smooth out the noise. (In practice things are not this simple).
In this situation (lobste.rs) delta is in the range of nanoseconds, noise is probably in the range of 10s of milliseconds, this means you will need to perform an absolutely enormous number of queries to smooth out the noise and perform an accurate measurement of the timing side channel. You’re proposing to increase the noise which will (gigantically) increase the number of queries needed to measure delta. Multiply the number of queries needed by the query rate you are able to hit the site with to find out how long you need. The attack needs to be done in under 24 hours.
So this idea of adding randomness could be a valid solution but it needs quantified to know it will work or not. Although we do not know if the attack is possible at all - so it’s very hard to say if adding 100s noise even matters. It’s much less fragile to fix timing side channels in general by using constant time comparisons/algorithms.
Yes; that’s the point I was trying to get across - thanks.
I don’t know. I think it’s actually the more fragile but theoretically optimal approach to resisting timing attacks while my suggestion is the “huge bandaid” overkill approach that makes you want to roll your eyes and burn the resulting code because it’s clearly suboptimal but probably gives better real world results.
By real world results I mean that so-called constant time operations are only constant time on paper and are a research paper or two away from being cracked. They only consider the algorithm in question and a few “known unknowns” but in the real world we have to deal with things like overzealous compilers, speculative execution, cache locality, side channels up and down both the software and hardware stacks, etc any of which might very plausibly one day be shown (if not already privately known) to have a partiality to produce - on some micro level - a linearly converging preference for one codepath over another.
What I mean to say is, in any library I write I’m going to definitely take the “smart” approach and convince myself that I’ve done everything right and I’m using all the right algorithms for all the steps and have arrived at an optimally impartial validation function/entrypoint to serve as my oracle without sacrificing one nanosecond more than I had to in order to be resistant to timing attacks, but if tomorrow someone were stupid enough to hold a gun to my head and put me in charge of creating an (online or offline) frontend to some country’s nuclear controls, theoretically optimum code is going out the window: I’m using constant time algorithms but you bet I’m adding a random delay to
AuthenticateIcbmCredentials()
.@soatok, I’m curious about the 80 bits of security part of the article. After the ‘3 words’ thing from a few weeks ago, I’ve been considering using randomly generated 5-word strings for the input to the root key for a backup tool that I’m working on in my copious free time. I took two word lists, and subtracted everything in the small one from the large one so that I end up with 111K uncommon words (on the basis that no string that I generate can be brute-forced with the small 27K word list, some may be possible to find with a word list that’s less than 111K, but then any can be found with a five-word word list if it’s the right five and a back-of-the-envelope calculation suggests that the attacker can’t do much better with a smaller word list). For a five-word string, this gives me 84 bits of entropy, unless I’ve done something wrong (I’m using libsodium’s uniform random number generator to select words, but it’s entirely possible that I’ve missed something in my estimation of the word list’s entropy).
I’m then using Argon2id (again from libsodium) to generate a hash that’s used as the input to the KDF that generates the keys that I use. The current parameters to Argon2id take about 3 seconds of CPU time and 1 GiB of RAM (I only ever need to do this when I restore from backups, so I don’t care if it takes a few seconds because it’s probably going to take hours for the complete restore and a few seconds is in the noise). An attacker trying to brute force this is assumed to have access to the public key that’s generated by the KDF from the Argon2 hash of the passphrase, the salt, the context for the KDF used to derive the public key, and the context of the KDF used to derive the private key, and wants to derive the private key. I think your statements about hash rates were specifically for SHA hashes and so don’t mean I should be worried here but I wonder if I’m missing something?
I also implemented random keys, but there’s a usability tradeoff between a user being able to keep a 32-byte random number somewhere safe from attackers and being able to keep five words safe and secure. The latter is something that it’s plausible for them to keep in their head, and failing that on a piece of paper hidden somewhere (and definitely not vulnerable to an attacker who doesn’t have an extended period of physical access to wherever they hid it). The former basically requires storage on a computer and if you want it to last several years (hopefully I never need my off-site backups!) then that probably means multiple USB drives each one of which is plugged into a computer and checked periodically (unpowered flash can lose its contents over time).
84 bits of entropy for a specific user’s passphrase is excellent. The kind of targeted attacks that will hit the 2^80 attack rate are almost always going to be directed at, say, the DNS root keys, or the TLS private key for a bank or government network.
As far as I know, nobody does a 2^80 brute force attack on random users’ passwords. There’s lower hanging fruit to be found everywhere.
I forget the exact equation for how password hashing algorithms affect attack cost, but that’s exactly what they’re meant to do. An Argon2id-stretched 80-bit passphrase is probably equivalent to at least a 100-bit passphrase used with a fast cryptographic hash, from an attacker’s perspective.
Keys are for machines to remember, not humans. I think five words is good.
Thanks. Now I need to fix my code as well.
By the way, I believe that Redmine uses just the token to both identify and authenticate API users. Does that mean what I think it means?
So, uh, what does the
disown
feature do?It changes the owner of the comment or story to @inactive-user.
Since the comparison itself is in the database and the field is indexed, this will have different characteristics than just looking at the memcmp performance. To be clear I haven’t looked into how index comparisons work for tiny tables… so I’m not sure if that makes the attack more or less practical. What back-end does lobsters use?
The reasons I’m not clear on the impact is because you may get lots of noise from results just because you take a different btree branch. On one hand side, that’s a bigger indicator which range you’re in, on the other, depending on the implementation you may hit “yes, you’re in this range” then “you’re out of the narrower range” for chunks larger than 4 bytes, taking the scan back to a smaller brute force territory.
Also, in this case you’re not hitting a check-specific location like you would with many apps. Instead you’re hitting a generic comparison on index search path on a busy app. Not sure what to expect from a branch predictor here.
I keep forgetting about Lobste.rs even though it’s a swell place. Thanks for reminding me it exists with your post-patch report.
On-topic: The disown link has been there, though it might be an account age thing. My original non-furry account from The Beforetimes is about 8 years old!
poc?edit: title was edited to state that the attack was hypothetical.
From the article:
so it was never checked if this vulnerability can be exploited in practice?
It’s pretty hard to do. The amount of traffic that you’d need to send to lobste.rs to do the attack is well into the ‘not polite’ level, even if you’re trying to attack your own account. If you try to recreate the environment locally and attack it then you’re probably going to end up with a slightly easier environment to attack than the real deployment.
That said, reading the disclosure, there’s nothing in there that hasn’t been exploited in the wild in other contexts so I have no trouble believing @soatok’s assessment.
yes, you would generally do this to validate the finding.
And this is the point where I think you argue in bad faith, as the whole article debates how realistic this is and we’ve had plenty of timing based attacks the last years. Why you need to do a PoC to proof something which is “trivial to proof” so to speak at this point.
No I’m not arguing in bad faith. What is it that I wrote that makes you think I am being bad?
There are variables involved in whether a timing attack can be mounted. You use a proof of concept to prove that it is actually exploitable.
I read it less as “zomg lobsters is insecure!!!” and more as “using the vulnerability as a springboard to discuss the mechanics of side channel attacks and split tokens.”
40% of the article discusses the practicality of timing attack exploitation in this context.