Threads for bitshift

  1. 5

    Interesting thoughts on abstraction. Doesn’t quite mesh with my experience: I really enjoy being able to call up, e.g., the Python standard library, even if I don’t know how their JSON parser works, or I don’t need my hash tables to keep track of insertion order. Most of the time (all of the time?) I don’t need to know in order to solve the problem at hand.

    The analogy that came to my mind is cache misses. With Forth, you write more code from scratch, so almost every function you need is a “cache miss” in the sense that you’re looking for something that doesn’t exist yet. It’s like you don’t even have a cache, and you move consistently but slowly – up to the point where you’ve written everything you need (“the cache has been warmed”), and you have the sensation of everything falling into place. Contrast with a language that has lots of abstractions: you have more “cache hits” because when you need some functionality, you can usually find a library that meets your needs, and you move faster as a result. But the tradeoff is that the occasional cache miss is more painful: you think library Foo meets your needs, but there’s some corner case it handles badly, and you won’t discover that until it burns you. Maybe you never even discover why – all you know is things are suddenly on fire all the time.

    It’s not a perfect analogy, but it gave me a way to think about the issue. As is often the case, the “real” answer is likely at neither extreme.

    People in the upper echelons of Forth programming (like Chuck Moore) design their own hardware and make sure that the interfaces are easy to use from Forth, but that’s not for everybody

    That’s a bit of an understatement.

    1. 7

      Then theoretical limit a server can support on a single port is 2⁴⁸ which is about 1 quadrillion because:

      Over IPv4. This goes to 2¹⁴⁴ over IPv6, which is exceeding by far the estimated 2⁸⁰ atoms in the entire observable universe.

      1. 10

        According to https://educationblog.oup.com/secondary/maths/numbers-of-atoms-in-the-universe, it’s not 2^80 but on the order of 10^80, which (log2(2.4 * 10 ^ 78), to be more exact, as taken from the article) works out to approx. 2^260, so IPv6 is still not ready to cover it all. But I agree with the general idea IPv6 address space should be sufficient for humankind in the observable future.

        1. 4

          I hope we get there one day. For now I’m stuck unsupported: https://test-ipv6.com/ 0/10 :(

          1. 5

            Surprised to see I’m 0/10, too. As far as I know this has never impacted me. Given that the IPv4 scarcity worries turned out like the Peak Oil scare did, can someone remind me why I should still care about IPv6? (I’m only half joking)

            1. 8

              IPv4 scarcity is not the only reason to care about v6 (having millions of IPs per server can be very useful, for just one example) but it’s also not a fake problem. v4 scarcity is getting worse all the time. T-Mobile LTE users don’t even get a NAT’d v4 anymore, just a sort of edge translation to be able to mostly reach v4 only hosts (this breaks a lot of WebRTC stacks if the server is v4 only for example).

              1. 2

                T-Mobile LTE users don’t even get a NAT’d v4 anymore

                Forgive me for being ignorant here, but I thought NAT was pretty much the bandaid for mitigating the symptoms of IPv4 address exhaustion (at least on the client side). Is there some fundamental limit to how many users can be behind a NAT, and is T-Mobile doing a type of translation different from standard NAT in order to get around it?

                1. 5

                  Yes, T-Mobile isn’t using a standard NAT or CGNAT at all. They use 464XLAT if you want to look up the tech.

                  1. 1

                    There are limits to NAT but it’s mostly either port exhaustion or too much translation happening on a single router. Layered NAT can solve that but that degrades performance. There are probably limits at which point IPv6 would be cheaper to run than layers and layers of NAT, but I don’t know if that time is coming any time soon.

                    1. 0

                      CGnat means you share an ipv4 address; makes hole punching even worse, but most things can be made to work.

                2. 3

                  10/10 here

                  1. 1

                    Thanks for the link — that’s new to me. I get 10/10 at home; I’m not fond of Comcast/Xfinity the company but I’m happy to see they’re giving me up-to-date service.

                    So does this mean that I could run a publicly-accessible server from home without NAT shenanigans and with a stable IPv6 address?

                    1. 2

                      Yeah. Once enabled, your router (depending on the router) will usually delegate addresses in the assigned /64 to devices on your network. You can live a NAT-free life! Just be careful to firewall.

                1. 8

                  I really like Bitwarden’s (apparent?) commitment to an open service and APIs that you can interact with either with officially supported clients, or other open-source projects. If anything, it makes me trust that I could move a vault’s data fairly easily.

                  Are there similar articles around for services like LastPass or 1Password?

                  1. 4

                    1Password has their security design whitepaper (PDF) which describes their approach in a similar level of detail.

                    1. 1

                      I don’t think the whitepaper says much about open APIs/unofficial clients, but it does go into a great amount of detail regarding how their system works.

                      That whitepaper is a good chunk of the reason why I’m personally using 1Password: I was impressed by the thoughtfulness that went into their security. But I’m operating under the assumption that I’ll only be able to use their official clients – which I’m okay with, but I think Bitwarden is better on that front.

                  1. 1

                    Honest question from a person who is fascinated by cryptography but knows relatively little about it. Why is this interesting/special/newsworthy?

                    1. 1

                      Look closer at the submitters.

                      1. 1

                        …They are North Korean? Is that what’s noteworthy?

                    1. 2

                      ViM is now easier to quit than Nano, if Nano is launched with --zero or --nohelp:

                      • ctrl-q starts a backwards search
                      • ctrl-c shows a status message at the bottom
                      • ctrl-h shows no help but starts deleting text
                      • ctrl-? just inserts question marks
                      • ctrl-x then actually quits nano, but only after saving the file with the deleted characters and question marks.

                      I wonder what the rationale behind the default keybindings is.

                      1. 7

                        For ctrl-h, that isn’t nano-specific: that’s just the key combination that generates ASCII code 8, the backspace character (H being the 8th letter of the alphabet).

                        ctrl-q and ctrl-w are backward- and forward-search, respectively, and they are right next to each other on the QWERTY keyboard. So that one at least makes geometric sense, although I don’t know why those two keys in particular.

                        nano has an interesting history. It’s a clone of pico, which was originally written to be the built-in editor for the pine email client (which was a fork of the elm email client…), which used a forked version of MicroEMACS before they replaced it with pico. I wouldn’t be surprised if many of the keybindings were decided back then: “We can’t assign the spellchecker to ctrl-BLAH because that’s the keybinding for shuffling the user’s inbox!”

                        1. 2

                          I use micro because it has same default keybindings, and then ssh into boxes with just nano and end up typing ctrl-s and ctrl-q out of habit, alas.

                        1. 5

                          This is great! I would suggest one way to go even simpler: Use some raw file format for the images, and use a generic file compression afterwards. There isn’t really any reason why compression and image file format need to be coupled.

                          1. 15

                            Many image formats are close to some variation of that theme (header + prefilter + generic compressor). From a quick look at qoi it seems like a twist on the LZW family. The prefilter stage here is updating the dictionary based on complete pixels rather than ‘per byte’ (index is xor:ing the channels together). I’d speculate that LZ4 would perform similarly compression ratio wise on planar image layouts. Both suffer when images start to get wide. Two easy improvements now while staying O(n) would be a space filling curve (hilbert) over a quadtree or simple tilemap.

                            There are plenty of reasons why compression and image formats should be coupled, even when staying lossless. Several generic compressors get greatly improved performance by agreeing on pre-trained sets that are representative of the data to send. These dictionaries need to be agreed upon in beforehand, and thus becomes part of the image “format”.

                            1. 7

                              Generic compression can only go so far, to achieve good compression ratios you need to have some real insight into what structure the data has, and how to take advantage of it. I found this post about compressing SPIR-V shaders to be a good example of how those structures can look like.

                              1. 5

                                You may be interested in farbfeld from the Suckless folks, which is pretty much exactly what you are suggesting.

                                1. 5

                                  It depends on the data. Generic compression would certainly do a good job with run-length encoding, as well as with efficient representation of recently-seen values. But method #3, storing colors as deltas, seems like the sort of thing that most generic compression algorithms won’t do. If you were trying to losslessly compress a photo, I would expect QOI to do a better job.

                                  On the other hand, generic compression will also recognize patterns that QOI can’t! If you have lots of dithering, generic compression is probably better.

                                  Either way, it’s remarkable how far simple algorithms can take you.

                                  1. 2

                                    But method #3, storing colors as deltas, seems like the sort of thing that most generic compression algorithms won’t do.

                                    This depends a bit on the image format. For ‘raw’ images, you generally have two possible representations: you store each pixel as a sequence of values one per colour plane (array of structures) or you store each colour plane separately (structure of arrays). Most GPUs support a variety of minor variations on these two themes.

                                    A lot of general-purpose compression algorithms handle small deltas quite well because sequences or values where the entropy is in the low bits are very common. This works well in the structure-of-arrays representation, less well in the array-of-structures representation.

                                    1. 2

                                      png does storing colors as delta

                                    2. 2

                                      There is one big reason: applications want to open and edit files. If all you care about is archiving, then it’s fine to separate these, but go generally want to constrain the set of things that every application that opens images needs to handle. This is one of the problems with TIFF: it’s a family of different compression and encoding schemes and a lot of things implement different subsets of them.

                                      This approach works well with tar because tar is a tape archiver: the file format is designed for streaming access, not random access, and so you can always use an external decompresser and pipe the output to tar for decompression. It doesn’t work when I want to open a file in my photo editor and it doesn’t know how to handle the particular decompressor. This is why most successful file formats specify a small set of CODECs.

                                    1. 3

                                      I generally agree with the thesis, but the assumption of either perfect stickiness or perfectly random load balancing seems fragile. Anyway, if the latter assumption holds, then you can use the negative binomial distribution to derive confidence bounds on the total request rate from the locally observed rate. (This trick isn’t original, but I don’t recall where I saw it, though it seems obvious in retrospect.)

                                      1. 1

                                        Generally agree. Even without perfect stickiness or load balancing, I’d think that a scheme like this could be “good enough” for most use cases. Or, put another way, I wonder what the worst case would be when using something like this?

                                        Edit: Especially compared with the worst case for the distributed version (complete external store failure?)

                                        1. 3

                                          Not sure if by “worst case” you mean exceeding the global limit or leaving capacity unused (by unnecessary throttling)? Of course you can avoid ever exceeding the global limit by just setting the local rate limit to be 1/N of the global limit, for N nodes. If you have round-robin load balancing, then this conservative approach doesn’t actually leave capacity unused. If you have uniformly random load balancing, then you will have some unused capacity due to one-sided variance, but it shouldn’t be that bad (haven’t worked out the math details).

                                          A more interesting question IMO is how to set the rate limit in the first place. I did some work on that when I was at AWS around 7 years ago, but never got any interest from managers, and my private implementation never got beyond a trivial Lua prototype in nginx. I have many dozens of pages of notes on that project, though (and looked at maybe a couple hundred control theory papers for it), so I’d like to validate those ideas at some point. Briefly, the idea is to dynamically calculate a target “bottleneck rate” based on latency or other SLOs. Each client’s request rate is monitored independently, but the same bottleneck rate applies to all of them (except for a weighted variant that allows prioritizing clients differently). The bottleneck approach naturally leads to a max-min fair allocation of capacity, much like bitwise round-robin packet queuing. But for all I know, AWS is still hardcoding rate limits for their services, unnecessarily throttling customers and leaving tons of capacity unused.

                                          1. 1

                                            Here are a couple of bad cases I can imagine:

                                            • Overestimating call volume: You promise 100 calls/minute in your API docs, but you have sticky routing over 5 hosts, and the customer complains because they get throttled at 20 per minute.
                                            • Underestimating call volume: You’re in the middle of resizing the pool, and the currently-active hosts mistakenly think the pool is half the size it actually is. So they mistakenly accept a total of 200 requests/minute, and this kills some shared dependency that (for whatever reason) can’t throttle requests itself.

                                            In general, enforcing limits locally actually seems more correct to me: if an individual host is underloaded, it’s okay to serve more requests than normal. And if it’s overloaded, formally rejecting a request seems better than a worst-case fail (e.g., trying to service the request crashes the host, increasing load on remaining hosts).

                                            1. 1

                                              Yes, I totally agree! The ultimate decision on whether to reject a request should be made by the edge server handling that request. Upstream rate limiting (in load balancers etc.) should only ensure that the edge servers aren’t overwhelmed by traffic that they can’t reject fast enough (so the rate limit there could be say an order of magnitude larger than the rate limit at the servers). One problem I can see though with a dynamic rate limiting strategy is its unpredictability for customers, which is why I think AWS has stuck with their hardcoded limits–because they can be documented and increased on an ad-hoc basis for specific customers. This still leaves a lot of utilization on the table when traffic is slow, though.

                                        1. 2

                                          Unless I’m missing something, this looks like it’s the same as truncated binary encoding? Never heard it under this name before.

                                          1. 2

                                            Shuffling without holding the whole thing in memory is one of my areas of interest!

                                            Some of the ideas the author mentions (Feistel ciphers, iterating until output is in range) show up in format-preserving encryption. Feistel ciphers in particular are magical to me, because they’re a general-purpose method for converting a random function (which likely has collisions) into a function that produces a permutation (each output appears only once).

                                            He mentions that “limitation to lists of length 2^k is a bit of pain” and mentions the possibility of timing attacks. I know of some other algorithms that remove these limitations: the swap-or-not shuffle is one that comes to mind. It can shuffle a list of arbitrary size, and the paper is not too difficult to understand (pseudocode example on the first page!).

                                            1. 1

                                              Ah, very cool! I’ll have to check out swap-or-not.

                                              Feistel ciphers really do look pretty magical. They’re so simple! And yet… I don’t actually understand them at all. This kind of math somehow bounces right off my brain. I should probably take some time to work a small example and see what it does with different round keys.

                                              1. 1

                                                What made it click for me is the realization that each step in a Feistel cipher is completely reversible (assuming you know the key):

                                                • Swap left and right: This can be undone by swapping again.
                                                • left ^= round_fn(right, round_key): In this step, right doesn’t change, and round_key never changes. You can calculate the exact same round_fn(...) again and do the exact same XOR again.

                                                If you can run each step in reverse, you can run the whole machine in reverse. Which is just another way of saying that you can decrypt the message. And that means it’s impossible to have two inputs that encrypt to the same X – because if you did, how would you ever decrypt X?

                                                1. 1

                                                  Yeah, I think it’s starting to click for me as I stare at it more – you’re layering on these contingent values with XOR, alternating sides. The reason to have two sides is so you can continue to store the “recovery information” for undoing the XOR; the reason for [edit] the reversal [/edit] is just that it makes the algorithm description simpler.

                                                  Swap-or-not… I’m going to have to stare at that one a lot more, haha! Do you know how much scrutiny that one has undergone? I don’t see a ton of information on it, and I don’t really know how to evaluate what I do see.

                                                  1. 2

                                                    There is a security proof in the swap-or-not paper. I’m not skilled enough to tell for myself whether the proof is correct, but Rogaway is a respected cryptographer.

                                                    The gist of the proof is if you have enough rounds, and your round function is secure, then swap-or-not is secure. This is a similar result to what’s been proven about Feistel ciphers (what the abstract calls “a Luby-Rackoff type result”). A specific cipher may still be broken, but the Feistel design is sound – just as a bridge collapse doesn’t imply that we’ve been wrong for millennia about the concept of the arch. Swap-or-not is similar, although the design hasn’t been in use for as many years (steel-cable suspension bridge, maybe?).

                                            1. 3

                                              Stream ciphers that use cryptographic hashes have a risk of running into cycles. This is when calling the transform function on the state will eventually go back to a previous one. In order to mitigate this, we can use a block cipher instead.

                                              The “proper” way is just to use a bigger state. The usual rule of thumb for overcoming the birthday paradox is to double the number of bits, so I would expect 256 bits of capacity would be sufficient for a 128-bit security level. But then you can’t use raw MD5 as your sponge function, which sort of defeats the author’s purpose of “can we make MD5 do everything”.

                                              The author mentions ChaCha20 as an alternative permutation. I’ve wondered before how dangerous it would be to use that as a sponge function, because it’s relatively simple and has a nice state size of 512 bits: split it in half, and you can absorb/squeeze 32 bytes at a time, with the other 256 bits being your capacity. Being no more than an armchair cryptographer, I’m vaguely aware that you can’t just turn any permutation into a sponge and call it good – but I’m not qualified to say what a permutation needs to have in order to make a good sponge.

                                              I see some rough edges in the article (most glaring one is that the described block cipher isn’t really a block cipher). But I’m really happy to see people learning about new fields and “shaking their sillies out” with respect to cryptography.

                                              1. 1

                                                I believe Blake2 uses ChaCha20 as the underlying primitive, so investigating how that team built the algorithm would probably answer your question about how to make it safe as a sponge.

                                                1. 1

                                                  Ah, I found someone to ask questions to. Let me bother you a little now that I’ve caught you.

                                                  I think when performance is not a top priority, you can just use 1 byte or even one bit to absorb and keep the rest as your capacity. This should at least reduce your chances of messing up severely. With ChaCha you can also change the number of rounds to play around with the mixing quality/performance trade-off.

                                                  I’d like to ask you why the thing I called block cipher isn’t really a block cipher. I’d love to improve my terminology and correct any mistakes in my mental model if you have time to elaborate.

                                                  1. 1

                                                    What you have right now is a single function get_block(key, counter) that returns a block of pseudorandom bits. For a block cipher, you need a pair of functions:

                                                    • encrypt(plaintext block, key) that returns the encrypted block
                                                    • decrypt(encrypted block, key) that returns the plaintext block

                                                    What you have mimics a block cipher used in CTR (counter) mode, which is a commonly used mode. But block ciphers can be used in other modes as well – CBC is a common one.

                                                1. 1

                                                  My gut feeling is that this design is over-complicated – although I’m probably biased, because I’m personally fine with trusting a hosted password manager.

                                                  One of the author’s requirements is that if a device is compromised, the attacker won’t gain anything useful. That’s a pretty tall order. What if a compromised device tricks the other devices into sending it secret shares? There are ways to defend against that (like the suggestion of 2FA) but now you have extra doors to lock, extra parts that could break in weird ways (what if a 2FA token is replayed?). And this is completely ignoring the fact that if your laptop/phone is the device that’s compromised, those devices are already receiving all the shares of the secret in the course of normal operation.

                                                  I think the pragmatic approach here is to just use a longer master password. 9-10 words from a passphrase-generation wordlist gives you 128 bits of entropy, and that’s without any key stretching (meaning your passphrase can be shorter than that without sacrificing security). Memorizing 9 words and running an existing open-source password manager is going to be a lot easier than setting up and maintaining a custom password-management network (albeit maybe less fun).