No more than the article is about Ruby.
Ephemeral port exhaustion only happens when using TCP, if you are proxying to localhost then UNIX or anon sockets are a far better option; they also have less overhead.
Other than it being a host local only socket, not really though portability to Windows might be important to you. Maybe you are fond of running tcpdump to packet capture the chit-chat between the front and backends and UNIX sockets would prevent this though if you are doing this you probably are just as okay with using strace instead.
From a developer perspective instead of connecting to a TCP port you just connect to a file on your disk, the listener when binding to a UNIX socket creates that file, nothing else is different. The only confusing gotcha is that you cannot ‘re-bind’ if the UNIX socket file on the filesystem already exists; for example the situation when your code bombed out and was unable to mop up. Two ways to handle this:
unlink() (delete) any previous stale UNIX socket file before bind()ing (or starting your code); most do this, as do Isun_path contribute to the reference name, not just the bytes up to the NUL terminationPersonally what I have found works with teams (for an HTTP service) is for development the backend presentation is a traditional HTTP server listening over TCP enabling everyone to just use cURL, their browser directly or whatever they like. In production though, a flag is set (well I just test if STDIN is a network socket) to go into UNIX socket/FastCGI mode.
As JavaScript/Node.js is a effectively a lingua franca around here, this is what that looks like:
$ cat src/server.js | grep --interesting-bits
const http = require('http');
const fcgi = require('node-fastcgi');
const handler = function(req, res){
...
};
const server = fcgi.isService()
? fcgi.createServer(handler).listen()
: http.createServer(handler).listen(8000);
server.on('...', function(){
...
});
$ cat /etc/systemd/system/sockets.target.wants/myapp.socket
[Unit]
Description=MyApp Server Socket
[Socket]
ListenStream=/run/myapp.sock
SocketUser=www-data
SocketGroup=www-data
SocketMode=0660
Accept=false
[Install]
WantedBy=sockets.target
$ cat /etc/systemd/system/myapp.service
[Unit]
Description=MyApp Server
Before=nginx.service
[Service]
WorkingDirectory=/opt/myorg/myapp
ExecStartPre=/bin/sh -c '/usr/bin/touch npm-debug.log && /bin/chown myapp:myapp npm-debug.log'
ExecStart=/usr/bin/multiwatch -f 3 -- /usr/bin/nodejs src/server.js
User=myapp
StandardInput=socket
StandardOutput=null
#StandardError=null
Restart=on-failure
ExecReload=/bin/kill -HUP $MAINPID
ExecStop=/bin/kill -TERM $MAINPID
[Install]
WantedBy=multi-user.target
The reason for multiwatch in production is you get forking and high-availability reloads. Historically I would have also used runit and spawn-fcgi but systemd has made this no longer necessary.
Local load balancing is the motivating example, but I wrote it highlight the general problem when load balancing between a large number of connections between a small number of backends (potentially external machines).
UNIX sockets might be a reasonable solution to the particular problem in the post. It’s not something I’ve tried with HAProxy before though, so I’m not sure how practical it would be.
This reminds me of something we figured out when dealing with some ropey multi-threaded C++ code that no-one understood and repeatedly deadlocked. Of course, as these things do, it fell onto the sysadmin team to handle as apparently it was our fault developer code locked up at 3am and again our fault we were the ones who got the pager alerts…but I digress. :)
v0 used strace to figure out if all the threads were either idle or stuck in ‘D’ state alongside a internal application side watchdog that touch’d a file on the filesystem so we trivially see if the main event loop was still moving. If things had stalled, the shell script pulled out the 9’bore and we left runit to mop up.
v1 popped out, IIRC, when we saw we could side step strace and find if the application was stuck in a futex syscall via /proc/.../syscall.
Today I was at an information security “summit” and in the next ballroom over was a “PERL Meeting”
I wandered over and it was a meeting for nephrologists discussing Preventing Early Renal Loss.
The problem is that manuals do not tell you what to do, they just list warnings and do not supply any hints on how you should navigate those obstacles safely.
“they just list warnings and do not supply any hints on how you should navigate those obstacles safely.”
It’s an arguable criticism where maybe they should link to that. I mostly agree if they describe the problem they should automatically point to a solution in a direct, obvious way. If space permits, describe it with an example immediately as in secure, coding guides. Alternatively, recommend a secure coding guide. :) Let’s test the counterpoint, though, that it should be easy enough to find for a worried developer willing to look into the matter.
“No bounds checking is performed. If the buffer dst is not large enough to hold the result, subsequent memory will be damaged. “
Direct implication: check the size of the input first if worried about preventing errors. That’s at least the obvious part. The worried programmer might also type strcpy into Google with words such as security, vulnerability, crash, or so on to understand what the problem might be. I just tested that in DuckDuckGo to find a search for “strcpy vulnerability” gives these links as the first, two results:
https://stackoverflow.com/questions/21121272/how-to-mitigate-the-strcat-and-strcmp-vulnerability
https://stackoverflow.com/questions/1258550/why-should-you-use-strncpy-instead-of-strcpy
That should give them plenty of information. So, maybe we just need to teach programmers to always type the thing they’re curious about followed by specific words into Google or DuckDuckGo that are likely to bring up relevant information. Words I’ve used to good effect include bugs, vulnerabilities, prevent, and mitigate/mitigation. People might have gotten too lazy to do this and/or forgotten how to search properly since Google is so good now. It’s not as good as people think, though, where my old techniques from old engines still pay off.
The other point is the OpenBSD manual is not intended to be a learn to program in C tutorial. It describes the functions available, (hopefully) simply and accurately, noting caveats as necessary. I think our philosophy is closer to do no harm than solve all problems.
Really, I have a hard time imagining a novice programmer reading that page, but knowing nothing of the territory. If you don’t know what strcpy is, how did you find the man page in the first place? If you were pointed at strcpy by some other tutorial, shouldn’t that be the document corrected?
You make a great point. Given that and what can Google finds, the reasonable assumption for the new programmer is probably that they learn the language, read some resources on secure coding, and then the docs are just giving a reminder of the risk. It’s not far from a formal spec, either, where it’s warning a precondition on size is necessary for correctness. We similarly wouldn’t expect a language tutorial in a formal spec.
Was always my personal favorite router mod:
The TTL-232R-3V3-AJ cables were cheap to come by too.
Some more possible things that could be learnt from this event.
The problem is these are all arguably over-defensive measures without the benefit of hindsight. In an “agile” setting, you won’t be able to justify writing code this defensively all the time. Even less so, since if you code like that, there won’t be any incidents to justify writing code like that :)
An exception could be if you managed to capture these patterns in abstract, reusable libraries, toolchains etc., then the up-front cost could be justifiable.
I wouldn’t put that on “agile”. I worked in quite a lot of agile organisations where safe and defensive practice was exercised around all mission-critical and safety-related data. They just didn’t bother too much if something was released with a button two pixels too far left (which is an issue to be fixed, I don’t want to downplay frontend, but it can easily be fixed with another deployment and has no lasting repercussions).
“pressure” is the killer here.
I am a big fan of LDAP but instructions on how to light up a particular vendor is really of not much interest, especially when maybe just improving the documentation readme could have been more valuable to all in this case.
Now why to use LDAP would be helpful, particularly for a blog with such a grandiose domain ;-)
This is depressing and horrible. No wonder everything we use is a tower of poor abstraction and riddled with bugs.
Oh, no need to be like that, turn that frown upside down. :-)
It is a great gig replacing all that webscale cruft with 10 lines of shell and making it faster in the process too.
Usually it depresses the developers, but really they brought it upon themselves.
“This is what makes a good developer: finding the right combination of libraries, keeping them up to date, and reducing self-written code to the absolute minimum. Ideally the only code you write should be the part that makes your application special.”
I agree with the author. The part he misses is that modern day applications do a lot more. Small teams build exceedingly complex applications. So yes while you have more tools at your disposal (open source, APIs, managed services etc.) the scope of the work has also expanded. So yes the work has changed, easier in some ways, harder in others.
You’re right!
Actually I had planned to cover the change in scope of modern applications, but it was that much text, that I decided to write an own article about that topic ;)
Why not just use the other databases that were designed to have transactions from the start instead of bolting it on later?
They use CrateDB as a database for storing and searching product data. They chose CrateDB because it allows them to scale the webshop easily, according to Gestalten.de CEO Frank Rakow.
With ~4.6M rows, they’d be well served by vanilla PostgreSQL. Then again, I don’t know all of their requirements, perhaps they have need of ElasticSearch’s features (CrateDB is a SQL+management layer on top of ElasticSearch). Hopefully Gestalten.de is using this database for analytics and not important data storage.
That said, you couldn’t pay me enough to use ElasticSearch as a primary datastore. A search index? Absolutely. Temporary log storage? Sure. Analytics? I suppose. ElasticSearch has measurably improved over the years, but once bitten, twice shy as the saying goes.
In any case, CrateDB is a remarkably immature database to be running a business on –And lets look at their marketing page:
On the other hand, CrateDB may not be a good choice if you require:
- Strong (ACID) transactional consistency
- Highly normalized schemas with many tables and many joins
Oh dear.
Why hopefully? These numbers are embarrassing.
I don’t understand why anyone would be proud of these numbers.
product:([sku:3300?`5]; a:3300?0)
xsell:asc ([] sku:4600000?`5; cross_sku:4600000?`5; tstamp:.z.p-4600000?0)
\t select from xsell where not sku in exec sku from product
87
That’s msec; kdb is 100x faster than crate (on my macbook).
I suppose I should be at least happy that crate.io puts some actual benchmarks up when they say it’s “fast” so that we know that they mean not at all fast.
Grouping in a distributed database is much harder then on your local disk. (and yes, this can be a reason to not pick a distributed software, but if that’s not your main query, this is also okay)
…but it’s 4m rows, you don’t need a distributed database, you don’t even need one for 1bn rows. Christ, this is roughly what I would say is the upper bound for CSV files chewed with UNIX sort/join!
With ~4.6M rows, they’d be well served by vanilla PostgreSQL. Then again, I don’t know all of their requirements, perhaps they have need of ElasticSearch’s features (CrateDB is a SQL+management layer on top of ElasticSearch). Hopefully Gestalten.de is using this database for analytics and not important data storage.
Webshops have the problem that they are rarely written by yourself and come with their own share of issues. For example, a popular system is OXID. That means that you are often not so free to choose the database layer.
That said, you couldn’t pay me enough to use ElasticSearch as a primary datastore. A search index? Absolutely. Temporary log storage? Sure. Analytics? I suppose. ElasticSearch has measurably improved over the years, but once bitten, twice shy as the saying goes.
Elasticsearch themselves does not sell themselves this way.
ES is very popular in the shop scene, where the databases of your shop software are often a pain to work with and most of your frontend is search anyways. So, what happens is that they still use the shop software to store all articles, user data and transactions and sync that to Elasticsearch, with which they drive their full frontend. I’ve seen that in a quite a number of deployments and it works well.
Elasticsearch has good stability, just no guarantees. It’s perfectly fine to use it for something important, just be able to recreate it if it blows.
On the other hand, CrateDB may not be a good choice if you require:
- Strong (ACID) transactional consistency
- Highly normalized schemas with many tables and many joins
That’s perfectly fine if none of this is needed on that store.
Yea, I would have added some other labels with it, like kubernetes or monitoring/prometheus, but those don’t exist, so just put Erlang.
Instead of memfd_create() you can use the POSIX standard shm_open(), so
memfd_create("queue_region", 0)
becomes
shm_open("queue_region", O_RDWR|O_CREAT, 0600)
Add ‘-lrt’ to your LDFLAGS and remember to shm_unlink() it when you’re done. Everything else stays the same, including the performance.
I vaguely recall it being less effort to simply open /dev/zero and use a private mmap()ing of that.
Of course if you are using this as an IPC between two processes you’ll have to use a regular file.
I don’t think a private map would work here:
MAP_PRIVATE
Create a private copy-on-write mapping. Updates to the mapping are not visible to other processes mapping the same file, and are not carried through to the underlying file.
It doesn’t seem like you can mmap /dev/zero. I get ENODEV “Operation not supported by device” when I try. (macOS)
Edit, showing my work:
#include <err.h>
#include <fcntl.h>
#include <stdlib.h>
#include <sys/mman.h>
int main() {
int fd = open("/dev/zero", O_RDWR);
if (fd < 0) err(1, "open");
void *map = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0);
if (map == MAP_FAILED) err(1, "mmap");
return 0;
}
Oops, I had used /dev/zero when I first tried it then accidentally swapped it for /dev/null when I came back to give some code. Either way, the result is the same: ENODEV.
The mmap voodoo seems a bit weird, though I don’t know the exact semantics we’re aiming for. Some of those calls would appear redundant or gratuitous.
I wish I understood how all this worked. Why exactly is a mod operation slow? Why exactly is it faster to do this via page tables? Is it because the kernel is already doing this and it effectively requires zero additional work? Is it because the CPU can handle this in hardware?
I guess I’ve got some research to do.
Mod isn’t super slow, but you can avoid mod entirely without the fancy page tricks by defining your buffer to be a power of 2. For example, a 4KiB buffer is 4096 = 2^12, so you can calculate the wrap-around with ( cur + len ) & 4095 without using mod.
You would still need two separate memcpy’s, and a branch for the wrap-around non wrap-around cases (which is normally not a big deal except when you’re racing against the highly optimized hardware cache in your MMU…)
Branches, conditionals such as if/switch statements, can cause performance problems so if you can structure things to avoid this sort of thing you can get a considerable bump in speed.
A lot of people look to software tricks to pull off speedups but this particular data structure can benefit directly from calling upon hardware baked into the CPU (virtual memory mapping).
Most of the time you have a 1:1 mapping of a 4kB continuous physical memory block to a single virtual 4kB page. This is not the only configuration though, you can have multiple virtual memory pages mapping back to the same physical memory block; most commonly seen as a way to save RAM when using shared libraries.
This 1:N mapping technique can also be used for a circular buffer.
So you get your software to ask the kernel to configure the MMU to duplicate the mapping of your buffer (page aligned and sized!) Immediately after the end of the initial allocation.
Now when you are at 100 bytes short of the end of your 4kB circular buffer and you need to write 200 bytes you can just memcpy()-like-a-boss and ignore the problem of having to split your writes into two parts. Meanwhile your offset incrementer remains simply:
offset = (offset + writelen) % 4096
So the speedup comes from:
So it is not really that the CPU is handling this in hardware and so it is faster, the hardware is doing actually no more work than it was before. The performance comes from more a duck-lining-up excercise.
Modulo and division (usually one operation) are much slower than the other usual integer operations like addition and subtraction (which are the same thing), though I’m not sure I can explain why in detail. Fortunately for division by multiples of two, right shift >> and AND & can be used instead.
For why doing this with paging is so efficient, it is because the MMU (part of the CPU) does the translation between virtual and physical addresses directly in hardware. The kernel just has to set up the page tables to tell the MMU how it should do so.
group() is amazing! specially the groupCollapsed(). I considered them but then I thought I be opening the door to talk about count and all the other useful console functions so i focused only on .log(). Maybe next time :)
I could even live with out an adblocker, but for me it is the “I don’t care about cookies” extension.
You don’t have to live without an adblocker… or other stuff really.
uBlock, uMatrix, Privacy Badger, Cookie AutoDelete, HTTPS Everywhere, Smart HTTPS are WebExtensions already. There’s even Tree Tabs (though it doesn’t hide the normal tab bar for now, the API for that aren’t there yet).
But yeah, I couldn’t find “I don’t care about cookies” :D
QUIC is cool, but is updating TCP really impossible? Multipath TCP is a thing, and Apple is already using it for Siri.
MTCP does not solve the stalling flow problem (head of line blocking) due to packet loss and waiting on that retransmit to get through.
QUIC also has a 0 (zero) RTT cryptographically secure(ish) HTTP request mode; akin to the whole TCP 3 way handshake, SSL handshake and HTTP request rolled up into a single compact UDP packet so the request is serviced immediately by the server.
QUIC annoys me, because it’s a massive layering violation. It bakes in HTTP, when all I want is a better TCP – Something like CurveCP, which has zero roundtrips, and does encryption and authentication in one go.
Guess you hate BTRFS and ZFS too?
I don’t think Google designed QUIC for you, so kinda irrelevant what you want or what I want :-)
Besides you said you have CurveCP so what is the problem?
CurveCP violates ‘layering’ in the same way that QUIC does, so I am not sure what you mean here? Nothing about QUIC prevents you sending non-HTTP traffic over it. QUIC is more a DCCP+DTLS+TLSv1.3-RTT0 rolled into one; done outside of the IETF initially as it was a prototype that no-one knew what would work?
What would you have done differently to avoid layering violation and to make your entire service (inc browser support) 0RTT in a span of months?
If you wanted to gripe about something you should probably grumble about DCCP/SCTP, but then this ignores the sane technical reasons QUIC (and CurveCP) violate these layers is because of all the god awful middleware that make up the routing hops.
If you have the time, read about RINA and also the fun presentation by John Day title Shortening the Dark Ages of Networking (.mov)…it might sway your opinion to maybe that layering is actually sometimes the problem. :-)
What are your thoughts on routing and what does a public key improve on over an 2^80 IP block allocation?
What advantages does this have over just setting a reverse DNS lookup record and including the public key in something like a TXT RR or maybe some formalised X509 RR?
We could do this with IPv4 today, but as no one is I am not really sure what this would solve over host transport IPsec?
Plus routing is the hard part of a protocol…not how many bits are in the address or if you slip something cryptographic in there, surely?
Definitely not thought through: but public key can be self allocated and works well in a encrypt-by-default world. You’d need something like dns …
As you service is strongly dependent on a cache, you may find as an instance is booting it slurps in a copy of the cache from a neighbouring node. Once complete then your node can start advertising that it is healthy.
If you are feeling really nifty, you may want to for cache misses to actually have nginx to cycle and proxy to neighbouring instances until it gets a hit and the front node locally caches that response. To be fancier still you can have nodes send around cache digests so you do not need to cycle all your proxies.
…then of course your current solution is 20 lines of code.