1. -1

    The best SRE recommendation around Memcached is not to use it at all:

    • it’s pretty much abandonware at this point
    • there is no built-in clustering or any of the HA features that you need for reliability

    Don’t use memcached, use redis instead.

    (I do SRE and systems architecture)

    1. 30

      … there was literally a release yesterday, and the project is currently sponsored by a little company called …[checks notes]…. Netflix.

      Does it do everything Redis does? No. Sometimes having simpler services is a good thing.

      1. 11

        SRE here. Memcached is great. Redis is great too.

        HA has a price (Leader election, tested failover, etc). It’s an antipattern to use HA for your cache.

        1. 9

          Memcached is definitely not abandonware. It’s a mature project with a narrow scope. It excels at what it does. It’s just not as feature rich as something like Redis. The HA story is usually provided by smart proxies (twemcache and others).

          1. 8

            It’s designed to be a cache, it doesn’t need an HA story. You run many many nodes of it and rely on consistent hashing to scale the cluster. For this, it’s unbelievably good and just works.

            1. 3

              seems like hazelcast is the successor of memcached https://hazelcast.com/use-cases/memcached-upgrade/

              1. 3

                I would put it with a little bit more nuance: if you have already Redis in production (which is quite common), there is little reason to add memcached too and add complexity/new software you may have not as much experience with.

                1. 1

                  this comment is ridiculous

                  1. 1

                    it’s pretty much abandonware at this point

                    i was under the impression that facebook uses it extensively, i guess redis it is.

                    1. 10

                      Many large tech companies, including Facebook, use Memcached. Some even use both Memcached and Redis: Memcached as a cache, and Redis for its complex data structures and persistence.

                      Memcached is faster than Redis on a per-node basis, because Redis is single-threaded and Memcached isn’t. You also don’t need “built-in clustering” for Memcached; most languages have a consistent hashing library that makes running a cluster of Memcacheds relatively simple.

                      If you want a simple-to-operate, in-memory LRU cache, Memcached is the best there is. It has very few features, but for the features it has, they’re better than the competition.

                      1. 1

                        Most folks run multiple Redis per node (cpu minus one is pretty common) just as an FYI so the the “single process thing” is probably moot.

                        1. 5

                          N-1 processes is better than nothing but it doesn’t usually compete with multithreading within a single process, since there can be overhead costs. I don’t have public benchmarks for Memcached vs Redis specifically, but at a previous employer we did internally benchmark the two (since we used both, and it would be in some senses simpler to just use Redis) and Redis had higher latency and lower throughput.

                          1. 2

                            Yup. Totally. I just didn’t want people to think that there’s all of these idle CPUs sitting out there. Super easy to multiplex across em.

                            Once you started wanting to do more complex things / structures / caching policies then it may make sense to redis

                            1. 1

                              Yeah agreed, and I don’t mean to hate on Redis — if you want to do operations on distributed data structures, Redis is quite good; it also has some degree of persistence, and so cache warming stops being as much of a problem. And it’s still very fast compared to most things, it’s just hard to beat Memcached at the (comparatively few) operations it supports since it’s so simple.

                  1. 6

                    Trying to decide if I want to take a new job. It’s agonizing, not the least of which because I’m always terrified of letting people down or not pulling my weight or whatever.

                    (My current employer is amazing and they went on to even match the offer from the other company, which is flattering…but the other offer is working with some of the biggest names in the world for what I do…but I work with one of my best friends where I’m at…but the other place looks to be more stable…but I’m not sure I’m talented enough to make it there…I’m worried my current company will be in trouble without me…)

                    1. 3

                      the other place looks to be more stable

                      Ride the train to the end of the tracks. Then change trains. That’s what I’m doing.

                      1. 3

                        I’m leaning that way too but I suppose I should point out that the other place looks a lot more stable, and I have a lot of people who depend on me financially.

                      2. 3

                        If you wanna bounce it off someone else in infosec lemme know … I, for one, generally follow the path of “small fish in a big pond”… as I’d rather learn a ton and feel like an idiot to help push myself.

                        1. 2

                          I’m in the exact same boat. Love my current employer, work with three of my best friends, but I just can’t stand the job I’m doing anymore. I talked to my boss and he said he can’t lose me so he’s given me basically free range to do whatever I want, but what I want to do doesn’t fit the team i work on. But I’ve never worked professionally doing the other job so am I good enough to get hired at another company? Or will they just spit me out and I’m back to doing a job I’m good at but hate every minute of?

                          Burnout combined with imposter syndrome combined with social anxiety. Not a great mix, but that’s what I’m spending my weekend worrying about.

                          1. 3

                            Burnout combined with imposter syndrome combined with social anxiety.

                            Holy crap I’m looking in a mirror.

                          2. 2

                            I’m not sure I’m talented enough to make it there

                            The people who’ve offered you the job think you are talented enough, and they’ve managed to attract these other big names, so that’s gotta be worth something 😀

                          1. 2

                            Data engineering the first 4 days:

                            • pushing the last two oauth2_proxy fronted endpoints up for employee-restricted HTTPS services, and writing a short note internally for how it was done,
                            • moving more parts of a Fargate-based data pipeline into Airflow,
                            • moving from single instance Airflow deployment to multi

                            Statistics the last day:

                            • Fixing a distributed lag time series model on health facility visits and state-actor violence so that it better accounts for seasonality and for dependent data (I’d been using a quasipoisson model, but on a per-observation instead of per-village level. So, for that, we need to be using random effects.)
                            • Still trying to catch up on small area estimation and how we’re using it for humanitarian crisis surveys.
                            1. 1

                              For the multi instance airflow, can you elaborate what you did? Like added more workers? Separated segment (dev vs stage vs prod etc)? Or like functionally separates?

                              1. 1

                                Sure! It’s nothing too special but:

                                • Earlier this month, we had just one airflow pod running in an EKS Kubernetes cluster in AWS. We’ve since split that into one prod and one staging EKS cluster, each in its own VPC, each with Airflow web UI + LocalExecutor procs running in a single pod deployment, backed by SQLite.
                                • Next, for both the prod and staging cluster, we’ll deploy instead:
                                  • rabbitmq, backed by a persistent volume
                                  • An RDS PostgreSQL DB
                                  • one airflow deployment for the webui
                                  • one airflow deployment for a small number (2 to 4 initially) of celery workers

                                And keep HTTPS inbound running through as it does already:

                                ELB -> nginx ingress controller -> oauth2 proxy -> airflow webui service
                                

                                We’ve gone for celery and rabbit since celery seems like the best supported of Airflow’s executors, and rabbit as the broker for celery because I’ve found it bit easier to supervise than Redis.

                                AWS resources are deployed by terraform, K8S resources for the moment just deploy by a shell script running kubectl apply -f.

                                1. 1

                                  Awesome thx. We hadn’t jumped to k9s for the operators yet as EKS wasn’t supported in Frankfurt until very recently (and we’re very concerned about GDPR, etc.). Just for the record, redis has been pretty fire and forget for us (whereas I’ve had odd stability issues with Rabbit in the past (> a year ago)).

                                  1. 1

                                    Thanks! Yes, I’d prefer redis myself, and appreciate the nudge back to it. Seems like it’s equally well supported these days as a celery broker; I’d been supporting older stuff in the past.

                            1. 4

                              I recently deployed Drone as my self-hosted CI solution, so I’m working on a blog post about how I arrived there, and also on a few plugins for it, to automate/simplify some of the things I need it to do.

                              1. 2

                                Look forward to reading. Just looking at drone vs concourse vs gocd now.

                                1. 1

                                  Post is now up, concourse & gocd included (though, neither in depth, because they failed to meet my requirements quite early :/).

                              1. 4

                                Building a multi-tenant job scheduler for an ETL pipeline. We need to strike a balance between reasonable resource utilization and fairness. We can’t have one user monopolizing all our resources. I was pretty surprised I couldn’t find a reasonable open source or COTS solution that I could just plug into our existing pipeline. Everything I saw was for some specific pipeline system.

                                1. 1

                                  I think that’s because most pipelines should have some sort of workload management built into them. Like airflow has priority weight (which works fairly well if you build it into feedback loops, etc). What system(s) are you using in your pipeline?

                                  1. 2

                                    Airflow was actually the first workflow system we evaluated. We needed to trigger one-time jobs based on messages in a queue - i.e. kafka or SQS, and that seemed not at all straightforward to do at the time we looked (about two years ago). It seemed like you would have to have a long-running job that polled for messages.

                                    The other thing is that the priority weighting needs to be dynamic - to support something like round-robin per customer. That also seemed like it would not be a good fit. The workflow system we’re currently using is nextflow but with a custom system to kick jobs off.

                                1. 8

                                  work:

                                  – getting pmacct <> rabbitmq <> influxdb working, then putting a nice frontend on top of it

                                  – auditing a Palo Alto install the MSP royally boned on the migration. I know BGP is somewhat obtuse on PANOS but…no excuse. in pre-sales you, unprompted, mentioned having one of four experts qualified to configure whatever is after the top of the line 5000 series. c'mon!

                                  – quickly utilizing the last six days of my Azure $200/30 day credit to boot OpenBSD, get IPsec tunnels with BFD running, and do some iperf tests between regions for a PoC

                                  – setup graylog to ingest wireless controller and firewall logs and make nice dashboards for front line support network troubleshooting

                                  fun work:

                                  – continue building class outline and course work for a “python for network engineers” (a working title as it’s already in heavy use by Kirk Byers)

                                  – lots of unikernel stuff. Kafka as a unikernel, pmacct as a unikernel. getting rumpkernels to boot with vmm on OpenBSD. getting ExaBGP into a unikernel, then doing ‘stress’ testing against OpenBGPd

                                  – osm + packet clearing house IXP list + peeringDB + d3js = transform spreadsheet currently sitting at http://peering.exposed/ (after a particularly whiskey-infused discussion @ RIPE73)

                                  – play with a couple of network verification tools I recently read and have been reading about, respectively: Propane and NetKAT

                                  1. 1

                                    Is there some particular reason you’re going to rabbitmq first instead of tossing to influxdb via statsd or some such first? You just want to persist bits in flight?

                                    (just curious)

                                    1. 2

                                      mostly because pmacct speaks amqp natively, and slightly because I do not wish to run node.js in this instance.

                                    2. 1

                                      What are you using for ingesting logs from rabbitmq to InfluxDB?

                                      I’m looking forward that Paolo releases the support for Redis.

                                    1. 4

                                      I’m wondering whether they really had to implement everything in order to learn those lessons. The “lessons” are pretty much the widely-known basic definitions of the well defined terms contained within them. It’s not like we’re talking about a chaotic emergent phenomenon that arises due to the complex interactions of the pieces in a complex system.

                                      1. 6

                                        The content is originally from 1999.

                                        1. 4

                                          I’m with enobayram – as a hobbyist game programmer in 1999 it was painfully obvious even then that TCP really wouldn’t cut it for any kind of twitch gaming over the internet.

                                          1. 4

                                            Basic texts on networking also taught TCP was used for reliability with UDP for speed. Troubleshooting or warning guides talked about how TCP performance could drop off due to congestion control algorithms. So, I was writing reliable protocols on top of UDP. Until I discovered UDT. :)

                                            Just sounds like the author never discovered this the easy way when learning about network programming. Then, had to discover it the hard way.

                                            1. 6

                                              To be fair I think part of the issue was also expectations of LAN vs WAN. A “how bad could it be” when the answer turned out to be horrific.

                                              For the record the author has learned his lesson in many ways and is doing fine (I work with him).

                                      1. 21

                                        With the company shutting down, we also wanted to find a new home for our team … We’re excited that the members of our engineering team will be joining Stripe

                                        If they actually went out and found work for the whole engineering team, that’s extraordinary. Bravo.

                                        1. 1

                                          I wonder if they went through a route like this.

                                          1. 3

                                            Offhand, I’d guess not. They both had same seed rounds from A16Z and A16Z is very good with relocating talent within portfolio (source: i’ve worked for like 5 of the portfolio companies)

                                        1. 2

                                          Meetings meetings meetings :)

                                          Have to interview around a half dozen folks. The most interesting thing is we’re quantitatively measuring our TAMs (technical account managers) on how well they demo and, separately, how they handle some field support situations. While its nice to hire smart folks, it’s good to be able to do something like this to objectively qualify that their skills are staying sharp.

                                          On the down side I have to do a fair amount of glue code / integration work with sales force this week.

                                          Finally i get to start getting some ETL pulls together to start building our customer success measurement analytics (time permitting)

                                          1. 3

                                            I really wish we had the full dumps from both the Friend Finder and OPM hacks. It’d be very interesting to correlate the data between Ashley Madison, Friend Finder, and OPM. Clearance-holding cheating spouses with kinky fetishes and STDs = no no.

                                            1. 3

                                              Not true actually. I have a multitude of friends with SCI who are into kink, etc. It’s generally not considered a big deal. Drug and alchohol issues are considered much worse by DISA.

                                              1. 2

                                                Would you say the same thing if this was a hack of various people’s gmail accounts? This is private data that happens to be owned by a service many people disagree with existing at all. I don’t think we should even joke about going through it just to play judge.

                                              1. 15

                                                Everything old is new again I suppose. Back in the late 90s when this was the popular approach, the downside was you were effectively limited performance-wise to vertically scaling your database as opposed to being able to horizontally scale your application layer.

                                                Also…with the Hickey quote….it could be argued that you’re keeping things simpler by making the primary function of the database to store data…that placing one’s business logic / transforms within the database is increasing the complexity.

                                                Anyways, like anything else, there’s no one clear answer. It’s always good to revisit assumptions, best practices, etc. as times change to see if there’s anything that is ripe for change / can be done better.

                                                1. 15

                                                  Exactly. It is (typically) much easier to scale your application layer than your DB layer. By putting all of this logic in your DB server, you’re causing yourself extra woe when it comes time to replicate.

                                                  Also, the examples here are relatively simple, but once you start trying to do more complex queries purely through stored procedures, you’re again just eating up memory in your precious DB layer. So, lets say you start to split things up into smaller function which are called in sequence from your….application layer. And it all quickly breaks down from there.

                                                  There is a reason this is something we used to do.

                                                1. 6

                                                  I’ve been a pretty big fan of jq for a number of things: * lots of my logs these days are in json (so they’re machine parseable and human readable). It makes filtering them child’s play. * Some of our ETL pipelines have a decent amount of JSON in them and jq makes it pretty quick / short work to do pre / post processing pretty quickly.

                                                  1. 16

                                                    I appreciate the conversation and that assumptions about Scrum are being challenged. And I agree that Scrum is not a silver bullet.

                                                    But I’ve seen planning poker work. It does not always go like the author’s anecdote. If some dev just barks “20 points? REALLY?” when the team is trying to come to an estimate, that dev is an asshole and you’ve got larger problems.

                                                    And I’ve seen morning standups work too. Someone has to be tasked with keeping the team on task. The conservation needs to be limited to what happened yesterday, what happens today, and what blockers can the PM tackle for the team.

                                                    I’ve seen this work in large organizations. I’m not talking about an 8 person start-up. Just because it’s not universally applicable or successful doesn’t mean it needs to die in a fire.

                                                    (Nice troll style points for the title and the image of “Visual Studio TFS” branded planning poker cards though.)

                                                    1. 6

                                                      Agreed. I also think the article misses massively on the “Why are we supposed to think developers are not business people?” It’s more the case that developers are not necessarily subject matter experts on the business subjects. Your US-based developer is going to understand international finance issues better than the international accounting folks? Please tell me more of all the magical unicorns you’ve employed who hold better subject matter expertise than.. well… those who work in the subjects.

                                                      1. 7

                                                        I’ve noticed that developers themselves are very prone to the misconception that being good at writing software makes them good at everything that their software deals with. particularly annoying is when they have some reductive argument that they are convinced is correct because everyone else is clearly just overcomplicating things.

                                                      2. 2

                                                        Also, planning poker isn’t scrum in the same way that syrup isn’t pancakes. Some people use them together, sure. But it’s a pretty weak argument.

                                                        On the other hand, there’s something to be said about how common it is to do “scrum plus” or “scrum but.” (And, indeed, much has been written about this, and a fair bit more coherently as well.)

                                                        It’s both a criticism and a mundane fact that scrum doesn’t reliability fix every organizational misstep within a group and the groups with which it must interact. It’s not a very opinionated framework, and so it tends to attract opinions, both in favor of planning poker and the like, and against.

                                                      1. 10

                                                        Apologies if this formats poorly:

                                                        • 1 -> Let the team member determine timing. You don’t want a meeting that is compulsory which is seen as a time sink.
                                                        • 2 -> Your job is no longer to code / perform technical things (aside from traffic management). Your job is to make your team members better and let them do their jobs. That means you should be pushing back on product management and helping your team to do what makes sense. You can still do some technical things by (for example) performing code review and design review (to help improve your team); however, it’s probably more helpful to your team to let them do their jobs and eliminate obstacles.
                                                        • 3 -> Always be on time to things. If you’re late that means whoever you’re meeting with is not important.
                                                        • 6 -> Keep your hiring bar high. Period. Once you let your bar down to just “get things done” you are admitting that you are choosing faster and cheaper over high quality (in the infamous triangle of faster, cheaper, quality…pick two). When you hire to get things done, you are incurring talent debt.
                                                        • 7 -> Retention is more important than hiring IMO. The folks you keep are the ones who are determining your culture. The people you hire are the people who change your culture. It’s also typically cheaper to retain people than to hire people.
                                                        • 8 -> I don’t believe in once a year formal performance reviews as they commonly exist. I would rather have 360 reviews with my folks every 6 weeks or so. Here’s where I think you’re doing very well. Here’s where I think you can improve. Here are a couple concrete things you’ve done recently and how I feel about them (good or bad). Now let’s talk about how I am performing for you…or how you feel others can improve/are doing well. I like to think of it like the agile world. I’d rather things being iterative as opposed to “Remember that project you worked on 9 months ago…here’s what you did wrong there.”
                                                        • 9 -> Increasing whose quality and productivity? Yours? Your team members? Your entire teams? Those all require different tools and appraoches. The only thing I’ve done a lot recently which I feel like most places don’t do is pay down technical debt on a regular basis (be it 1 week per month or 1 out of every N sprints, etc.) Schedule it and do it. Otherwise, I feel there tends to be a lot of lip service to paying technical debt and not enough doing of it (until it becomes much larger than it needs to be)
                                                        • 10 -> I like the 20% time but I’ve never been able to do this well. Closest thing was a sprint every N which was “a free sprint”. It wasn’t as successful as I wish it had been. I’d love to hear how people do this well (and not just the 2 day hackathon every quarter like some places do).
                                                        1. 1

                                                          1 -> Let the team member determine timing. You don’t want a meeting that is compulsory which is seen as a time sink.

                                                          I disagree with this sentiment. 1-on-1’s are really important and a lot of people either won’t own up to wanting them or won’t realize how important they are. Making it a regular thing (ours are every 2 weeks) means someone doesn’t have to feel weird or out of place asking for a meeting with their manager. I think a lot of people don’t naturally feel comfortable doing that and a regular, compulsory, meeting makes it much easier.

                                                          Most people won’t tell their manager what is on their mind naturally, you have to force it out of them.

                                                          1. 2

                                                            While I agree that it’s easy to underestimate the value of 1-on-1’s as an engineer being managed, it is valuable to give the engineer some input in what frequency is ideal. It helps to make clear that 1-on-1’s don’t need to be super-structured if there is not much to talk about, just grabbing a coffee some weeks and talking about the family can be just as nice as airing complaints about peers or blockers on other weeks.

                                                            As a manager, there is a lot you can glean from casual conversation with your direct reports—about their happiness, productivity, ambitions.

                                                        1. 2

                                                          I’m embarrassed to say! I’m implementing some really simple automated trading code. And it’s taking me fucking forever because I don’t know Java or automated trading systems.

                                                          1. 2

                                                            I love the JVM, but you should be careful about using anything with stop the world pauses for super-low-latency systems.

                                                            1. 1

                                                              There’s ways around it. In particular check out this guy’s blog: http://vanillajava.blogspot.com/ . From the same guy who wrote OpenHFT and Chronicle (Peter Lawrey).

                                                              1. 1

                                                                Agreed (except that I admire the JVM without actually liking it), but we aren’t doing anything super-low-latency.

                                                            1. 1

                                                              I don’t get it. Tracky posts all sorts of stuff that is available in the http headers anyway. There are still multiple invisble pixels (at least on the home page).

                                                              1. 1

                                                                If i recall they did that (duplication of the http header info) so they had essentially denormalized records (everything in one flat record) in the json logs..it tends to make certain kinds of analysis quicker / easier when you get to the analytics systems.

                                                              1. 1

                                                                “One question begged of Big Data has been – is anybody actually handling data big enough to merit a change to NoSQL architectures?”

                                                                1. 1

                                                                  I think part of the issue is that the volume (aka size) is only one of the 4 Vs. I would think that the velocity will end up having more of an impact on the architectures because of how some of the consensus algorithms end up working (well…velocity in combination with distribution (think transatlantic / transpacific) in combination with volume (larger clusters)).