Trying (for what feels like the umteenth time) to get my homelab k8s cluster up and running and using either Flux or ArgoCD.
Gonna be taking the k8s route myself very soon as Im starting to deal with it at work. Ill probably do a single node k3s cluster to get acquainted with kubectl then do a multi-node full-blown cluster at some point.
Thanks! There’s a lot to learn. My setup is also based on k3s, and the k8s-at-home GitHub org (and Discord) has been a really valuable resource.
If you’re ever thinking of playing around with it before running it on physical hardware, I’ve been using Kind to experiment - it essentially starts up the cluster inside a docker container and configures kubectl to talk to it. I think some people use Minikube to do something similar in a VM.
Thanks for those resources! I learned about KIND back in June, didn’t even know it was a thing. I’ve yet to try it, but it’s on my list. Gotta get at least a single node up first.
I joined the k8s at home discord and will probably drag a friend of mine in there with me as he’s interested in this stuff too.
“kind” made a lot of things … simple to approach I felt
@mxp can you explain a bit the benefits this offers over embedding one of the JMESPath libraries in your software directly? https://jmespath.org/libraries.html shows well tested libraries for most popular programming languages.
I hear about “rule engines” but I’m a bit stymied to understand when I want one, the concept is too abstract for me.
You can definitely embed the logic and take up responsibility of updating and managing rules including deployments. I believe it’s the same question as why have memcached when you can embed a cache library. Obvious advantages are reusability, centralization of rules management, and most important that I believe security folks in your team will like is sandboxing. Bug in an in process library has a broader attack surface than an isolated sidecar or a server.
I am envisioning this more like a Redis of expression evaluation where I want to provide with broad set of very commonly used functions and operators (like I already added support for regex matching), JMES is the entry point the value lies in operations and broad range of things rules can do. Just like you could have implemented your own version of Redis with all data structures but having it out of box makes it so much easier, and reusable.
I get that, but this is so many orders of magnitude slower than linking in a JMESPath library or even running it as a separate local process. And these sorts of tests/queries can easily become bottlenecks in a larger operation. (For example, the “map function” in CouchDB, evaluated using an external JS process.)
I would respectfully disagree, map function for larger queries is totally different case specially the CouchDB part with external evaluation.
I can give you counter example of Redis & Lua that has been out in production for years now, and I’ve seen people successfully deploy production grade apps on it. I did give rationale of why you might centralize it in a comment above, and how I plan on having a batch call to avoid multiple round-trips. That combined with sidecar approach should be able to give you sub-millisecond responses (like benchmarks).
Some common places to use them could be things like:
You can often think of them as glorified Boolean if thens. Right now I use them in a variety of situations:
Is that a bit more concrete? The “promise” you often hear is of allowing domain experts to define the rules and actions so they don’t need to be put directly into code.
Testing Prefect to see if it’s worth it to use instead of Airflow. Besides that I will continue my reading on ‘Where Wizards Stay Up Late: The Origins Of The Internet’.
I found the move to 2.x solved a number of pain points for our deploys. I’m not annoyed enough to move to something else yet but I’ll be curious what your thoughts are comparatively (I also have folks who’ve espoused dagster instead)
I wish all the best to lobste.rs readers (: Wesołych Świąt to everyone!
szczęśliwego Nowego Roku!
The best SRE recommendation around Memcached is not to use it at all:
Don’t use memcached, use redis instead.
(I do SRE and systems architecture)
… there was literally a release yesterday, and the project is currently sponsored by a little company called …[checks notes]…. Netflix.
Does it do everything Redis does? No. Sometimes having simpler services is a good thing.
SRE here. Memcached is great. Redis is great too.
HA has a price (Leader election, tested failover, etc). It’s an antipattern to use HA for your cache.
Memcached is definitely not abandonware. It’s a mature project with a narrow scope. It excels at what it does. It’s just not as feature rich as something like Redis.
The HA story is usually provided by smart proxies (twemcache and others).
It’s designed to be a cache, it doesn’t need an HA story. You run many many nodes of it and rely on consistent hashing to scale the cluster. For this, it’s unbelievably good and just works.
seems like hazelcast is the successor of memcached
I would put it with a little bit more nuance: if you have already Redis in production (which is quite common), there is little reason to add memcached too and add complexity/new software you may have not as much experience with.
this comment is ridiculous
it’s pretty much abandonware at this point
it’s pretty much abandonware at this point
i was under the impression that facebook uses it extensively, i guess redis it is.
Many large tech companies, including Facebook, use Memcached. Some even use both Memcached and Redis: Memcached as a cache, and Redis for its complex data structures and persistence.
Memcached is faster than Redis on a per-node basis, because Redis is single-threaded and Memcached isn’t. You also don’t need “built-in clustering” for Memcached; most languages have a consistent hashing library that makes running a cluster of Memcacheds relatively simple.
If you want a simple-to-operate, in-memory LRU cache, Memcached is the best there is. It has very few features, but for the features it has, they’re better than the competition.
Most folks run multiple Redis per node (cpu minus one is pretty common) just as an FYI so the the “single process thing” is probably moot.
N-1 processes is better than nothing but it doesn’t usually compete with multithreading within a single process, since there can be overhead costs. I don’t have public benchmarks for Memcached vs Redis specifically, but at a previous employer we did internally benchmark the two (since we used both, and it would be in some senses simpler to just use Redis) and Redis had higher latency and lower throughput.
Yup. Totally. I just didn’t want people to think that there’s all of these idle CPUs sitting out there. Super easy to multiplex across em.
Once you started wanting to do more complex things / structures / caching policies then it may make sense to redis
Yeah agreed, and I don’t mean to hate on Redis — if you want to do operations on distributed data structures, Redis is quite good; it also has some degree of persistence, and so cache warming stops being as much of a problem. And it’s still very fast compared to most things, it’s just hard to beat Memcached at the (comparatively few) operations it supports since it’s so simple.
Trying to decide if I want to take a new job. It’s agonizing, not the least of which because I’m always terrified of letting people down or not pulling my weight or whatever.
(My current employer is amazing and they went on to even match the offer from the other company, which is flattering…but the other offer is working with some of the biggest names in the world for what I do…but I work with one of my best friends where I’m at…but the other place looks to be more stable…but I’m not sure I’m talented enough to make it there…I’m worried my current company will be in trouble without me…)
the other place looks to be more stable
the other place looks to be more stable
Ride the train to the end of the tracks. Then change trains. That’s what I’m doing.
I’m leaning that way too but I suppose I should point out that the other place looks a lot more stable, and I have a lot of people who depend on me financially.
If you wanna bounce it off someone else in infosec lemme know … I, for one, generally follow the path of “small fish in a big pond”… as I’d rather learn a ton and feel like an idiot to help push myself.
I’m in the exact same boat. Love my current employer, work with three of my best friends, but I just can’t stand the job I’m doing anymore. I talked to my boss and he said he can’t lose me so he’s given me basically free range to do whatever I want, but what I want to do doesn’t fit the team i work on. But I’ve never worked professionally doing the other job so am I good enough to get hired at another company? Or will they just spit me out and I’m back to doing a job I’m good at but hate every minute of?
Burnout combined with imposter syndrome combined with social anxiety. Not a great mix, but that’s what I’m spending my weekend worrying about.
Burnout combined with imposter syndrome combined with social anxiety.
Burnout combined with imposter syndrome combined with social anxiety.
Holy crap I’m looking in a mirror.
I’m not sure I’m talented enough to make it there
I’m not sure I’m talented enough to make it there
The people who’ve offered you the job think you are talented enough, and they’ve managed to attract these other big names, so that’s gotta be worth something 😀
Data engineering the first 4 days:
Statistics the last day:
For the multi instance airflow, can you elaborate what you did? Like added more workers? Separated segment (dev vs stage vs prod etc)? Or like functionally separates?
Sure! It’s nothing too special but:
And keep HTTPS inbound running through as it does already:
ELB -> nginx ingress controller -> oauth2 proxy -> airflow webui service
We’ve gone for celery and rabbit since celery seems like the best supported of Airflow’s executors, and rabbit as the broker for celery because I’ve found it bit easier to supervise than Redis.
AWS resources are deployed by terraform, K8S resources for the moment just deploy by a shell script running kubectl apply -f.
kubectl apply -f
Awesome thx. We hadn’t jumped to k9s for the operators yet as EKS wasn’t supported in Frankfurt until very recently (and we’re very concerned about GDPR, etc.). Just for the record, redis has been pretty fire and forget for us (whereas I’ve had odd stability issues with Rabbit in the past (> a year ago)).
Thanks! Yes, I’d prefer redis myself, and appreciate the nudge back to it. Seems like it’s equally well supported these days as a celery broker; I’d been supporting older stuff in the past.
I recently deployed Drone as my self-hosted CI solution, so I’m working on a blog post about how I arrived there, and also on a few plugins for it, to automate/simplify some of the things I need it to do.
Look forward to reading. Just looking at drone vs concourse vs gocd now.
Post is now up, concourse & gocd included (though, neither in depth, because they failed to meet my requirements quite early :/).
Building a multi-tenant job scheduler for an ETL pipeline. We need to strike a balance between reasonable resource utilization and fairness. We can’t have one user monopolizing all our resources. I was pretty surprised I couldn’t find a reasonable open source or COTS solution that I could just plug into our existing pipeline. Everything I saw was for some specific pipeline system.
I think that’s because most pipelines should have some sort of workload management built into them. Like airflow has priority weight (which works fairly well if you build it into feedback loops, etc). What system(s) are you using in your pipeline?
Airflow was actually the first workflow system we evaluated. We needed to trigger one-time jobs based on messages in a queue - i.e. kafka or SQS, and that seemed not at all straightforward to do at the time we looked (about two years ago). It seemed like you would have to have a long-running job that polled for messages.
The other thing is that the priority weighting needs to be dynamic - to support something like round-robin per customer. That also seemed like it would not be a good fit. The workflow system we’re currently using is nextflow but with a custom system to kick jobs off.
– getting pmacct <> rabbitmq <> influxdb working, then putting a nice frontend on top of it
– auditing a Palo Alto install the MSP royally boned on the migration. I know BGP is somewhat obtuse on PANOS but…no excuse. in pre-sales you, unprompted, mentioned having one of four experts qualified to configure whatever is after the top of the line 5000 series. c'mon!
– quickly utilizing the last six days of my Azure $200/30 day credit to boot OpenBSD, get IPsec tunnels with BFD running, and do some iperf tests between regions for a PoC
– setup graylog to ingest wireless controller and firewall logs and make nice dashboards for front line support network troubleshooting
– continue building class outline and course work for a “python for network engineers” (a working title as it’s already in heavy use by Kirk Byers)
– lots of unikernel stuff. Kafka as a unikernel, pmacct as a unikernel. getting rumpkernels to boot with vmm on OpenBSD. getting ExaBGP into a unikernel, then doing ‘stress’ testing against OpenBGPd
– osm + packet clearing house IXP list + peeringDB + d3js = transform spreadsheet currently sitting at http://peering.exposed/ (after a particularly whiskey-infused discussion @ RIPE73)
– play with a couple of network verification tools I recently read and have been reading about, respectively: Propane and NetKAT
Is there some particular reason you’re going to rabbitmq first instead of tossing to influxdb via statsd or some such first? You just want to persist bits in flight?
mostly because pmacct speaks amqp natively, and slightly because I do not wish to run node.js in this instance.
What are you using for ingesting logs from rabbitmq to InfluxDB?
I’m looking forward that Paolo releases the support for Redis.
I’m wondering whether they really had to implement everything in order to learn those lessons. The “lessons” are pretty much the widely-known basic definitions of the well defined terms contained within them. It’s not like we’re talking about a chaotic emergent phenomenon that arises due to the complex interactions of the pieces in a complex system.
The content is originally from 1999.
I’m with enobayram – as a hobbyist game programmer in 1999 it was painfully obvious even then that TCP really wouldn’t cut it for any kind of twitch gaming over the internet.
Basic texts on networking also taught TCP was used for reliability with UDP for speed. Troubleshooting or warning guides talked about how TCP performance could drop off due to congestion control algorithms. So, I was writing reliable protocols on top of UDP. Until I discovered UDT. :)
Just sounds like the author never discovered this the easy way when learning about network programming. Then, had to discover it the hard way.
To be fair I think part of the issue was also expectations of LAN vs WAN. A “how bad could it be” when the answer turned out to be horrific.
For the record the author has learned his lesson in many ways and is doing fine (I work with him).
With the company shutting down, we also wanted to find a new home for our team … We’re excited that the members of our engineering team will be joining Stripe
If they actually went out and found work for the whole engineering team, that’s extraordinary. Bravo.
I wonder if they went through a route like this.
Offhand, I’d guess not. They both had same seed rounds from A16Z and A16Z is very good with relocating talent within portfolio (source: i’ve worked for like 5 of the portfolio companies)
Meetings meetings meetings :)
Have to interview around a half dozen folks. The most interesting thing is we’re quantitatively measuring our TAMs (technical account managers) on how well they demo and, separately, how they handle some field support situations. While its nice to hire smart folks, it’s good to be able to do something like this to objectively qualify that their skills are staying sharp.
On the down side I have to do a fair amount of glue code / integration work with sales force this week.
Finally i get to start getting some ETL pulls together to start building our customer success measurement analytics (time permitting)
I really wish we had the full dumps from both the Friend Finder and OPM hacks. It’d be very interesting to correlate the data between Ashley Madison, Friend Finder, and OPM. Clearance-holding cheating spouses with kinky fetishes and STDs = no no.
Not true actually. I have a multitude of friends with SCI who are into kink, etc. It’s generally not considered a big deal. Drug and alchohol issues are considered much worse by DISA.
Would you say the same thing if this was a hack of various people’s gmail accounts? This is private data that happens to be owned by a service many people disagree with existing at all. I don’t think we should even joke about going through it just to play judge.
Everything old is new again I suppose. Back in the late 90s when this was the popular approach, the downside was you were effectively limited performance-wise to vertically scaling your database as opposed to being able to horizontally scale your application layer.
Also…with the Hickey quote….it could be argued that you’re keeping things simpler by making the primary function of the database to store data…that placing one’s business logic / transforms within the database is increasing the complexity.
Anyways, like anything else, there’s no one clear answer. It’s always good to revisit assumptions, best practices, etc. as times change to see if there’s anything that is ripe for change / can be done better.
Exactly. It is (typically) much easier to scale your application layer than your DB layer. By putting all of this logic in your DB server, you’re causing yourself extra woe when it comes time to replicate.
Also, the examples here are relatively simple, but once you start trying to do more complex queries purely through stored procedures, you’re again just eating up memory in your precious DB layer. So, lets say you start to split things up into smaller function which are called in sequence from your….application layer. And it all quickly breaks down from there.
There is a reason this is something we used to do.
I’ve been a pretty big fan of jq for a number of things:
* lots of my logs these days are in json (so they’re machine parseable and human readable). It makes filtering them child’s play.
* Some of our ETL pipelines have a decent amount of JSON in them and jq makes it pretty quick / short work to do pre / post processing pretty quickly.
I appreciate the conversation and that assumptions about Scrum are being challenged. And I agree that Scrum is not a silver bullet.
But I’ve seen planning poker work. It does not always go like the author’s anecdote. If some dev just barks “20 points? REALLY?” when the team is trying to come to an estimate, that dev is an asshole and you’ve got larger problems.
And I’ve seen morning standups work too. Someone has to be tasked with keeping the team on task. The conservation needs to be limited to what happened yesterday, what happens today, and what blockers can the PM tackle for the team.
I’ve seen this work in large organizations. I’m not talking about an 8 person start-up. Just because it’s not universally applicable or successful doesn’t mean it needs to die in a fire.
(Nice troll style points for the title and the image of “Visual Studio TFS” branded planning poker cards though.)
Agreed. I also think the article misses massively on the “Why are we supposed to think developers are not business people?” It’s more the case that developers are not necessarily subject matter experts on the business subjects. Your US-based developer is going to understand international finance issues better than the international accounting folks? Please tell me more of all the magical unicorns you’ve employed who hold better subject matter expertise than.. well… those who work in the subjects.
I’ve noticed that developers themselves are very prone to the misconception that being good at writing software makes them good at everything that their software deals with. particularly annoying is when they have some reductive argument that they are convinced is correct because everyone else is clearly just overcomplicating things.
Also, planning poker isn’t scrum in the same way that syrup isn’t pancakes. Some people use them together, sure. But it’s a pretty weak argument.
On the other hand, there’s something to be said about how common it is to do “scrum plus” or “scrum but.” (And, indeed, much has been written about this, and a fair bit more coherently as well.)
It’s both a criticism and a mundane fact that scrum doesn’t reliability fix every organizational misstep within a group and the groups with which it must interact. It’s not a very opinionated framework, and so it tends to attract opinions, both in favor of planning poker and the like, and against.
Apologies if this formats poorly:
1 -> Let the team member determine timing. You don’t want a meeting that is compulsory which is seen as a time sink.
I disagree with this sentiment. 1-on-1’s are really important and a lot of people either won’t own up to wanting them or won’t realize how important they are. Making it a regular thing (ours are every 2 weeks) means someone doesn’t have to feel weird or out of place asking for a meeting with their manager. I think a lot of people don’t naturally feel comfortable doing that and a regular, compulsory, meeting makes it much easier.
Most people won’t tell their manager what is on their mind naturally, you have to force it out of them.
While I agree that it’s easy to underestimate the value of 1-on-1’s as an engineer being managed, it is valuable to give the engineer some input in what frequency is ideal. It helps to make clear that 1-on-1’s don’t need to be super-structured if there is not much to talk about, just grabbing a coffee some weeks and talking about the family can be just as nice as airing complaints about peers or blockers on other weeks.
As a manager, there is a lot you can glean from casual conversation with your direct reports—about their happiness, productivity, ambitions.
I’m embarrassed to say! I’m implementing some really simple automated trading code. And it’s taking me fucking forever because I don’t know Java or automated trading systems.
I love the JVM, but you should be careful about using anything with stop the world pauses for super-low-latency systems.
There’s ways around it. In particular check out this guy’s blog: http://vanillajava.blogspot.com/ . From the same guy who wrote OpenHFT and Chronicle (Peter Lawrey).
Agreed (except that I admire the JVM without actually liking it), but we aren’t doing anything super-low-latency.
I don’t get it. Tracky posts all sorts of stuff that is available in the http headers anyway. There are still multiple invisble pixels (at least on the home page).
If i recall they did that (duplication of the http header info) so they had essentially denormalized records (everything in one flat record) in the json logs..it tends to make certain kinds of analysis quicker / easier when you get to the analytics systems.