1. 3

Realize you can’t possibly answer about everything (while doing the job itself!), but the peek at the architecture makes me curious about adjacent stuff–hardware and what Wikipedia’s particular load looks like to CDN. Just to fire off random questions:

Does a PoP have relatively cheap boxes or fewer bigger ones? Is a PoP server’s network/disk/CPU/RAM balance far off from a typical app server’s? Is the filesystem layer SSD or HDD? (Would very weakly bet on lower-end SSD, e.g. SATA: no more worries about IOPS, but cheap for an SSD.) Is the size public for any of the PoPs?

Also, given that images, etc. tend to be larger than text but easier in other ways (e.g. they don’t normally need to expire quickly), I wonder how much your cost/complexity is driven by big media files (needing huge storage, etc.) vs. articles (needing more origin fetches?). (That’s not quite even a well-formed question.) I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.

Again, I don’t really expect answers, much less complete ones. I hope you at least take the peppering of questions as an indication people find all this stuff interesting. :) And of course much appreciation for what you’re working for as well!

1. 4

Is the filesystem layer SSD or HDD?

We’ve got a mix of cheap SSDs for the OS and good NVMes (Samsung PM1725a/PM1725b) for the on-disk cache. See https://wikitech.wikimedia.org/wiki/Traffic_cache_hardware for the details. That page should indirectly answer some of your qualitative questions too.

Is the size public for any of the PoPs?

Pretty much everything is public. :) On-disk caches are 1.6T per host, see for instance ATS cache usage on this Amsterdam node. We have 16 servers per PoP except for San Francisco and Singapore (12).

given that images, etc. tend to be larger than text but easier in other ways (e.g. they don’t normally need to expire quickly), I wonder how much your cost/complexity is driven by big media files (needing huge storage, etc.) vs. articles (needing more origin fetches?).

Very good question, I’ll keep it in mind for the next article. In brief: we do have two logically distinct cache clusters, one for larger files like images and videos and another for everything else, including html/css/js and the like. The former is called “upload”, the latter “text”. Their VCL configuration is slightly different, see upload vs text, but most importantly the in-memory frontend caches are kept separate given the different access/expiration patterns you’ve mentioned.

I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.I also wonder about how those 10% of uncacheable hits break down, e.g. relaying logged-in users uncacheable pages vs. actual long-tail article fetches.

Big difference between text and upload. Traffic for logged-in users, as you guessed, isn’t cacheable and forms the bulk of the ~7% “pass” you see in the breakdown here. When it comes to upload, instead, the hitrate is as high as ~96%.

I hope you at least take the peppering of questions as an indication people find all this stuff interesting

This is very useful feedback for the next article, thank you!

1. 2

Thank you! I would have guessed, from the sheer number of views WP gets, that each PoP would need an even bigger pipe than you could fill with 10GbE from 12-16 cache nodes, but poking at the public Grafana for the SFO PoP, it looks like you’re actually plenty well provisioned on that front. Neat!

1. 10

It makes me unreasonably happy that Wikipedia hasn’t succumbed to the trend of using Cloudflare, Cloudfront, or one of the other huge CDNs.

Also, just for fun, I used curl -I to find out what headers Wikipedia returns for a successful request. The returned headers include a GeoIP cookie that did a pretty good job of identifying the region I’m in, including the system’s guess at my country, state, city, and approximate latitude and longitude. I wonder how much it costs Wikipedia to get that information.

1. 4

Indeed. I’m definitely looking forward to the next installments, especially information about Apache Traffic Server (and potentially alternatives considered). Everyone seems happy offloading their CDN workloads to other companies, so I haven’t seen much public content about running your own. How many people really need to pay some company for access to 200+ PoPs and all that fanciness? Clearly not Wikipedia.

Re: location data, it looks like MaxMind offers databases at that granularity for $100 a month. IANAL but my reading of the licensing info seems like Wikipedia would not need a commercial license, and could use the MaxMind database at that$100/mo price point:

you may use Geolocation Functionality to customize and target your own ads for your own products and services, surveys, and other content but may not use Geolocation Functionality in connection with a service that customizes or targets any content on behalf of your customers, users, or any third party

1. 5

location data, it looks like MaxMind offers databases at that granularity for 100 a month That is correct. We also use the netspeed stuff, so that’s ~190 USD a month. See our maxmind puppet configuration and the documentation by the Analytics team for details about what the information is used for. 1. 1 The big question for me is why they updated Varnish if they had a working setup with Varnish v3 and they knew that their preferred backend was moved to the proprietary version in v4. 1. 9 Hey! We had to upgrade because v3 wasn’t supported anymore by the Varnish development team, so no more bugfixes. Being a team of 2 we surely did not have the capacity to maintain a Varnish fork, on top of all other things. :) 1. 1 Sure, I get that, but it had been working for years. Were you seeing any vulnerabilities or bugs? 1. 7 Frequently, yes. https://phabricator.wikimedia.org/T133866 is just one example, but if you dig into our phab you’ll find plenty more! Plus of course the idea is that when a security vulnerability is discovered you want to already be running the supported version. Take into account that upgrading from v3 to v4 was a project that took many months and involved porting hundreds of lines of VCL code, it wasn’t a matter of apt dist-upgrade. 1. 2 Commiserations, that does sound like a bit of a pain. I hope that the new system serves you well :) 2. 2 Personally, I’ve seen Varnish segfault on non-malicious input more than once. Given that, I think it’s implausible to hope that serious security bugs won’t sometimes turn up. 2. 2 Varnish v4 was released in 2014 and v3 went EoL a year after that in 2015. It hasn’t had any security patches etc from upstream since then. 1. 2 Varnish v4 was released in 2014 and v3 went EoL a year after that in 2015 Correct, and we upgraded in 2016 (one year too late!). It hasn’t had any security patches etc from upstream since then. Right, if it’s unsupported, upstream does not provide fixes. 1. 8 The article links here which contains the phrase “GNU/Systemd”. That really made my day. This is some quality performance art. 1. 3 “catalogue of carnage” did it for me 1. 1 How would that be enforced exactly? 1. 4 As I understand it the ruling is “Storing customer data in the US is not compatible with GDPR compliance”, so it would be enforced using the existing GDPR enforcement regime. 1. 6 Sure, but where can you store a chat conversation between European and USA citizens ? 1. 4 In Europe 1. 3 On their own devices. Use end-to-end encryption while you still can (but that’s a good question in general) 2. 2 The CLOUD Act seems to be removing the distinction between data stored in the USA versus data stored abroad when it comes to US companies. As far as I understand it, the act in a way extends American jurisdiction to every country where the server of an American company is located, so perhaps a more important thing EU states can do in this regard is not entering CLOUD Act agreements with the US at all? I’m only partially trolling. 3. 0 Why, by giving EU States complete access to their data feeds, of course. I wonder if I’m being paranoid by seeing this as a subtle play for warrantless surveillance? 1. 11 I think it’s far more likely that it will be enforced with the possibility of outlandish fines or loss of market access if found to be in violation of the law. That would (roughly) align with how other data privacy regulations are established in the EU. A gross expansion of warrantless surveillance seems quite unlikely in the EU, as there is a cultural belief that data about one’s self belongs to one’s self which is in contrast to the American culture where data about one’s self is typically viewed as belonging to whoever collected the data. 1. 20 In case anyone’s wondering what the deal is here: lots of European countries, especially in Eastern and Central Europe, but also some Western European countries (e.g. Germany) have a bit of a… history with indiscriminate data collection and surveillance. Even those of us who are young enough not to have been under some form of special surveillance are nonetheless familiar with the concept, and had our parents or grandparents subjected to it. (And note that the bar for “young enough” is pretty low; I have a friend who was regularly tailed when he was 12). And whereas you had to do something more or less suspicious to be placed under special surveillance (which included things like having bugs planted in your house and phones being tapped), “general” surveillance was pretty much for everyone. You could generally expect that conversations in your workplace, for example, would be listened to and reported. With the added bonus of the fact that recording and surveillance equipment wasn’t as ubiquitous and cheap as it was today, so it was usually reported by informers. Granted, totalitarian authorities beyond the Iron Curtain largely employed state agencies, not private companies for their surveillance operations – at least on their own territory – but that doesn’t mean the very few private enterprises, limited in scope as they were, couldn’t be coopted into any operation. And, of course, the Fascist regimes that flourished in Western Europe for a brief period of time totally partnered with private enterprises if they could. IBM is the notorious example but there were plenty of others. Consequently, lots of people here are extremely suspicious about these things. Those who haven’t already experienced the consequences of indiscriminate surveillance have the cautionary tales of those who did, at least for another 20-30 years. If someone doesn’t express any real concern, it’s often either because a) they don’t realize the scope of data collection, or b) they’ve long come to terms with the idea of surveillance and are content with the fact that any amount of data collection won’t reveal anything suspicious. My parents fall in the latter category – my dad was in the air force so it’s pretty safe to assume that we were under some form of surveillance pretty much all the time. Probably even after the Iron Curtain fell, too, who knows. But most of us, who were very quickly hushed if they said the wrong thing at a family dinner or whatever because “you can’t say things like that when others are listening”, aren’t fans of this stuff at all. Edit: Basically, it’s not just a question of who this data belongs to – it’s a pretty deeply-ingrained belief that collecting large swaths of data is a bad idea. The commercial purpose sort of limits the public response but the only reason why that worked well so far is that, politically, this is a hot potato, so there’s still an overall impression that the primary driving force behind data collection is private enterprise. As soon as there’s some indication that the state might get near that sort of data, tempers start running hot. 1. 5 For more details on this, Wikipedia’s entry on Stasi, the security service of East Germany, is a great read. Stasi maintained detailed files (on paper!) on millions of East Germans. Files were kept on shelves, and shelves were >100 kilometers(!) long when East Germany fell. It is easy to imagine why Facebook’s data collection reminds people of Stasi files. 1. 1 There were some amazing stories floating around in 1989 – like, the Stasi were sneaking across the border into the West to buy shredders, because they couldn’t shred the documents fast enough; and the army of older ladies who have been painstakingly reassembling the bags and bags and bags of shredded documents. 2. 3 To be fair with powers shifting, companies consolidating, individuals having the same money and thereby power of whole governments, and individual companies or partnering ones not only being owrking individual sectors anymore and governments outsourcing more and more of their stuff (infrastructure (IT & non IT), security, etc. and corporations creating pretty much whole towns for their employees and oftentimes families they overall become more similar to governments, but usually with fewer guarantees by things like constitutions. 1. 2 Absolutely. There’s been talk of a “minimal state” for decades now, but no talk of a “minimal company”. Between their lack of accountability, the complete lack of transparency, and the steady increase of available funds, I think the leniency we’re granting private enterprises is short-sighted. But that’s a whole other story :). 2. 5 The US actually claims the right to warrantless surveillance of non-US citizens, through FISA. Additionally, through the CLOUD act, they claim the right to request personal information from US companies, even if this information is not stored on US soil. Looking at the political side of things, many EU lawmakers are perfectly fine with engaging in a little protectionism for European IT companies, and if EU privacy law makes life difficult for FAANG, that’s perfect. On the other hand, the US is trying to use the world dominance of its IT companies as a way to extend the reach of its justice and surveillance system. Then there are FAANG-paid lobbyists, who keep pushing for treaties that claim the US extends protections to EU citizens’ data, even though it clearly doesn’t. They don’t last long once they get taken to court. This is why some US tech companies, like Salesforce, are now lobbying for a data protection regime in the US - this would be one way to reconcile this difference. This is a trade war, and the victims are smaller US companies that shy away from doing business in the EU. 1. 3 It might also just be that having written Homebrew isn’t enough to automatically get a job everywhere. 1. 48 I’ve read through a lot of these kind of discussions in the last week, and one thing that really strikes me is that they consist almost entirely of white people discussing this. This seems a bit odd to me because there are plenty of non-white programmers as well. I’d like to think that these people are more than articulate enough to raise these kind of issues themselves if they have a desire to, but thus far I gave not really seen much of that. Quite frankly, I find that the entire thing has more than a bit of a “white saviour” smell to it, and it all comes off as rather patronising. It seems to me that black people are not so fragile that they will recoil at the first sight of the word “master”, in particular when it has no direct relationship to slavery (it’s a common word in quite a few different contexts), but reading between the lines that kind-of seems the assumption. For me personally – as a white person from a not particularly diverse part of the world – this is something where I think it’s much wiser to shut up and listen to people with a life experience and perspective very different than mine (i.e. black people from different parts of the world), rather than try and make arguments for them. I think it’s a very unfortunate that in the current climate these voices are not well heard since both the (usually white) people in favour and opposed to this are shouting far too loud. 1. 28 It’s called White guilt. Superficial actions like changing CS terms and taking down statues are easy ways to feel better about oneself while avoiding the actual issue (aka: bike-shedding). 1. 5 I had the same thought: this is something that is easy to have an opinion about and feels achievable. That makes it very attractive to take action on, independent of the actual value it has. 1. 8 It is easier to change the name of a git default branch and put that on your CV as an action demonstrating you are not racist, than it is to engage in politics and seek to change some of the injustices that still remain. 1. 6 Or to put it really on point: it’s easier for GitHub to talk about changing the default branch name on repos created on GitHub from ‘master’ to ‘main’ than it is for them to cut their contract with ICE. 2. 14 It’s not like you can guess someone’s race from a gravatar. Not to mention, one of the liberating features of the Internet is being able to hide your identity and be treated for what you say in stead of what you are. On the flip side that does mean everybody sees everyone as an adolescent white male. In any case, there’s a black engineer expressing their thanks in the comment section of the OP. 1. 11 I probably wasn’t too clear about this, but I did not guess anyone’s skin colour; I just looked at their profile pictures, names, etc. For example the author of this post is clearly white, as are the authors of the IETF draft he linked (I did a quick check on this), everyone involved in the Go CL was white, and in the Rubocop discussion everyone was white as well as far as I could tell – certainly the people who were very much in favour of it at the start. There certainly are non-white people participating – anonymously or otherwise – but in general they seem to be very much a minority voice. Or, to give an analogy, while I would certainly support something like Black Lives Matter in various ways, I would never speak on the movement’s behalf. It’s simply not my place to do so. On the flip side that does mean everybody sees everyone as an adolescent white male. Yeah … that’s true and not great. I try not to make assumptions on the kind of person I’m speaking to, but “talking” to just a name is very contrary to human social interaction and it’s easy to have a mental picture that’s similar to yourself and those around you. This is kind of what I was getting at: sharing of different experiences and perspectives is probably by far the most helpful thing and constructive thing that can move this debate (as well as several other things) forward, instead of being locked in the shouting match it is today. I have no illusions that this will happen, because far too many people seem far too eager to comment on the matter, and to be honest I’ve been guilty of that as well. 2. 15 If we look back at how visceral the reaction to these types of ideas can be, and especially how that response is so often personally directed, it should be no surprise that someone who feels in any way marginalized or at risk in the software community might be reluctant to speak up. 1. 15 OK, so I think you’re referring to the Reddit Go thread (which was a dumpster fire of “I’m not racist but…” comments; for someone to get so upset about someone else’s internal code base is proof of some underlying issue). Here’s some things to think about: • “It seems entirely white people discuss this”: There’s a really obvious reason for this. Look at Google’s diversity numbers: their value of hiring vs attrition places the number of black people at Google at 3.7%. And yet the census reports 12.1% in the US are African American. Who do you think is going to be discussing this? They’re not here. They can’t be part of this conversation. Worse, black people leave Google faster than other demographics, so even when they get there they decide they don’t like it more and leave. Why would you work hard for your whole life to get a job at Google and then decide to leave? What is it about the software engineering environment that is toxic? Why bother getting upset and making a noise when you’ve already decided it’s hopeless and given up? • “It has a white savior smell”: It is incumbent on the privileged class to show allyship and help build equality for the underprivileged. It is unacceptable to put on blinkers and go “they’ll work it out”, as it ignores the systemic reasons why inequity exists. A big difference about what is happening now is that white people are going out to the streets and showing their allyship. These protests are very similar to those in Ferguson, except in Ferguson it was all black people. Nothing happened. Now that white people have come out, suddenly people start talking about “movements”. You can’t look to black people in CS and say “you overcome all the systemic problems” just like we can’t look to women in CSand say “you overcome all the systemic problems and please suck it up when you get battered with toxic behavior that’s just the way we are lol.” For the privileged class to sit back is for the privileged class to approve of what happens. “White savior” is a weaponized term to say that if you are white, you don’t get to help. Actually, if you are white, you absolutely should be helping. • “you should listen rather than make arguments for them”: Again, we are back to who do you listen to? Representation is so horrifically low. The Go thread raised up anyone who identified as black, had the same viewpoint as the mob and held that viewpoint as representative for the whole black community. You can’t just ask someone on the street and say “there you go, he said it”. You have to talk. And talk. And talk. And talk. To as many people as you can. Over and over again. I am so glad Google has the Black Googlers Network for exactly that sort of discussion. Names mean something. master/slave has clearly had it’s time. whitelist/blacklist (as in the Go thread) is unnecessary, a term that we basically invented, and is easily replaced. Would I change master to main? Probably not. But I’m certainly not going to come and say that attempting to move the needle, even if it doesn’t work or the needle move only a fraction, shouldn’t be attempted. Anecdote: Google offers a number of optional diversity training. I went to one that showed this video. I was in tears. It was so foreign to me and so horrific that I was crying at work and had to leave the room. That video is the result of white America doing nothing. 1. 12 I’m not really referring to the Reddit thread as such. Not only is Reddit really anonymous, so much of the time I have no idea who I’m dealing with, Reddit also has its fair share of … unpleasant … people. On Twitter Nate Finch mentioned he banned a whole truckload of people who had never posted in /r/golang before coming in from whatever slimepit subreddit they normally hang out in. Unfortunately, this is how things work on Reddit. There were some interesting good-faith conversations, but also a lot of bad-faith bullshit. I was mostly referring to the actual CL and the (short) discussion on that. As for Google diversity, well, Google is just one company from one part of the world. The total numbers of developers in India seems comparable or greater than the number of developers in the US, for example. I’ve also worked with many Brazilian developers over the years, so they also seems to have a healthy IT industry. There are plenty of other countries as well. This is kind of what I meant with the “outside of the Silicon Valley bubble” comment I removed. Besides, just because there are fewer of them doesn’t mean they don’t exist (3.7% is still >4k people) or that I need to argue things in their place. It’s one thing to show your allyship, I’m all in favour of that, but it’s quite another thing to argue in their place. I have of course not read absolutely anything that anyone has written on this topic, but in general, by and large, this is what seems to be happening. This is something that extends just beyond the racial issue; I’ve also seen people remove references to things like “silly” as ableist, but it’s not entirely clear to me that anyone is actually bothered by this other than the (undoubtedly well-intentioned) people making the change. The Go thread raised up anyone who identified as black, had the same viewpoint as the mob and held that viewpoint as representative for the whole black community. Yeah, this is a problem: “here’s a black person saying something, therefore [..]”. Aside from the fact that I wouldn’t trust such a post without vetting the account who made it (because, you know, /r/AsABlackMan) a single person commenting doesn’t represent anything other than that single person. An initiative from something like the Black Googler Network would probably be much more helpful than some random GitHub PR with little more than “please remove oppressive language” true-ism. If you’re telling people who have been used to these terms for years or decades that all of the sudden it’s racist and oppressive without any context or explanation, then it’s really not that strange that at least some people are going to be defensive. I really wish people would spend a lot more thought and care in the messaging on this; there is very little effort spent on actually building empathy for any of this; for the most part it’s just … accusations, true-isms, shouting. You really need to explain where you’re coming from, otherwise people are just going to be confused and defensive. 2. 4 This seems a bit odd to me because there are plenty of non-white programmers as well, especially if you look beyond the Silicon Valley bubble. Silicon valley is full of nonwhite programmers. White people are somewhat underrepresented in Silicon Valley compared to their percentage of the American population. And of course most of the world is not America. 1. 3 I’ve actually never been to the States, much less the Silicon Valley. I just dimly remember reading somewhere that it’s mostly white, but I probably just remembered wrong. I’ll just remove that part since it doesn’t matter for my point and I clearly don’t know what I’m talking about with that 😅 1. 4 In my previous company in SV (I was a remote engineer abroad, everybody else US based) we had literally 1 person on the team that was born and raised in the US, everybody else was from somewhere else. India and China were dominant, but not the only other countries. Other teams looked pretty much the same. CEO (+founder), VP of Eng and all team leads in Engineering were non US born and almost all non white too. I am now working for a different company with head-quarters in SF and it is a bit different. We still have pretty big mix of backgrounds (I don’t know how to express it better, what I mean is that they are not decedents of white Europeans). We seem to have more people that were born in the US yet are not white. Our European office is more “white” if you will, but still very diverse. At one point we had people from all (inhabited) continents working for us (place of birth), yet we were only ~30 people in total. 2. 2 Well, it’s full of programmers from Asian countries, to the point where I wouldn’t call their presence diverse. Being a Chinese/Indian/White male isn’t diversity, it’s a little bit more diverse. So while “nonwhite” is accurate, it’s not really the end game. Software engineering is massively underrepresented in women and in Black and Latinx. 1. 6 So who exactly sets the rules on what is diverse enough? Is it some committee of US Americans or how does that work? 1. 1 Ah okay so here we see the problem. It’s only diversity when there aren’t enough of them, then it stops counting as diversity once you actually have diversity and the goalposts shift once again. 2. 4 Quite frankly, I find that the entire thing has more than a bit of a “white saviour” smell to it, and it all comes off as rather patronising. It seems to me that black people are not so fragile that they will recoil at the first sight of the word “master”, in particular when it has no direct relationship to slavery (it’s a common word in quite a few different contexts), but reading between the lines that kind-of seems the assumption. Agreed that black folks are in the main far too sensible to care about this kind of thing. I don’t know that it is really so much about being a ‘white saviour’ (although that may be part of it); rather, I see it more as essentially religious: it is a way for members of a group (in this case, young white people) to perform the rituals which bind the group together and reflect the moral positions the group holds. I don’t mean ‘religious’ here in any derogatory way. 1. 9 Not sure about this specific issue, but in general there’s so much systemic stuff that it’s a bit much to ask black communities alone to speak up for everything. It’s emotionally exhausting if we don’t shoulder at least some of the burden, at the same time listening to and amplifying existing voices. To be honest I’d never really thought about the ‘master’ name in git before, and think there might be larger issues we need to tackle, but it’s a pretty low effort change to make. Regardless, the naming confused me anyway when I first used git and then just faded into the background. I’ll let black people speak up if they think it’s overboard, however, although I’d imagine there’d be different perspectives on this. 1. 3 Not sure about this specific issue, but in general there’s so much systemic stuff that it’s a bit much to ask black communities alone to speak up for everything. It’s emotionally exhausting if we don’t shoulder at least some of the burden, at the same time listening to and amplifying existing voices. Yeah, I fully agree. I don’t think they should carry all the burden on this and it’s not just helpful but our responsibility to be supportive both in words and action. But I do think they should have the initiative. Otherwise it’s just a bunch of white folk sitting around the table musing what black folk could perhaps be bothered by. Maybe the conclusions of that might be correct, but maybe they’re not, or maybe things are more nuanced. 2. 2 Really couldn’t disagree more — one of the big repository hosting services had this discussion just the other week. Much of the agitation came from Black employees, particularly descendants of enslaved Africans brought to America. I agree with you on one count, though: if you’re white and you don’t have any particular investment in this issue, you should probably keep your opinion on it to yourself. 1. 4 Which discussion in particular are you referring to? 1. 2 The idea that this is being primarily driven by white people, specifically as a “white savior” exercise. The word “master” does bring up a painful legacy for lots of Black people, and with the context as muddled as it is with “git master,” it makes sense to defer to them on how they perceive it, especially in an industry where they’re so underrepresented. 1. 3 You mentioned that: one of the big repository hosting services had this discussion just the other week. Much of the agitation came from Black employees So I was wondering if you have a link or something to that discussion? I’d be interested. 1. 3 I wish I had something to share — the conversations have been internal and I wouldn’t want to breach confidentiality (any more than I already have). Once we’ve all forgotten about this, if there’s a blog post to share, I’ll thread it here. 1. 3 Ah cheers, I didn’t realize it was an internal thing. 1. 9 you need something under and acceptable licence, so python is out. What’s wrong with python’s license? This is the first time I’ve heard anyone say there’s issues with it. Also, I think he forgot to mention Rust. Must definitely rewrite everything in Rust. /s 1. 2 As for the license, python’s license appears fairly similar to Perl’s artistic license. I would worry a bit about the strong terms in 1. This License Agreement will automatically terminate upon a material breach of its terms and conditions. for which no equivalent is visible in Perl’s license. 1. 2 1. 12 That was fixed in Python 2.0.1, released in June 2001… 1. 1 Those statements seem to be contrarianism of the basic form that highlights the bad aspect of things that are for the most part good. https://www.lesswrong.com/posts/9kcTNWopvXFncXgPy/intellectual-hipsters-and-meta-contrarianism And yes, they are factually correct. 1. 37 At my former employer, for a time I was in charge of upgrading our self-managed Kubernetes cluster in-place to new versions and found this to eventually be an insurmountable task for a single person to handle without causing significant downtime. We can argue about whether upgrading in-place was a good idea or not (spoiler: it’s not), but it’s what we did at the time for financial reasons (read: we were cheap) and because the nodes we ran on (r4.2xl if I remember correctly) would often not exist in a quantity significant enough to be able to stand up a whole new cluster and migrate over to it. My memory of steps to maybe successfully upgrade your cluster in-place, all sussed out by repeated dramatic failure: 1. Never upgrade more than a single point release at a time; otherwise there are too many moving pieces to handle 2. Read change log comprehensively, and have someone else read it as well to make sure you didn’t miss anything important. Also read the issue tracker, and do some searching to see if anyone has had significant problems. 3. Determine how much, if any, of the change log applies to your cluster 4. If there are breaking changes, have a plan for how to handle the transition 5. Replace a single master node and let it “bake” as part of the cluster for a sufficient amount of time not less than a single day. This gave time to watch the logs and determine if there was an undocumented bug in the release that would break the cluster. 6. Upgrade the rest of the master nodes and monitor, similar to above 7. Make sure the above process(es) didn’t cause etcd to break 8. Add a single new node to the cluster, monitoring to make sure it takes load correctly and doesn’t encounter an undocumented breaking change or bug. Bake for some day(s). 9. Drain and replace remaining nodes, one a time, over a period of days, allowing the cluster to handle the changes in load over this time. Hope that all the services you have running (DNS, deployments, etc.) can gracefully handle these node changes. Also hope that you don’t end up in a situation where 9/10 of the nodes’ services are broken, but the remaining 1 original service is silently picking up the slack and hence nothing will fail until the last node gets replaced, at which point everything will fail at once catastrophically. 10. Watch all your monitoring like a hawk and hope that you don’t encounter any more undocumented breaking changes, deprecations, removals, and/or service disruptions, and/or intermittent failures caused by the interaction of the enormous number of moving parts in any cluster. There were times that a single point release upgrade would take weeks, if not months, interspersed by us finding Kubernetes bugs that maybe one other person on the internet had encountered and that had no documented solution. After being chastised for “breaking production” so many times despite meticulous effort, I decided that being the “Kubernetes upgrader” wasn’t worth the trouble. After I left, is seems that nobody else was successfully able to upgrade either, and they gave up doing so entirely. This was in the 1.2-1.9 days, for reference, so though I’d be very surprised things may be much better now. 1. 33 tldr; If you can’t afford 6+ full-time people to babysit k8s, you shouldn’t be using it. 1. 13 Or, at least, not running it on-prem. 1. 6 True, if you out source the management of k8s, you can avoid the full-time team of babysitters, but that’s true of anything. But then you have the outsourcing headache(s) not including the cost(like you still need someone responsible for the contract, and for interacting with the outsourced team). Outsourcing just gives you different, and if you selected wisely, less, problems. 1. 5 True dat. But every solution to a given problem has trade-offs. Not using Kubernetes in favour of a different orchestration system will also have different problems. Not using orchestration for your containers at all will give you different problems (unless you’re still too small to need orchestration, in which case yes you should not be using k8s). Not using containers at all will give you different problems. ad infinitum :) 1. 6 Most companies are too small to really need orchestration. 1. 2 Totally! 2. 2 I keep having flashbacks to when virtualization was new and everyone was freaking out over xen vs. kvm vs. VMWare and how to run their own hypervisors. Now we just push the Amazon or Google button and let them deal with it. I’ll bet it 5 years we’ll laugh about trying to run our own k8s clusters in the same way. 1. 8 Yeah, this is the kind of non value added activity that just beg to be outsourced to specialists. I have a friend who work in a bakery. I learned the other day that they outsourced a crucial activity to a contractor: handling their cleaning cloths. Everyday, a guy come to pick up a couple garbage bag full of dirty cleaning cloth, then dump the same number of bag full of cleans one. This is crucial: one day the guy was late, and the bakery staff had trouble keeping the bakery clean: the owner lived upstairs and used his own washing machine as a backup, but it could not handle the load. But the thing is: while the bakery need this service, it does not need it to differentiate itself. As long as the cloth are there, it can keep on running. If the guy stop cleaning cloth, he can be trivially replaced with another provider, with minimal impact on the bakery. After all, people don’t buy bread because of how the dirty cloth are handled. They buy bread because the bread is good. The bakery should never outsource his bread making. But the cleaning of dirty cloth? Yes, absolutely. To get back to Kubernetes, and virtualization : what does anyone hope to gain by doing it themselves? Maybe regulation need it. Maybe their is some special need. I am not saying it is never useful. But for many people, the answer is often: not much. Most customers will not care. They are here for their tasty bread, a.k.a. getting their problem solved. I would be tempted to go as far as saying that maybe you should outsource one level higher, and not even worry about Kubernetes at all: services like Heroku or Amazon Beanstalk handle the scaling and a lot of other concerns for you with a much simpler model. But at this point, you are tying yourself to a provider, and that come with its own set of problems… I guess it depends. 1. 2 This is a really great analogy, thank you! 1. 2 It really depends on what the business is about: tangible objects or information. The baker clothes, given away to a 3rd party, do not include all personal information of those buying bread. Also, business critical information such as who bought bread, what type and when is not included in the clothes. This would be bad in general, and potentially a disaster if the laundry company were also in the bread business. 1. -7 gosh. so much words to say “outsource, but not your core competency” 1. 1 Nope. :) Despite my verbosity we haven’t managed to communicate. The article says: do not use things you don’t need (k8s). If you don’t need it, there’s no outsourcing to do. Outsourcing has strategical disadvantages when it comes to your users data, entirely unrelated to whether running an infra is your core business or not. I would now add: avoid metaphors comparing tech and the tangible world because you end up trivializing the discussion and missing the point. 2. 3 As a counterpoint to the DIY k8s pain: We’ve been using GKE with auto-upgrading nodes for a while now without seeing issues. Admittedly, we aren’t k8s “power users”, mainly just running a bunch of compute-with-ingress services. The main disruption is when API versions get deprecated and we have to upgrade our app configs. 1. 2 I ahd the same problems with OpenStack :P If it works, it’s kinda nice. If your actual job is not “keeping the infra for your infra running”, don’t do it. 1. 2 Wikimedia Foundation, the non-profit organization behind Wikipedia (Alexa top 5) as well as all sister projects such as Wiktionary, Wikiquote,.. is hiring Site Reliability Engineers, Application Security Engineers and more. All positions in San Francisco or remote. 1. 1 Just saw that you’re pretty new there, how is it going? 1. 2 Oh, I have seen this comment only now! I’ve been at WMF for 3.5 years now, and still like it. :) 1. 21 It would also be a way better email if you dropped all the HTML shenanigans. Composing better emails? Plain text should be number 1 on the list. 1. 14 Personally disagree on this. A proper HTML link is almost always cleaner than “(see link below)”. You can’t underline stuff (you can “put asterisks around stuff” but…). Sometimes you want to just reference an image inline! There’s a reason that word processors are a big business. Laying out a message nicely aesthetically is valuable for human consumption! The answer to “people always misuse HTML layout” isn’t to get rid of HTML layouts, it’s to teach people how to use it nicely! 1. 2 I totally get what you mean, but emails are not meant for rich text. Link to a shared document (or an HTML page!) if you need to convey some information that requires media. 1. 1 I will admit to preferring plain text, but if HTML is used sparingly, it’s fine. That said… A proper HTML link is almost always cleaner than “(see link below)”. This is true, but practically no one does this. Certainly in the 10+ years I’ve been working in corporate environments, no one makes the effort to do a proper link. The answer to “people always misuse HTML layout” isn’t to get rid of HTML layouts, it’s to teach people how to use it nicely! That time has come and gone. The only way this is likely feasible is if you change the tools in some way. Teaching people to use something “correctly” when there are a myriad of (easy) ways to use it incorrectly is a losing battle. 2. 10 Sorry to be blunt, but I don’t think it’s good that plain text email is such a shibboleth that you can say the equivalent of “I prefer plain text email” without giving any justification or discussion, and it will be the top comment on an article. There are good arguments against HTML email, there are good arguments that we should support some form of formatting, whether or not it’s HTML (see the sibling post by rtpg). Whichever view is right, I don’t think it should just be assumed without any attempt to argue for your opinion. 1. 2 I’m all for formatting. I regularly use markdown-style formatting in my plain text mails, and I’m an avid user of references[1] for links. HTML formatting in emails is an abomination. Period. It’s a hack. It causes all kinds of issues; it enables phishing, automatic “read notifications” that you did not approve of and difficulties for people with a need for screen readers, just to name a few. Not to mention the security vulnerabilities in clients that have resulted from trying to support this crap. [1] Like this. 2. 7 I agree but this is not realistic in a world where everyone usees Outlook. I swam against the tides and ran mutt at work for years, until one day I missed a critical update from my manager that used rich text to denote something in red. IMO This is a lost cause, but feel free to rage against the dying of the light :) 1. 2 Up voted for poetry. :) 1. 2 Alternatively, you can go work somewhere where the managers use mutt. :) https://boards.greenhouse.io/wikimedia/jobs/1623040 1. 1 Yup it’s all about choices. I’m willing to run Outlook as my mailer and deal with a bit of large corporate white noise because the value I derive from working here far FAR outstrips those minor annoyances. Everybody has to do their own cost/value curve calculations though. 2. 2 IMO This is a lost cause, but feel free to rage against the dying of the light :) Don’t worry; I will! https://p.hagelb.org/line.jpg 1. 2 +10 for an entirely apropos ST:TNG 2. 1 It is of course unfortunate that you missed such an important update. This is where it would make sense to use the Subject header to emphasize the importance of the message, such as the use of [URGENT] or [CRITICAL]. I totally agree that this is a lost cause, but yeah, I will continue to fight for the cause :) 1. 2 It is of course unfortunate that you missed such an important update. This is where it would make sense to use the Subject header to emphasize the importance of the message, such as the use of [URGENT] or [CRITICAL]. It had that, but as I said in the post, it was like: Blahblahblah <SUPER CRITICAL STUFF IN RICH TEXT COLORED RED THAT MUTT CAN’T SEE> blahBLAHblahblah. So yeah, no hope at all other than “Don’t use rich text.” In my work environment, I know of literally maybe 2 people in a team of 150 who use mutt. I don’t know what the stats are for the wider company, but I know it’s a TINY fraction. Expecting my manager to cater to my needs and preferences to this extent is unreasonable in my book. 1. 1 maybe 2 people in a team of 150 who use mutt OK, but how many are colorblind? 1. 3 You’re preaching to the choir. I personally think leaving critical information to the vagaries of color is a mistake, but it wasn’t my call. I just need to roll with the punches and deal with the technology environment I’m given. Yes, I know, I could go work over atPERFECT_COMPANY and all manner of things would be well, but having to run Outlook and deal with rich text in my E-mail isn’t enough to blunt what is otherwise a really compelling value prop for me in this job.

1. 7

I feel like it’s worth mentioning that some people feel like using Beamer is a bit of a curse. Nothing makes a presentation less engaging than piles of equations, tiny source code, and bullet points, but that’s precisely what Beamer makes easy to add.

I think some of the javascript libraries for presentations are a better fit as they make it easy to embed videos, animations and transitions that guide the eye to what matters. Unless you need to be able to send someone a pdf of the presentation, I’d hesitate to recommend using this library without large amounts of discipline.

1. 9

I think what’s going on here is that too many people have been sitting in university rooms listening to boring lecturers giving excruciating presentations made with Beamer and filled with hundreds of bullet points.

Not that I’m the biggest Beamer expert out there, but I use it for all my slides and I think the results are pretty good.

I think some of the javascript libraries for presentations are a better fit as they make it easy to embed videos, animations and transitions that guide the eye to what matters.

Animations, videos and transitions can be abused exactly like bullet points. In an effort to escape the boring-lecturer-effect, we should be careful not to err on the side of entertainment and produce presentations filled with animated gifs and almost zero content (I’ve seen many of those too, lately).

1. 2

I think some of the javascript libraries for presentations are a better fit as they make it easy to embed videos, animations and transitions that guide the eye to what matters.

Unless you want to print the slides..?

1. 1

Nothing makes a presentation less engaging than piles of equations, tiny source code, and bullet points, but that’s precisely what Beamer makes easy to add.

At university this has become quite popular. Instead of lecture notes we just have densely populated beamer presentations, which seem neither to read during a lecture nor to read when learning.

I think it’s a pity that many of the more interactive features of beamer beyond \pause are just forgotten, ignoring seemingly all principles of good presentation-making.

1. 1

I’m not clear even the advanced features really help. I think it matters what makes a tool easy to do.

1. 1

This is why I despise Beamer. Also it is a pain to use compared to alternatives.

2. 1

I totally agree! I have used reveal.js with pleasure and success, though I used only a bare minimum of the features, as I find most stuff in presentation software distractions not attractions.

1. 1

What javascript libraries do you have in mind? l’m a heavy (disciplined) Beamer user and, like @ema, think I produce quality slides, but I am curious about other tools for programatic presentation generation.

1. 1

Truthfully, these days I use reveal.js with Jupyter notebooks (https://github.com/damianavila/RISE)

I’ve used deck.js, reveal.js and eagle.js. Aside from needing to futz with npm these have all been perfectly adequate. Thanks to MathJax, I can still put in an equation if it’s needed. For some of them you can even use pandoc to generate the html directly from markdown https://pandoc.org/demos.html.

Like I said, if you are disciplined, Beamer can work really great. For me what counts is what the tools encourages you to do and not to do. From that standpoint, a lot of tools would have trouble outdoing sent

2. 1

That’s interesting to know. I’m in the process of converting my workshop slides from PowerPoint to beamer. Most of the slides are either code, short definitions, or diagrams, and I wanted to be able to easily find/replace my slides. They’re there to frame the live coding sections, so hopefully the plainness won’t be too much of a problem.

1. 3

Urgh, damn it. I guess I should download Wikipedia while Europeans like me are still allowed to access all of it… It’s only 80 GB (wtf?) anyway.

1. 3

That and the Internet Archive. ;)

Regarding Wikipedia, do they sell offline copies of it so we don’t have to download 80GB? Seems like it be a nice fundraising and sharing strategy combined.

1. 3

I second this. While I know the content might change in the near future, it would be fun to have memorabilia about a digital knowledge base. I regret throwing to the garbage my Solaris 10 DVDs that Sun sent me for free back in 2009. I was too dumb back then.

1. 2

Its a bit out of date but wikipediaondvd.com and lots more options at dumps.wikimedia.org.

I wonder how much traffic setting up a local mirror would entail, might be useful. Probably the type of thing that serious preppers do.

1. 1

You can help seeding too.

2. 4

Actually Wikipedia is exempt from this directive, as is also mentioned in the linked article. While I agree that this directive will have a severely negative impact on the internet in Europe, we should be careful not to rely on false arguments.

1. 1

Do you remember the encyclopedias of the 90s? They came on a single CD. 650MB.

1. 5

To be explicit, this is not a “modern systems are bloated” thing. The English Wikipedia has an estimated 3.5 billion words. If you took out every single multimedia, talk page, piece of metadata, and edit history, it’d still be 30 GB of raw text uncompressed.

1. 4

Oh that’s not what I was implying. The commenter said “It’s only 80 GB (wtf?)”

I too was surprised at how small it was, but them remembered the old encyclopedias and realized that you can put a lot of pure text data in a fairly small amount of space.

1. 1

Remember that they had a very limited selection with low-quality images at least on those I had. So, it makes sense there’s a big difference. I feel you, though, on how we used to get a good pile of learning in small package.

2. 1

30 GB of raw text uncompressed

That sounds like a fun text encoding challenge: try to get that 30GB of wiki text onto a single layer DVD (about 4.6GB?)

I bet it’s technically possible with enough work. AFAIK Claude Shannon experimentally showed that human readable text only has a few bits of information per character. Of course there are lots of languages but they must each have some optimal encoding. ;)

1. 2

Not even sure it’d be a lot of work. Text packs extremely well; IIRC compression ratios over 20x are not uncommon.

1. 1

Huh! I think gzip usually achieves about 2:1 on ASCII text and lzma is up to roughly twice as good. At least one of those two beliefs has to be definitely incorrect, then.

Okay so, make it challenging: same problem but this time an 700MB CD-R. :)

1. 4

There is actually a well-known text compression benchmark based around Wikipedia, the best compressor manages 85x while taking just under 10 days to decompress. Slightly more practical is lpaq9m at 2.5 hours, but with “only” 69x compression.

1. 1

What does 69x compression mean? Is it just 30 GB / 69 = .43 GB compressed? That doesn’t match up with the page you linked, which (assuming it’s in bytes) is around 143 MB (much smaller than .43 GB).

1. 5

From the page,

enwik9: compressed size of first 10e9 bytes of enwiki-20060303-pages-articles.xml.

So 10e9 = 9.31 GiB. lpaq9m lists 144,054,338 bytes as the compressed output size + compressor (10e9/144,054,338 = 69.41), and 898 nsec/byte decompression throughput, so (10e9*898)/1e9/3600 = 2.49 hours to decompress 9.31GiB.

2. 1

Nice! Thanks.

1. 10

The whole “value gap” theory, on which this proposal is based, is flawed.

The European Commission spent €360.000 to prove that copyright infringement negatively affects sales. The study they (we) have paid for concluded that, with the exception of recently released blockbusters, there is no evidence to support the idea that online copyright infringement displaces sales. So they’ve tried to keep it secret, till Julia Reda published it: https://juliareda.eu/2017/09/secret-copyright-infringement-study/

There must be a lack of better things to spend EU money on, I guess.

1. 1

The whole “value gap” theory, on which this proposal is based, is flawed.

But oh so profitable!

1. 46

Possibly a bit Apple centric. Better title: why i like my old macbook pro more than the new one.

1. 10

I’d never have clicked on that one! :)

1. 8

I clicked on it expecting a Thinkpad, came away very disappointed. >:(

1. 3

You might be new to marco.org? I don’t think he’s ever uttered the word “thinkpad” before. Apple fanboyism at it’s proverbial best.

1. 5

Ubuntu breaking itself with updates while still being less secure than openbsd is what made me switch to Openbsd stable on my laptop. Everything just works, even months later, and was surprisingly easy to setup (Openbsd was a breeze for me compared to arch linux).

1. 1

Going from ubuntu to openbsd! What a jump haha How was the transition? I always imagine it very hard to make.

1. 5

I have been all over the place … something like ubuntu -> arch -> fedora -> debian -> ubuntu -> debian -> ubuntu -> openbsd.

I just practised installing openBSD once in a VM to make sure I could get i3 working and after that there was no problem. The older I get, the more I appreciate things that don’t change under your feet.

1. 5

The older I get, the more I appreciate things that don’t change under your feet.

So, there’s this OS called “Debian”… :-)

1. 9

I tried updating a Debian stable machine that I had not touched for six months. it blew up in my face. I’ve never had that happen with OpenBSD.

ever

1. 3

As a counterpoint, I’ve had Debian machines that have gone through 10+ years of upgrades without problem. For example, i’m currently in the process of retiring a VPS that was first installed in 2005 (it’s only being retired as it’s still running a 32-bit userland, has become too much of a snowflake and needs to be rebuilt using configuration management tools).

That said, I’ve never had a problem with the OpenBSD upgrade procedure either :)

1. 1

Agreed, debian stable is ok if you stick in the same stable version. Upgrading between stable releases can be… problematic.

With openbsd its mostly just a matter of reading release notes to see what config files need to be looked at. I’ve never had a linux distro be as straight forward as openbsd in this regard. And that is why it runs all my routing duties.

1. 1

Regular security or point release upgrades never break, so I imagine you’re talking of an upgrade to a new major release. Do you remember which version you’ve tried to upgrade to and what went wrong exactly?

1. 7

TIL: we need more managers sitting in rooms enforcing processes. Cool.

1. 4

I think it’s more like we still need the usual amount of managers enforcing process.

Just like scaling a server backend requires certain coordination technologies (load balancers, container orchestration), scaling a human organization does too (including managers sitting in rooms reviewing change plans).

1. 8

Not surprised. News at 11.

So what useful things do lobsters people put in their motd?

My machines are named after Father Ted characters. So, naturally, all motds contain quotes from Father Ted characters. Rewatching an episode with a particular character to hunt for quotes while setting up and naming a new machine has become a ritual of mine :)

1. 11

Shameless plug, I made this years ago to make pretty pictures for my motd: https://max.io/bash.html

1. 2

This is amazing, you should post this

2. 7

So what useful things do lobsters people put in their motd?

We (Wikimedia Foundation) show: kernel and distro version, server role in puppet, last puppet run, machine installation date and last login. For example:

Linux cp1049 4.9.0-0.bpo.2-amd64 #1 SMP Debian 4.9.13-1~bpo8+1 (2017-02-27) x86_64
Debian GNU/Linux 8.8 (jessie)
The last Puppet run was at Fri Jun 30 07:30:49 UTC 2017 (22 minutes ago).
Debian GNU/Linux 8 auto-installed on Fri Mar 13 17:57:50 UTC 2015.
Last login: Thu Jun 29 15:51:39 2017 from bast3002.wikimedia.org

1. 3

Nothing. Our servers are terminated and replaced on a frequent enough basis that spending time on MOTDs would be a waste.

1. 3

That’s very pragmatic, but also a bit boring. No time for easter eggs?

If no other lobster replies to this thread, yer all a bunch of boring bishops ;)

1. 2

some figlet and lolcat action for me. variety is the spice of life.

1. 2

Our easter eggs mostly end up in some other part of the stack. SSH-ing into a machine is pretty much reserved for major outages.

1. 1

I miss Easter Eggs. The Word and Excel ones were fun to show kids in class. Also a lesson about the threat of subversion where management and/or customers didn’t notice an entire game hidden in their office software. “Code rah… review? I don’t think we’ve done anything like that over here…” ;)

1. 0

No time for easter eggs?

No time for celebrity worshipping.

1. 10

Easter eggs don’t necessarily involve a personality cult. They can just be silly jokes.

2. 3

I only put the most import things: http://ix.io/y6E (best viewed in terminal)

1. 3

So what useful things do lobsters people put in their motd?

The server hostname.

1. 2

I normally use a template and regenerate stats about the system using cron. Things like tailing the last few entries from auth.log etc, nothing too fancy.

1. 2
> cat /etc/motd
mksh: cat: /etc/motd: No such file or directory


guess I’m a boring bishop too :(

1. 2

Nice to see another Father Ted fan here - I’m surprised you haven’t created a custom fortune data file just for the purpose :)

Although almost all of my personal systems are configuration managed, I do still log in to them regularly (they’re halfway between pets and cattle) so a customised MOTD is something I’ve been meaning to look at for ages. Something showing load, pending package updates, etc (I only automatically install security updates). One day…

1. 2

I normally use update-motd to show a summary of key services the machine handles. Helps to prevent “oops” moments because you forgot what services it runs and reboot/etc.