Threads for caius

  1. 3

    Spending a few days with most of my colleagues in 🇲🇽 in meatspace. First time I’ve met all bar one of them in real life, and we’ve got some exciting things planned together. Mostly looking forward to hanging out and putting real life mannerisms to the faces I’ve seen on zoom for a year plus.

    1. 14

      This is one of a bunch of different accounts by #mastoadmins. I find it funny that in the past month, scaling mastodon has gone from a problem no-one ever really had, to a problem that is well-documented and understood.

      This is a particularly interesting example, because @nova@hachyderm.io was running the instance as a small instance in her basement, that went from 700 ( a fairly small instance) users to 30,000 users (one of the biggest) in a month. It started out as a pet, and had to quickly scale up to a herd of cattle, and it’s a fascinating account of how they did that.

      1. 18

        The money quote of this account:

        In other words, every ugly system is also a successful system. Every beautiful system, has never seen spontaneous adoption.

        1. 2

          I believe it’s a paraphrase of the famous quote of John Gall: https://www.goodreads.com/quotes/9353506-a-complex-system-that-works-is-invariably-found-to-have

        2. 10

          It didn’t have to by the way. Leaving sign ups open up to 30k users all while hosting in the basement was a choice.

          1. 2

            A choice that was clear in hindsight, not clear during the initial investment.

            1. 3

              I dunno; it was pretty obvious past a certain point, and that certain point was much, much lower than 30k.

              You get to ten thousand users, you’re seeing one new signup every 90 seconds, what exactly do you think is going to happen next week? You get to 20,000, and what, you think it’s just going to stop for some reason? Doesn’t make any sense to me.

              1. 7

                I think they address that in the article, adding more users didn’t necessarily correlate to increased load on the system. I can see there might be some extra load (new users following other new users that you weren’t already federating with), but they likely would’ve had similar size issues if they’d capped it at 10k users rather than hitting 30k by the sounds of it.

                1. 1

                  That’s an interesting thought. Instance load should correlate with the size of the network that instance follows. As you point out, there should be a point where adding an extra user makes only minimal impact on the overall network, as they will follow people who are already known by the instance. I wonder where that point is.

                  1. 1

                    Timeline generation in Mastodon is surprisingly expensive, where it can be a major contributor to the total load of the server. IIRC Mastodon has an optimization to stop timeline generation for inactive users because mastodon.social had a ton of inactive users, but the timeline generation for them was contributing a lot of the total load.

                  2. 5

                    From TFA:

                    We were nowhere close to hitting limits of DB size, storage size, or network capacity. We just had bad disks.

                    1. 0

                      Sure, but even ignoring the moderation issues, it would have been much easier to move to new disks if the number of users were reasonable. You don’t have to move all the remote media, just the stuff owned by the actual local users.

                2. 2

                  Pretty bold choice, too.

                  1. 2

                    Here’s a thread from mastodon (link) from the hachyderm admin team that addresses this with more of their thinking:

                    Kris Nóva — @nova

                    Was finally able to write-up a little more detail about the production problems we faced with #Hachyderm last week. This covers the migration out of our basement and into Digital Ocean and Hetzner.

                    I can’t say enough about the team who worked so hard on this. I shared some screenshots in the blog, but you would have to have been there to really see how powerful this group of wonderful people really is.

                    @recursive

                    It’s very impressive that you didn’t close new user registrations the whole time. Nice work, y’all deserve a rest.

                    Kris Nóva — @nova

                    We closed them for a few hours the night before the final cutover. I made this decision. I really don’t think it would have made a difference with our our performance either way. The only reason for considering it was because we were worried about first impressions with users.

                    I’m not sure where the belief that new users are somehow impacting performance is coming from. We would need to be talking hundreds of thousands of users.

                    Hazel Weakly — @hazelweakly

                    you need to calculate a person’s home feed for the first time when an account is created and that can be expensive computationally.

                    In theory a huge spike of signups might knock you over, and depending on how the feed recalculation logic works, you could wind up with a potential thundering herd type of issue.

                    But… It’s gotta be a lot of signups. Fast enough that the server can’t keep up. It wasn’t the issue or even noticeable for us :)

                    dominic — @dma

                    my theory is that the older accounts that follow more users across instances are the bigger issue than any local-only home feed. those jobs that push and pull across the federation are the expensive ones to run, database wise, and more likely to require retries due to failures from network issues or other instance scaling issues.

                    Hazel Weakly — @hazelweakly

                    right, yeah. That’s one of the reasons we felt that new user signups are a red herring for performance concerns. There’s a lot of other more likely causes

                    Michael Fisher — @mjf_pro

                    One day I have to tell you a story from earlier in my career about an EMC VMAX 40K SAN that had 28 of 32 disk directors go offline due to a bug in the replication firmware. It wrecked a weekend and a few months beyond. What you had going on with the old NFS (can we call it “Hachyd’oh!”?) didn’t look like it had the data loss risk/impact that our incident did at least. But yeah, storage channel problems can REALLY mess up a day.

                    Malte Janduda — @malte

                    I’m still convinced that NFS would work with our setup :)

                    Root cause of our problems were the bad disks. We had fast response times for all GET requests, as the pg database fits into memory. But the POSTs that result in pg updates/inserts were super slow. This way we had lots of requests in flight, resulting in a high number of concurrent disk access - also for sidekiqs.

                    Hazel Weakly — @hazelweakly

                    I think it would work too under ideal conditions, but I particularly didn’t like how brittle it was to tune and how easy it was to have multiple different facets of the system cause each other to fail.

                    Maybe it would’ve been the same with object storage? But at least failing disks on one machine wouldn’t have cascaded to slow and unresponsive other systems

                    Malte Janduda — @malte

                    We basically needed to increase the nfsd thread count to be able to serve all concurrent accesses. The main problem here was really the poor performance of our disks.

                    But you’re right, NFS is backed by one server. When that server has a problem, the whole application has a problem.

                1. 2

                  Playing around with libp2p which I’m hoping to use to create a fully p2p mesh overlay network (aka a thing to setup lots of point to point wireguard tunnels).

                  I’ve got a wireguard setup going for always being on my home network, but it takes too much managing for my liking, so I want to make something that can be completely automatic. But something that doesn’t require giving access to my whole network to some closed-source vc-funded control system that pinky promises it won’t do whatever. Basically, I miss OG Hamachi.

                  1. 1

                    Headscale?

                    1. 2

                      Headscale

                      That’s definitely plan b. But this whole thing started because I tested out tailscale for a work thing and was unhappy with the inflexibility of the client. I’ve also got a few separate networks I’d like to have and having to put up a headscale server for each seems painful.

                  1. 1

                    Finishing off jobs at home before going away, packing, running the Xmas 5km the local brewery puts on, then flying to 🇲🇽 for my first $work offsite. It’ll be the first time meeting the folks I’ve been working with for a year, can’t wait. Also looking forward to escaping the UK winter for some sunshine and warmth.

                    1. 2

                      This is just the git tag, perhaps we should wait for the announcement or release notes?

                      1. 8

                        Member of the release team here. The announcement will (hopefully) be out soon – we’re waiting on an eval to finish and the channels to update.

                        1. 2

                          Thank you for your work!

                          1. 3

                            Haskell ghcWithPackages is now up to 15 times faster to evaluate, thanks to changing lib.closePropagation from a quadratic to linear complexity.

                            Wow!

                            1. 2

                              pkgs.polymc is back as pkgs.prismlauncher, nice!

                              1. 1

                                I think the upgrade section is missing? The link at the top points to https://github.com/NixOS/nixpkgs/blob/master/nixos/doc/manual/release-notes/rl-2211.section.md#sec-upgrading

                                1. 1

                                  It works when you visit the release notes on the homepage.

                                  https://nixos.org/manual/nixos/stable/release-notes.html#sec-release-22.11

                              2. 1

                                https://github.com/NixOS/nixpkgs/issues/193585 has the release schedule for reference

                              1. 4

                                iCloud Photo Library with my main MacBook Pro set to sync all content locally, then Arq backs it up into Backblaze b2. (Along with the rest of the machine, including iCloud Drive.)

                                1. 2

                                  How does this work? Does it have to go through iCloud to get to your MBP? And if so, are you constrained on iCloud capacity (we’re paying for 200GB and we’re already hitting that cap). If not, do you have to plug in a USB, or does it work automatically via the network?

                                  1. 1

                                    Syncs through iCloud yes. We have the family “everything” plan (Apple Premier?) so get 2TB cloud storage to use between us. Think my photo library is just over 300GB so far.

                                    If I had to plug the phone in, I’d never sync it.

                                  2. 1

                                    This is precisely what I do. I did recently switch from Backblaze B2 to MS OneDrive, as I needed the MS365 subscription for other (family) reasons, and it comes with 6 x 1TB Onedrive accounts. Two are used for backups.

                                    1. 1

                                      I’m curious why you use Backblaze B2 and not the Backblaze Personal Backup?

                                      1. 2

                                        I used to use their personal backup years ago when it first came out, switched to using Arq against One Drive and S3 I think it was, then switched to b2 because it’s cheaper (and easier to setup.) Stopped using One Drive when I left my previous job as it was the corporate account.

                                        I found backblaze personal backup didn’t backup system paths, which included homebrew (vague memory, might have been macports?) and I couldn’t restore a database I’d overwritten locally because it hadn’t been backed up. I understand why the client doesn’t do that, but it was a dealbreaker for me at the time. Arq also does client side encryption and multiple locations for me.

                                    1. 2

                                      Continuing with the slow progress emptying the old office. Don’t really have the motivation for it, but moving small amounts out a few times a day is slowly emptying it.

                                      Started building out the homelab at the weekend, atop NixOS. Using one server as the storage, the other as the compute (for now) with the nas mounted as SMB share on the compute node. Working well so far, need to migrate settings and media from the old setup into the new setup though. Finding apps that can’t be configured via configuration files is irritating. Thought I’d configure them via terraform once they were up, but found a bug in the terraform provider for the first one.

                                      1. 4

                                        Last weekend to empty the old office and turn it into the new spare room, got a friend visiting next weekend. Need to sort/move the last of the office stuff (3d printer, drawers, random cables/crap) either downstairs into the new office or upstairs to the loft for storage. Then get to swap the tiny sofabed currently in there for a much bigger sofabed. Possibly mount the old TV on the wall (I want to get my Xbox/wheel setup in there for driving games at some point).

                                        Also picked up a kindle 4 this week (cheap) to try and hack into an eink office door sign, so I can display “Alreet” / “Sod off” messages when I’m free or in meetings. Started a #project thread for it on the Elephant https://ruby.social/@caius/109360583504456226.

                                        1. 1

                                          Continue the migration of the old office into the new office. Attacked a large drawer over the weekend and threw out 98% of it (99% once I’d scanned the few letters I needed copies of.) More of that.

                                          The UK is pretty miserable weather-wise now, and I’ve not been getting on the static exercise bike because it’s either zero resistance or all the resistance. It’s supposed to have 25 levels of adjustment. Need to take that apart and look at it, suspect the mechanism that moves the magnet toward the flywheel has gone awry somehow.

                                          At work, continue building out new infrastructure. Currently putting together new AMIs for our EC2 instances, on a newer LTS release and with tooling we want included. Also experimenting with HCP Packer Registry and channels for building, testing, promoting images safely. Working well enough so far.

                                          1. 1

                                            Helping the local scout troop run a “monopoly inspired game” in the town for a morning (custom board, roll dice, walk to place and take picture. Sends you all round town and keeps them busy for two hours.)

                                            Then splitting my time between various jobs at home (fitting headlights/front bumper back on Z4, emptying old office, throwing stuff out on the way to new office) and entertaining the pets. Hopefully carve out some time in there to progress my scavenger hunt game app as well.

                                            1. 3

                                              $work - continuing with our build out of the Next Generation, and also starting to think about what our goals (North Star) should include for 2023. What problems irritate us? What problems does the business need answers for?

                                              !$work - continuing to empty the spare room (née office) of stuff so it can be tidied up (we’re not decorating it) and the new sofa bed installed ahead of friends staying at the end of the month. Also working on my Scavenger Hunt app, given it’ll be needed in less than four weeks now. Probably upgrade my Mac to Ventura at some point too.

                                              1. 6

                                                Flying to Dublin for some proper Guinness and Irish Whisky. 🇮🇪🍻

                                                1. 6

                                                  For whiskey (spelled with an ‘e’ here), I’d recommend Redbreast 12, Red Spot, and if you can find it: Blue Spot. Connemara is an interesting one as well, as it’s one of the few peated Irish whiskeys on the market at the moment.

                                                  1. 1

                                                    Ha, yes, spent more time in Scotland than Ireland hence the spelling default.

                                                    Connemara is one of our fave Irish whiskeys, tried the blue spot (Palace Bar had it, amazingly) but it was a bit too sweet for our palette. Wonderfully spiced though.

                                                    Cheers for the recommendations (and correction 🤪)

                                                1. 1

                                                  TIL about astralship, never would’ve thought there was a hackerspace in that area of the world, let alone in a 200 year old chapel!

                                                  1. 1

                                                    Building out new base images and specialist images based on those at work, expanding our stack to a second region properly and also jumping forwards in OS versions before support expires.

                                                    At home, continuing with the move to the new office. I think all the cabling is done, just need to fit a shelf for the network kit to sit on/under and then I can empty the old office and turn it into a spare room.

                                                    1. 6

                                                      Helping out at the sailing club again to build more of the new jetty, then heading over to a friend’s yacht in Wales for a visit.

                                                      Still tinkering with the homelab setup, think I’ve settled on NAS exposed via NFS with other systems mounting NAS storage over the network to use it. Now to configure it all and test it out.

                                                      1. 2

                                                        Work: building out a second region to functionally match our first region in infrastructure. (Hashistack, leaning on HCP. Mmmm.)

                                                        Not-Work: hacking some more on scroungr and trying to sort out the networking in the house (mostly getting up the motivation to punch down a bunch of RJ11 and RJ45 wall sockets.)

                                                        1. 4

                                                          Working party building more of the new jetty and the last youth training session of the year at the sailing club. If it stays moderately dry I should attack the garden for hopefully the last time this year.

                                                          Also working on a ruby wrapper for Tailscale’s localapi (caius/tsclient), to be used by an omniauth tailscale provider (caius/omniauth-tailscale), to make use of in a rails app I’m writing for a work retreat later this year (caius/scroungr). Scroungr has a proof of concept where accessing it over the tailnet Just Works™ in terms of knowing who you are, which is quite pleasing. (Omniauth support is then because I also want Google SSO supported in the app for non-technical folk in the business.) All very much a WIP, but starting a new rails app and not needing node available is pleasing.

                                                          1. 2

                                                            AVMs FRITZ!Boxes, which are quite popular in Germany, have been doing the same thing for decades. They use fritz.box as their domain, which was probably pretty safe to use when tlds where limited to countries.

                                                            1. 5

                                                              Fritz!Boxes use their DNS server, they do not man in the middle port 53. Or at least mine does.

                                                              1. 1

                                                                Yeah, I had various Fritz!Boxes over the years and if you use another DNS server on a machine, the fritz.box name just fails to resolve.

                                                                1. 1

                                                                  Netgear business wifi access points do the same, if you’re using their DNS then there’s an easy config host.

                                                              1. 5

                                                                Exactly the kind of thing that’s difficult to Google for prior art as well.

                                                                1. 2

                                                                  contemplating how to better manage my homelab server… in the interest of avoiding a long wait until getting it up and running, I installed Ubuntu Server and used docker compose to manage my services. I eventually moved my docker compose stacks into a Portainer instance, but now editing the compose definitions is getting to be such a chore in the browser that I want to go back to the CLI. I’d also like to avoid some of the duplication and boilerplate in my compose definitions.

                                                                  candidates I’m considering:

                                                                  • leave server as is, use Ansible or something similar to manage docker compose configurations
                                                                  • install NixOS and use Nix to manage services in docker (or systemd?)

                                                                  I already use Nix extensively for laptop development, but I’ve put off setting up a Linux Nix box because I’m not excited about rearchitecting my dotfiles to support both nix-darwin and NixOS. home-manager will make it easier, but I’m still a bit worried. anyone have any cross-compatible nix-darwin/NixOS configs they can share or point me to?

                                                                  1. 4

                                                                    I entirely side stepped the problem of shared configs by starting a new repo for the homelab NixOS configs and leaving my nix-darwin/home manager configs separate. There’s minimal copy/paste so far, but I’ll unite them at some point.

                                                                    1. 2

                                                                      This is probably the answer, tbh. This would give me the opportunity to identify things that are actually duplicated/shared ahead of time, rather than trying to guess and check what I can share while constantly double checking that the configs still work for both systems.