1. 22

If so, what do you use it for?

Since I doubt anybody’s running a TOP500 machine let’s broadly categorize a supercomputer as “a machine that’s a step above standard high-performance ‘workstations’.” Something that’d be overbuilt for any consumer-oriented tasks and almost all developer tasks.

(I’m not planning on buying/making one, I just find the idea of “supercomputer” really fascinating and weirdly old-fashioned. And it seems like there’d be a subculture of enthusiasts out there)

  1. 12

    I used the University of Leeds high performance computing cluster to perform tensor decompositions for this open-access paper: Evolution of communities of software: using tensor decompositions to compare software ecosystems

    I didn’t record how much CPU time we used, but it was quite a lot. Maybe a few months? We also needed machines with loads of RAM to hold the tensors and perform the decomposition (though that could have been reduced somewhat somewhat with a different implementation, IIRC).

    The experiments in the paper are reproducible (including our figures) and the code is available here.

    Edit I still have pretty mixed feelings about this paper, so I’d love to hear from you if you think it’s useful or not.

    1. 9

      In one sense, most of us have a supercomputer these days, or could do if we wanted one: The single precision FP throughput of a modern graphics card is roughly equivalent to the 500th supercomputer in the TOP500 from a decade ago, or the fastest supercomputer in the world 15 years ago. I still find that kind of astonishing: exponential growth is the gift that keeps on giving.

      At my last contract, we had a relatively big machine - 1Tb memory, 128 CPUs IIRC - for doing large proof checks. Some CSP models in FDR, but also SAT checks being run by other researchers - I think they used minisat as the target SAT solver mostly. Does that count?

      1. 4

        As a researcher I can confirm that having access to a big machine for SAT solving was a godsend. We had access to 128 Xeon E7 threads with ~2TB of DRAM, and our experiments took days to weeks of real time.

      2. 9

        I actually have root on a few HPC clusters right now, though I’m not sure if any of mine are on Top500 this year. 😉

        I also work closely with developers and SREs at a bunch of Top500 sites. The workloads vary a lot, but the ones I’m most familiar with tend to be big simulation applications: weather forecasting, molecular modeling, that kind of thing.

        And of course, right now a huge number of these systems are being used for COVID-19 work!

        1. 3

          I work for one of the few remaining “supercomputer” companies, we just got bought out recently. I might know of the systems you’re referring to and have stuff I wrote running on them to make stuff faster.

        2. 6

          My current “workstation” is:

          • Intel(R) Xeon(R) Gold 6234 CPU @ 3.30GHz (32 core)
          • 32TB nvme (PCIe attached)
          • 1TB ram

          I mostly just do programming on it. I don’t browse the web or anything like that on it because I have spectre/meltdown/etc mitigations disabled. It doesn’t run Shadow Warrior particularly well for some reason.

          -bash-4.2$ free
                        total        used        free      shared  buff/cache   available
          Mem:     1056196276   136832932   509102748       19896   410260596   917534032
          Swap:             0           0           0

          I also have a laptop that runs Outlook and Zoom, and a remote desktop client.

          1. 4

            I feel like I have a lot of questions. Is this machine fairly under utilised in this role? Was it not perilously expensive?!

            1. 2

              Is this machine fairly under utilised in this role?

              It’s a workstation, so cores are usually idle, but sometimes I want to do something with it that a smaller/slower machine would not be able to do, so I suppose it matters what you mean by that.

              It could be faster.

              Was it not perilously expensive?!

              Perhaps it is more than we need, but the project is for something with rather specific performance requirements, so I can support the decision to overbuild on that. This isn’t the biggest machine I’ve tried to run Shadow Warrior on, so I wouldn’t use the word “perilously” though.

              1. 4

                Expensive is one of those funny terms. If you’re an electrician, a 40k truck is a reasonable business expense, even if you only drive it 2 hours a day.

                Spend half that on a desktop PC and people lose their minds.

          2. 5

            I have a beowulf cluster of Raspberry Pi zeros, but I’m not sure that counts. It is more powerful than my older Beowulf clusters (built on PPC Macs and one built on x86 desktops before then). I also used to have a SparcCenter 2000E, but I’m not sure that really counts.

            The Pi cluster gets powered up mostly to work on projects to approximate Pi to different lengths, although I have used it for batch processing jobs. The low RAM makes it a bit useless for things like a kubernetes cluster so it’s all MPI.

            It runs things like Slurm for job management, but most of the time I just SSH in and run my loads as it’s only me using it.

            My main windows desktop’s Nvidia GTX 1070 is way more powerful. I’ve done some work porting my MPI Pi calculator over to OpenCL but I’m not very happy with it.

            1. 2

              If you’re interested in running Kubernetes on Pis, you should check out k3s which is a slimmed-down version of Kubernetes for running on IoT devices.

              1. 1

                Funnily enough I spent some time playing with OpenFaas, which in the end was just too much like magic for me to ever go near anything k3s-like again. K3s and K8s seem great for some, just the wrong solution for my use cases.

            2. 3

              Own, no, but I’ve had access to a few at various universities. Most recently, a handful of machines with 64 cores and 1 TB of RAM, give or take a bit, for running machine learning stuff and data mining. Before that I did some work with a proper batch job cluster, each node was about workstation-grade but there were 64 of them. Was very nice to be able to submit a script full of Numpy and have it run on a few thousand data sets and collect the output for you.

              1. 3

                I don’t think it falls in the category of supercomputer, but it’s pretty good, and it’s not mine, but I have access to the server of the Federal University of Minas Gerais (UFMG):

                RAM: 1 terabyte
                LSCPU output:
                CPU(s):                64
                On-line CPU(s) list:   0-63
                Thread(s) per core     2
                Cores(s) per socket:   8
                Socket(s):             4
                NUMA nodes:            8
                CPU family:            21
                Model:                 2
                Model name:            AMD Opteron(tm) Processor 6376
                CPU MHz:               1400.000
                CPU max MHz:           2300,0000
                CPU min MHz:           1400,0000

                I use it for performing genome assembly and genome annotation, mostly.

                Genome assembly: genome, when sequenced is fragmented, then we need to join the pieces, close the gaps checking for overlaps in the many fragments.

                Genome annotation: annotate the assembled genome, finding where genes begin, where they end, finding intron and exon regions, finding binding regions.

                1. 1

                  What do the cool kids use for assembly and alignment these days? I worked in biotech ~10 years ago and I can only imagine things have changed considerably.

                  1. 3

                    For assembly: SPAdes, Flye, Canu, Velvet. With a 16 G of RAM, you might be able to perform assembly with some smaller datasets.

                    For alignment, the same old tools are still in use: BLAST, Clustal, MUSCLE, DNAstar.

                2. 3

                  I studied it a long time ago. The interconnects were the thing I was most jealous of for the clusters. The main differentiator for me, though, was the NUMA architectures with cache-coherence. The scale-up machines. They made writing code for a pile of cores and RAM more like multithreaded programming on your desktop. You still had to worry about locality because off-board accesses cost far more than in-board accesses. Then, they started making languages such as Chapel and X10 to make that easier.

                  Fun fact from the old days: Final Fantasy was made on a supercomputer of about 30+ Onyx2’s with up to 256 CPU’s and multiple graphics cards in each. Looking at today’s games and applications, I can only imagine what a similar setup of today’s hardware would do. You could probably hit 10 Slack instances without your computer locking up.

                  1. 3

                    Does a TIS-100 count?

                    1. 3

                      I have one for machine learning. I don’t know if it was worth the cost. It’s pleasant to use though. What do you want to know specifically?

                      1. 3
                        1. 3

                          Drew DeVault gave a stab at getting a good POWER9 machine


                          I was tempted until I heard about his experience. YMMV Mr. DeVault is a very earnest and knowledgeable guy.

                          Other companies may give you a better experience, and RaptorCS may have changed their tune.

                          1. 6

                            He did write a more positive post about them soon after though, https://drewdevault.com/2019/10/10/RaptorCS-redemption.html

                            1. 3

                              I have a Raptor machine myself (Talos II, though it’s far from being a supercomputer, only 8 cores and 96GB RAM). It works great and there are no issues. The experience that Drew described was certainly a one-time problem and he was just unlucky.

                            2. 3

                              Since clusters with batch systems are not really nice for experimenting with and iterating on neural network models, we have a bunch of machines with various generations of GPUs (e.g. RTX 5000s, Tesla M60, Radeon VIIs). The machines have up to 64 cores and up to 768GB of memory. So, definitely better equipped than most consumer or developer machines.

                              1. 3

                                The last place I worked, and where a bunch of my good friends still work, hosts a password cracking service. It’s used by law enforcement agencies, estates of wealthy people trying to crack open documents, etc.

                                They have some fairly beefy machines, with a bunch of GPUs in there to do password hashing.

                                They hosted a password cracking competition a few times at DefCon. The competing teams would bring some impressive hardware to bear on the problem.

                                1. 2

                                  Related question: what sort of things would you use a mainframe (e.g. IBM z) for, and why would that be an improvement over a rackful of regular 1 u x64 servers?

                                  1. 2

                                    Not a super computer by any means but the company I work for has a 72 core Xeon server with 256GB of ram and four Titan RTX’s. We use it as a mixed use host with developers given shell access, it also runs CI and other miscellaneous jobs like training or notebooks or local services for robots.

                                    Definitely fun and useful to be able to remote into a computer that can put any laptop to shame and do build tasks in drastically shorter time frames.

                                    1. 2

                                      Sort of? I’m writing control software for a quantum computer. While the quantum computer itself isn’t particularly powerful in the traditional computing sense, it does allow for new classes of algorithms to be performed in a reasonable time-frame. Others on my project get time on one of the real supercomputers at Argonne National Labs to simulate quantum experiments.

                                      1. 2

                                        Not sure what qualifies as a “Supercomputer” but we are doing live video transcoding on the CPU and have encoders with dual Xeon E5-2698 v4’s (and some newer Xeon Gold 6242’s), that’s 40 cores and 80 threads in total per machine. We run 100-200 concurrent renditions on one machine and we have ~20 of them. Maybe not “super” but also not something that regular consumers would have at home.

                                        1. 2

                                          Does access to an Nvidia DGX Station count ?

                                          1. 2

                                            While I don’t own them, I have access to servers with 40+ cores. They’re fantastic for running fuzzers. Since it’s ultimately a search problem, fuzzing benefits a lot from parallelism.

                                            1. 1

                                              I recently built myself a VM server that is a “supercomputer” in the sense of being a step above a standard workstation, although not in the sense of being a machine designed specifically to do raw number-crunching as fast as possible. It’s a rack-mount server with a server-grade motherboard, 128 GB of RAM, and a 32-core CPU. I’m currently running 5 VMs on it, with a few more to come when I get around to configuring it. So, not something that would be outlandish in a data center, but certainly overkill for consumer or developer tasks.

                                              I was originally going for a machine with somewhat higher specs, and made some compromises in the name of price. A 32-core Xeon CPU was expensive enough, without springing for a 64 core one. The motherboard can handle up to 1 TB of memory - and as cool as it would be to run a computer with 1 TB of memory, 8x128GB DIMMs costs more than I’m willing to pay.

                                              I’m basically using it for everything you might use a VPS or other cloud compute service for, except that I’m managing it myself instead of buying from a cloud provider. I run a few different types of web server across the various VMs, it does some of the routing for my home network, file storage, backups, GPU compute, whatever other server or network experimentation I might want to do in the future.