My regular checklist for a new server:
I generally try to avoid installing alternate shells; customized dotfiles; or anything that doesn’t exist in the distro default packages. I spend way too much time logged into servers with strict rules about software approvals, or which are just owned by other people, so I don’t want to become dependent on anything outside the system default experience.
We have some users set up with read-only permissions, mostly for applications which are expected to do some data analysis or visualization but never make changes to the source data. We also restrict access on a table-by-table basis for applications with limited scope.
If application scope changes it means you have to change the permissions, but that little bit of administrative overhead has seemed worthwhile to keep clear boundaries and reduce the scope for accidental breakage.
On-site, we have an Apple Time Capsule which we back up to using Time Machine. We also do off-site backups to BackBlaze.
I work on HPC clusters that do scientific simulations, and Fortran is alive and well in our world. Not just for libraries, but for plenty of new applications. Though C++ is gaining ground…
Another pro is that a statically-linked binary only requires one metadata lookup, vs potentially dozens or hundreds of lookups for libraries. Many scientific apps run on HPC clusters with parallel filesystems like Lustre, which have few (or one) metadata servers to many object servers. When several thousand nodes try to launch the same application at the same time, in order to wire up a parallel job, the result can look a lot like a DoS attack on the metadata server…
(This is very much a special case relative to the larger scientific computing world, but one I’m very familiar with. Ah, Lustre, how I hate you…)
I used this extensively for notes at $JOB[-2]. Worked pretty well, tho I ran into problems after a couple years when it got too big and started slowing down. Glad to see it’s still active.
I read a lot of science fiction, fantasy, and history; work in our back yard; play board games and role-playing games; and I’m on a curling team at our local club.
Right now I’m reading:
In Search of Certainty, by Mark Burgess. Second reading for this book, after letting the ideas sit with me for about a year after my first reading. Very interesting.
The Art of Monitoring, by James Turnbull. This one I’m mostly reading with the laptop next to me, so I can try out ideas, so it’s slow going. But very interesting so far, and its release this month was very timely for me as I’m working on a bunch of monitoring problems at $DAYJOB.
The View from the Cheap Seats, a collection of mostly-short non-fiction essays and speeches by Neil Gaiman. I love all of Gaiman’s writing, and actually enjoy his non-fiction writing and speaking more than his fiction. And reading one or two essays at the end of the day is very refreshing, relative to the technical books. :)
Normally I’d also have a fiction book going at the same time, but I’m between novels and nothing’s inspiring me right now.
I would say not. Reddit has a large audience and is actively monetized; IIRC, hiding comment scores at HN was at least sold as a reaction to perceived groupthink or dogpiling. I don’t think any of those conditions currently apply on Lobsters. And the scores do provide some value, and fit with the general philosophy of transparency.
In other words: nothing seems to be broke, so why fix it?
I’m at LISA this week, so most of my work will probably consist of going to talks, taking notes, and trying to keep an eye out for people we can hire. :D
My spare-time project for the past couple days has been setting up an ELK-stack monitoring server and playing around with it. So far I’m less happy with it than I am with Splunk, but free and open source is a strong advantage. We’ll see.
Really interesting perspective on the differences between the ‘traditional HPC’ and machine learning communities' approaches to problems of large-scale computation.
He notes the somewhat paranoid mind-set of HPC folks about errors, being worried about everything from numerical stability to silent data corruption and whole-node failure. As he says, some of this is a property of the computations they’re doing - lots of repeated calculations on the same input data are more susceptible to stability problems. Some of it is also historically bred into the HPC programmer, as access to those machines is precious, and HPC folklore has it that the first Cray at Los Alamos suffered from Single-Event-Upset problems (apparently one every 6 hours )
Edit: Because I’m a sucker for supercomputing history, Here’s more info about the Cray-1’s memory reliability, from the LANL report on the initial evaluation of the Cray-1 (which also has a nice description of the Cray’s architecture at the end, with classic diagrams.) They experienced a MTTF during testing of ~2.5 - 7 hours, and 89% of the failures were memory parity errors.
This post also makes it sound like HPC codes use system-level full-memory checkpoint & restart facilities. Maybe that’s more common in industry HPC, but in the DOE and academic scientific computing world I was familiar with, most applications handled checkpointing themselves, persisting much less than the total contents of node memory. The general point does stand though, it’s still not really scalable to write out enough data often enough to guarantee good progress.
BTW, why did he link to a video of Modern English’s “I Melt with you” in his paragraph about checkpoint restart?
As an example of blind full-memory checkpoint being overkill - stencil computations, where you have a regular 2-or 3d grid and break it up into subgrids to distribute over processors, often have an optimization where each node will store some duplicate data from its neighbors in “ghost cells”. These ghost cells can be used as input for several local iterations before requiring slow network IO to update them.
However, because they’re redundant, saving them to persistent storage as part of a blind checkpoint wastes time and space.
This post also makes it sound like HPC codes use system-level full-memory checkpoint & restart facilities. Maybe that’s more common in industry HPC, but in the DOE and academic scientific computing world I was familiar with, most applications handled checkpointing themselves, persisting much less than the total contents of node memory.
Application-level checkpointing is a lot more common these days, but I’ve definitely seen system-level checkpointing used in both academia and industry. I’ve mostly seen it on smaller systems with a lot of parallel storage — on bigger systems, the admins won’t let you do anything nearly so wasteful. :)
I also thought the OP made interesting points about restricting the computation model to improve fault-tolerance, a la MapReduce. I’d be really interested to see more “framework-style” distributed computing models come into being, because I also think they’d help improve ease-of-use and adoption for parallel programming in general. The trick would be finding ways to port existing applications to those models (or at least find ways to solve the same problems).
From the author, via twitter:
@mikemccracken The “melt with you” video was a (bad?) joke emphasizing global checkpoint inefficiency, as in “i stop the world …”
I guess I should’ve listened to the song, I didn’t remember that part. :)
I wonder how this is different from the existing NAS CG conjugate gradient benchmark. From what I can tell, it might be mostly packaging and license?
Also, while the slides are pretty readable, I was a little curious what exact points they were trying to make on the slides with results. The first one is clear though- always entertaining to see the difference between peak flops and app performance. Less than one percent of peak!
I wonder how this is different from the existing NAS CG conjugate gradient benchmark. From what I can tell, it might be mostly packaging and license?
After reading your comment I got curious about this myself and did a little hunting. From the original paper on HPCG[0]:
The NAS Parallel Benchmarks (NPB) [3] include a CG benchmark. It shares many attributes with what is propose here. Despite the wide use of this benchmark, it has the critical flaw that the matrix is chosen to have a random sparsity pattern with a uniform distribution of entries per row. This choice has led to the unfortunate result that a two-dimensional distribution of the matrix is optimal. Therefore, computation and communication patterns are non-physical. Furthermore, no preconditioning is present, so the important features of local sparse triangular solve is not represented and is not easily introduced, again because of the choice of a non-physical sparsity pattern. Although NPB CG has been extensively used for HPC analysis, it is not appropriate as a broad metric for our effort.
Also, while the slides are pretty readable, I was a little curious what exact points they were trying to make on the slides with results.
This is always the problem with a “bare” slide-deck. :P I think they’re mostly trying to show results on clusters that have well-known perf results in the HPC community, but it’s definitely hard to tell.
Though I thought slide 34, which showed the substantial the improvement from tuning on the K computer, was also interesting. I’d be curious how much tuning is necessary to get good perf on HPCG vs HPL… that might have some bearing on its future adoption.
Interesting, thanks for digging. That proposal you linked is shorter and more readable than I expected :)
So it sounds like NAS CG is more synthetic and un-representative than I would’ve thought. That’s interesting.
That reminds me that a totally impossible but really interesting study I always used to dream of doing is to go through the back literature and look for any results that depend on suspect benchmarks, and attempt to reproduce the results with real workloads. It’d be like that recent drug study that went back and reviewed old results: Economist: “Trouble at the lab”. Maybe I’m a pessimist, but I’d expect trouble at the computer lab too…
I was always fond of the “Flash vs. Simulated Flash” paper for the same kind of reason. Wish there was a forum for more things like that.
By far the biggest problem with Keybase.io is the suggestion to upload your private keys to their service. This completely breaks security. I like the ability to easily verify via other channels like Twitter and Github; GPG supports varying trust levels so you could trust a key .
I suppose the target audience for this service is technical people who are (excusably) put off by the complexity of gpg --help. Ultimately, I hope Keybase helps the existing GPG system rather than attempting to create a silo.
the suggestion to upload your private keys to their service.
Not having received my invite yet, I hadn’t realized that was a requirement. That’s not very comforting.
It’s not a requirement: you can do everything using their command line client, which is open source and doesn’t require uploading private keys. Private key hosting is a convenience thing for using their webapp to sign things.
I wish they wouldn’t even make private key signing possible, but it isn’t a hard dependency.
I opened an issue about this: https://github.com/keybase/keybase-issues/issues/160
That response is quite concerning – that sort of security/convenience argument is not a strong one, and further it encourages an insecure default to new users of cryptography; which is precisely the sort of user keybase is trying to attract.
In particular, no matter how well encrypted your key, it is possible that an attacker breaks in, downloads all those keys, and attacks them in-bulk using standard attacks against a large keybase. It’s a massive security hazard and no amount of convenience can justify it.
That suggestion is part of what makes me suspicious of the entire system. I can’t see how any well meaning, moderately competent security person would suggest such a thing.
Building a new centralized directory, because “PKI is hard” makes me think that “building a silo” is exactly what they’re trying to do.
Hi there, I’m Adam DeConinck. I do system administration (or “devops” if feeling trendy) on high-performance computing clusters. I mostly work on the sort of clusters which are used for scientific computing at universities and national labs – think MPI and Slurm instead of MapReduce and Hadoop. Currently I’m working at NVIDIA, so I think a lot more about GPUs in the datacenter than I used to. :)
I think a physical meetup-like event would be fun, and I’d probably prefer a talk-based or hackathon-based format to drinkups or a train event.
Instead of “one big talk”, it could be fun to do an event (or events) based around a bunch of lightning talks. Maybe based around a theme like distributed systems, or devops, or “talk about your favorite language”.
I think something like this would be really cool. I don’t think a theme is really necessary. Just have people send proposals for talks a week or so ahead of time.
This is an interesting idea because you eliminate the possibility of indirect password compromises through password re-use. Unfortunately you replace it with a new point of failure in the form of your “secure channel”, but that point of failure is on the user end rather than the server, so it’s harder to compromise multiple users at once with a server-side data leak.
I still think 2FA is more secure in general, but I can see the argument that this is a more secure single-factor than a password.
I don’t like the active request-token workflow like a password reset, though. Something time based like [TOTP] (http://en.wikipedia.org/wiki/Time-based_One-time_Password_Algorithm) seems like a better idea.
Two-factor auth is definitely more secure. Or at least it can’t possibly be less secure unless you really mess it up by e.g. sharing the password from the first factor using the second factor. But in general two factors will always be more secure than one.
The idea is that if you are only implementing a single factor, it should be the “something you have” factor (i.e. a code sent to you over a secure channel or a TOTP as you suggest) and not the “something you know” factor.
I’ve been finding Go really useful for sysadmin tools in my own deployments. While I haven’t been migrating any existing tools (mostly Python and Perl), I have been using it for almost all new tool development and haven’t run into many major pain points yet. I miss some of the libraries in PyPI or CPAN, but the advantages in ease of deployment and maintainable development have made up for it,
1) The stories are Lobsters are great. Very technical and very interesting. All of the members on the site I’ve interacted with are very professional. The discussion is getting better, but many stories still lack a good conversation. I think this will improve over time, so I’m not worried about it.
2) There is one hole in the current content, and that’s industry news. For example, Nest got bought for 3.2 billion dollars and there wasn’t a single story. I think the reason people shy away from submitting this news is that it’s really not important enough for an entire article. Would anyone be interested in a weekly thread (probably on Saturday) that covered the week’s important industry news?
3) Every here has been great. A much kinder environment than most of the other sites I visit.
To be honest, I’m not interested much in industry news; I suppose that was one of the original draws of Lobsters away from HN. Of course, if that all went under its own tag, I could filter it out.
I feel that many people aren’t, which is why it would be 1) a weekly post and 2) tagged industry-news so you could also filter it out.
As laid-out, it seems sound, but… Looking at this from the “reinventing the wheel” standpoint, do we really need another news site? What would lobste.rs do better? (Aside from probably better behaviour.)
To be honest, I’d be worried about behavior getting worse if this became another general tech news site.
For example, Nest got bought for 3.2 billion dollars and there wasn’t a single story.
And I, for one, am grateful. Everywhere I turned, I saw the headline. It was nice that Lobste.rs was safe from that type of fluff headlines.
On the whole, Lobste.rs sticks to real information and less on light news headlines that lack any significant meaning. Personally, I get my fluff from a daily Slashdot email digest. (And lately I’ve seen lobste.rs content showing up there 2 days later)
I agree with this, One thing that turns me off of HN is the amount of ‘Hype’ news that makes it to the front page constantly.
COBOL is a great example of what happens when a programming language becomes widely used in a particular domain, but then falls out of favor with the field at large. Fortran is another good example of this: it’s the bedrock of HPC and scientific computing, but the broader computing field has abandoned it. I learned Fortran in a class on fluid mechanics, where it was still extremely relevant, but mentioning my Fortran experience is always a good way to shock my colleagues who got CS degrees. (I’m not exactly old, either — I’m 33! But my degree is in physics.)
Based on the attitudes of younger colleagues, I wonder if Java is also on this road. Perl was never as popular, but I occasionally see job postings for “please come maintain this huge Perl stack!” And there’s no reason to think it won’t keep happening. Is it realistic to think the same set of languages is going to be in wide use in 30 years?
One possible result is that we will simply always have these languages with us, and they’ll take on the status of domain-specific languages. I wonder if there’s money to be made in running a series of “industry-specific” coding bootcamps. Want a job in banking? Do your CS degree, then come and take my 10-week class in COBOL and Java…
I think there’re two things here that this article really kinda misses. One is that, yeah you need cobol programmers to maintain those millions of lines of cobol. What’s not said is; first you want the guys who implemented it because they had all the domain knowledge. If they’re not available, you want cobol programmers from the same industry, next you’ll take “good” cobol developers who can understand code fast and get up to speed on yours. The last thing you want is fresh grads. This is assuming you’re ok living with cobol.
The other thing is; the skills thing is kind of a known known. You can contract out maintenance or new development to Tata or whoever so that’s kind of a non issue. Other issues are the IBM tax for running the systems. Another is market perception of these old systems. In some industries, cobol can be a big minus to customers shopping for claim systesm, retirement record keeping systesm, etc… It can be a huge head wind to sales for example.
I think these companies would be best served by reversing course and promoting cobol as a viable technology for the future; rock solid, (essentially) bug free, battle tested systems. The bedrock of entire industries as the article notes. Developers may frown upon it nowadays, and although we may not like it, the business doesn’t need rock stars for this stuff anyway.
Another thing is rehosting mainframe apps, this is where microfocus and other providers come in. Running the old mainframe stuff on commodity hardware can save millions. There are a lot of so called rehosting solutions around nowadays.