The weekly thread to discuss what you have done recently and are working on this week.
Be descriptive, and don’t hesitate to ask for help!
Implementing proper log-cleaning for my rust bw-tree! It turns out that cramming 50k fragmented fallocate calls to clean up what was originally a single 8mb write is quite a bottleneck :) Lots of flamegraphs and reading papers. There are so many juicy problems that still need to be solved, like implementing a versioned KV on top that will support MVCC, snapshots, transactions, etc… But I’m really happy how progress has been going. Up to 7 million reads/s on a MBP for trivial workloads :) But that 50k fallocate amplification really kills it when you mix in writes, squashing a mixed workload’s max throughput to like 200k ops/s. That will dramatically change with nice log cleaning (and also this will come with multi-OS support, as the fallocate hack only works on linux for now).
I’ve managed to convince a few friends to start helping out with different areas, and I’m super excited to be working with more people on this! If anyone is curious about lock and wait-free algorithms, performance tuning, general DB engineering, or even rust or FFI DB libraries for other languages in general, this thing has very promising theoretical properties, and I’d love to mentor/guide anyone who would like to learn more about this kind of engineering! I’m lucky enough to be taking a break from a real job, and I love talking about systems with people, so I have tons of energy for working with more folks at this point :)
I’m trying to learn J. It’s difficult, slow going, but it triggers the same obsessiveness I have with TLA+. Trying to figure out why. I think it’s because both of those were so far outside what I normally experience as “programming” I can’t/couldn’t wrap my mind around them: I can’t physically think in J. That alienness appeals to me. Not only am I learning a new tool, but I’m learning new ways to think about programs.
Workwise, we’re now knee-deep in the new school year and dealing with launch issues. Mostly on top of it, though. Going to WindyCityRails on Thursday, may give a lightning talk there.
I’ve been tinkering with J recently too. I’ve found the books, PDFs and journals on J/K/APL to be great reading material for approaching problems differently to how I’d normally approach them in more common toolsets.
Packing for, and then actually, moving house! Same town, bigger house, much bigger garden. Means I get a whole room for my office, rather than half a shared room too. And a garage (yay, workshop!). Cannot wait. Rapidly approaching the point of, “why didn’t I pay someone else to move my shit” though.
Also figured out my puppet apply-from-a-git-repo workflow, and extracted a template repository at https://github.com/caius/puppet-apply with my scripts to make it easier. Now that’s in place & I’ve stopped hacking on the “make it work”, I’m now retrofitting all the manifests to make them apply to my existing servers. And pulling in modules from the forge where I can instead of writing things myself. (nginx! mysql!) Feels Good™
Still working on switching back to developer mode from manager mode. This week, that means relearning how to deploy Rails apps, learning some new CSS things (hello, CSS Grid and Flow), and getting more comfortable with ES6 and React again. Happy for crustacean’s views on the first, by the way: I see so many Rails deployment patterns I’m having trouble discerning what’s standard for MVPs v. midsize v. heavyweight apps.
IMO, the main difference between size is how much automation you want around bringing up new servers. MVP for tiny app? It’s fine to have a single hand-built server, don’t waste your time automating it yet. If you’re going to have tons of users and a bunch of money involved, then you ought to be able to spin up new servers and have them integrated into your system and serving traffic with one click.
Then there’s all the mechanics of how to set up the details, including how to start the app automatically and keep it running as a service, which Rails server to use, getting it set up properly for your system, figuring out where to log to and how you’re going to access log info to debug in production, setting up connections to the DB and other internal servers, how the app gets API keys to external services, setting up Nginx reverse-proxy and related settings, how deploying new code onto the server and restarting it is going to work, etc.
Thanks! I think I phrased my initial thoughts a bit weakly though. I’m actually very comfy in Kubernetes/Docker/Ansible/etc., and feel comfy navigating that whole domain; my questions are mostly Rails-specific. E.g., when I last did Rails work “for real”, the only game in town was Mongrel, which doesn’t even appear to be a thing. So whether to use Puma, or raw WEBrick, or foreman driving something, etc., is not clear to me.
Ah, I see. At work, we mostly use passenger for our Rails hosting, and I’ve been using Puma for my personal projects. Unicorn seems fine too. I’d pick any of those and just go, the differences in performance aren’t all that big of a deal. Webrick isn’t generally recommended for production due to lack of support for threads and process and all of that stuff, but honestly, we accidentally left one of our SOA apps running on Webrick in production for a year or two, and nobody noticed.
foreman is AFAIK mostly for testing, so you can spin up all of your various background processes at once. It isn’t meant to run them in production. You should be using whatever kind of services your OS supports to run them in production.
More work on theft this weekend:
Investigating using clang’s coverage tracing to make theft’s search phase smarter. Most of its work so far has gone into its shrinking phase, once it has found failures. Fuzzers like afl-fuzz and libfuzzer do the opposite – work has gone into making them smarter at searching for bugs, but they make little effort to shrink what they find into a simple bug report. Ideally, theft would have a general hook to inform theft about coverage info as a property test runs, rather than anything too specific to clang’s implementation (and I’d rather not completely reimplement afl-fuzz either!).
Started investigating structural inference algorithms. I think sequitur may be a good fit: While building up property test input, generators request random bits in specific sizes, e.g. [11, 3, 8, 1, 3, 8, 1, 3, 8, 1, 3, 8, 1, ...]. Treating the repeated groups (here, [3, 8, 1]) as a single unit should make searching for simplifications smarter.
[11, 3, 8, 1, 3, 8, 1, 3, 8, 1, 3, 8, 1, ...]
[3, 8, 1]
Preparing to add multi-core searching and shrinking. So far, it’s mostly been removing things from the API that are optional, have better alternatives now, and would greatly increase the surface area of the concurrency support. In particular, custom shrinker support is gone (autoshrinking works sufficiently well that I’d rather focus on improving it further; writing good custom shrinkers requires understanding some really subtle issues), and custom hashing callbacks are also gone because they can be handled better internal to autoshrinking.
Adding failure tagging: previously, theft couldn’t distinguish between failures that had different root causes, so bugs caused by simpler input could shadow others. Usually, the rarer / more complex bugs are more interesting. If a failure ID is set before returning THEFT_TRIAL_FAIL, then the shrinker will focus on failures of that type and untagged failures (which could be entirely new). This is already implemented in the single-process case, but use with forking overlaps with the multi-core work.
Improved the adaptive weighting for tactics used when shrinking failing input. Now it’s better about reducing the frequency of ineffective operations.
Have you been applying theft to theft? Lots of CompSci folks in the space do that with KLEE people finding some bugs that way.
Applied to itself as a whole, not yet. There are several tests with a contrived property that stresses the shrinker in a certain way, and I check that it successfully finds the real local minimum quickly. I’m going to be using the non-concurrent/single process version of it to verify some new code that supports the concurrent mode, though.
Cool. Good stuff.
Now I have! As part of the concurrency support, I’m making a new scheduler/planner API that can coordinate meaningful next steps between several dozen independent worker processes. It’s not integrated into theft yet, but I’m running a property test on it as a standalone API (“if it runs N trials with M concurrent workers, with controlled interleaving, can it finish the tests without getting stuck or triggering asserts?”), and it’s found a couple bugs that my integration tests missed.
Good work! Keep an eye out Monday since Im submitting something relevant you might want to try. It also caught unexpected concurrency bugs.
I am hoping to spend some time learning about Michelson, the smart contract language for the Tezos platform. My primary resource is likely to be this blog: http://www.michelson-lang.com
Otherwise, business as usual iOS/macOS development.
I am trying to write an HTML formatter tool that is aware of Go templates. I would like to enforce codestyle of these templates, and so far, only vim-go and it’s gohtmltmpl filetype handles such formatting. But having to rely on Vim + plugin to enforce codestyle is not cool.
I think I need to use a HTML parser like Gumpo, and write the formatter myself (with all the problems that come with it).
Any recommendation (of existing tools, or of resources that would help me) would be appreciated!
Trying out front-end development with react and mobx with a Haskell back-end using scotty, digestive-functors and postgresql-simple on a pet project.
Working on a CzechELib ERMS tender for NTK. The goal is to get a central electronic scientific publications licensing platform going and generally get a better bargaining position when negotiating with publishers.
Workwise: short week for me because I took off Thurs and Friday to go visit some friends. But before I go I have a meeting about how we’re going to make our logging system a bit more rational. Hopefully will make some headway in a major refactor of one of our data ingestion pipelines. Also lots of fixes to crawlers.
Otherwise: I’ve been reading about sockets and communication protocols. I’ll continue to do so, especially with the new IPv10 RFC out now.
I’m working on getting a private beta of deps.co out to some early users. One of the big tasks is getting the servers setup with Terraform. I’d already done a ton of work setting up the supporting infrastructure, but this was my first time really using systemd in anger, and it took quite a lot of time to figure out some dependency cycles for running my main app. I’m also digging in to Varnish for the first time, although I’m sort of familiar with it from setting up Fastly for Clojars.
What’s your differentiation from something like Artifactory? Not trying to say it’s better, honestly curious.
Sure, great question. Hosted Artifactory is a great option if you have a large organisation, or you want to store packages of many different kinds. The downside is that it is fairly expensive to run and somewhat complicated to manage and configure. This is in part because each customer runs on their own VM.
Deps is designed for smaller JVM based teams that want simpler management and browsing. Crucially it runs as a multi-tenant service, so we don’t need to run a VM per customer. In exchange you get a simpler interface, higher availability, and cheaper pricing. It will be better for some people, but not for everyone, particularly if you need to handle multiple kinds of packages in one system.
My main project this week is working on my PMA (Positive Mental Attitude). https://www.youtube.com/watch?v=YYjqY0Yg1kk
It’s tough – one could make the argument that we are living in “Coptic Times” indeed.
Finally got off my ass and implemented a simple FP-ish PL using a CEK machine. Of note is the guaranteed TCO: I can compute fact(100000) which isn’t possible in most languages without TCO. Was especially proud of myself for extending the machine to add an explicit sequencing operator for clarity (it is a cons cell!). I’m going to keep adding constructs and semantics and seeing if I like how it’s turning out. I’m not 100% if I want to keep using CEK semantics (other than TCO).
This week I need to update the codegen module to actually generate unique symbol names, and prune environments to make them safe-for-space.
Rewriting a CMS originally in (object-oriented spaghetti) PHP into Node.js; because why not?
Learning Groovy a bit more
Rewriting a Node API with Golang. Rewriting an AngularJS SPA with Vue. Lots of slow going learning. Also, started a side project for an internal ticket system that will probably go nowhere because it’s a huge task for little payoff.
Getting used to life as a Mr Manager person at my new gig. Also, working on some lumpy warts in Kafka Connect, which entails Java, which … which … I struggle to say something nice, here, so, let’s just say, of all the languages I have written software in, Java is certainly one of them.
Otherwise, managing being back at work, given that our childcare responsibilities have doubled. My wife is still on leave but we’re trying to get into the habits we’ll need to survive when she’s back at work in December. Signing up for the gym. Trying to drink less on school nights. The old man band is sort of on hiatus, but I’d like to get back to writing songs myself.
Manager, we just say manager.
I hear you on life with kids. I’ve never felt such a compulsion to drink, and I was never much into alcohol to begin with.
I love my daughters (.5 and 2 ⅓ yrs), and I love spending time with them, but it is work, and I am always happy to have an hour or so of decompression time in the evening. Every day increases my wonder at women who do this alone. Holy wow.
Working on Helmspoint, a tool for deploying Keras machine learning models to the web.
Individual background tasks are easy to fire off, with resque, sidekiq, and its ilk. But when I’ve had background tasks that had to be chained together, it got messy. Think I finally have a solution now, where I can build tasks that are chained with AndThen and Map. It’s almost like rewriting promises and effects managers, but now they’re persistent.
Haven’t decided on a topic, but it might be “How do I get other people to give me data”?