The pattern that the author describes is fairly popular and works for simple small projects. It is pushed frequently by some projects including Django because it “looks good” to have the config be in the same language and write a simple variable assignment in it such as ‘DATABASE_USERNAME = “username”’. However I disagree with calling it “doing configuration right”. Configuration should be “just data”. This turns configuration into executable code which is a bad idea. Before you know it non-primitive types like class instances, and side effects will creep into your “config” and it’s a world of pain from there.
Stick to “dumb” data formats such as JSON/Yaml/TOML and derive/calculate the other things that you need from that original raw information. This keeps your config as data, serializable, without side effects, and makes it easy for other tooling to read/write/generate/compare/switch it.
If you need something beyond that look into Dhall, CUE, or Jsonnet (but think twice before doing so, you probably don’t need them!)
This turns configuration into executable code which is a bad idea.
Where do you draw the line? At what point does your data become executable code with a YML file? Is a python module not data? It’s not a pure data structure like a dict, but that doesn’t mean it’s not data. At some point that conversion from plaintext to (your language of choice) needs to happen.
I’d go so far as to argue this is data. Yes you can abuse it, but at the end of the day you can treat it as simple key value data wrapped up in a module.
At what point does your data become executable code with a YML file?
Effectively at no point because it’s always “just” a yaml file. If my or some other application wants to read that and do stuff based on it that’s when code execution comes in.
I have to execute some code to read the Yaml file. But I don’t have to execute it. When it is a Python source file I have to execute it. One is data, the other one is code that contains data.
At some point that conversion from plaintext to (your language of choice) needs to happen.
Yes at some point that transformation needs to happen. If what your application needs internally is very close to the raw format you can get away with those two things being very similar. For example your raw JSON/Yaml becomes a python dictionary in a python application or an object in javascript.
If your application needs a richer format it can construct that form the raw data at the time of conversion. For example an array of file path strings from the raw configuration data may be transformed into a an array of file objects or class instances.
The temptation is for people to skip the serializable data format and turn a source file/module into their “config”. At that point in some ways you don’t really have a config. You are just asking the user/consumer of the application to provide a source file/module of their own that you combine with the rest of your application.
It makes it difficult to make the application robust and handle errors because the line between config data and application code gets blurred. For example your entire application may crash upon attempting to load the config.
There’s also another nasty pattern where the config module tries to dynamically decide its own values based on other things such as:
if something: CONFIG_VALUE = “this” else: CONFIG_VALUE = “that”
If you make things like that, at that point basically you don’t have a config. What you have there is an application that contains and self-generates its own configuration by executing and evaluating its own code.
When your config is a source module in some language effectively it says:
In contrast when your config is in a conventional data format it says:
“source file as config” is hostile in many contexts. For example your ops/security team may want to inspect/store/compare/validate/verify/generate all or some parts of the config for the applications. And they may not even use the same language as the app itself. If application config data is in a “source file in langauge X” it’s going to make things quite difficult.
The default environment being DEV put the project at risk, in the sense that when someone will try, for some reason, to spin the app in production environment it might break things in difficult ways, whereas making production default, will just fail fast.
Also, this article does not even touch the subject of distributed secret sharing. This is an approach that works with a single app, single service monolith, but does not scale to multiple services and distributed settings.
If you’ve got a polyglot situation, then it seems like json read as loads
in Python and whatever equivalent in the other language is better. An example is when you want to maintain some configuration invariant between your React app and your Flask backend.
The difference is so minor, though (if config['environment'] == 'development'
is no less readable than if config.is_development
) and while the theory is that with the latter you can then move your config[environment]
to config[environments][canary]
and only have to modify one thing, I don’t think that happens that often in practice. Probably not worth optimizing for at the cost of the indirection.
The advantage of the JSON approach is that you’re guaranteed there’s no logic and you get the polyglot support. The disadvantage is that you’re guaranteed no logic.
Nothing is preventing you from doing that here. They can coexist. The (truly) shared config can exist as a JSON file and get pulled into Python-land with json.loads()
.
I would argue that config['environment'] == 'development'
is worse, although your point on readability is valid. I say this because development
has now become a magic string that you need to be aware of.
I built one last week (Ryzen 7 3600X/32GB RAM/1 TB M.2 SSD), it was an upgrade after 7.5 years (minus graphics card). I also built the one before that and the one before that. I think the last one I bought off the shelf was in 2005ish.
Overall I’m not really into hardware anymore, but I’ve been buying stuff recommended by a friend who’s really into it for years.
The last laptop I bought was in 2004, since then I’ve exclusively used hand-me-downs, usually from work. Nothing too shabby, I have an x230 (that was broken and I had to replace the screen) and a T460p, but laptops have never been my personal main machine, ever.
My NAS is an HP Microserver N54L where I added RAM and disks, so kinda half off-the-shelf :P
For work I’ve used ThinkPads since roughly early 2010, the last (and first?) work desktop machine I had was from ~2001 to 2005 and sometimes when working at customers’ offices.
Playing catch up. I’m a consultant and for whatever reason I absolutely cannot make progress during the week or during business hours when I have people interrupting flow state.
This was supposed to be a 100% offline weekend for my wife and I but I’m afraid I’m gonna need to spend most of it holed up in my office jamming on client work.
Not dreading it, quite the contrary. But I definitely need to work on getting this under control so I can make the best use of my time during normal business hours so I’m free to spend what society considers off time with my wife.
Have Deep Work (the book) queued up but haven’t read it yet.
Hm. Seems overkill. I just tag my docker images with the git hash. Done. Don’t deploy latest
, deploy the tag.
I have a trigger for the master
branch that tags images as :master
, and another trigger on all tags that tags to :latest
, so my :latest
images are the latest tag, so that I can sort of guarantee that :latest
is stable and :master
is master’s HEAD.
This is on the Docker Hub, Quay.io does this by default if you leave the default build trigger on.
(I also have a third trigger that tags images with the name of the tag ifself, too)
OK, this is going to stand out from the rest of the crowd.
Extras @home
Visual Studio 2019
There was a lot of pain due to folks upgrading early to VS 2017, so I’m curious how this is working out for you.
Cmder (just enough to survive on Windows)
LOL, we’re all just trying to get a decent terminal on WIndows. I initially tried Cygwin, but now I’m over on WSL.
There was a lot of pain due to folks upgrading early to VS 2017, so I’m curious how this is working out for you.
We haven’t transitioned the compiler yet, so we still generate VS 2015 projects. Other than that the IDE is snappier, opens much faster and is generally nice.
Oh and BTW for Visual Studio user, I strongly recommend the Fast Find extension which has an excellent fuzzy finder that can deal with huge solutions. For around 15$, it is really worth the price.
It’s been a some years since I’ve done it, but I basically followed the OpenBSD/octeon guide which supported the EdgeRouter Lite at that moment.
Since the main drive is a USB flash, I remember setting noatime,softdep on the mount point in my fstab tab to minimize the amount of writes. I has being going strong since then, with the base install providing everything I would want for a router (even games ;)
An approach for a recent Python project uses AWS CodeBuild/CodeDeploy/CodePipeline via Github webhooks.
The server itself is configured via terraform. That is, the basic instance type, and cloudinit stuff so when a new instance boots for the first time it configures itself to be capable of running the codebase. Basic system packages, etc… it runs AWS agents that auth via the instance IAM role so there is no need to store keys anywhere.
Then the CodeDeploy process does the rest. The build is created and the test suite is run. Once that passes, it gets deployed to the server. This essentially involves destroying the code that is there and doing a clean checkout. Then it restarts the processes for this particular app which all live under a single/parent systemd service. All that stuff is bootstrapped by the code deploy (it’s idempotent so the same calls get run every deploy and it something is missing it gets added)
It’s not a blue/green deploy so there is a blip of downtime. In this instance, that’s okay/acceptable.
No containers. Just a virtual environment to isolate python dependencies and systemd.
It’s fun being able to completely nuke any part of my infrastructure and run “terraform apply” to bring it right back to normal and a git push
to master to trigger a code deploy.
I’ve seen and managed it all. Bare metal. Heroku. AWS. GCP. Kubernetes the hard way (in production at FarmLogs), VPS’ etc…. there is no one right or wrong way to do it but the two biggest things I’d focus on are 1. Idempotency and 2. Infrastructure as code. The underlying stuff running your app doesn’t matter. A container is just isolation around regular Linux processes. It’s all the other stuff that is more important to sort out.
I just brought a 2nd R720 (2U rack server) online in my home rack to use as either a k8s lab box or a single-node k8s server.
I was an early adopter of k8s back in around 2015-2016, using it to migrate FarmLogs (a YC startup) from a series of Heroku apps to a self-managed cluster of a few big nodes that supported our big collection of microservices.
Since then I have not really used it in a production sense, so I want to freshen up and also begin using it to deploy my own private/self-hosted apps.
I essentially want AWS at home: need a random psql db? Need a redis instance too? I want to provision it with code and have it sandboxed for each of my clients, versus having a bunch of services running locally and needing to worry about state and what lives where.
I am exploring some of the pseudo-k8s tools like k3s and similar, but will probably go with k8s the hard way since I have done it from scratch a few times now.
If anyone has tips for a more ‘infrastructure-as-code’ approach to doing this in a homelab I am all ears!