I really don’t like the approach of using python itself as the configuration language. Stuff configuration into effectively global variables muddies dependencies and interfaces, and additionally it’s awkward as soon as a non-python component needs to read this configuration. I’ve also seen cases where the “config” module slowly starts acquiring things that perform side-effects, which is its own whole can of worms.
If you’ve got a polyglot situation, then it seems like json read as loads in Python and whatever equivalent in the other language is better. An example is when you want to maintain some configuration invariant between your React app and your Flask backend.
The difference is so minor, though (if config['environment'] == 'development' is no less readable than if config.is_development) and while the theory is that with the latter you can then move your config[environment] to config[environments][canary] and only have to modify one thing, I don’t think that happens that often in practice. Probably not worth optimizing for at the cost of the indirection.
The advantage of the JSON approach is that you’re guaranteed there’s no logic and you get the polyglot support. The disadvantage is that you’re guaranteed no logic.
Nothing is preventing you from doing that here. They can coexist. The (truly) shared config can exist as a JSON file and get pulled into Python-land with json.loads().
I would argue that config['environment'] == 'development' is worse, although your point on readability is valid. I say this because development has now become a magic string that you need to be aware of.
This pattern of a settings module is used pretty often in Django world. The biggest frustration ends up being that if you have some general configuration in your __init__.py or base.py or wherever, it’s easy to overwrite it in a sub-module, unless there’s some logic that runs based on the configuration.
A classic example is having your settings/__init__.py do something like:
TESTING = ...
if TESTING:
set_up_logging()
If in your settings/local.py you end up just setting TESTING = True, then you’ll miss that base statement and so have to hack in some other stuff.
Ultimately what I do on personal projects is a sort of multi-stage environment setup, but a lot of people are averse to “complicated settings” (Though a part of me feels that complication levels of settings are not based on the usage of Python but more intrinsic requirements..).
The pattern that the author describes is fairly popular and works for simple small projects.
It is pushed frequently by some projects including Django because it “looks good” to have the config be in the same language and write a simple variable assignment in it such as ‘DATABASE_USERNAME = “username”’.
However I disagree with calling it “doing configuration right”. Configuration should be “just data”. This turns configuration into executable code which is a bad idea. Before you know it non-primitive types like class instances, and side effects will creep into your “config” and it’s a world of pain from there.
Stick to “dumb” data formats such as JSON/Yaml/TOML and derive/calculate the other things that you need from that original raw information. This keeps your config as data, serializable, without side effects, and makes it easy for other tooling to read/write/generate/compare/switch it.
If you need something beyond that look into Dhall, CUE, or Jsonnet (but think twice before doing so, you probably don’t need them!)
This turns configuration into executable code which is a bad idea.
Where do you draw the line? At what point does your data become executable code with a YML file? Is a python module not data? It’s not a pure data structure like a dict, but that doesn’t mean it’s not data. At some point that conversion from plaintext to (your language of choice) needs to happen.
I’d go so far as to argue this is data. Yes you can abuse it, but at the end of the day you can treat it as simple key value data wrapped up in a module.
At what point does your data become executable code with a YML file?
Effectively at no point because it’s always “just” a yaml file. If my or some other application wants to read that and do stuff based on it that’s when code execution comes in.
I have to execute some code to read the Yaml file. But I don’t have to execute it.
When it is a Python source file I have to execute it.
One is data, the other one is code that contains data.
At some point that conversion from plaintext to (your language of choice) needs to happen.
Yes at some point that transformation needs to happen. If what your application needs internally is very close to the raw format you can get away with those two things being very similar. For example your raw JSON/Yaml becomes a python dictionary in a python application or an object in javascript.
If your application needs a richer format it can construct that form the raw data at the time of conversion. For example an array of file path strings from the raw configuration data may be transformed into a an array of file objects or class instances.
The temptation is for people to skip the serializable data format and turn a source file/module into their “config”. At that point in some ways you don’t really have a config. You are just asking the user/consumer of the application to provide a source file/module of their own that you combine with the rest of your application.
It makes it difficult to make the application robust and handle errors because the line between config data and application code gets blurred. For example your entire application may crash upon attempting to load the config.
There’s also another nasty pattern where the config module tries to dynamically decide its own values based on other things such as:
if something:
CONFIG_VALUE = “this”
else:
CONFIG_VALUE = “that”
If you make things like that, at that point basically you don’t have a config. What you have there is an application that contains and self-generates its own configuration by executing and evaluating its own code.
When your config is a source module in some language effectively it says:
Anybody who wants to read/inspect me must execute me to find out what my values are
Nobody can generate me in a sane way (you’d have to “template” a source file which is nasty and unsafe)
In contrast when your config is in a conventional data format it says:
Anybody who can read JSON/Yaml/TOML can also read me (up to them what they want to do with it)
Anybody who can write JSON/Yaml/TOML can generate me
“source file as config” is hostile in many contexts. For example your ops/security team may want to inspect/store/compare/validate/verify/generate all or some parts of the config for the applications. And they may not even use the same language as the app itself. If application config data is in a “source file in langauge X” it’s going to make things quite difficult.
The default environment being DEV put the project at risk, in the sense that when someone will try, for some reason, to spin the app in production environment it might break things in difficult ways, whereas making production default, will just fail fast.
Also, this article does not even touch the subject of distributed secret sharing. This is an approach that works with a single app, single service monolith, but does not scale to multiple services and distributed settings.
I really don’t like the approach of using python itself as the configuration language. Stuff configuration into effectively global variables muddies dependencies and interfaces, and additionally it’s awkward as soon as a non-python component needs to read this configuration. I’ve also seen cases where the “config” module slowly starts acquiring things that perform side-effects, which is its own whole can of worms.
Vim is configured with a similar script. If you want to do it this way probably depends on what you’re doing.
Is that supposed to be ‘Modules’?
If you’ve got a polyglot situation, then it seems like json read as
loads
in Python and whatever equivalent in the other language is better. An example is when you want to maintain some configuration invariant between your React app and your Flask backend.The difference is so minor, though (
if config['environment'] == 'development'
is no less readable thanif config.is_development
) and while the theory is that with the latter you can then move yourconfig[environment]
toconfig[environments][canary]
and only have to modify one thing, I don’t think that happens that often in practice. Probably not worth optimizing for at the cost of the indirection.The advantage of the JSON approach is that you’re guaranteed there’s no logic and you get the polyglot support. The disadvantage is that you’re guaranteed no logic.
Nothing is preventing you from doing that here. They can coexist. The (truly) shared config can exist as a JSON file and get pulled into Python-land with
json.loads()
.I would argue that
config['environment'] == 'development'
is worse, although your point on readability is valid. I say this becausedevelopment
has now become a magic string that you need to be aware of.This pattern of a
settings
module is used pretty often in Django world. The biggest frustration ends up being that if you have some general configuration in your__init__.py
orbase.py
or wherever, it’s easy to overwrite it in a sub-module, unless there’s some logic that runs based on the configuration.A classic example is having your
settings/__init__.py
do something like:If in your
settings/local.py
you end up just settingTESTING = True
, then you’ll miss that base statement and so have to hack in some other stuff.Ultimately what I do on personal projects is a sort of multi-stage environment setup, but a lot of people are averse to “complicated settings” (Though a part of me feels that complication levels of settings are not based on the usage of Python but more intrinsic requirements..).
The pattern that the author describes is fairly popular and works for simple small projects. It is pushed frequently by some projects including Django because it “looks good” to have the config be in the same language and write a simple variable assignment in it such as ‘DATABASE_USERNAME = “username”’. However I disagree with calling it “doing configuration right”. Configuration should be “just data”. This turns configuration into executable code which is a bad idea. Before you know it non-primitive types like class instances, and side effects will creep into your “config” and it’s a world of pain from there.
Stick to “dumb” data formats such as JSON/Yaml/TOML and derive/calculate the other things that you need from that original raw information. This keeps your config as data, serializable, without side effects, and makes it easy for other tooling to read/write/generate/compare/switch it.
If you need something beyond that look into Dhall, CUE, or Jsonnet (but think twice before doing so, you probably don’t need them!)
Where do you draw the line? At what point does your data become executable code with a YML file? Is a python module not data? It’s not a pure data structure like a dict, but that doesn’t mean it’s not data. At some point that conversion from plaintext to (your language of choice) needs to happen.
I’d go so far as to argue this is data. Yes you can abuse it, but at the end of the day you can treat it as simple key value data wrapped up in a module.
Effectively at no point because it’s always “just” a yaml file. If my or some other application wants to read that and do stuff based on it that’s when code execution comes in.
I have to execute some code to read the Yaml file. But I don’t have to execute it. When it is a Python source file I have to execute it. One is data, the other one is code that contains data.
Yes at some point that transformation needs to happen. If what your application needs internally is very close to the raw format you can get away with those two things being very similar. For example your raw JSON/Yaml becomes a python dictionary in a python application or an object in javascript.
If your application needs a richer format it can construct that form the raw data at the time of conversion. For example an array of file path strings from the raw configuration data may be transformed into a an array of file objects or class instances.
The temptation is for people to skip the serializable data format and turn a source file/module into their “config”. At that point in some ways you don’t really have a config. You are just asking the user/consumer of the application to provide a source file/module of their own that you combine with the rest of your application.
It makes it difficult to make the application robust and handle errors because the line between config data and application code gets blurred. For example your entire application may crash upon attempting to load the config.
There’s also another nasty pattern where the config module tries to dynamically decide its own values based on other things such as:
if something: CONFIG_VALUE = “this” else: CONFIG_VALUE = “that”
If you make things like that, at that point basically you don’t have a config. What you have there is an application that contains and self-generates its own configuration by executing and evaluating its own code.
When your config is a source module in some language effectively it says:
In contrast when your config is in a conventional data format it says:
“source file as config” is hostile in many contexts. For example your ops/security team may want to inspect/store/compare/validate/verify/generate all or some parts of the config for the applications. And they may not even use the same language as the app itself. If application config data is in a “source file in langauge X” it’s going to make things quite difficult.
The default environment being DEV put the project at risk, in the sense that when someone will try, for some reason, to spin the app in production environment it might break things in difficult ways, whereas making production default, will just fail fast.
Also, this article does not even touch the subject of distributed secret sharing. This is an approach that works with a single app, single service monolith, but does not scale to multiple services and distributed settings.
Yep, that’s noted a few times in the post.