1. 2

    Beautiful work, as can be expected from this author.

    1. 8

      Somewhat related story: on a non-work linux laptop my wife ended up with a directory of 15 million files. The full story is here http://pzel.name/til/2020/08/30/Large-directory-feature-not-enabled-on-this-filesystem.html I used find . to list all the files, which surprisingly did not hang.

      1. 1

        I was wondering if find . would hang in the same way. ls is actually notoriously bad at listing files when it gets over a certain amount.

      1. 7

        Author’s summary:

        I made Ink as an experiment to study writing interpreters and compilers in mid-2019. Since then, I’ve worked on several other related projects in the programming language space, but continue to write programs and apps in Ink in my day to day work. Because the language and runtime is so small, it’s easy for me to understand the entire stack and debug Ink programs easily. Because Ink’s interpreter is a single static binary that runs across operating systems, deploying Ink programs is also pretty simple.

        What makes this one relevant (vis-a-vis the multitudes of toy or in-progress PLs posted to lobsters at least monthly), is that the author has actually been using it to build some very interesting pieces of software.

        1. 2

          Does SO itself strongly reject this advice? For a long time they were proud of bucking the trend of cloud-scale deployments and just managing some small amount of beefy servers themselves.

          Edit: nvm I see they cover it in the article

          1. 3

            For those who don’t bother reading the article: when StackOverflow launched, Azure was brand-new, Windows didn’t support containers, and Kubernetes didn’t exist. I remember as late as roughly 2012, the Stack team told the Azure team what they’d need in hosted SQL Server alone to go entirely to the cloud, and the Azure team effectively just blanched and dropped it. (I worked at Fog Creek at the time, and we’d do beer o’clock together fairly often and swap stories.)

            It’s obviously an entirely different ballgame in 2021, so they’re allowed to change their minds.

            1. 8

              It’s obviously an entirely different ballgame in 2021, so they’re allowed to change their minds.

              I think a lot of people here and HN would argue that the trend over the past decade has not been good, and the old way really was better. They say the move toward containers and cloud services is driven by hype, fashion, and particularly cloud vendors who have something to sell.

              I admit I was sad to learn that Stack Exchange is no longer bucking the trend. Yes, I make heavy use of cloud infrastructure and containers (currently Amazon ECS rather than k8s) in my own work, and have often publicly defended that choice. But the contrarian, anti-fashion streak still has its appeal.

              1. 2

                Most of the contrarians are wrong: the old systems were not better, simple, easy to learn, or easy to use. The old way of every place above a certain size and/or age having their own peculiar menagerie of home-grown systems, commercial products, and open source projects bodged together hopefully with teams to support them just was how it was. But you got used to it and learned the hidden or half-documented footguns and for a time there weren’t better choices. Now there are choices and a different landscape (you don’t have to run your own DC), and many of them are very good and well-documented, but they’re still high in complexity and in depth when things go wrong. I think the greatest advantage is where they interact with home-grown systems they can be cordoned and put behind common interfaces so you can come in an say, “Oh, this is an operator doing xyz” and reason a little about it.

                1. 6

                  I think there are just as many – or more – footguns with the overly deep and complex virtualization/orchestration stacks of today. In the “old days” senior developers would take time to introduce juniors into the company culture, and this included tech culture.

                  Nowadays developers are expected to be fungible, tech cultures across companies homogeneous, and we’ve bought into the control mechanisms that make this happen, hook line and sinker. </end-ochruch-poast>

                  1. 1

                    Nowadays developers are expected to be fungible, tech cultures across companies homogeneous, and we’ve bought into the control mechanisms that make this happen

                    Lol why is this bad? Imagine if your doctor wasn’t fungible. You’re ill and your doctor retires, well, time for you to not have a doctor. Moreover homogeneity leads to reliability and safety. Again imagine if each doctor decided to use a different theory of medicine or a different set of instruments. One of our doctors believes in blood leeches, the other in herbal medicine, which one is right? It’s always fun to be the artisan but a lot less fun to receive the artisan’s services, especially when you really need them.

                    1. 3

                      Lol the doctors whom I trust with my health (and life) are not fungible, because I know them and their philosophy personally. If I want blood leeches I will go to the doctor who uses them well.

                      1. 1

                        So what if they retire or get sick themselves? What if you get sick badly somewhere away from your home? Also, do you expect everyone to do this level of due diligence with all services that people use?

          1. 5

            The original David Parnas paper on extensible code is titled “Designing Software for Ease of Extension and Contraction.”

            Extension means adding cases, contraction means deleting them.

            The contraction bit was always there.

            Extension and contraction go hand in hand; they’re both consequences of well designed modularity.

            I strongly recommend reading that Parnas paper, as well as that on modularity, if by any bad luck you haven’t yet.

            1. 1

              Thank you for the paper recommendation. The modularity paper seems a lot more often-referenced than the one you bring up.

              1. 1

                I like the paper a lot. A key insight is to specify the “minimal requirement” and build up from there. If every software project has two requirement set, one being “minimal”, another being “all plugins installed, full bloated state”, it will be a lot easier to keep a modular architecture. The product designer, instead of programmer, should be responsible for specifying how these two requirement set can be met in the same time. The solution of modularity problem might not in the hands of programmer, but in the product designing side.

              1. 4

                I feel what’s more valuable than the new environment variable proposal is learning all the ways to turn off telemetry in all the tools the author mentioned.

                1. 7

                  If all of those variables are known, why not make a shim that will toggle them all based on the new environment variables?

                  1. 2

                    This is a great idea. Like the community-sourced /etc/hosts directories, adblock lists, and the like, it has an actual chance of remaining “in play” as long as there are folks who care about the topic.

                  2. 2

                    I believe figuring out if what you’re using is going to include telemetry and how to turn it off your self is the most sustainable way to move forward.

                    I’ve gotten “used to” Microsoft’s .NET Telemetry so I know to add the flag before I even download it. When you first install .NET Core it will spit out how to opt out, but i’m assuming this is after they’ve already phoned home to say someone has installed.

                  1. 12

                    I wholeheartedly agree with this philosophy. I don’t think of code as an artifact to be preserved, and the idea of ‘lines spent’ really resonates with me.

                    However, I find that –paradoxically– this is a big reason why codebases tend towards unmaintainability as time goes on. Good programmers will try to write code that is easy to change and delete, while bad programmers will come up with schemes that are hard to either change or remove. So in the long run, the solid and heavy bits are all that’s left. Especially that there are perverse incentives at play, where programmers who ‘own’ this type of code gain job security.

                    Unless of course there is some kind of counter force acting on the codebase: for example, the need to keep a system within some performance bounds, or LOC count (I wish!).

                    1. 17

                      I’ve used ab, siege, jmeter and wrk in the past. I think wrk was the best (for quick stuff and heavy load) unless you actually want to set elaborate test setups of complete workflows incl login, doing stuff, logout, etc.pp - then jmeter.

                      NB: I’ve not done web development full time for many years, so maybe there’s new stuff around.

                      NB2: Always measure from a second host, a good stress testing tool will destroy the perf of the host it is running on.

                      1. 2

                        Do you know if wrk have fixed their coordinated omission problem or should one still use wrk2 to get correct measurements?

                        1. 1

                          I’ve always used the (unmaintained-looking) wrk2 instead of wrk, precisely because of C.O. Nothing I could see in the wrk readme suggested they merged the C.O. work back in..

                          1. 1

                            No, as I said it’s been a while and I don’t remember this term or type of problem, sorry. It worked for me at the time.

                        1. 3

                          I love Nils’s work. Good to know he’s still at it. Has anyone worked through the scheme 9 from empty space book? Can you recommend it to someone who has worked through (most of) SICP?

                          1. 2

                            From my memory of it, the Scheme 9 From Empty Space book is more about implementing a scheme.

                            1. 1

                              Yes, so my hope is that it goes into the nitty-gritty that was left out of SICP: hygienic macros, call/cc, threads.. I’m guessing it has to be the case since s9 does R4RS.

                              1. 2

                                Answering myself, for posterity: This is what the book is about (from http://www.t3x.org/s9book/index.html)

                                    How does automatic memory management work?
                                    How is macro expansion implemented?
                                    How does lexical scoping work?
                                    How is arbitrary precision arithmetics implemented?
                                    How does tail call elimination work?
                                    How are first-class continuations implemented?
                                
                          1. 2

                            Excellent approach. I really appreciate how the author adapted his article after getting constructive feedback. The iptables trick is very nice. The kind of thing you think to yourself: “this is obvious!”, but only after finding out about it.

                            I think there is still a living vein of old school KISS deployment & management of production applications in the Erlang/Elixir world. I know once upon a time Whatsapp used to do their deployments with a bash script, can anyone corroborate what the state is currently? (or correct me if I’m repeating BEAMer urban legends?)

                            At a previous job (big chat system), we served the entire world on a cluster of 4 beefy EC2 instances and ansible deployments of OTP release bundles.

                            1. 3

                              One more thought: although it would conflict with the ‘zero cost’ premise of the article, plugging in sqlite+litestream and dropping postgres would reduce the mental overhead of managing Postgres permissions and security.

                            1. 16

                              This is an excellent, empathic piece of writing. I cut my professional teeth in a strictly-XP pairing environment, but always working remote (shared tmux workplace). I even wrote a little thing on the topic of remote pairing, but this post really shines light on the totalitarian, cultish aspect of the practice when done in-person.

                              I think what made my experience pleasurable was that I was pairing with people who I considered (and still consider) close friends and mentors, making my workdays very enjoyable. I can imagine that having to pair program day-in, day-out with people you dislike really burns you out. Much like creating music, if you’re doing it with someone with whom you have some kind of emotional affinity, the process can continue fruitfully for a very long time. But you just can’t play/create well with people you feel antipathy for, it’s absolutely draining. At least in my experience.

                              1. 4

                                Sometimes I feel like I write just as much shell and Make as I do other languages. I’ve really dug into Make in the last ~3 years and just recently had the necessity to enmakify some Python projects.

                                I belong to a school of thought that Make is the base build tool, the lowest common denominator. It’s already available on macOS, Linux, etc. out of the box and trivial to install on Windows. I’ve worked at companies with thousands of developers or with just dozens, with varying skill levels, familiarity with a particular ecosystem’s tooling, patience for poor or missing onboarding documentation, and general tolerance for other team’s preferences ranging from scared to flippant.

                                Regardless, nearly all of them can figure out what to do if they can run make help and are presented with a menu of tasks that ideally are self-configuring entirely. As in, I clone a repo, run make help and see that there’s a test task that runs tests. I should then be able to run make test and… tests run. It may take some time to set up the environment — install pyenv, install the right Python version, install dependencies, etc. – but it will inevitably run tests with no other action required. This is an incredibly straightforward onboarding process! A brief README plus a well-written Makefile that abstracts away idiosyncracies of the repo’s main language’s package manager, build system, or both, can accelerate contributors, even if your Makefile is as simple as:

                                help:
                                	@echo Run build or test
                                build:
                                	npm build
                                	sbt build
                                test:
                                	npm test
                                	sbt test
                                

                                My base Pythonic Makefile looks like this now, not guaranteed to work because I’m plucking things. I’m stuck on 3.6 and 3.7 for now but hope to get these projects up to 3.9 or 3.10 by the end of the year. I’m using Poetry along with PyTest, MyPy, Flake8, and Black.

                                # Set this to ~use it everywhere in the project setup
                                PYTHON_VERSION ?= 3.7.1
                                # the directories containing the library modules this repo builds
                                LIBRARY_DIRS = mylibrary
                                # build artifacts organized in this Makefile
                                BUILD_DIR ?= build
                                
                                # PyTest options
                                PYTEST_HTML_OPTIONS = --html=$(BUILD_DIR)/report.html --self-contained-html
                                PYTEST_TAP_OPTIONS = --tap-combined --tap-outdir $(BUILD_DIR)
                                PYTEST_COVERAGE_OPTIONS = --cov=$(LIBRARY_DIRS)
                                PYTEST_OPTIONS ?= $(PYTEST_HTML_OPTIONS) $(PYTEST_TAP_OPTIONS) $(PYTEST_COVERAGE_OPTIONS)
                                
                                # MyPy typechecking options
                                MYPY_OPTS ?= --python-version $(basename $(PYTHON_VERSION)) --show-column-numbers --pretty --html-report $(BUILD_DIR)/mypy
                                # Python installation artifacts
                                PYTHON_VERSION_FILE=.python-version
                                ifeq ($(shell which pyenv),)
                                # pyenv isn't installed, guess the eventual path FWIW
                                PYENV_VERSION_DIR ?= $(HOME)/.pyenv/versions/$(PYTHON_VERSION)
                                else
                                # pyenv is installed
                                PYENV_VERSION_DIR ?= $(shell pyenv root)/versions/$(PYTHON_VERSION)
                                endif
                                PIP ?= pip3
                                
                                POETRY_OPTS ?=
                                POETRY ?= poetry $(POETRY_OPTS)
                                RUN_PYPKG_BIN = $(POETRY) run
                                
                                COLOR_ORANGE = \033[33m
                                COLOR_RESET = \033[0m
                                
                                ##@ Utility
                                
                                .PHONY: help
                                help:  ## Display this help
                                	@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n  make \033[36m\033[0m\n"} /^[a-zA-Z0-9_-]+:.*?##/ { printf "  \033[36m%-15s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)
                                
                                .PHONY: version-python
                                version-python: ## Echos the version of Python in use
                                	@echo $(PYTHON_VERSION)
                                
                                ##@ Testing
                                
                                .PHONY: test
                                test: ## Runs tests
                                	$(RUN_PYPKG_BIN) pytest \
                                		$(PYTEST_OPTIONS) \
                                		tests/*.py
                                
                                ##@ Building and Publishing
                                
                                .PHONY: build
                                build: ## Runs a build
                                	$(POETRY) build
                                
                                .PHONY: publish
                                publish: ## Publish a build to the configured repo
                                	$(POETRY) publish $(POETRY_PUBLISH_OPTIONS_SET_BY_CI_ENV)
                                
                                .PHONY: deps-py-update
                                deps-py-update: pyproject.toml ## Update Poetry deps, e.g. after adding a new one manually
                                	$(POETRY) update
                                
                                ##@ Setup
                                # dynamic-ish detection of Python installation directory with pyenv
                                $(PYENV_VERSION_DIR):
                                	pyenv install --skip-existing $(PYTHON_VERSION)
                                $(PYTHON_VERSION_FILE): $(PYENV_VERSION_DIR)
                                	pyenv local $(PYTHON_VERSION)
                                
                                .PHONY: deps
                                deps: deps-brew deps-py  ## Installs all dependencies
                                
                                .PHONY: deps-brew
                                deps-brew: Brewfile ## Installs development dependencies from Homebrew
                                	brew bundle --file=Brewfile
                                	@echo "$(COLOR_ORANGE)Ensure that pyenv is setup in your shell.$(COLOR_RESET)"
                                	@echo "$(COLOR_ORANGE)It should have something like 'eval \$$(pyenv init -)'$(COLOR_RESET)"
                                
                                .PHONY: deps-py
                                deps-py: $(PYTHON_VERSION_FILE) ## Installs Python development and runtime dependencies
                                	$(PIP) install --upgrade \
                                		--index-url $(PYPI_PROXY) \
                                		pip
                                	$(PIP) install --upgrade \
                                                                     		--index-url $(PYPI_PROXY) \
                                                                     		poetry
                                	$(POETRY) install
                                
                                ##@ Code Quality
                                
                                .PHONY: check
                                check: check-py check-sh ## Runs linters and other important tools
                                
                                .PHONY: check-py
                                check-py: check-py-flake8 check-py-black check-py-mypy ## Checks only Python files
                                
                                .PHONY: check-py-flake8
                                check-py-flake8: ## Runs flake8 linter
                                	$(RUN_PYPKG_BIN) flake8 .
                                
                                .PHONY: check-py-black
                                check-py-black: ## Runs black in check mode (no changes)
                                	$(RUN_PYPKG_BIN) black --check --line-length 118 --fast .
                                
                                .PHONY: check-py-mypy
                                check-py-mypy: ## Runs mypy
                                	$(RUN_PYPKG_BIN) mypy $(MYPY_OPTS) $(LIBRARY_DIRS)
                                
                                .PHONY: format-py
                                format-py: ## Runs black, makes changes where necessary
                                	$(RUN_PYPKG_BIN) black --line-length 118 .
                                

                                Is this overkill? Maybe, but I can clone this repo and be running tests quickly. I still have some work to do to actually achieve my goal of clone-to-working-env in two commands — it’s three in git clone org/repo.git && make deps && make test right now – but I’ll probably get there in the next few days or weeks. Moreover, this keeps my CI steps as much like what developers run as possible. The only real thing that has to be set in CI are some environment variables that Poetry uses for the make publish step, plus setting the version with poetry version $(git describe --tags) because git describe versions are not PEP-440 compliant without some massaging and I’ve been lazy doing that when our published tags will always be PEP-440 compliant.

                                The Brewfile:

                                # basic build tool, get the latest version
                                # if you want to ensure use, use 'gmake' instead on macOS
                                # or follow caveats in `brew info make` to make make brew's make
                                brew 'make'
                                
                                # python version and environment management
                                brew 'pyenv'
                                # python dependency manager
                                # a version from pypi instead of homebrew may be installed when running make deps
                                brew 'poetry'
                                

                                The full pyproject.toml is an exercise left to the reader but here’s the dev-dependencies selection from one of them:

                                [tool.poetry.dev-dependencies]
                                flake8 = "3.7.9"
                                black = "19.10b0"
                                mypy = "^0.812"
                                pytest = "^6.2.2"
                                pytest-html = "^3.1.1"
                                ansi2html = "*"
                                pytest-tap = "^3.2"
                                pytest-cov = "^2.11.1"
                                pytest-mypy = "^0.8.0"
                                lxml = "^4.6.2"
                                

                                Suggested improvements welcome. I’ve built Makefiles like this for Scala, Ruby, Rust, Java, C, Scheme, and Pandoc projects for a long time but feel like Make really is like Othello: a minute to learn, a lifetime to master.

                                1. 4

                                  Regardless, nearly all of them can figure out what to do if they can run make help and are presented with a menu of tasks that ideally are self-configuring entirely. As in, I clone a repo, run make help and see that there’s a test task that runs tests. I should then be able to run make test and… tests run. It may take some time to set up the environment — install pyenv, install the right Python version, install dependencies, etc. – but it will inevitably run tests with no other action required.

                                  This is the reason we use Makefiles on my teams. Like you say, it’s the lowest common denominator, the glue layer, which enables things like multi-language projects to be managed in a coherent manner. I’d much rather call out to both npm and gradle from the Makefile, than use weird plugins to shovel npm into gradle or vice-versa. Makefiles are scripts + a dependency graph, so you can do things like ensuring particular files are in place before running commands, and this is not just about build artifacts, but also files downloaded externally (hello chromedriver).

                                  I have Makefiles in all my old projects, and they are a real boon when I need to make changes after a long time (like years) has passed. I also make it a habit to always plug this make tutorial whenever the topic of Make comes up. It’s a stellar tutorial.

                                  To quote Joe Armstrong: “Four good tools to learn: Emacs, Bash, Make and Shell. You could use Vi, I am not religious here. Make is pretty damn good! I use Make for everything, that good!”

                                  1. 2

                                    Could this 114 line Makefile be a gist or something?

                                  1. 1
                                    • x if I’m going to be deleting it shortly;
                                    • simon-deleteme if it might stick around for a while.
                                    1. 2

                                      I just stick it in /tmp, and let the system delete it on restart.

                                    1. 6

                                      A good high-level intro, but for folks truly new to make, I always recommend this tutorial. It starts off easy but also goes in depth.

                                      1. 17

                                        Paul Graham knows he’s smarter than most people (that’s not hard, most of us in here are, statistically), and he still thinks this fact makes him right all the time. He’s rich and has a big audience, so this is unlikely to change.

                                        Not sure why this post needs to exist, but definitely can’t see why it needs to be here.

                                        1. 9

                                          PG reminds me of a pattern I’ve seen in other people. That is, a person experienced in one particular area develops the idea they are experienced in many other areas. They then proceed to share their wisdom of these areas, when in reality they know jack shit. The way these people write or speak can make it difficult to distil bullshit from facts.

                                          When encountering such people, especially if they develop a cult following like PG, I think it’s entirely reasonable to call such people out. The article posted may beat around the bush a bit too much, but it provides many good examples. As such, I think it’s existence is entirely valid.

                                          1. 5

                                            Engineer’s disease? The amusing part to me, is that it reminds me of that trope in movies “oh, you’re a scientist? clearly you’re a polymath” - except it’s real.

                                            1. 1

                                              Yeah, I agree - when i was younger and finding my footing in tech i was quite taken with PG (and i certainly do feel like spending some time learning lisp has made the functional programming paradigm more intuitive to me) but it has been valuable to me to see critiques of his work as well, especially in trying to apply the things he actually was expert at to unrelated fields. He certainly was personally successful - more so than most people criticizing him, I’m sure (but things aren’t necessarily fair), but it can be helpful to point out that at some point he just stopped being very relevant.

                                              …but I personallly still don’t like java, and prefer lispy FP to heavy handed OOP.

                                            2. 4

                                              because for sure in this audience there are people that would take his expertise in programming as a source of authority on other topics (pretty much like most people do with celebrities advocating for a cause) and maybe it’s useful to remind them, with terms they can understand, that this is magical thinking a few rich people use to steer the whole sector.

                                              1. 3

                                                They will not rest until they cancel https://timecube.2enp.com/

                                              1. 3

                                                That’s interesting, though Ł and Ó for me are the most easy to write special letters from all letters, and they require just one hand to write them. When testing charset conversion I’m often using magic words like “łóżko” (“bed”) because “łó” is so easy to write ;), but of course not every hand is built the same way, so I understand that for some people it’s different.

                                                One can’t also forget about the layout used in PN-I-06000:1997 – the “typist’s” (214) layout, which also seems to group special letters on one side of the keyboard, though on the right side. But also it’s QWERTZ-based, so it’s a little bit different.

                                                1. 1

                                                  This used to be the same for me, but recently it’s gotten very hard to get keyboards with a short spacebar, with the right alt in a place reachable by the thumb. When both my noppoo chocs broke I looked around for a keyboard with a similar layout and couldn’t find one. So I caved and got a Logi G keyboard, which is really great for typing, but the right alt key is unfortunately so far to the right that I can’t reach it with my thumb.

                                                  The typist’s layout is too alien for me, and not very convenient for programming. I don’t really like having to tweak defaults, so I tried to get used to the regular Polish keyboard, but I don’t want to get RSI (again).

                                                  I’m secretly hoping this layout variant catches on and perhaps in a couple of years might be included in Windows. One can dream.

                                                1. 4

                                                  Hi Lobsters! I know there are quite a few Polish speakers and keyboard tweakers here. If you’re in the center of that Venn diagram, this keyboard layout might interest you. I was sick of contorting my right thumb all the time while typing in Polish, so I remapped the L,N,O keys on my machine. It’s been pretty sweet so far so I thought I’d share. (*typo)

                                                  1. 2

                                                    I like the overall idea, but I’m unclear on something.

                                                    Part 1 says:

                                                    Even logging invalid data could potentially lead to a breach.

                                                    I can’t think how that would be the case.

                                                    Also, the example of that is:

                                                    log_error(“invalid_address, #{inspect address}”)

                                                    In the reworked example, you show

                                                     {:error, validation_error} ->
                                                        log_error(validation_error)
                                                        return_error_to_user(validation_error)
                                                    

                                                    But validation_error contains (a couple levels deep) an input: field with the original input. So wouldn’t it have the same problem?

                                                    1. 1

                                                      Yeah, I totally agree that we’re cheating here! This is a design tension that we’re not sure how to resolve. On one hand, we don’t want to expose unsanitized inputs to the caller, while on the other we’d love to log examples of payloads that cause the parser to fail. (For auditability).

                                                      Do you have any pointers (or links to resources) on ways to resolve this tension? There’s always the option of “defanging” the original input by base64-encoding it, etc., but perhaps there’s a more elegant way out?

                                                    1. 1

                                                      A couple of years ago, I slapped together a couple of modules in python for taking nested JSON documents and pulling out slices of data for loading into a data warehouse. I’m happy with the general idea, but I have been wanting to refactor the implementation to separate out some of the concerns and improve flexibility. Everything is too bound up, too opinionated.

                                                      If you had a JSON document for a film like so:

                                                      {
                                                        "id": 1,
                                                        "title": "Titantic",
                                                        "cast": [
                                                          {"talent_id": 1, "name": "DeCaprio", "role": "Jack"},
                                                          {"talent_id": 2, "name": "Winslet", "role": "Rose"}
                                                        ],
                                                        "release_dates": [
                                                          {"location": "US", "date": "1997-12-19"},
                                                          {"location": "CA", "date": "1997-12-20"}
                                                        ]
                                                      }
                                                      

                                                      Then you could write schemas like so:

                                                      film = {
                                                        "id": Field("titleId", int),
                                                        "title": Field("title", String50),
                                                        "country_of_origin": Field("originalCountry", NullableString50),
                                                      }
                                                      
                                                      cast = {
                                                        "id": Field("titleId", int),
                                                        "cast": {
                                                          "talent_id": Field("talentId", int),
                                                          "name": Field("talentName", String100),
                                                          "role": Field("role", String100),
                                                        }
                                                      }
                                                      
                                                      release_dates = {...}  # you get the picture
                                                      

                                                      Which would result in dictionaries like:

                                                      films = [{"titleId": 1, "title": "Titanic", "originalCountry": null}]
                                                      
                                                      cast = [
                                                        {"titleId": 1, "talentId": 1, "talentName": "DeCaprio", "role": "Jack"},
                                                        {"titleId": 1, "talentId": 2, "talentName": "Winlet "role": "Rose"},
                                                      ]
                                                      

                                                      I built some plumbing around deserializing the document, passing it to a series of schemas, pulling out the record instances, and serializing each instance to it’s own location. If any subcomponent failed, I’m failing out the whole set of records to ensure the database has a logical view of the entity. Overall, it’s worked pretty well for well organized, consistently typed JSON data. Unfortunately, there is a lot of nasty JSON data out there and it can get pretty complex.

                                                      I suppose this is a long way of saying that this article gives me a couple ideas of how I might decouple some of this logic. Are you going to be discussing building entire structs in the next time? Or are you looking at a per field perspective?

                                                      Looking forward to the next article!

                                                      1. 1

                                                        Hey, thanks for the great feedback! Yeah, we’re going to be building entire structs – if you take a look at the previous post, at the end (“Under the Hood”) there’s a snippet that uses a Data.Constructor.struct/3 to specify parsers for the particular fields. The next installment is going to be about how to make your struct parsing more flexible: for example, if you have a big flat JSON object coming in, but want to use it to create a nested hierarchy.

                                                        In general, we’re taking a fractal approach of composing smaller parsers to create larger ones. The ‘struct’ parser constructor is a complex combinator with some specific semantics, but it’s fundamentally similar to the list/1 combinator. So yeah, to answer your question, we will be BOTH constructing entire structs and looking at it from a per-field perspective. It all comes together in the end.

                                                        1. 1

                                                          Awesome! Look forward to reading about it!

                                                      1. 2

                                                        For stateless validations (must be a number between 0 and 100), this is a nice approach. For stateful validations (this e-mail address has already been taken), it should probably be a two-stage process–unless we want to put filesystem/database/etc calls inside our parsers, which seems like a terrible idea.

                                                        1. 3

                                                          Yes, putting some kind of IO (service-call/db/etc) inside a parser would be terrible. I try to tackle stateful validation problems like this:

                                                          1. Model the syntactically-valid data type, and use a parser to “smart-construct” it. So in this case we’d have an %EmailAddress{}. This data type doesn’t tell us anything about whether the email has been claimed or not.

                                                          2. Down the line, when(if) we actually need to work with email addresses that are unclaimed, we give the service responsible for instantiating them expose a function typed:

                                                          @spec to_unclaimed_email_address(
                                                            %EmailAddress{}) :: Result.t(%UnclaimedEmailAddress, some_error())
                                                          

                                                          This function does the necessary legwork to either create a truly unclaimed email address, or tell you that it’s not possible with the data you brought it. It still conforms to the ‘railway-oriented style’, but at another level of the architecture.

                                                          Of course this opens up another can of worms in terms of concurrency, but that’s state for you.