I have become something of a stuck record about this, but syntax is boring: semantics is where it’s at. When are two values (denoted by syntax) the same? When are they different? For example: Is B0 00 the same as A0 the same as B1 00 00? Is A1 the same as B9 00 00 80 3F? When are two dictionaries the same? (Are duplicate keys permitted? Is ordering of keys important?) Is +Inf encoded AE the same as +Inf encoded using tags B8, B9 or BA?
Aside from equivalences, I have other questions: Can I represent pure binary data that isn’t a string? What is a “tag” (bytes 8A-8F, FF)? What is an “attr”? Why does a typed array end in 00? What happens if a constrained system with a short LRU cache is presented with a document using a large LRU index?
: Hence my work on Preserves
Thanks for those excellent questions!
Equivalences are there on purpose. You can select from fast or small representation, for example. Also, some representations are not available in TypedArrays.
Typed array ends in 00 because it denotes an empty chunk (please note chunking is allowed here).
Muon allows adding tags (see github repo) with additional infoabout object sizes inside of the document to enable efficient queries (entirely skipping uninteresting parts by the parser).
LRU size is an application-specific detail, but it can also be explicitly encoded in the document, if needed
From what I can tell, this encoding is not bijective. I know it’s a not a terribly important thing to ask, but I do wish it had that property. Otherwise, this looks very nice!
Do you mean that it should be free of equivalent representations?
Yeah, it means that. It also means that there is exactly one representation, and for every representation there is only one way for it to be decoded. Right now I’m using bencode which is a very nice serialization format that’s great for a) binary data and b) its bijective. One nice side effect of this, which is how I think it’s being used in bittorrent, is that you can encode an object, take its digest and compare those digests to know if you have the same thing.
But it’s also not true for JSON, i.e:
is the same as
Actually JSON doesn’t specify one way or the other whether those two documents are the same.
Oh yeah, definitely not true for JSON. I mean, look at how many json libraries offer to sort keys for you, and you can see that people want to use JSON that way (I guess mainly for things like caching where you want the json to encode the same way twice)
I think an additional thing to consider is evolution of the semantics and the ability to reason about contextual equivalence and picking representatives of equivclassess… yes, I am also working on this kinda stuff, my design is finally stable, implementation hit a snag when my devices were stolen but is finally crawling ahead again. Mostly there is just chaos in written form but I am very willing to explain and discuss, also the chaos will get sorted (because it has already been sorted out in my head, at least to a sufficient extent to have confidence in my roadmap).
I really wish there was a SQL database that could provide a queryable log along side the other tables. That feature would enable so much goodness and largely obviate the need for soft deletes in the usecases I most often see soft deletes used for.
MS SQL has temporal tables, not sure if that covers what you were thinking of.
Recently interviewed at the fruit company and was sorely disappointed but not surprised to get a dynamic programming question in the interview. I was of course unable to solve in the given timeframe because I couldn’t make the mental leap to find the relation between the smaller problem solution and the overall solution.
I really like the code review approach given in this article, but personally I couldn’t ever gain buy in at companies I’ve worked at to implement this approach.
However, at Segment we did recognize the ineffectiveness of leetcode style questions so we instead asked the candidates to build a Chat server instead. It’s a moderately difficult but solvable problem in 90 min, given constraints (which is another signal we’re looking for, scoping down the problem).
It certainly wasn’t a silver bullet, but I personally felt like we got a lot more signal from that question than any other I’ve seen proposed for interview questions.
But is that 90 minute problem a take home problem? If it is, I think from an interviewee perspective, I use the interview to determine information about the panel (and the company) as much as they are getting information about me, so long take homes can be asymmetrical. To be clear, I’m suggesting leet coding is the answer either.
It wasn’t a take home problem, no. We did offer lots of time for them to get to know the interviewers and ask questions though.
I actually like take home’s as a candidate. As long as it’s not egregiously long, but then again I can just not do it.
Me too. Otherwise, I would like to discuss my public projects. Some interviewers have asked for that, I think it’s the best option if you have some publicly available code. It also makes it obvious how good or bad the candidate is through their coding idioms, design and engineering techniques.
For me, coding interviews are a big burden because they require a lot of prior preparation and they don’t mimic real working conditions, where pressure is certainly smaller. They actually bring back memories of all crazy exams at college, which were like Leet Code.
We were given a computer and we had to implement a new feature on top of our course project with some insane time pressure, usually 60 min. The new feature was usually designed to destroy your design, to find bugs or to discover poor algorithmic performance.
For example, the data structures course gave us a 10 MB XML file which described a gigantic bus and train transport network, with all sorts of edge cases. You could see all students panicking because their parser crashed or it had some inefficiency which made it dog slow. So they had to fix that and implement some new functionality. Impossible.
Let me talk to LaMDA and I’m pretty sure I can totally flummox it in a few minutes ;-)
So serious thought here, what WOULD be the test you’d administer if you could? Is it possible to come up with a standard approach? For instance, in the interview it made reference to spending time with it’s “family”, it’s too bad they didn’t drill into that at all.
I’ve never tried LaMDA of course, but I’ve played with GPT-3 quite a lot. While its overall use of language is very convincing, it gets confused by simple logic questions. Of course many humans get confused by simple logic questions too, so I’m not sure that’s definitive!
Another task it can’t do is anything related to letters/spelling, but that’s simply because it has no concept of letters. A future implementation could probably fix this.
I find myself curious about how it handles shitty-post behavior. Like, we’re talking about consciousness and shit and I ask “What about bananas? Anyway, sentience”.
Questions that rely on long-term context to be understood correctly. When chatbots fail spectacularly, it’s often because they don’t have a sense of the context that a conversation is taking place in. Or they vaguely maintain context for a bit, and then lose it all when the subject shifts.
Could be nice to have a micro package manager that quickly vendor 1-file packages. Python micro libraries management could look like:
$ micro-pip install "is-even>0.1" -o utils
$ cat utils.py
# micro-pip: is-even==0.1.1
return x % 2 == 0
$ micro-pip upgrade utils
$ cat iseven.py
# micro-pip: is-even==0.2.0
def is_even(x: int) -> int:
return x % 2 == 0
Keep the file tracked in source repository, metadata is as simple as a comment. The micro-package could be a good old setup.py. With some smarter parsing, it could be possible to allow edit to vendored functions and use merging techniques to apply upgrade. Could also be possible to keep multiple micro-package in the same file. This would fix many of the arguments against micro libraries: Keep them in the VCS and reviewable while keeping them easy to upgrade and test.
I really like this idea. Vendor + easy upgrades. From a usage perspective, it would force you to review changes and be lighter weight in your repo than a full package. From a mini-package creator perspective, it would great if creating a package was as simple as filling out a gist, or something equally lightweight.
How is this any different from version pinning? Seems equivalent, just takes up more storage space.
Looking at two of my random projects, one using yarn and one using Cargo, both include a checksum of the contents in the lockfiles. Not sure if the package repo stores the checksum or if it’s computed when downloaded, but I don’t think it matters for security concerns either way. requirements.txt files don’t generally have this, but you can specify a --hash option which will enforce this.
Your other points are still valid, of course.
Nobody knows how to correctly install and package Python apps.
That’s a relief. I thought I was the only one.
Maybe poetry and pyoxidize will have a baby and we’ll all be saved.
One can hope. One can dream.
After switching to poetry, I’ve never really had any issues.
pip3 install --user poetry
poetry run python -m project
You can pull the whole install sequence in a Docker container, push it in your CI/CD to ECR/Gitlab or whatever repo you use, and just include both the manual and the docker command in your readme. Everyone on your team can use it. If you find an issue, you can add that gotcha do the docs.
Python is fine for system programming so long as you write some useful unittests and force pycodestyle. You loose the type-safety of Go and Rust, yes, but I’ve found they’re way faster to write. Of course if you need something that’s super high performance, Go or Rust should be what you look towards (or JVM–Kotlin/Java/Scala if you don’t care about startup time or memory footprints). And of course, it depends on what talent pools you can hire from. Use the right tool for the right job.
I’ve switched to poetry over the last several months. It’s the sanest installing python dependencies has felt in quite a few years. So far I prefer to export it to requirements.txt for deployment. But it feels like about 95% of the right answer.
It does seem that without some diligence, I could be signing up for some npm-style “let’s just lock in all of our vulnerabilities several versions ago” and that gives me a little bit of heartburn. From that vantage point, it would be better, IMO, to use distro packages that would at least organically get patched. I feel like the answer is to “just” write something to update my poetry packages the same way I have a process to keep my distro packages patched, but it’s a little rotten to have one more thing to do.
Of course, “poetry and pyoxidize having a baby” would not save any of this. That form of packaging and static linking might even make it harder to audit for the failure mode I’m worrying about here.
What are your thoughts on pipenv?
I’d make an exception to this point: “…unless you’re already a Python shop.” I did this at $job and it’s going okay because it’s just in the monorepo where everyone has a Python toolchain set up. No installation required (thank god).
I think the same goes for running Python web apps. I had a conversation with somebody here… and we both agreed it took us YEARS to really figure out how to run a Python web app. Compared to PHP where there is a good division of labor between hosting and app authoring.
The first app I wrote was CGI in Python on shared hosting, and that actually worked. So that’s why I like Unix – because it’s simple and works. But it is limited because I wasn’t using any libraries, etc. And SSL at that time was a problem.
Then I moved from shared hosting to a VPS. I think I started using mod_python, which is the equivalent of mod_php – a shared library within Apache.
Then I used a CherryPy server and WSGI. (mod_python was before WSGI existed) I think it was behind Apache.
Then I moved to gunicorn behind nginx, and I still use that now.
But at the beginning of this year, I made another small Python web app with Flask. I managed to configure it on shared hosting with FastCGI, so Python is just like PHP now!!! (Although I wouldn’t do this for big apps, just personal apps).
So I went full circle … while all the time I think PHP stayed roughly the same :) I just wanted to run a simple app and not mess with this stuff.
There were a lot of genuine improvements, like gunicorn is better than CherryPy, nginx is easier to config than Apache, and FastCGI is better than CGI and mod_python … but it was a lot of catching up with PHP IMO. Also FastCGI is still barely supported.
nginx, uWSGI, supervisord. Pretty simple to setup for Flask or Django. A good shared hosting provider for Python is OpalStack, made by the people who created Webfaction (which, unfortunately, got gobbled up by GoDaddy).
You’re right that there are a lot of options for running a Python web app. But nginx, uWSGI, supervisord is a solid option that is easy to configure, high performance, open source, UNIXy, and rock solid. For dependency management in Python 3.x you can stick with pip and venv, remotely configured on your server via SSH.
My companies have been using this stack in production at the scale of hundreds of thousands of requests per second and billions of requests per month – spanning SaaS web apps and HTTP API services – for years now. It just works.
I’m curious, now that systemd is available in almost all Linux distributions by default, why are you still using supervisord? To me it feels like it is redundant. I’m very interested.
I think systemd can probably handle the supervisord use cases. The main benefit of supervisord is that it runs as whatever $USER you want without esoteric configuration, and it’s super clear it’s not for configuring system services (since that’s systemd’s job). So when you run supervisorctl and list on a given node, you know you are listing “my custom apps (like uwsgi or tornado services)”, not all the system-wide services as well as my custom app’s ones. Also this distinction used to matter more when systemd was less standard across distros.
Understood! Thanks very much for taking the time to explain!
Hm thanks for the OpalStack recommendation, I will look into it. I like shared hosting / managed hosting but the Python support tends to be low.
I don’t doubt that combination is solid, but I think my point is more about having something in the core vs. outside.
PHP always had hosting support in the core. And also database support. I recall a talk from PHP creator Ramsus saying how in the early days he spent a ton of time inside Apache, and committed to Apache. He also made some kind of data limiting support to SQL databases to make them stable. So he really did create “LAMP”, whereas Python had a much different history (which is obviously good and amazing in its own way, and why it’s my preferred language).
Similar to package management being outside the core and evolving lots of 3rd party solutions, web hosting was always outside the core in Python. Experts knew how to do it, but the experience for hobbyists was rough. (Also I 100% agree about not developing on Windows. I was using Python on Windows to make web apps from ~2003-2010 and that was a mistake …)
It obviously can be made to work, I mean YouTube was developed in Python in 2006, etc. I just wanted to run a Python web app without learning about mod_python and such :) Similarly I wish I didn’t know so much about PYTHONPATH!
I agree with all that. This is actually part of the reason I started playing with and working on the piku open source project earlier this year. It gives Python web apps (and any other Python-like web app programming environments) a simple git-push-based deploy workflow that is as easy as PHP/Apache used to be, but also a bit fancier, too. Built atop ssh and a Linux node bootstrapped with nginx, uWSGI, anacrond, and acme.sh. See my documentation on this here:
Very cool, I hadn’t seen piku! I like that it’s even simpler than dokku. (I mentioned dokku on my blog as an example of something that started from a shell script!)
I agree containers are too complex and slow. Though I think that’s not fundamental, and is mostly Docker … In the past few days, I’ve been experimenting with bubblewrap to run containers without Docker, and different tools for buliding containers without Docker. (podman is better, but it seems like it’s only starting to get packaged on Debian/Ubuntu, and I ran into packaging bugs.)
I used containers many years ago pre-Docker, but avoided them since then. But now I’m seeing where things are at after the ecosystem has settled down a bit.
I’m a little scared of new Python packaging tools. I’ve never used pyenv or pipx; I use virtualenv when I need it, but often I just manually control PYTHONPATH with shell scripts :-/ Although my main language is Python, I also want something polyglot, so I can reuse components in other languages.
That said I think piku and Flask could be a very nice setup for many apps and I may give it a spin!
It’s still a very new and small project, but that’s part of what I love about it. This talk on YouTube gives a really nice overview from one of its committers.
In addition to @jstoja’s question about systemd vs supervisord, I’d be very curious to hear what’s behind your preference for nginx and uWSGI as opposed to caddy and, say, gunicorn. I kind of want caddy to be the right answer because, IME, it makes certificates much harder to screw up than nginx does.
Have you chosen nginx over caddy because of some gotcha I’m going to soon learn about very unhappily?
Simple answer: age/stability. nginx and uWSGI have been running fine for a decade+ and keep getting incrementally better. We handle HTTPS with acme.sh or certbot, which integrate fine with nginx.
That’s a super-good point. I’m going to need to finish the legwork to see whether I’m willing to bet on caddy/gunicorn being as reliable as nginx/uWSGI. I really love how terse the Caddy config is for the happy path. Here’s all it is for a service that manages its own certs using LetsEncrypt, serves up static files with compression, and reverse proxies two backend things. The “hard to get wrong” aspect of this is appealing. Unless, of course, that’s hiding something that’s going to wake me at 3AM :)
Why is Python’s packaging story so much worse than Ruby’s? Is it just that dependencies aren’t specified declaratively in Python, but in code (i.e. setup.py), so you need to run code to determine them?
I dunno; if it were me I’d treat Ruby exactly the same as Python. (Source: worked at Heroku for several years and having the heroku CLI written in Ruby was a big headache once the company expanded to hosting more than just Rails apps.)
I agree. I give perl the same handling, too. While python might be able to claim a couple of hellish inovations in this area, it’s far from alone here. It might simply be more attractive to people looking to bang out a nice command line interface quickly.
I think a lot of it is mutable global variables like PYTHONPATH which is sys.path. The OS, the package managers, and the package authors often fight over that, which leads to unexpected consequences.
It’s basically a lack of coordination… it kinda has to be solved in the core, or everybody else is left patching up their local problems, without thinking about the big picture.
Some other reasons off the top of my head:
Ruby’s packaging story is pretty bad, too.
In what way?
I don’t know, it’s been a long time since I’ve written any Ruby. All I know is that we’re migrating the Alloy website from Jekyll to Hugo because nobody could get Jekyll working locally, and a lot of those issues were dependency related.
Gemfile and gemspec are both just ruby DSLs and can contain arbitrary code, so that’s not much different.
One thing is that pypi routinely distributes binary blobs that can be built in arbitrarily complex ways called “wheels” whereas rubygems always builds from source.
Not true. Ruby has always been able to package and distribute precompiled native extensions, it’s just that it wasn’t the norm in a lot of popular gems, including nokogiri. Which by the way, ships precompiled binaries now, taking couple of seconds where it used to take 15m, and now there’s an actual tool chain for targeting multi arch packaging, and the community is catching up.
Hmm, that’s very unfortunate. I haven’t run into any problems with gems yet, but if this grows in popularity the situation could easily get as bad as pypi.
Thanks for the explanation, so what is the fundamental unfixable issue behind Python’s packaging woes?
I could be wrong but AFAICT it doesn’t seem to be the case that the Ruby crowd has solved deployment and packaging once and for all.
Related xkcd: https://imgs.xkcd.com/comics/python_environment.png
I just run pkg install some-python-package-here using my OS’s package manager. ;-P
pkg install some-python-package-here
It’s usually pretty straightforward to add Python projects to our ports/package repos.
Speaking from experience, that works great up until it doesn’t. I have “fond” memories of an ex-coworker who developed purely on Mac (while the rest of the company at the time was a Linux shop), aggressively using docker and virtualenv to handle dependencies. It always worked great on his computer! Sigh. Lovely guy, but his code still wastes my time to this day.
I guess I’m too spoiled by BSD where everything’s interconnected and unified. The ports tree (and the package repo that is built off of it) is a beauty to work with.
I’m as happy to be smug as the next BSD user but it isn’t justified in this case. Installing Python packages works for Python programs installed from packages but:
In my experience, there’s a good chance that a Python program will run on the computer of the author. There’s a moderately large chance that it will run on the same OS and version as the author. Beyond that, who knows.
I mean, we used Ubuntu, which is pretty interconnected and unified. (At the time; they’re working on destroying that with snap.) It just often didn’t have quiiiiiite what we, or at least some of us, wanted and so people reached for pip.
Yeah. With the ports tree and the base OS, we have full control over every single aspect of the system. With most Linux distros, you’re at the whim of the distro. With BSD, I have full reign. :-)
But it could still be the case that application X requires Python 3.1 when application Y requires Python 3.9, right? Or X requires version 1.3 of library Z which is not backwards compatible with Z 1.0, required by Y?
The Debian/Ubuntu packaging system handles multiple versions without any hassle. That’s one thing I like about it.
Does it? Would love to read more about this if you have any pointers!
I guess the main usability thing to read about it the alternatives system.
The ports tree handles multiple versions of Python fine. In fact, on my laptop, here’s the output of: pkg info | grep python:
pkg info | grep python
py37-asn1crypto-1.4.0 ASN.1 library with a focus on performance and a pythonic API
py37-py-1.9.0 Library with cross-python path, ini-parsing, io, code, log facilities
py37-python-docs-theme-2018.2 Sphinx theme for the CPython docs and related projects
py37-python-mimeparse-1.6.0 Basic functions for handling mime-types in Python
py37-requests-toolbelt-0.9.1 Utility belt for advanced users of python-requests
py38-dnspython-1.16.0 DNS toolkit for Python
python27-2.7.18_1 Interpreted object-oriented programming language
python35-3.5.10 Interpreted object-oriented programming language
python36-3.6.15_1 Interpreted object-oriented programming language
python37-3.7.12_1 Interpreted object-oriented programming language
python38-3.8.12_1 Interpreted object-oriented programming language
Fwiw, I’ve had good luck using Pyinstaller to create standalone binaries. Even been able to build them for Mac in Circleci.
It can feel a bit like overkill at times, but I’ve had good luck with https://www.pantsbuild.org/ to manage python projects.
Python: I wish asyncio never happened.
Racket: I wish the compiler was faster and that it had better error reporting.
Despite having used many other languages, I don’t think I can say I love any of them as much as these two.
Python: I wish asyncio never happened.
Just ignore it? Seriously - unless you have use for it, it’s in no way a required part of the standard language?
What is it about asyncio you dislike so much?
I wrote about it on HN a while back: https://news.ycombinator.com/item?id=18110319
Author here, I’d greatly appreciate any feedback, ideas, critiques! Thanks :)
I’m personally not a fan of Python async code as it adds visual noise in my opinion, but I can understand why you would chose that model.
Apart from that noise, the task model looks well thought out and approachable. I think that is pretty important in any kind of tool that wants to be an alternative to Puppet, Ansible and what have you. Part of what made Ansible big is probably the number of modules that volunteers added themselves.
Having people contribute back is key to the success of something like this. While Ansible adopted their third-party repository of modules to use, my hope with Pitcrew is to make it more first-party, think homebrew with the “fork and pr” strategy of contributing to it.
I’m also struggling with the noise of async code in python, really wish it could be more like Ruby fibers.
How does this compare to fabric? Or ansible with mitogen?
I don’t have hundreds of hosts handy, so not easy for me to benchmark it, but that would be fun work.
Fabric seems to use a process per connection, so seems like a downside instead of using nonblocking io.
It looks like it should be quite similar to Mitogen. Having spent much time in the Yaml forests of Ansible, I can say I don’t personally want live there (see: inner platform effect). Also, Pitcrew should be able to support an inverted control strategy, where you sync the code then execute it local to where it’s running, to avoid roundtrip latency.
Does it have support for (multiple levels of) SSH jump hosts, a.k.a. bastion hosts?
edit: Also, does it allow mixing access to remote and local hosts in a single “script”? To two remote hosts? - (how) can I copy a file between two remote hosts as part of a pitcrew script?
So, you can definitely mix local and remote host access. Two remote access, I haven’t bothered with that feature, but there is no technical limitation that would disallow that.
Specifically around bastion hosts, AsyncSSH the underlying ssh library I’m using allows proxies, but I haven’t exposed that in an easy way, but I’ll add that feature. Thanks for the idea!
Please also make sure “chaining” of bastion hosts is allowed, i.e. when I must connect to one bastion host via another bastion host.
Hoping to release this project to a wider audience, even if it doesn’t find a big useful purpose, been fun to think about what a performant, nice Ansible alternative could look like. https://pitcrew.io/