Threads for yingw787

  1. 2

    I like property based testing and it helped me a lot in the past. It’s a great tool for certain domains and with a good setup it could allow more people to contribute.

    Having said that, the first part of the article mentions how sales and marketing would rarely contribute to testing, and the very first example starts with using regexes, which are hard even for programmers. Wouldn’t it be easier (yet longer) to implement it with strategy that selects from several other, simpler strategies?

    1. 1

      Thanks for sharing! I thought it would be important to give the context from which I approached property-based testing, and the article is mostly an intro for programmers in the field already, hence discussion around corporate and inter-departmental psychology. I believe hypothesis has strategies.characters() which generates a stream of individual characters, but if you want strings I think the only option is st.from_regex(). I might be wrong though. You definitely can chain multiple strategies together.

    1. 2

      Man this was jarring. I wish they de-risked the UI change by forming a UX migration plan and then A/B testing one part at a time, than doing everything wholesale.

      1. 3

        What is this app?

        If it’s an HTTP(s) server, then yes this is pretty straightforward.

        When your browser visits google.com, the HTTP request it sends includes a line like Host: google.com. Your app can switch on that line without needing to have multiple IP addresses or anything.

        1. 1

          I’m building out a tiny spreadsheet alternative for myself; it’s like open-source Airtable, with APIs and DB cursor access. I mostly want to get away from using Google Sheets. The backend stack would be PostgreSQL + PostgREST (web server to lift PostgreSQL into HTTP). Yeah, PostgREST is an HTTP server. I’d either be provisioning PostgreSQL + PostgREST for an individual spreadsheet, or one db + API for all user spreadsheets, but I’d rather do the former because ElephantSQL sets limits on free tiers per database on disk usage + # of concurrent connections (which makes sense to me from first principles).

          I’m not sure I fully understand your point on routing HTTP requests. Can you elaborate more on that please?

          1. 2

            because ElephantSQL sets limits on free tiers per database on disk usage + # of concurrent connections

            Building around ‘what can I get for free from a specific provider’ is a very fast way to burn a weeks engineering time to save $5.

            They can - and likely will - detect and ban you for circumventing the free tier, which will likely cause you to lose all your data.

            For a project at this stage, I’d strongly recommend a single DB and a single API server, both running on the same box. You’ll need a machine to run the API server anyways - put your postgresql there too.

            AWS, google, azure and oracle each have a free tier which can do this.

            The hardest part will be figuring out cron for backups (it never has the right env vars), but you can leave that until you’ve been using it for a few days (don’t wait months though!).

            I’m not sure I fully understand your point on routing HTTP requests. Can you elaborate more on that please?

            As you’re using an existing off-the-shelf server, some things are slightly more complex (assuming you don’t want to patch & compile it yourself), since you’re restricted to what it already knows how to do.

            Unless you have particularly fancy needs (multiple users?), I would recommend against separating ‘document’ and ‘sheet’ (the way e.g. excel does), instead just having sheets. If you jam it all in one schema in one database on one domain name, you could be using it tomorrow. Manually separate unrelated data by giving it a sensible name.

            If you insist on nesting sheets inside documents, consider using the document name as a prefix - you can get your client to run a list all tables whose names have prefix 'mydocument' query and only show those ones.

            Further separation could be obtained (with additional effort) via the multi-schema support in postgrest. However, then you need to update the config file and restart a server every time you add a spreadsheet - see the relevant bit of the postgrest docs.

            1. 1

              I think you’re right. Thanks for talking me out of my original plan. I do have $50k worth of AWS credits, but since I was on sabbatical and was hesitant to start something serious, I never really expected to take advantage of those credits because I can’t use it after the credits expire. Now I think maybe that’s a bit bone-headed and I should go ahead and build it using the free credits anyways. Worst comes to worst, I can always export my data to S3 and shut down the stack.

              Hmm, I didn’t realize that bit of PostgREST documentation. I should go back and read through the whole thing before pinging people with questions. Thanks again for your help!!

        1. 5

          DNS names map to IP addresses. In principle, it is possible to have multiple IP addresses for a single computer, and make each process listen on a different IP address.

          Hard to make a concrete suggestion without knowing more of the requirements, but for example you could have each process on the application server run a script on the DNS server via SSH, which changes the DNS configuration to point to the IP address that the process is using.

          1. 3

            DNS names map to IP addresses

            Some of them do. SRV records; however, map to a port and DNS record name. If you have a DNS updating API, you can publish SRV records that advertise the location of the service. You can have multiple SRV records for the same resource, so you could do a single DNS query to get a complete list of all of the nodes if it’s small enough.

            1. 1

              SRV records sound interesting, I’m only familiar with CNAME, A, and AAAA so far. Do you know of any particularly great resources / tutorials to read / code more about SRV records along my use case or similar?

              1. 2

                The Wikipedia page is actually pretty good. It describes the format. For your use case, I imagine that you’d want to set a very short (1-10 second) TTL so that they were quickly invalidated, though you may set something longer and just handle the case that some nodes drop out in your system.

                Note that this can be complicated if you want these to be run on consumer connections. If you’re behind a NAT you may need to do some more clever work to figure out your public IP and the port that’s being forwarded (e.g. via a STUN server or UPnP / IGDP). From your post below, it sounds as if you’re just exposing a public server that multiple clients connect to, so DNS SRV records might be a good choice.

                If you’re providing a (potentially) different connection endpoint for each spreadsheet, then you can use this as your addressing layer. If you name sheets as something like {sheet name}.{username}.{your domain}, then you’d look up something _yingw787sheet._tcp.{sheet name}.{username}.{your domain} to get the host IP and port of the server. Or, if you want to give users the ability to transfer ownership of sheets and rename them, you’d do use a CNAME entry for {sheet name}.{username}.{your domain} that would then return a domain in the form {UUID}.{your domain}, and then you’d do the SRV lookup on _yingw787sheet._tcp.{UUID}.{your domain}. That also makes it fairly trivial for other people to host instances of the service and even lets people do their own namespace management while using your hosted version.

                Of course, like most other uses of DNS, this is a horrible abuse and not what DNS was designed for. You may find that free DNS providers start to object when they see that you have 100,000 DNS entries in a single domain with short TTLs and thousands of requests per second.

            2. 1

              I’m thinking about using Route 53 as my DNS server, and I don’t think I have SSH access. I think I do have API access to create and destroy records though. If I use DNS-based service discovery (which I know nothing about currently), would it be possible to register an EC2 instance with a master process to DNS upon provisioning, then have the master process issue API calls to DNS for each process? If this stack does work, how might this change if I add in a proxy layer?

            1. 3

              I think something like https://www.consul.io/ will fit what you’re looking for.

              1. 1

                This looks interesting. I had heard of Consul before but I didn’t put two and two together. Thanks!

                Have you used Consul in production before? How often might it fail, and how often do you need to update it due to a critical need (e.g. security vulnerability)?

                1. 1

                  Also I think Consul is meant to be natively integrated into Kubernetes…and EKS costs something like $0.10 / hr which is $72 / mo…without factoring in EC2 costs on top of that. My budget is something like $5-10 / mo. for personal projects.

                1. 3

                  You can have a wildcard domain, such as `.mydomain.com that routes to some dispatch service.

                  Then each process you run registers (in a database, writes to disk, etc.) itself, and the dispatch service takes myprocessname.mydmain.com and routes it to that process.

                  Theoretically, the dispatch service could also do a lookup of active processes running and dynamically route based on that, but I assume there might be some overhead with that approach (without caching).

                  1. 1

                    Hmm, this is interesting. So if I understand you correctly, you’re saying I can create a master process on the instance (or on a separate server?), that can read a remote database, figures out the process, and routes the request to that process?

                    Are there any examples of this approach on GitHub, or in books you know and recommend? How might this approach change if individual processes and instances are shielded behind a load balancer?

                    Can you name individual UNIX processes, or do you have to search by PID or command? I’m guessing that even if PIDs don’t change over the process lifetime, if the process goes down and it restarts, then it’ll have a different PID, and relying on a failing process to properly issue an HTTP call to update a remote registry isn’t wise because you’ll be coupling failure models within your system design.

                    1. 2

                      Also perhaps look into dbus: https://news.ycombinator.com/item?id=9451023

                      1. 2

                        I don’t know enough about process name / PID internals to know how easy or hard it is to look up - that’s also why I suggest that each process self-registers in some shared store (DB, disk, in-memory service, etc.). Someone further up suggested Consul which fits this role well - in general, “service discovery” is probably what you should be googling.

                        The router (that takes incoming requests and sends them to the right process) can both live on the same instance or somewhere else, assuming you use HTTP for communication. If you want to use sockets or similar IPC, you’ll need to have it on the same instance.

                        To handle failing processes, you could either have a heartbeat mechanism from the router checking that the process is up (with X failures before unregistering it) or you could just have a timeout on all incoming requests and make it so that a new process registering will overwrite the old process’ registration.

                        It’s hard to be more specific without pulling in specific code samples or talking about actual implementation details.

                    1. 2
                      • Where is this going to run? The “where” defines a lot about where DNS is going to point at, and who will be responsible for restarting the process. Is it going to be something like Heroku, AWS Lambda or a VM that the process will run?
                      • If it is going to run on a VM, how is it going to be restarted? systemd, supervisord, someone doing it by hand?
                      • Do you own the domain name, for which you are going to use a lot of subdomain names?
                      • Where is that domain name hosted? Is there an API to manage it?
                      • Is your process a web server, or is it simply opening a socket waiting for connections via TCP or UDP?

                      Mind you, you are not required to know all these answers, but unless you provide some context using them as a guide, any advise we offer is useless.

                      1. 1

                        So the impetus behind this idea is I wanted to move away from Google Sheets to something like Airtable, because I’m more familiar with SQL and databases and APIs than I am with spreadsheets and formulas. But I don’t like depending on closed-source platforms because any change they make to their platform is a Black Swan event to me I may need to address (or can’t address). So I wanted to build out a tiny alternative to Airtable using PostgreSQL and PostgREST, and some very basic UI layer I’ll write myself. I don’t care too much about site performance, I care a little bit about availability, I mostly care about data freedom and integrity.

                        I found that ElephantSQL (PostgreSQL as a Service) can create databases for free using a multi-tenant (?) model, and EC2 / DigitalOcean / etc. is fairly cheap for a tiny instance. I’ve worked with containers and there’s a lot of cognitive load just to get it working in production (especially if I move on in my career and forget about it). So I’d rather create a VM and fill it to the brim with processes.

                        Problem is, PostgREST has a static configuration for a PostgreSQL URI, which means it can’t support multiple databases without restarting. I could model my “spreadsheet” around this and create a spreadsheet as a table. However, ElephantSQL restricts 20MB of data for the free tier on a per database level, and database costs are scary for personal projects. It also seems like a waste of resources to have an entire EC2 instance for one process, which could negatively impact my costs over time.

                        • I’m thinking about running these processes either on Heroku free tier (which I think probably uses EC2 spot instances underneath), or using an EC2 group + load balancer. If it’s the EC2 group, then I’d probably want to fill up the server with processes.
                        • I’m not terribly familiar with systemd or supervisord, but I think systemctl uses systemd underneath the hood, so probably that…
                        • Yes, I own the domain name.
                        • The domain is hosted on AWS Route 53, so I can use AWS APIs to manage it.
                        • The process is a web server.
                        1. 2

                          Here is how I’d approach it:

                          • Domain name on R53. API calls to make any changes needed
                          • Server on Hetzner (cheaper than the rest and considerably better than its previous bad reputation). For a 32G machine you can choose between a previously used bare metal, or a VM.
                          • I’ve run multiple instances of postgrest on a 16G VM on Heztner without any issues
                          • Use docker-compose to run everything: The web service, the postgrests and postgres. I’d use a host directory mount as the postgres data partition.
                          • database backups on S3.
                          • Use Mailgun or some other sending provider to send email if needed (they charge too).

                          YMMV, I cannot judge how much you’re willing to spend. You could do most of these with a $4/month server and no docker depending the load.

                      1. 2

                        About how much it cost per month to run VPC, RDS, EBS and ECS? I think I checked once only for RDS for a hobby project and it was a bit expensive.

                        1. 2

                          That’s a good question! VPC is free (NAT gateways and Elastic IPs you may have to pay for), EBS is pretty cheap, and ECS uses EC2 instances underneath the hood, so if you pay for reserved or spot pricing it’s less of an issue. You can stick multiple containers on an EC2 instance, but if you need to connect them together you may need something like DNS service discovery for those services to talk to each other, which is a PITA. RDS is the most expensive. Everything else is like $5 / mo., RDS is $20-50 / mo.

                          However, since RDS is merely a data sink for this stack, you don’t have to have RDS. You can use something like ElephantSQL’s free plan and punch in the PostgreSQL URI into the custom db compute CloudFormation template. Then it’d be $5 / mo. total.

                          1. 2

                            Thank you very much.

                            1. 2

                              Hmm, correction…I’m using t3.medium instance which is $30 / mo., but you can switch that to t3.nano which would be $3.70 / mo., but you’d have to adjust the “CPU / memory units” that ECS uses to allocate for each container on the VM.

                              The principle should be the same though, you should be able to downscale and not suffer any other behaviour differences except slowness / capacity issues.

                        1. 2

                          If the problem with postgres RDS is lack of extensions or similar, do https://www.elephantsql.com/ or https://aiven.io/ offer more flexibility so that one could avoid the latency of having a postgres to postgres proxy?

                          1. 2

                            Do they offer extensions / root access? I thought they were managed solutions.

                            1. 3

                              Aiven supports 47 postgresql extensions, including pg_cron, postgis, postgres_fdw and timescaledb.

                              I searched Elephant SQL’s docs for “Extension” but got no meaningful hits. Looks like they do not support extensions.

                              1. 1

                                That’s…surprisingly great. Do you know whether the “AWS Marketplace” offerings are purely EC2 costs, or do vendors also charge a flat fee for the AMI? I have Startup School credits through AWS (extremely generous) and that’s why all my infra is AWS; would love to try out Aiven for AWS if the former is true. :P

                          1. 2

                            I tried to understand what exactly this is trying to do, but I don’t understand why you’d have two postgres instances. If I understand correctly, the problem with RDS is that you can’t install custom extensions (but they do supply FDW out of the box). But if you then install a second Postgres (in a regular VM?) and connect it to RDS, isn’t the RDS instance redundant and just sitting there costing you extra money, then?

                            I mean, if you’re installing your own custom install, why not just use that and be done with it?

                            1. 2

                              Only reason I could think of is that the data storage is still on them. Yes you’re proxy going down is more likely, but proxies are normally easy to operate.

                              1. 1

                                One thing I didn’t realize when I first interacted with numerous databases is how SQL is different between each database. I don’t like having to relearn having to fetch and shape data over and over again if I don’t have to, so I proxy data sources through Postgres in order to use Postgres’s flavor of SQL and stored procedures PL/pgSQL.

                                Doesn’t have to be Postgres as the data source, you can use MySQL, Oracle, Google Sheets, GPU VRAM, or your smarthome lightbulbs, or a combination of these: https://wiki.postgresql.org/wiki/Foreign_data_wrappers

                                You can also write your own FDWs using Multicorn, a Python library: https://multicorn.org/

                                1. 3

                                  Just using Postgres as a “front end” to a different database strikes me as very inelegant and too much needless overhead. Also, from my experience, FDWs don’t handle data migrations in the source database very well. It’s very easy for the source table and the FDW definition to get out of sync.

                                  Testing FDW-based tables is a right pain in the neck, too, due to how they interact with transactions (unless you do everything through the FDW, and the databases you connect to are defined statically). I wrote a bit about my experiences with FDW on the company blog a while ago, but I never really bothered to update it. We used it mostly to do queries over multiple instances of the same system.

                                  So while I think FDW are very cool and powerful, I’m not sure I’d be very quick to use them myself.

                                  1. 2

                                    Very interesting! I was planning on using FDWs as a pass-through entity, where dataflow goes from table -> view -> HTTP. In this situation, I wouldn’t be pushing any data on the source table, and therefore wouldn’t have any issues with data migrations (because there would not be any). For synchronizing foreign table references, I was planning on a stored procedure + batch job to pull in foreign tables every so often and blow existing references away, though that may be tricky if there’s any dependencies on those foreign tables. Any transactions you’d handle on the source table, in a “push/pull” configuration (OLTP work on source, OLAP work on destination). I think for BI / analytics stuff, this should be fine.

                                    I just felt restricted by what you can do with an ORM. I figured since an ORM supports so many databases, it has to cater to the least common denominator of database features, and I wanted more from Postgres.

                                    1. 3

                                      It’s good to experiment, so definitely do so! Maybe for your use case the issues I ran into are not even relevant.

                                      I’m not a big fan of ORMs either (they can produce inefficient queries, are often a barrier to getting exactly out of the db what you want because not every query result maps cleanly to “objects”, they can easily accidentally generate many more queries than you would want to perform, and they are a total memory hog), but they don’t have to be the least common denominator. For the query language, usually, yes (and for that, dropping down to the SQL level tends to be painful), but for example Django allows you to use Postgres extensions like arrays, JSON fields, partial indexes, and even Postgis fields and spatial operators. Only thing I kind of miss are proper enum fields.

                              1. 2

                                I was amused by the automatic conversion of inline graphviz code to a graph with JavaScript (try disabling JS (or enabling it if you have it disabled)). Pretty nifty!

                                1. 1

                                  Thanks! All credit goes to Munif Tanjim (creator of Minimo), Adam Charytoniuk (creator of Engimo), and @spf13 / maintainers of Hugo for their hard work and dedication!

                                1. 3

                                  This was a great read! I’ve been looking at postgraphile but wanted something that was just regular rest. Now I know there is postgrest!

                                  But I have been thinking more and more about how critical knowing databases is for being able to build new things, especially when you don’t have the money or the people to build abstractions around databases. But, like mentioned in the article, it does require time, and that is precious.

                                  1. 5

                                    There are a few tools like PostgREST out there. Supabase, Hasura, XGeneCloud, Prisma. You can also generate a REST API from OpenAPI specs with OpenAPI Generator. I’ve been working on a similar tool that can generate a REST API and admin UI from Postgres or MySQL.

                                    1. 2

                                      Do any of these tools intercept the query and rewrite it, possibly joining across multiple SQL backends? Of those that do, do they do query push down?

                                      1. 2

                                        Stardog’s virtual graphs can do that.

                                        1. 2

                                          Not that I’m aware of.

                                          1. 2

                                            I think Hasura can stitch GraphQL queries. Or I guess they call it remote joins: https://hasura.io/blog/remote-joins-a-graphql-api-to-join-database-and-other-data-sources/

                                          2. 1

                                            FWIW I’m writing out the Part 2 of this blog post where I ship actual code on AWS (hopefully for publishing tomorrow, repo here: https://github.com/yingw787/postgres_as_app), and it’s shockingly easy to get started with PostgREST. It’s tar one binary, copy in a conf file, and go. The robustness of Haskell with the simplicity of Go. All runtime conf options, like pagination, are handled via HTTP headers. It’s definitely a tool I want to incorporate into my future apps, because it’s stupid simple and transparent.

                                        1. 18

                                          Finding the will to continue in quarantine.

                                          1. 3

                                            One thing I do every morning is take a picture of myself in the mirror, with my made bed in the background, and sending it to Boss as a Service, with the subject line “I live in the best of all possible worlds. Everything I do today, I choose to do”. I think it really helps me out to start the day well and to explicitly place myself in a well of positive thinking no matter the actual circumstances.

                                            1. 2

                                              Never heard of BaaS.

                                              Sounds cool.

                                            2. 2

                                              Start each day remembering the good stuff you experienced, the good stuff you have, and the good stuff you might still do or experience. On the last one, include some things you can do or experience in quarantine. Then try to take action on those each day. Then, they eventually join the list of good things you’ve done or experienced to motivate you towards the next set.

                                              This is part of my morning routine. How well it goes varies day by day. It helps most days, though. Hope it helps you. :)

                                            1. 3

                                              I would also recommend people that aren’t used to remote, to install a browser extension or else to restrict the time on some websites. Being alone at home (the place where you usually give yourself a good rest) is a more tempting place and time to spend more time than usual on websites such as youtube or lobsters.

                                              1. 1

                                                I’ve been searching for such things for Linux for quite a while. Is there anything you recommend?

                                                1. 2

                                                  I use Ubuntu on my laptop, and I have a multi-layered approach: /etc/hosts to block out a number of websites, and a RescueTime subscription to redirect sites if I happen to unblock something locally (because I have sudo access).

                                              1. 5

                                                This looks pretty great! I’m not in the embedded Rust space, and therefore I’m probably not going to buy a ticket, but more conferences should look at providing quality remote-friendly and remote-first services, and I’m glad you guys are making the commitment!

                                                1. 4

                                                  This is a trial run, but I know at least of one other Rust conference that currently considers at least a remote component.

                                                1. 2

                                                  Continuing to do some work on TinyDevCRM (or “tinydev”): https://bytes.yingw787.com/posts/2020/02/08/tinydevcrm_1/ (Can’t publish a new post due to new lobste.rs rules, which is all good with me!)

                                                  Hopefully I can get to MVP stage by end of next week, how likely that is I’m not sure.

                                                  1. 1

                                                    I use links2 as a CLI-based browser. It’s surprisingly great and I learned a good deal about what’s important in an HTML document and how I want to write front-ends. I’ve also looked at Midori, but if I’m going with a GUI I still like Firefox.

                                                    Chrome I use for WebRTC-related tasks like video chats with friends and professional contacts. I’m definitely not a FOSS purist, but I try to rely on it whenever I can.

                                                    1. 3

                                                      I come to Lobste.rs for the genuine tech conversations that I find harder get on other sites like Hacker News. Beyond Hacker News and Lobste.rs, other sites are kind of just “here’s how to create a basic REST API” or some fluffy crap like that and I don’t bother going there. I would be really sad if I didn’t have you guys to talk to because we didn’t take care of our community.

                                                      I really like how the two people I’ve invited to Lobste.rs prefaced their ask for a Lobste.rs invite with some discussions around some of the blog posts I submitted here to confirm their interest and their ability to knowledgeably add to the discussion.

                                                      I think it’s so great that Lobste.rs is valuable now (as shown / described). It’s important we understand we have lever in extending invites, and we should learn to say no when we’re uncomfortable or unsure. I personally find it hard (which is why I’m mentioning it), but I find the alternative of losing this community scarier.

                                                      1. 3

                                                        It is always super weird to see someone quote me. Really cool! But also super weird. I’m not fully adjusted to that yet ;)

                                                        If you find Haskell’s type system interesting, I’d also recommend checking out an ML! The module system in ML is really cool, powerful, and unique. I’m shocked more languages haven’t adopted similar module systems.

                                                        Also, a couple of quick pedantic nitpicks:

                                                        I was reading through some of Hillel Wayne’s blog posts, and one of them discussed the Curry-Howard correspondence, which somehow proves a 1:1 association between aspects of a mathematical proof and aspects of a type system. You can convert a proof into a type, if your type system supports it. I think Haskell’s type system respects this. I don’t think Python’s type system does.

                                                        I don’t think I explained the CHC very well in that. All sound static type systems are “proofs”, where what you are proving is “this is well-typed”. This is true for both Haskell and Python (mypy). But different languages can encode different consequences of being well-typed, like if you can express the type of sorted lists, you can say “I’ve proven this is well-typed ⇒ this sort function correctly sorts the list.” Haskell’s type system is more powerful than most other commercial languages, but it’s not powerful enough to encode arbitrary theorems. For that you need dependent typing a la Idris or Coq.

                                                        Also proving type-safety for arbitrary theorems is really, really hard. Tradeoffs!

                                                        Python has hypothesis, but having used it a tiny bit for an open-source contribution, I don’t think it’s the same without the Haskell type system.

                                                        Gotta defend hypothesis a bit here! It sounds like you were using quickcheck in the context of learning from a book, while you were using hypothesis for a real-world project. In that context quickcheck is gonna seem more elegant. But Hypothesis is world-class and a lot of more modern PBT libraries follow its example. It’s a lot better at generating inputs like “two dictionaries where the values of the first dict are a superset of the keys in the second.”

                                                        1. 2

                                                          OCaml seems really cool, and I’ve heard it used in industry; is it the most popular ML for practitioners? I also know of ReasonML used by Facebook to create Facebook Messenger. But I think there’s still so much to learn about Haskell, and between going broad and learning about type systems, I’d want to go deep into Haskell and become intermediate, or even advanced if I’m fortunate enough to have the time and resources to do so!

                                                          Hmm, I didn’t know that Idris or Coq would be powerful enough to encode arbitrary theorems. I think if I were to study more into type theory (an interesting subject), I’d try to learn those!

                                                          Yeah, I think Corbin mentioned that Hedgehog was closer to hypothesis than QuickCheck was; I think I was too quick in dismissing hypothesis. I do like the notion of using generators over types, and the integrated shrinking (when I wrote this I assumed that QuickCheck had integrated shrinking; I think that’s actually a Hedgehog or maybe hypothesis innovation). I definitely like your blog post point about creating integration tests from a combination of hypothesis and pycontracts, I’ll give that a shot for my next project :D

                                                          1. 2

                                                            QuickCheck does shrinking. The Arbitrary typeclass has a shrink :: a -> [a] method which should return a list of smaller versions of its argument.

                                                            If you don’t want to create an instance of Arbitrary, you can use the function forAllShrink. It’s type is a more polymorphic version of (RandomSeed -> a) -> (a -> [a]) -> (a -> Bool) -> Bool: the first argument is a generating function, the second is a shrinking function and the third is the property to check.

                                                        1. 19

                                                          As somebody who first learned Python, and then Haskell, and then left both, this review reminds me of many of my own experiences.

                                                          I was reading through some of Hillel Wayne’s blog posts, and one of them discussed the Curry-Howard correspondence, which somehow proves a 1:1 association between aspects of a mathematical proof and aspects of a type system. You can convert a proof into a type, if your type system supports it. I think Haskell’s type system respects this. I don’t think Python’s type system does.

                                                          Yes, but undefined happens to be a value of every type, in Haskell; this corresponds to being able to prove any statement, including the false ones. So while it is indeed possible to turn proofs into programs, it’s not possible to turn Python or Haskell programs into useful valid proofs.

                                                          (Somebody may pedantically note that every Python or Haskell program gives a proof, just not of the fact that we might hope that they prove; they usually have Python-shaped or Haskell-shaped holes in them.)

                                                          I don’t think you can do anything like [fmap] with Python or any other practitioner’s language that I’m familiar with.

                                                          I think that this is part of the roots of the Haskeller memetic complex. Because Haskell has the lightweight syntax of an ML, and its prelude is rich in algebraic tools, suddenly there is a blindness as to the fact that the tools have been there all along. For example, in Python 3:

                                                          >>> plusOne = lambda x: x + 1
                                                          >>> lmap = lambda f: lambda xs: [f(x) for x in xs]
                                                          >>> nada = object()
                                                          >>> mmap = lambda f: lambda x: nada if x is nada else f(x)
                                                          >>> dot = lambda f: lambda g: lambda x: f(g(x))
                                                          >>> zf = dot(lmap)(mmap)(plusOne)
                                                          >>> zf([1, 2, 3])
                                                          [2, 3, 4]
                                                          >>> zf([1, nada, 3])
                                                          [2, <object object at 0x7ff65c5db0d0>, 4]
                                                          

                                                          Even the base error type, bottom or |, is defined as an evaluation that never completes successfully, either because a computation failed, or because there’s an infinite loop. … I don’t think Python’s BaseException has a built-in notion of infinite loops evaluating to an error condition. This appears super powerful to me, because the big error type introduced by going over network connections are network timeouts, which appear to most programs as infinitely long computations.

                                                          There are many opinions on this, but I am of the opinion that the sort of incompleteness generated by an unresponsive network peer, known as FLP impossibility or perhaps FLP “incompleteness” of network requests, is not the same as the incompleteness that Turing, Gödel, Tarski, Lawvere, and others showed via diagonalization. They are connected, in that the FLP result implies that a Turing machine which simulates many computers on a network may encounter a Turing-complete subproblem which prevents the simulated network peers from responding.

                                                          In Alice-and-Bob framing, suppose Alice is waiting for Bob to reply over the network. The FLP impossibility of Alice having to wait forever for Bob’s reply is connected to the Turing undecidability of whether a simulation of Alice and Bob will hang forever while simulating Bob’s preparation of their reply.

                                                          Haskell has Test.QuickCheck, a property-based testing framework that generates values to check a declared property. Python has hypothesis, but having used it a tiny bit for an open-source contribution, I don’t think it’s the same without the Haskell type system.

                                                          I think that this is another memetic root; specifically, the root is type-driven dispatch. To compare apples to apples, a better analogue of QuickCheck in Python might be PayCheck, while Hedgehog is closer to Hypothesis in Haskell. As soon as the one-dimensional comparison becomes two-dimensional, the flexibility of Hypothesis, and crucially the ability to select arbitrary strategies for generating test cases, far outweighs the convenience of simple type-checking. This makes sense, in a certain way; if Haskell’s type system were sound in the first place, then QuickCheck would be little more than a totality checker. Hypothesis’ own author recommends Hedgehog over QuickCheck, and in my language, Monte, I implemented both a classic unit-test runner and also a Hypothesis-style property test runner.

                                                          Ultimately, I don’t think [indentation] is a big issue. Haskell has code auto-formatters like haskell-formatter, much like Python’s black. Use it and move on. Bikeshed about something else, you know?

                                                          Since you didn’t mention it, and none of your code samples seem to use it, and to give an example of something to bikeshed over, it’s worth pointing out that Haskell supports curly-brace-oriented syntax as well as indentation. This is a recurring topic in programming language design: We want code to be readable, and that leads to caring about whitespace.

                                                          1. 9

                                                            This makes sense, in a certain way; if Haskell’s type system were sound in the first place, then QuickCheck would be little more than a totality checker.

                                                            I don’t think that follows unless you’re using incredibly specific types. There are many total functions of type Ord a => [a] -> [a], but not all of them would pass the tests of a sorting function.

                                                            1. 3

                                                              This is a fair observation. QuickCheck test selection indeed is driven not just by types, but by typeclasses and associated behaviors, and I was remiss to dismiss it so glibly.

                                                              The main desire is that one might express algebraic laws, and ultimately all of these testing tools are only approximations to actual equational reasoning. That doesn’t preclude a type system from giving useful annotations.

                                                            2. 4

                                                              Wow, amazing comment! I understand about half of this. I wish I had more than one upvote to give you.

                                                              I am really curious as to whether Python’s NoneType would map to Haskell’s bottom in that regards. Would that be the Python-shaped hole you’re talking about?

                                                              I did see a tutorial by David Beazley on implementing a lambda calculus in Python. I didn’t think it was all that practical tbh, and I think he said the same in the video: https://www.youtube.com/watch?v=pkCLMl0e_0k

                                                              Could you give a brief summary of how FLP impossibility and Turing incompleteness are different? I know of “Turing completeness” and that it’s a thing, but my knowledge ends there. I haven’t heard of FLP impossibliity before. Bookmarked that website.

                                                              Huh, the author of hypothesis says QuickCheck doesn’t have integrated shrinking. From his statement on generators rather than type classes, I understand your point on how Hedgehog and Hypothesis are more similar.

                                                              Hmm, there weren’t a whole lot of examples of curly-brace oriented syntax. Most examples used indentation. The Haskell wiki does mention it though: https://en.wikibooks.org/wiki/Haskell/Indentation#Explicit_characters_in_place_of_indentation

                                                              (I do care about whitespace and clean code; I read through “Clean Code” my second year out of college, but in my experience as a practitioner so far I’d rather use a tool and automated process to have a consistent way of doing things with multiple people, in order to spend political capital elsewhere.)

                                                              1. 10

                                                                (Phew this got long, time to turn it into a blog post ;) )

                                                                I am really curious as to whether Python’s NoneType would map to Haskell’s bottom in that regards.

                                                                It’s important to distinguish between the empty type and the unit type. In (type) theory there are no values of the empty type, and there is one value of the unit type. In a language without _|_, like Agda, we might define them something like this:

                                                                data Empty : Type
                                                                
                                                                data Unit : Type where
                                                                    unit : Unit
                                                                

                                                                This says that Empty is a Type, that Unit is a Type, that unit is a Unit. Since we’ve given no contructors (or “introduction forms”) for Empty, there’s no way to make one. Likewise we can write function types like Int -> Empty which have no values (since there’s no way for any function to return a value of type Empty), we can write tuple types like Pair Int Empty that have no values, etc. We can also write function types like Empty -> Int which do have values (e.g. const 42), but which can never be called (since there are no Empty values to give as an argument). Incidentally, the fact that we can have types with no values is one of the reasons we can’t have a value with type forall a. a (in Haskell) or (a : Type) -> a (in Agda): they claim to return a value of any type, but we might ask for an Empty.

                                                                The unit type is trivial, literally. Since there’s only one value of this type (unit), whenever we know that a value will have type Unit we can infer that the value must be unit, and hence (a) if it’s a return value, we can always optimise the function to simply unit and (b) if it’s an argument, it has no effect on the return value (there’s nothing to branch on). In this sense, the unit type contains 0 bits of information (compared to Bool which contains 1 bit).

                                                                As an aside, Either is called a “sum type” because it contains all the values from its first argument plus those of its second: Either Unit Unit is equivalent to Bool (with values Left unit and Right unit). Pairs are called “product types” since they contain every combination of their first argument and second arguments, which is the number of values in each multiplied together. In this sense Empty acts like zero: the type Either Zero a doesn’t contain any Left values, so it’s equivalent to a, just like 0 + a = a for numbers. Likewise Pair Empty a contains no values, since there’s nothing to put in the first position, just like 0 * a = 0 for numbers. Either Unit a acts like 1 + a, since we have all of the values of a (wrapped in Right) plus the extra value Left unit; note that this is also the same as Maybe a, where Right acts like Just and Left unit acts like Nothing. Pair Unit a is the same as a since each a value can only be paired with one thing, and we know that thing is always unit; this is just like 1 * a = a with numbers.

                                                                Back to your question: things get confusing when we try to apply this to “real” languages, especially since a lot of the terminology is overloaded. In Haskell the type Unit is written () and the value unit is also written ().

                                                                In Haskell the type Empty is called Void, but that is not the same as void in languages like Java! In particular, remember that we cannot have a function which returns Empty, so void being Empty would mean we could not implement methods like public static void putStrLn(String s). Such methods certainly do exist, since they’re all over the place in Java, but what happens if we call them? They will run, and return a value back to us (hence it can’t be Empty). What is the value we get back? We know, without even running the method, that we’ll get back null. That’s just like the Unit type (where we know the value will be unit without having to run the code). Hence Java’s void acts like Unit, and null acts like unit.

                                                                If we follow a similar line of reasoning in Python, we find that None acts like unit (or Java’s null) and hence NoneType is like Unit (or Java’s void). AFAIK Python doesn’t have anything which acts like Empty (Haskell’s Void) since, lacking any values, Empty is only useful for type-level calculations (of the sort which get erased during compilation), which Python doesn’t tend to do.

                                                                That just leaves “bottom”. There are a couple of ways to think about it: we can be “fast and loose” where we ignore bottom as a nuisance, which mostly works since bottom isn’t a “proper” value: in particular, we can’t branch on it; hence any time we do a calculation involving bottoms which results in a “proper” value, we know that the bottoms were irrelevant to the result, and hence can be ignored. Any time a bottom is relevant, we have to abandon our nice, logical purity in any case, since catching them requires IO, so why bother complicating our pure logic by trying to include them?

                                                                Alternatively we can treat bottom as an extra value of every type, in a similar way to null inhabiting every type in Java. From this perspective Haskell’s Void type does contain a value (bottom), and the () type contains two values (() and bottom). In this sense we might think of Java’s void corresponding to Haskell’s Void, but I find this line of thinking difficult to justify. In particular, we can branch on null in Java (in fact we should be doing such “null checks” all the time!), whereas (pure) Haskell doesn’t even let us tell whether we have () or bottom.

                                                                As a final thought, whenever comparing dynamic and static languages, it’s important to not conflate different concepts which just-so-happen to have similar names. In particular I find it useful to think of dynamically typed languages as being “uni-typed”: that is, every value has the same type, which we might write in Haskell as a sum like data DynamicValue = S String | I Int | L List | O Object | E Exception | .... Python is actually even simpler than this, since “everything is an object” (not quite as much as in Smalltalk, but certainly more than e.g. Java), so Python values are more like a pair of a class (which itself is a Python value) and a collection of instance properties.

                                                                This is important because “dynamic types”, like in Python, aren’t directly comparable to “static types” like those of Haskell; they’re more comparable to “tags” (e.g. constructors). In particular, think about a Python function branching based on its argument’s “dynamic type”; this is the same as a Haskell function branching on its argument’s constructor. What does this tell us about “bottom” in Python? From this perspective, there isn’t one (at least not reified; an infinite loop might be comparable, but it’s not something we can e.g. assign to a variable). Python’s None is just a normal value like any other, in the same way that Haskell’s Nothing is a normal value (we might sometimes use it to stand for an error, but that’s not inherent to the system); likewise Python’s exceptions are also normal values (we can assign them to variables, return them from functions, etc.); the idea of “throwing and catching” (or raise/except for Python) is actually a perfectly normal method of control flow (it’s actually a limited form of delimited continuation), and this is orthogonal to error representation and handling.

                                                                This makes raising an exception in Python very different to triggering a bottom in Haskell, since Haskell provides no way to branch on a bottom (or whether we even have one). In Python we can raise a value (with the exception “tag”) as an alternative to using return, and we can catch those values, inspect their tag and value, etc. to determine what our overall return value will be, with none of this being visible by the caller. To do anything like that in Haskell we need some “proper” data to inspect and branch on, hence why I say None, exceptions, etc. are just normal values (even though they’re used to represent errors, and we treat them specially because of that; but the same could be said about Either String values in Haskell, for example). Consider that in Haskell the only way to even check if a bottom exists is to use IO, but the idea behind IO is that it’s like State RealWorld, i.e. every IO a value is a pair of (RealWorld, a), so conceptually we never “actually” see a bottom; it’s more like triggering a bottom changes the RealWorld, and we’re branching on that.

                                                                1. 4

                                                                  Thank you for the clarification on None vs bottom! I think it definitely helps me understand how much more different Python is from Haskell.

                                                                  I like the idea of saying data DynamicValue = S String | I Int | L List | O Object | E Exception for dynamic languages, and I think from a type theory perspective I finally understand what people say when they say “In {Ruby, Python}, everything is an object”.

                                                                  I hate try/catch logic in Python. It works, but I much rather prefer the functorial structure espoused by Haskell.

                                                                  I agree on the description on IO! The book does mention IO is technically without effects, since it is merely a description of what IO actions should be taking place instead of imperatively executing IO.

                                                                  1. 4

                                                                    I like the idea of saying data DynamicValue = S String | I Int | L List | O Object | E Exception for dynamic languages

                                                                    That’s a good intuitive starting point, but it’s not the whole story. The “unitype framework” is a way of understanding dynamic types from a static typing perspective, but we can also start from a “dynamic typing perspective” and see where it gets us. A dynamically-typed language is one where you can manipulate type definitions at runtime, and test for type membership at runtime. So saying DynamicValue = S String | I Int ... will always be incomplete, because we might not know something’s type even after creating it!

                                                                    An intuition I like here is that dynamic types are boolean predicates: x is of type T iff T(x). This means we can define the type of, say, all functions with format as an optional keyword, or the type of all iterables that aren’t strings, or the type of all types that appear as return values in functions.

                                                                    (The downside of this is it gets hellishly confusing very quickly, and allows for a lot of spooky action at a distance. Also, there’s no guarantee that checking type membership is total. That’s one reason even highly dynamic languages shy away from going deep in dynamic types.)

                                                                    I think from a type theory perspective I finally understand what people say when they say “In {Ruby, Python}, everything is an object”.

                                                                    Everything can also be an object in a static system, too. My rule of thumb for what “everything is an object” means: are object methods also objects? Is the call stack an object? If you are using message passing, are the messages objects? Can you send messages to other messages?

                                                                    1. 1

                                                                      RE: “Everything is an object”, I usually put it in quotes since the meaning is usually context-dependent. A colleague once said that to me regarding Java, and I replied that Java classes aren’t objects yet Python classes are; and Python’s if branches aren’t objects yet Smalltalk’s are (at least, the closures are); and so on.

                                                                      RE: Unitypes, etc. yes the static/dynamic distinction can be built up from the dynamic side too (I tend not to think that way since fixed points and things trip me up). The important thing is that people aren’t talking past each other, e.g. it’s fine to argue about whether “foo can/can’t do bar”, but not so useful when each person is using “foo” and “bar” to mean different things ;)

                                                                  2. 3

                                                                    The overloaded word “void” across languages is confusing. In Rust and Typescript (and others?) they called it “never”, which I think makes it really clear: values of this type never actually exist. And “this function returns never” is like a pun: “this function never returns” and “this function’s return type is ‘never’”.

                                                                    1. 3

                                                                      This was wonderfully clear. Thank you.