1. 31
  1. 35

    Startup mistakes are spending too much time thinking about data stores, and not enough time thinking about making money. Personally I believe that so many startups fail because they obsess and re-engineer their tech stack so much. Most startups (if they ever get to making money) change their initial product/offering so much by the time its out there, that the tech stack decisions they made in the early days aren’t as relevant to the problem they actually end up solving. Use a sane default then evaluate if you actually need anything else. So I whole heartedly agree with his statement just use Postgres, and you won’t regret it.

    1. 3

      I agree with you after reading lots of Hacker News and Barnacles. They should spend almost all their time marketing, listening to customers, building/testing features, and so on. That said, it’s still good to consider this stuff ahead of time to give them templates for what to go on that will reduce operational problems early on and/or in maintenance down the road. Like the OP and your Postgres recommendation. Also, like turnkey Postgres appliances for the cheap hosts they’ll probably be using.

      1. 3

        100%. I been in numerous failed startups. I’ve seen plenty more. Datastore wasn’t an issue even once…. I’ve seen plenty of startups build on amazing datastores and fail to have any sales and marketing. I’ve seen plenty fail from not having backups, from hiring poorly, from overreaching.

        Datastores can hurt when you’re scaling up. But you’re scaling up now. Congratulations.

        Agree that defaults make sense. However, equally it’s worth being aware that you’re going to have scale problems whatever you do. Every serious relational database I’ve seen is a beast. Comes with the territory.

        1. 2

          Yup, going with Postgres at the beginning makes sense, because it gives to freedom to experiment and iterate on your domain in the beginning from a solid foundation. Alas the databases that are more geared towards scaling are currently trickier to work with in the face of changing requirements, so mistakes become magnified to quite a large degree. But you’ll eventually want to move to a different model once scaling becomes an issue.

          At the moment my thinking is that you use a RDBMS under the hood, but pretending it’s CQRS/ES, which maintains an nice separation of concerns between reading and writing and the persistence layer, then you are in a better position to switch to something more scalable in the future, whilst maintaining the advantages of in-place migrations in the beginning when requirements are still up in the air.

          1. 1

            I had never heard of CQRS before, it’s quite interesting. Could you elaborate a bit on ES?

            1. 1

              Event sourcing…

              And I’m with /u/brendan on this: focus on the “why” (design) of separating concerns, not the “how” (transport). It’s a long time between MVP and having load that demands Kafka, RabbitMQ, NSQ, NATS, etc. There are designs that need those early but I squint hard at that because they add a lot of complexity and operational overhead when you’re still figuring out how to solve other problems.

        2. 3

          Startups don’t exist to make money- they exist to get bought. That’s their profit model: get bought out at a high valuation before your burn outpaces your investment.

          1. 5

            Startups don’t exist to make money- they exist to get bought.

            To be fair, their ability to get bought should depend on their ability to make money, and after these crazy bubble times are over, it will.

            That’s their profit model: get bought out at a high valuation before your burn outpaces your investment.

            That’s not a profit model though :) It’s a plan for making a return on the time, effort, and money the founders invested in the startup. It’s a gamble too!

        3. 6

          Seeing that Postgres makes working with JSON a completely seamless experience, I don’t see why you’d pick anything else. It’s best of both worlds. Make relational tables when you have relational data, store documents when you have documents. Meanwhile, Citus addresses the common complaint with scaling Postgres quite nicely.

          1. 4

            Thanks for the mention. Us at Citus are also biased towards just starting with Postgres and not complicating things early on. And then today we just shipped some tooling to make it easy to move from existing Postgres (such as RDS) directly into Citus with essentially no downtime - https://www.citusdata.com/blog/2017/11/16/citus-cloud-2-postgres-and-scale-without-compromise/

            1. 1

              Fantastic work! :)

            2. 3

              Wow Citus is cool thanks for sharing! I had not heard about it before.

            3. 6

              I’m a charter member of Team RDBMS, and I heartily endorse this blog post. Just say NO to NoSQL!

              1. 3

                It seems to be a common theme these days…. People rediscovering time and time again why properly normalised data, ACID and a well thought through data model is important.

                I walked into a discussion recently where they were bemoaning the fragility, brittleness and complexity of a large json data structure…

                …my only comment was that I felt I had fallen into a time warp and I was back in the late 1980’s when people were bemoaning the problems of hierarchical databases and why a RDBMS was needed.


                Sort of sad really.

                I’m still waiting for the pro-sql types to wake up to what CJ Date has been saying for decades and to up their game beyond null’s and auto-increment keys….. but we can’t get there because we keep having to rehash the basic stuff of normalization and ACID.

                1. 7

                  The problem is the lack of SQL databases that require less than days to set up replication in a robust manner. Schemas are not the problem. Arcane hard to administrate software is the problem. PostgreSQL replication requires a dedicated DBA. I’m keeping a close eye on CockroachDB.

                  1. 4

                    I use Amazon RDS at the day job. Unless you have enough data to justify a DBA for other reasons, RDS is inexpensive enough and solves PostgreSQL replication.

                2. 3

                  I will observe that the Elixir story for Mongo is…not great. Getting better (sadly) but not great.

                  Almost all value in business is derived from the relationships between objects. Ergo, use a database designed to explicitly capture and query those relationships.

                  1. 1

                    So, if keeping it focused on objects, then we just use AllegroCache’s object database or a clone of it combined with the language that’s a secret weapon for how fast it helps startups move? ;)


                  2. 1

                    Are you aware of a “relational” database to store entities that could be “incomplete” (e.g. you get only a part from an API call then every parts from a second API call) or with optional fields (like you can do with documents), but still having relations between entities (foreign keys and 1..1, 1..n, n..n relations), and be able to run on a distributed system? I was thinking of storing optional fields somewhere else, like in PostgreSQL’s array or in filesystem, but obviously a lot of keys and string values are repeated and it could work with that). What would be very useful would be to detect entities in child nodes of a json/bson document (potentially recursively) and create new entries for them (hence the incomplete entities).

                    1. 2

                      If I understand you correctly, you can do this in any SQL database. Just select only the fields you actually want on the first call. As far as allowing relationships to be optional, that’s what nullable foreign keys are.

                      Running on a distributed system is harder, especially if your workload is online transaction processing. But that would be challenging regardless of the data store you choose. If you really need something that high-end, you can always pay for Spanner…

                      But the odds are that you don’t need that. When people frame their requirements as “it needs to be a distributed system”, I take this to mean that they don’t know what their requirements actually are. Do they need redundant copies to prevent data loss? Do they need high availability? Those are mutually exclusive (in practice, with existing solutions, other than Spanner; I’m not trying to argue about the CAP theorem), so… maybe they haven’t thought it through.

                      1. 1

                        I think I’ll make an article later to describe exactly what I would like.

                        1. 1

                          Sure, go for it. The response is probably going to be some form of, you can meet the basic needs you’re thinking of but you might have to change the schema a little to make it happen. SQL definitely isn’t like JSON-based storage systems where you can always fit the existing data into it without rearranging it at all.