1. 29

  2. 27

    I think I’m constitutionally incapable of being fair to tools like Mongo and Javascript, but I’ve seen abominations in “real databases” that make me wonder if the problem is less the pissweak tools and more the total lack of interest in thinking. I mean, you can swap Mongo out for something SQL based and still not understand your data or the way in which the entities you care about are related.

    1. 13

      The selection of pissweak tools may be a priori indicative of a lack of thinking.

      What I find more puzzling is that given the prevalence of “mongo is shit” sentiment on various sites, the twitters, etc., why would anybody willing brag “made with love with mongo” about their site? It’s like a split reality where I see nothing but scorn for mongodb, but people still pick it. And yes, it can be trendy to hate the thing that’s popular, but a lot of the scorn comes from actual practitioners. Is it really true that people select mongodb unaware that many others have such a low opinion of it?

      1. 21

        I think it’s different social circles entirely. The people who use node.js and mongodb aren’t the same people talking about PLT or CAP on nerd twitter; they’re the ones who came from PHP and wordpress on VPS, in my experience. And boy howdy are they numerous.

        1. 4

          I work at Joyent, and most of the software we’ve written is built on Node: Triton, our cloud orchestration stack; and Manta, our hybrid object storage and compute system. We use PostgreSQL, rather than Mongo, though. And we’re pretty serious about CAP and other models for understanding distributed systems problems – even on nerd twitter!

          1. 2

            Did you guys lose a bet, or?

            1. 4

              No. The reason we keep using Node is because we’ve put a lot of effort into giving it first class live system and post mortem debugging features. We have a lot of DTrace instrumentation we can make active use of on both development and production systems. We also have the ability to take either discretionary or on-abort core files and extract the full Javascript (and C/native) heap, digging out all the objects and the stack trace (including JS frames) of the failure.

              If we were to move to another language or platform, it would have to be at least as debuggable as Node is today – both on live systems and in post mortem.

              1. 1

                the reason we keep nailing our hands to the wall is that we’ve made a tremendous investment in hammers and state of the art surgical facilities. We can fix all manner of puncture wounds, be they shallow glancing gash or through-hole artery-severing crucifixion. No other language or platform could possibly offer as much opportunity for hand surgeon practice!

                1. 3

                  I get that it’s very trendy to bash on Javascript; I hope you receive your share of Internet points for doing so. It’s not, by any stretch of the imagination, my favourite language – for any number of reasons. Engineering is all about trade-offs, and we’ve traded off some obviously undesirable aspects of the Javascript language against the ability to instrument and debug real production systems.

                  What language and platform should I be using instead, assuming first class support for dynamic instrumentation (DTrace) and post mortem core file analysis is a constraint?

                  1. 1

                    I confess to making excessive fun. But that’s because ‘we need post mortem core file analysis’ has been a checked checkbox since the time of UUCP, kremvax and a.out; and dtrace, while truly excellent, kudos and commendations to all involved, is the heaviest of weird hammers to swing in production first.

                    Any of the JVM or BEAM languages provide hands down better language facilities, production support and debuggability, and many of them also provide better concurrency, throughput, numerics, libraries, type systems, and developer pools. And of course all of them are packed full of delicious dtrace probes as well.

                    So: why write anything non-browser in js? Much less anything important like an object storage system?

                    1. 5

                      I completely disagree that post mortem core file analysis is a solved problem for all languages and runtime environments. How do I take an operating system core file from an Erlang application and extract and inspect or report on all of the Erlang-level objects? Attaching a debugger to a live Erlang VM, or relying on some kind of dump format written out by the VM doesn’t count; I mean a dump of the memory and the process state by the operating system without input from the VM.

                      It’s possible that OpenJDK may provide some facility to do this with a combination of jmap, jhat, and VisualVM, but I’ve also never actually seen this work reliably across different JVMs and operating systems. Java environments are also notoriously heavyweight, both on the server and especially in the development tools.

                      With a C application, there are techniques for locating all (or at least many) objects in memory. The debugger I use, mdb, has ::typegraph – a facility for Post Mortem Object Type Identification. We have even better luck with Javascript, because V8 stores heap objects with sufficient self-describing structure that we can, through ::findjsobjects, readily identify type information and available properties. This is all possible long after the failed process is a distant memory and service has been restored by the supervisor. This is not, as far as I know, a feature available for every other (or even most other) languages or runtimes.

                      I also completely disagree that DTrace is a weird hammer to swing when looking at what a production system is doing. Though I may not always emit a raw D script as the first step in determining what a system is doing, I absolutely use tools built on top of DTrace very frequently. For example: our key value store, Moray, includes a tool, moraystat.d, built as a DTrace script. In addition, our logging framework (bunyan) includes a formatting CLI with a flag that allows it to begin collecting log records from one or all processes running on the current system at any log level – it uses DTrace to achieve that, as well.

                      DTrace also makes it trivial for us to produce Flame Graphs by profiling the call frame stack, including both Javascript and native C frames. We can use a tool like stackvis to produce a succinct visual representation of what a process (or the kernel, or both) is doing with its time. When a Node program is spinning on CPU, this is often one of the first pieces of data I will collect, in order to understand what’s going on. It’s also trivial to grab a break down of system calls being made, including the JS stack that induced the call.

                      As I said before, not everything is perfect, obviously. But we’ve built a robust software stack using these components, in large part because we’ve been able to look deeply into the system and understand what’s broken or even just suboptimal. I would love to have static types, and stronger types, but I don’t think I’d give up the engineering facilities that we have in order to get there.

                      1. 1

                        How do I take an operating system core file from an Erlang application and extract and inspect or report on all of the Erlang-level objects? Attaching a debugger to a live Erlang VM, or relying on some kind of dump format written out by the VM doesn’t count; I mean a dump of the memory and the process state by the operating system without input from the VM.

                        Given the existence of rich debuggers that do in fact attach to live Erlang VMs, even remotely; and a rich dump format written out by the VM in the rare event that it fails, one has to wonder why you define these particular goalposts so narrowly and make these exclusions so very specifically. Is it maybe because the way that you find out about errors in node.js, and the way that you debug them generally, is because node.js crashes, killing every unit of concurrency and all state in the system, and then requires intense human forensic analysis to bring the system back up? And that, therefore, you gear all of your resources towards making even that behavior borderline tolerable? And that by ignoring that you could have not had those errors in the first place, and further by closing the discussion against the other solutions that are radically operationally superior to core dump analysis, you can soothe yourself that you haven’t invested all of your time into obtaining stockholm syndrome at a deeply suboptimal local maxima?

                        I love dtrace myself, but just imagine if you had built all of what you’ve built on a more solid foundation. You wouldn’t need half the scaffolding and splints and bandaids and patches and flying buttresses that you’ve apparently erected to get your job done. Imagine if you could dive into dtrace to go see what was going on with a process that was “spinning on CPU”, but it was rarely a critical emergency, because your code wasn’t a mess of single threaded callbacks, and the rest of your program continued running while you looked into the problem and maybe hot-upgraded a fix.

                        Anyway – glad you’re proud of your engineering efforts; hope it works for you.

          2. 3

            I think a lot of the problems that exist are due to people wanting to get an idea implemented as quickly as possible. When time is critical, documentation is avoided over a handful of Google searches that lead to bad (and outdated) Stack Overflow answers, be it choosing what technologies to use or getting help with an error message from some code.

            It’s really easy to ship a product with just a handful of search queries, unfortunately.

          3. 5

            I think a lot of it is “well that won’t affect me”.

            From my perspective, I’m at least aware of a lot of Mongo’s shortcomings, but if I had a project where I didn’t entirely know the data schema yet, and was just spiking out a test and wanted to iterate quickly over dealing with data cleanliness, I’d still consider using mongo.

            And maybe a lot of that is why it’s so heavily used. Plus, it is super easy to just start using.

            1. 5

              Is it really true that people select mongodb unaware that many others have such a low opinion of it?

              Definitely, there’s a lot of people who are just getting into backend development (perhaps having come from the frontend, or web-design) and they only know Js, Node and MongoDB because it’s what was taught in the bootcamp/tutorial they used to get their idea off the ground.

              1. 2

                For example, FreeCodeCamp teaches Mongo, Node, and Express for their backend developer certification.

              2. 5

                This also happened early in Mongo’s life; they made it very clear they used memory-mapped files, and so DB size is quite limited on 32-bit, yet “mongo lost my data” still became a meme.

                Software is hard.

              3. 8

                Yeah, I think people underestimate or don’t want to consider the degree to which database selection / design is hard. Which is a shame because the design of your database is a fundamental decision on which the rest of the application relies. It is important to get it right.

                Getting it right means:

                • Understanding the sort of data you’re storing
                • Understanding the ways in which you want to be able to use that data
                • Considering what data structures, database type, etc. will best facilitate success under those constraints

                Taking some time at the start of a project to consider these things is worthwhile, and regular review of the constraints (do they still apply? Are there new constraints to consider?) and of your database system’s health (performance, availability, consistency) is worthwhile too.

                All in all, database selection should be hype-less. It is a technical decision for which there are actual arguments to be for or against a particular solution, and making a poor choice will hurt. Take the time to do it right.

                1. 6

                  less the pissweak tools and more the total lack of interest in thinking.

                  I think it’s rather more lack of interest in thinking about the stuff the dev doesn’t find interesting. Which is a problem, to be sure, but it’s not quite obdurate ignorance.

                  One of the arguments for tools like schema’d databases and strongly typed languages is that they force attention on some of those things which many devs would find tedious, rather than interesting. Likewise, they prevent some errors that lack of attention will generate. But, as you point out, attention isn’t necessarily enough to make good code.

                  1. 4

                    I think this is where experience matters. If you’ve been around the block a few times, you know that you have to care deeply about data storage and that no vendor can magic it away behind some marketing.

                    You only get that experience through by doing it wrong a few times.

                    1. 2

                      The thing that is often overlooked, at least in the world of software that might choose Mongo, is the need for ad-hoc querying. How else are you going to learn anything about your data, and the way it changes over time? I’ve never seen a database of any size that hasn’t needed additional interfaces, and if all your constraints are in the client, or you use some kind of godawful k/v schema, oh well.

                      1. 7

                        This is why I believe Kafka is the best thing to ever happen to databases. Consume from a topic and store data in multiple databases based on different query patterns. Then Mongo is just one materialized view over your data.

                  2. 7

                    We are in the same situation. 4 years of Mongo and it’s a pain now. We moving to an other db.

                    Performing real-time analytics on blobs of data Curious about why ?

                    1. 3

                      Same thing here about three years ago. To be fair, I don’t think mongo was to blame here, it just wasn’t a good fit for the kind of data we were handling (mostly relational). Not having someone with mongo experience on the team probably also did it’s share.

                      1. 1

                        Yes you’re right

                    2. 5

                      When MongoDB was in its early stages, 1.x, it was one of the only open source DBs which simultaneously supported fast real-time counters and cluster-based data replication. And aside from CouchDB and Solr, by my memory, it was the only storage engine which allowed for heterogenous document storage.

                      There are real-world use cases for these features. I’d know – I leveraged and exploited them at scale for a real product. And I only built on MongoDB after throwing out a prototype on Postgres that I could prove wouldn’t work.

                      I wrote about database choices based on the shape of your data here in 2012:


                      All the MongoDB hatred in this thread is surprising to me. There’s quite a lot of groupthink going on.

                      There are valid reasons to dislike MongoDB, and I guess it’s fair to dislike the company behind it for trying to convince everyone to use a non-SQL DB as their “primary” or “general purpose” DB. By contrast, a company like Elastic (supporting Elasticsearch) is being more honest by focusing on niche/concrete use cases at which ES shines. But the open source project itself, and the code, is not a “pissweak tool” just because it doesn’t support SQL, joins, or the AP tradeoff.

                      No DB supports SQL better, but Postgres has trouble parallelizing queries. No DB has the data structure richness and raw speed of Redis, but Redis lacks transactions and strong durability. No DB has the query/aggregation flexibility of Elasticsearch, but ES lacks transactions, rollbacks, and has a clumsy write path. No DB can support horizontal linear write scalability like Cassandra, but Cass lacks many common forms of filtering and aggregation. No DB can support multi-consumer streaming like Kafka, but Kafka lacks any query capabilities altogether.

                      Are all database designers simply not working hard enough? Or are these problems simply difficult and thus trade-offs are necessary?

                      1. 4

                        Storing large (16MB) documents

                        isn’t that exactly the max document size? am I missing the sarcasm in OP?

                        1. 3

                          My one experience with Mongo was at a place where they tried it “out of curiousity”. I got called in to try and do some reporting with the data and lo and behold, the need for joins and nice SQL stuff came up and Mongo proved to be very unsuitable, but at that point they were stuck with it. Would not use again.

                          1. 3

                            What’s a good resource on understanding database design choices?

                            1. 2

                              I had the exact same experience after building out a prototype with CouchDB. Fortunately piecewise conversion to Postgres took only a week, but it was still a wasted week I could probably have avoided. Shows that the experience is not limited to Mongo, though.