1. 36

  2. 11

    You can get 75% of Prolog in SQL, along with some statistical functions that you can’t easily do in Prolog. That’s pretty good.

    1. 3

      75% of database Prolog with only limited support for unification. Prolog especially shines when you need unification, backtracking, meta-programming, etc.

    2. 9

      This alternative title turned up in my brain and won’t let go, so I’m writing it here to get it out of my head:

      You don’t need ML/AI. You need data queries and business logic.

      I’m not trying to argue with the original title! SQL is definitely the right language for querying data and gathering cases to feed to the business logic. Also, it is a well-written article that makes its case for the use of queries and business logic very nicely. Thanks for sharing it!

      1. 9

        You don’t need ML/AI. You need data queries and business logic.

        I think that hits on an important point that the article misses (or sidesteps): business logic. SQL is only helpful if you can describe what you’re looking for, and doing that requires knowledge and skill.

        When you lack the ability to describe what you’re looking for, then SQL seems useless, and ML seems magically useful. Of course, you’ll still end up doing that work if you opt for an ML approach, it will just be less obvious.

        (digression: this parallels the problem of not being able to describe what you’re storing, which leads to poor usage of NoSQL databases)

        Now, if you have a broad notion of what you’re looking for, and a willingness to experiment, then ML (well, analysis of any kind) can help. What about the contents of a basket can be a predictor for returning after a reminder email? What is the right size of a coupon to entice people back while minimizing cost? What is the right period of time to contact someone after an absence? More customers/transactions/views/baskets means more leeway to experiment, which means more data, which means more use cases for an ML-style approach.

        But, yes, absolutely, simple querying will be a better ROI until you get hyuuge.

        1. 1

          When you lack the ability to describe what you’re looking for, then SQL seems useless, and ML seems magically useful.

          From the point of view of SQL, I would state this as “when you lack the ability to describe your relationships.” Whether you know enough in advance to add table constraints or write queries to discover them, SQL helps (forces) you to make your data categorically consistent.

      2. 4

        Machine Learning can be very useful. The same goes for Deep Learning and other more recent approaches (which have not invalidated the value of the ‘traditional’ machine learning and statistics traditions, merely they extended the range of methods to choose from when optimizing for trade-offs). Of course you can do a lot with SQL and it might be efficient. Really, the computational power isn’t what needs to be discussed here.

        The most important aspect missing in the discussion however seems to be buiness value and value propositions of technology. The usefulness of SQL for the business is pretty obvious already for booking, storing and provisioning data. When it comes to data analytics and predictive models, this is where things get interesting. Because quite often, AI/ML/Deep Learning is promoted without actually talking about how it should be incorporated into business processes and how they can support them.

        Once you start with a business need first, and think about solutions, you’ll pretty soon have a competing simpler heuristic that you can treat as a competitor to a machine learning solution, and if you consider aspects such as robustness, reliability, maintainability, you will often even consider to use these methods over an existing machine-learning solution.

        1. 3

          I think the value of ML comes in very particular scenarios.

          1. Tasks where business logic would be too cumbersome to implement. For example, language detection of a document and for that matter most NLP and Image recognition tasks.

          2. Tasks where it’s easy to collect data in hindsight. For example, recommendation systems, detecting nudity in videos, and predicting user demographics.

          3. Tasks that are simply statistical tasks. A/B testing and election forecasting involve doing some amount of calculation and there is no equivalent in terms of SQL and business logic.

          4. Tasks which have an optimization component. This could deciding what prices to set for which users, or predicting server utilization to save power.

          All these settings can and should be combined with business logic but it’s unlikely to be enough.

          1. 3

            I’m not apart of the data team for our ecommerce stuff, but we take a very similar approach with a few differences:

            Our newsletters and engagement are actually offloaded onto Hubspot. Hubspot uses a lot of these techniques to tell us about top customers, abandoned cart emails, custom emails and newsletters, and other stuff I’m forgetting because I’m not on the marketing team.

            The second big difference is our use of BI software. Specifically Tableau. It will essentially make those queries to find out information about our customers for us, and then display it on nice graphs and dashboards for the managers to swoon over. I haven’t gone over the marketing material for Tableau, and I’m sure they sprinkle ML/AI in that copy somewhere, but the reality is that it just connects to our databases and makes queries.

            1. 1

              There seems to be some unspoken assumption here that SQL is simpler/easier/cheaper than ML/AI. That’s not my experience: everything you can do with modern tools is possible in SQL, sure, but the modern tools make it much easier and eliminate a lot of the pitfalls.

              1. 5

                I don’t follow: what kind of modern ML/AI tools are you thinking of (as simple/easy/cheap as SQL)?

                1. 1

                  Mainly Spark. Hardware performance/cost is probably worse unless you’re on a really big dataset, but for me it more than made up for it in programmer cost: I found it so much easier to answer questions based on our data when I could use a shell in a normal programming language with access to our data as normal values and just call e.g. .aggregateByKey and pass code to do what I wanted.

                  1. 5

                    Calling aggregateByKey isn’t using ML, it’s using straightforward querying of a dataset. Spark isn’t an ML solution so much as it’s a non-relational data store that’s queried differently than SQL-based data stores.

                    You can build ML solutions on top of Spark, much the same as you can on top of SQL.

                    The big difference between the two is instead the declarative vs. imperative nature of querying. With SQL, you describe what you want, whereas with Spark (and a number of other big data and nosql stores), you describe how to get it. The latter is more familiar to many imperative/OO programmers, but the former is generally more approachable to non-programmers, and tends to deal with changes to data more smoothly.

                    In fact, the declarative approach is useful enough that Spark SQL exists and is widely used.

                    1. 1

                      So maybe: “you don’t necessarily need ML/AI: you should consider just using SQL if you already know it.“?

                      1. 1

                        I still don’t follow: as far as I know, Spark is not a ML/AI tool.