1. 33
  1. 1

    I can see these replacing some simpler materialized views I use. No need to refresh views is a big plus.

    1. 1

      Why would you want a column with derived data mixed in storage with primary, non-derived data (apart from FTS indexing)?

      1. 2

        If you have a need for pre-computed / “cached” data, especially with a workload of few writes and lots of reads, generated columns should help simplify your application / server-side code a lot.

        1. 1

          That’s a quote that doesn’t illuminate the potential use-cases enough for me.

          For example, could I use it to run another query at write-time using the data being inserted?

          1. 5

            For example, could I use it to run another query at write-time using the data being inserted?

            No, I do not think so. You have access to these columns at Select, Update, Delete time

            For more use cases, perhaps one could also look at “oracle virtualized columns”, those have been around since 11g, and there are probably more blog posts/examples of them.

            Imagine, that in PostgreSQL, you can always use select col1, col2, my_transform_function(col1,col2,col3) from some_table_1

            The value of the 3rd column is whatever comes out of your transformation function. Now, that’s pretty useful on its own and most people in PG world would use that (when for example exposing one field of a complex json stored in column 3, transformed based on the data of the same row, sitting in columns 1 and 2).

            For example, in the world of complex derivatives trade processing, trade life cycle json objects can have like 500 to 1000 fields nested to like 10 levels of hierarchies…., similar complexities exist in clinical or scientific data/experiment management….
            Clearly, if you want to filter or join on the fields inside those complex objects, you will have to at some point in time, extract those values out…

            So, if want to offer a filtering criteria, by one of those nested fields, what would I do?

            • I have an option to parse that field out during insert time, and have it stored as separate column. But then, I have to understand the details of those objects, at ingest time. And every time they change something, I have to re-injest the whole thing (which are terabytes and terabytes of data…).

            • I have an option to apply my_transform_function at select time… But, then, I cannot use database engine to efficiently filter rows out that do not satisfy a particular criteria on the transform results, and I have get my database engine to repeate this work for very select that’s being sent to my Database…. Inefficient.

            • a much better option is pre-compute the value of the virtual column on first insert, and then on any update to that row (and have the DB figure out when to re-compute… as it knows when inserts/updates happen)..

              Then ask my DB engine to create an index (some times called function index) on that column. And from there on, the selects, joions (and updates/deletes) with the filter on the precomputed column, will be more efficient. That’s the value of the generated columns

            • you could also create a form of materialized view (sort of like a projection of the table (or multple tables) – whith this data extracted.. but that has its own complexities.. As materialized views are ‘snapshots’ in time and are pretty demanding. (can also be done by hand-coding triggers, probably the least efficient option to maintain materialized views).

            The generated columns in PostgreSQL 12 (and in other db engines), have a number of limitations so they are not like the true columns… for example you cannot partition based on those columns, also probably they cannot participate in all types of indexing strategies or in materialized views (although I have to read more about it).

            but overall this is a very useful feature for Data hubs, that cannot anticipate at ingest time, all the possible query/filtering/join criteria that will be needed for serving selects (or deletes during archiving, as another example) for their data sets.

        2. 1

          Perhaps you want to store a JSON payload exactly as you received it from an external API, but you’d also like easy access to the data within… or maybe you want access to a computed result of that JSON.

          1. 1

            I think that complex indexing expressions (like FTS indexing as you mentioned) is a pretty good use-case. For example, I need to sort a lot of stuff depending on the day (but regardless of the time of the day), so I have to index on this ugly big expression “date_trunc('day'::text, timezone('utc'::text, post_time))”. It would be nicer to have a generated column for that, with an index on it.