Context: I worked on query engines of MongoDB, ClickHouse and YDB.
I am concerned that the author focuses on the query language before focusing on the “I/O layer”. Data access planning is a gargantuan effort for SQL queries. If we are giving the developers a richer query language, figuring out which indexes to use will become even more complex.
If we are asking the developers to explicitly specify which storage primitives they are using, what the interface will look like? What if it is more optimal to use one index over another depending on the size of the input parameter to the query? Do users encode that manually as well?
All in all, I am happy to see an attempt to move away from SQL, but I do believe that the nature of storage and the interface to it will have a ripple effect on all parts of the system, including query language.
How do you see storage rippling out to SQL now? It or dialects very close to it are used for all kinds of different storage backends in various different databases. I’m curious if you have examples of how SQL has been affected by those?
It is true that SQL as a language is not affected by the storage that much. I was not referring to the SQL, but to the custom query language Glowdust uses.
My point is that if it follows SQL’s path (meaning it will be high-level and storage primitives will be abstracted), then it is likely to have even more problems with data access because it is more expressive. If it does not follow SQL’s path and becomes more low-level (by exposing index and table access primitives in their programming language), then it seems that thinking about these storage primitives early on is essential.
Context: I worked on query engines of MongoDB, ClickHouse and YDB.
I am concerned that the author focuses on the query language before focusing on the “I/O layer”. Data access planning is a gargantuan effort for SQL queries. If we are giving the developers a richer query language, figuring out which indexes to use will become even more complex.
If we are asking the developers to explicitly specify which storage primitives they are using, what the interface will look like? What if it is more optimal to use one index over another depending on the size of the input parameter to the query? Do users encode that manually as well?
All in all, I am happy to see an attempt to move away from SQL, but I do believe that the nature of storage and the interface to it will have a ripple effect on all parts of the system, including query language.
That’s a relevant point, especially considering the PRQL compiles directly to SQL.
How do you see storage rippling out to SQL now? It or dialects very close to it are used for all kinds of different storage backends in various different databases. I’m curious if you have examples of how SQL has been affected by those?
It is true that SQL as a language is not affected by the storage that much. I was not referring to the SQL, but to the custom query language Glowdust uses.
My point is that if it follows SQL’s path (meaning it will be high-level and storage primitives will be abstracted), then it is likely to have even more problems with data access because it is more expressive. If it does not follow SQL’s path and becomes more low-level (by exposing index and table access primitives in their programming language), then it seems that thinking about these storage primitives early on is essential.
Ahh that makes sense. Thanks!