This is a very good idea when you can get away with it, but because there are so many profiling tasks that might want to be done (e.g. most popular cities; fraud prevention) on some data in that opaque string, you’ll often find yourself implementing a data matching/cleaning software badly – either because the parsing exercises actually do start simple enough, or you don’t know pre-existing solutions exist (or you do, and the price scares you off). And if you know how unpleasant that can be (I frequently work with CRM databases with 6mm+ contacts in them) you should be forgiven for trying to get the user to help you parse it.
For this reason, I recommend using both the “printed”/combined/opaque string, as well as structured/parsed-columns as well. Ask the user to fill out the structured data first, but give them the option to go with the blob, and kick off a task (software if you can, human if you must) to clean it up post-entry – which can and should include recommending schema changes to your development team.
Love finding writings like this where the author has clearly worked on something very specific, and can articulate nuances that you’d either never think about or wouldn’t think existed (for example, buildings that are numbered zero)
We need something like that for every category in one place. Plus, premade components in common languages that enforce their best practices by default with escape hatches for some stuff where it makes sense. minimax’s link is a good start.
In New Zealand, the post codes are pretty much a worthless field. NZ Post has done such a poor job of maintaining them. There are some areas with multiple post codes; most people don’t really use (or know) their correct post code .. there are multiple and conflicting post code maps. We had a team that learnt all of this due to a government mailer contract we got. I don’t think the NZ Post even use those numbers for sorting.
Honestly if a data type was ever written on paper I just assume it’s an opaque string unless I have a really good reason to do otherwise.
This is a very good idea when you can get away with it, but because there are so many profiling tasks that might want to be done (e.g. most popular cities; fraud prevention) on some data in that opaque string, you’ll often find yourself implementing a data matching/cleaning software badly – either because the parsing exercises actually do start simple enough, or you don’t know pre-existing solutions exist (or you do, and the price scares you off). And if you know how unpleasant that can be (I frequently work with CRM databases with 6mm+ contacts in them) you should be forgiven for trying to get the user to help you parse it.
For this reason, I recommend using both the “printed”/combined/opaque string, as well as structured/parsed-columns as well. Ask the user to fill out the structured data first, but give them the option to go with the blob, and kick off a task (software if you can, human if you must) to clean it up post-entry – which can and should include recommending schema changes to your development team.
Love finding writings like this where the author has clearly worked on something very specific, and can articulate nuances that you’d either never think about or wouldn’t think existed (for example, buildings that are numbered zero)
We need something like that for every category in one place. Plus, premade components in common languages that enforce their best practices by default with escape hatches for some stuff where it makes sense. minimax’s link is a good start.
Glad to see this one made the list already:
In New Zealand, the post codes are pretty much a worthless field. NZ Post has done such a poor job of maintaining them. There are some areas with multiple post codes; most people don’t really use (or know) their correct post code .. there are multiple and conflicting post code maps. We had a team that learnt all of this due to a government mailer contract we got. I don’t think the NZ Post even use those numbers for sorting.