I continue to dig up and share this simple approach for production data masking via SQL to create testing data sets. Time to codify it into a post.
Rather than generating a set of names and data from tools such as Mockaroo, it is more practical to use actual data for a variety of testing reasons.
The SQL below is a self-explanatory approach of removing Personal Identifiable Information (PII), but keeping data relevant. I use this approach for a number of reasons.
- We are using production data rather than synthetic data. Data volume, distribution, and additional column values are realistic. This is a subset of an example, but dates and locations are therefore realistic
- Indexes (and unique indexes) still work, and distribution across the index is adequate for searching. Technically the index …