Where I work, there is an ongoing discussion about test data generation. At the moment we do not replace or change any production data for use in the test environments, and we don’t generate test data.
That is safe and legal, because production data is tokenized. That is, PII and PCI data is being replaced by placeholder tokens which can be used by applications to access the actual protected data through specially protected access services. Only a very limited circle of people is dealing with data in behind the protected services.
Using production data in test databases is also fast, because we copy data in parallel, at line speed, or we make redirect-on-write (“copy-on-write” in the age of SSD) writeable snapshots available.
Assume for a moment we want to change that and
- mask data from production when we copy it to test databases
- reduce the amount of data used in test databases, while …