Often times, production data needs to be moved to different environments for testing/developing purposes. However, some of that data can be people’s name, birthday, address, account number, etc., that we don’t want testers and/or developers to see, due to privacy and regulatory concerns. Hence the need to mask those data. I can certainly see this needs grow over time for all database platforms. There are software out there that does this sort of task, or similar tasks, such as data generation tool. Oracle actually has a Data Masking Pack since 10g for this purpose. Here are some of my thoughts on this topic.
One method of masking data is through reshuffling, which shuffles the value in target column(s) that you want to protect randomly across different rows.
Another way of doing it is through data generation. For instance, for target column(s), we just replace its value with something else.
For reshuffling, obviously …
[Read more]