Showing entries 1 to 3
Displaying posts with tag: data distribution (reset)
Data Masking 101

I continue to dig up and share this simple approach for production data masking via SQL to create testing data sets. Time to codify it into a post.

Rather than generating a set of names and data from tools such as Mockaroo, it is more practical to use actual data for a variety of testing reasons.

The SQL below is a self-explanatory approach of removing Personal Identifiable Information (PII), but keeping data relevant. I use this approach for a number of reasons.

  • We are using production data rather than synthetic data. Data volume, distribution, and additional column values are realistic. This is a subset of an example, but dates and locations are therefore realistic
  • Indexes (and unique indexes) still work, and distribution across the index is adequate for searching. Technically the index …
[Read more]
MySQL thread pool and scalability examples

Nice article about SimCity outage and ways to defend databases: http://www.mysqlperformanceblog.com/2013/03/16/simcity-outages-traffic-control-and-thread-pool-for-mysql/

The graphs showing throughput with and without the thread pool are taken from the benchmark performed by Oracle and taken from here:
http://www.mysql.com/products/enterprise/scalability.html

The main take away is this graph (all rights reserved to Oracle, picture original URL):

Scalability is where throughput can grow and grow, as demand grows. I need to get more from the database, the question …

[Read more]
They say: "Relational Databases Aren't Dead"

This is a good read, claiming: "Relational Databases Aren't Dead. Heck, They're Not Even Sleeping", http://readwrite.com/2013/03/26/relational-databases-far-from-dead. A key quote:
"While not comprehensive, the uses for NoSQL databases center around the acquisition of fast-growing data or data that does not easily fit within uniform structures."
There were 2 parts in the statement about NoSQL's uses. I'll start with the latter:


"data that does not easily fit within uniform structures" - NoSQL is probably the right choice, hmm although I always encourage thinking and architecting in advance. And also online structure changes do exist in the RDBMS world and recently in MySQL: …

[Read more]
Showing entries 1 to 3