Showing entries 1 to 4
Displaying posts with tag: Mike Benshoof (reset)
Managing big data? Say ‘hello’ to HP Vertica

Over the past few months, I’ve seen an increase in the following use case while working on performance and schema review engagements:

I need to store exponentially increasing amounts of data and analyze all of it in real-time.

This is also known simply as: “We have big data.” Typically, this data is used for user interaction analysis, ad tracking, or other common click stream applications. However, it can also be seen in threat assessment (ddos mitigation, etc), financial forecasting, and other applications as well. While MySQL (and other OLTP systems) can handle this to a degree, it is by no means a forte. Some of the pain points include:

  • Cost of rapidly increasing, expensive disk storage (OLTP disks need to be fast == $$)
  • Performance decrease as the data size increases …
[Read more]
Integrating pt-online-schema-change with a Scripted Deployment

Recently, I helped a client that was having issues with deployments causing locking in their production databases.  At a high level, the two key components used in the environment were:

  • Capistrano (scripted deployments) [website]
  • Liquibase (database version control) [website]

At a high level, they currently used a CLI call to Liquibase as a sub-task within a larger deployment task.  The goal of this engagement was to modify that sub-task to run Liquibase in a non-blocking fashion as opposed to the default that just runs native ALTERS against the database.

As I wasn’t very familiar with Liquibase, I took this opportunity to learn more about it and it seems like a very valuable tool.  Essentially, it does the following:

  • Adds two version control tables to …
[Read more]
pt-online-schema-change and binlog_format

Statement-based or row-based, or mixed?  We’ve all seen this discussed at length so I’m not trying to rehash tired arguments.  At a high level, the difference is simple:

  1. Statement based replication (SBR) replicates the SQL statements to the slave to be replayed
  2. Row based replication (RBR) replicates the actual rows changed to the slave to be replayed
  3. Mixed mode uses RBR in the event of a non-deterministic statement, otherwise uses SBR

Recently, I worked with a client to optimize their use of pt-online-schema-change and keep replication delay to a minimum.  We found that using RBR in conjunction with a smaller chunk-time was the best result in their environment due to reduced IO on the slave, but I wanted to recreate the test locally as well to see how it looked in the generic sense (sysbench for data/load).

Here was my local setup:

[Read more]
MySQL 5.6 – InnoDB Memcached Plugin as a caching layer

A common practice to offload traffic from MySQL 5.6 is to use a caching layer to store expensive result sets or objects.  Some typical use cases include:

  • Complicated query result set (search results, recent users, recent posts, etc)
  • Full page output (relatively static pages)
  • Full objects (user or cart object built from several queries)
  • Infrequently changing data (configurations, etc)

In pseudo-code, here is the basic approach:

data = fetchCache(key)
if (data) {
  return data
}
data = callExpensiveFunction(params)
storeCache(data, key)
return data

Memcached is a very popular (and proven) option used in production as a caching layer.  While very fast, one major potential shortcoming of memcached is that it is not persistent.  While a common design consideration when using a cache layer is that “data in cache may go …

[Read more]
Showing entries 1 to 4