Showing entries 31 to 40 of 125
« 10 Newer Entries | 10 Older Entries »
Displaying posts with tag: data (reset)
Configure External Replica For CloudSQL — Replication from CloudSQL to VM

CloudSQL is fully managed MySQL / PostgreSQL database system. CloudSQL reduces the workload for DBAs and anyone can easily manage even without a DBA. In many cases, people wants to have a replica of their production database for Testing purpose or even some other purpose. CloudSQL provides the flexibility to have external replicas on VM or On-prem. In this blog we are going to configure external replica for CloudSQL.

MySQL with GTID:

CloudSQL 2nd generation support GTID based replication and binlog file based replication. But GTID is more consistent. But 1'st generation only support binlog replication.

A global transaction identifier (GTID) is a unique identifier created and associated with each transaction committed on the server of origin (master). This identifier is unique not only to the server on which it originated, but is unique across all servers in a given replication setup. There is a 1-to-1 mapping between …

[Read more]
Black-Box Auditing: Verifying End-to-End Replication Integrity between MySQL and Redshift

Since Yelp introduced its real-time streaming data infrastructure, “Data Pipeline”, it has grown in scope and matured vastly. It now supports some of Yelp’s most critical business requirements in its mission to connect people with great local businesses. Today, it has expanded into a diverse ecosystem of connectors sourcing data from Kafka and MySQL, and sinking data into Cassandra, Elasticsearch, Kafka, MySQL, Redshift, and S3. To ensure that the whole ecosystem is functioning correctly, Yelp’s Data Pipeline infrastructure is continually growing its repertoire of reliability techniques such as write-ahead logging, two-phase commit, fuzz testing, monkey testing, and black-box auditing to...

Array Ranges in MySQL JSON

Pretend you have a JSON array of data that looks roughly like the following.

mysql> insert into x(y) values('["a","b","c","d"]');
Query OK, 1 row affected (0.10 sec)


You could get all the values from that array using $[*]


mysql> select y->"$[*]" from x;
+----------------------+
| y->"$[*]" |
+----------------------+
| ["a", "b", "c", "d"] |
+----------------------+
1 row in set (0.00 sec)

Or the individual members of the array with an index that starts with zero.


mysql> select y->"$[0]" from x;
+-----------+
| y->"$[0]" |
+-----------+
| "a" |
+-----------+
1 row in set (0.00 sec)


But what about the times you want the last item in the array and really do not want to loop through all the items? How about using …

[Read more]
Incremental MYSQL loads to BigQuery using Matillion

As part of building an enterprise DW for one of our customers we had to sync a bunch of tables from a MYSQL slave to BigQuery at 30 min intervals. Considering the range of other non-relational data sources which will be part of the this load, we chose Matillion as ETL tool. Matillion is easy to setup (just provision the VM and start authoring jobs) and long list of integrations so it made sense.

This post explains building a Matillion job that does the following:

  1. Full Load
  2. Incremental load for tables with larger row count and an ID that can be looked up for new rows since last load.

MYSQL Drivers

If you came from a Google search looking for Matillion — I am assuming you are done with provisioning the instance, setting up default project etc are done, so I am skipping those. While Matillion ships with PostgreSQL drivers, for some reason it doesn’t have MYSQL …

[Read more]
Two New MySQL Books!

There are two new MySQL books both from Apress Press. One is an in depth master course on the subject and the other is a quick introduction.


ProMySQL NDB Cluster is subtitled Master the MySQL Cluster Lifecycle and at nearly 700 pages it is vital resource to anyone that runs or is thinking about running NDB Cluster. The authors, Jesper Wisborg Krogh and Mikiya Okuno, have distilled their vast knowledge of this difficult subject in a detail packed but easily readable book.  MySQL Cluster is much more complex in many areas than a regular MySQL server and here you will find all those details. If you run MySQL NDB Cluster then you need this book. The partitioning information in chapter 2 is worth the price of the book alone.  I am only a third of the way …

[Read more]
Handy JSON to MySQL Loading Script

JSON in Flat File to MySQL DatabaseSo how do you load that JSON data file into MySQL. Recently I had this question presented to me and I thought I would share a handy script I use to do such work. For this example I will use the US Zip (postal) codes from JSONAR. Download and unzip the file. The data file is named zips.json and it can not be bread directly into MySQL using the SOURCE command. It needs to have the information wrapped in a more palatable fashion.

head zips.json 
{ "city" : "AGAWAM", "loc" : [ -72.622739, 42.070206 ], "pop" : 15338, "state" : "MA", "_id" : "01001" }
{ "city" : "CUSHMAN", "loc" : [ -72.51564999999999, 42.377017 ], "pop" : 36963, "state" : "MA", "_id" : "01002" }
{ "city" : "BARRE", "loc" : [ -72.10835400000001, 42.409698 ], "pop" : 4546, "state" : "MA", "_id" : "01005" }
{ "city" : "BELCHERTOWN", "loc" : [ -72.41095300000001, …
[Read more]
How to interview an amazon database expert

via GIPHY Amazon releases a new database offering every other day. It sure isn’t easy to keep up. Join 35,000 others and follow Sean Hull on twitter @hullsean. Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the … Continue reading How to interview an amazon database expert →

Using find() with the MySQL Document Store

The VideoThe find() function for the MySQL Document Store is a very powerful tool and I have just finished a handy introductory video. By the way -- please let me have feed back on the pace, the background music, the CGI special effects (kidding!), and the amount of the content. The ScriptFor those who want to follow along with the videos, the core examples are below. The first step is to connect to a MySQL server to talk to the world_x schema (Instructions on loading that schema at the first link above).

\connect root@localhost/world_x

db is an object to points to the world_x schema. To find the records in the countryinfo collection, use db.countryinfo.find(). But that returns 237 JSON documents, too many! So lets cut it down to …

[Read more]
MySQL Document Store Video Series

I am starting a series of videos on the MySQL Document Store. The Document Store allows those who do not know Structured Query Language (SQL) to use a database without having to know the basics of relational databases, set theory, or data normalization. The goal is to have sort 2-3 minute episodes on the various facets of the Document Store including the basics, using various programming languages (Node.JS, PHP, Python), and materializing free form schemaless, NoSQL data into columns for use with SQL.

The first Episode, Introduction, can be found here.

Please provide feedback and let me know if there are subjects you would want covered in the near future.

Random human recognizable dataset

We all do need sometimes to generate raw valid dummy data for our use cases and applications as we start them. Obviously, one can write their own scripts to generate random data, but it is much better to have data, to which human beings can associate with like names, addresses instead of having them filled with random "lorem ipsum" string data :)

While searching for such a tool, I found a site which does exactly this: http://www.generatedata.com/

Documentation: http://benkeen.github.io/generatedata/

This can also be downloaded and installed locally. It supports three types of installations:
- A single, anonymous user account
- A single user account, requires login
- Multiple accounts

Below is the set of wide varied data types it supports for …

[Read more]
Showing entries 31 to 40 of 125
« 10 Newer Entries | 10 Older Entries »