Showing entries 1 to 10 of 22
10 Older Entries »
Displaying posts with tag: metadata (reset)
Databook: Turning Big Data into Knowledge with Metadata at Uber

From driver and rider locations and destinations, to restaurant orders and payment transactions, every interaction on Uber’s transportation platform is driven by data. Data powers Uber’s global marketplace, enabling more reliable and seamless user experiences across our products for riders, …

The post Databook: Turning Big Data into Knowledge with Metadata at Uber appeared first on Uber Engineering Blog.

Metacat: Making Big Data Discoverable and Meaningful at Netflix

by Ajoy Majumdar, Zhen Li

Most large companies have numerous data sources with different data formats and large data volumes. These data stores are accessed and analyzed by many people throughout the enterprise. At Netflix, our data warehouse consists of a large number of data sets stored in Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake and MySql. Our platform supports Spark, Presto, Pig, and Hive for consuming, processing and producing data sets. Given the diverse set of data sources, and to make sure our data platform can interoperate across these data sets as one “single” data warehouse, we built Metacat. In this blog, we will discuss our motivations in building Metacat, a …

[Read more]
How to Deal with MetaData Lock

What is MetaData Lock?

MySQL uses metadata locking to manage concurrent access to database objects, and to ensure data consistency when performing modifications to the schema: DDL operations. Metadata locking applies not just to tables, but also to schemas and stored programs (procedures, functions, triggers, and scheduled events).

In this post I am going to cover metadata locks on tables and triggers, that are usually seen by DBAs during regular operations/maintenance.

Kindly refer to these 4 different connections to MySQL Instance:

 

The screenshot shows that the uncommitted transaction may cause metadata lock to ALTER operations. The ALTER will not proceed until the transaction is committed or rolled-back. What is worse, after the ALTER is issued, any queries to that table (even simple SELECT queries) will be blocked. If the ALTER operation is an …

[Read more]
Tracking Metadata Locks (MDL) in MariaDB 10.0

I recently blogged about tracking metadata locks in the latest MySQL, and now I want to discuss how to track these metadata locks in MariaDB.

In MySQL 5.7, there is a table named `metadata_locks` added to the performance_schema (performance_schema must be enabled *and* the metadata_locks instrument must be specifically enabled as well.

In the MariaDB 10.0 implementation (as of 10.0.7), there is a table named METADATA_LOCK_INFO added to the *information_schema*. This is a new plugin, so the plugin must be installed, but that is very simple with:

INSTALL SONAME 'metadata_lock_info';

Then, you will have the table.

To see it in action:

Connection #1:

mysql> create table t (id int) engine=myisam;
mysql> begin;
mysql> select * from t;

Connection #2:

mysql> alter table t add …
[Read more]
Tracking Metadata Locks (MDL) in MySQL 5.7

I’ve blogged about metadata locks (MDL) in the past (1 2 3) and in particular discussed how best to track them down and troubleshoot threads stuck waiting on metadata locks.

If you’ve had any experience with these, you’ll know finding them isn’t always the most straight-forward task.

So I was glad to see metadata lock instrumentation added to MySQL 5.7.3 as part of performance_schema, which makes tracking these down a breeze! (Note this is only in 5.7.3 currently, and therefore is some time from being GA as of today)!

To use these, performance_schema must be enabled (i.e., performance_schema=1 in your config file).

But, also, the metadata_locks instrument is disabled by default, so even if you enable the …

[Read more]
Data Modeling

Dear data integration fans,

I’m a big fan of “appropriate” data modeling prior to doing any data integration work.  For a number of folks out there that means the creation of an Enterprise Data Warehouse model in classical Bill Inmon style.  Others prefer to use modern modeling techniques like Data Vault, created by Dan Linstedt.  However, the largest group data warehouse architects use a technique called dimensional modeling championed by Ralph Kimball.

Using a modeling technique is very important since it brings structure to your data warehouse. …

[Read more]
Viewing RMAN jobs status and output

Yesterday I was discussing with a fellow DBA about ways to check the status of existing and/or past RMAN jobs. Good backup scripts usually write their output to some sort of log file so, checking the output is usually a straight-forward task. However, backup jobs can be scheduled in many different ways (crontab, Grid Control, Scheduled Tasks, etc) and finding the log file may be tricky if you don’t know the environment well.
Furthermore, log files may also have already been overwritten by the next backup or simply just deleted. An alternative way of accessing that information, thus, may come handy.

Fortunately, RMAN keeps the backup metadata around for some time and it can be accessed through the database’s V$ views. Obviously, if you need this information because your database just crashed and needs to be restored, the method described here is useless.

Backup jobs’ status and metadata

A lot of metadata about …

[Read more]
Dynamic de-normalization of attributes stored in key-value pair tables

Dear Kettlers,

A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.  The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution.  And if you have to, do some denormalization.

However, there are occasions where you need to query a source system and get some report going on them.  Let’s take a look at an example :

mysql> select * from person;
+----+-------+----------+
| id | name  | lastname |
+----+-------+----------+
|  1 | Lex   | Luthor   |
|  2 | Clark | Kent     |
|  3 | Lois  | Lane     |
+----+-------+----------+
3 rows in set (0.00 sec)

mysql> select * from person_attribute;
+----+-----------+---------------+------------+
| id | person_id | attr_key      | attr_value | …
[Read more]
Parse nasty XLS with dynamic ETL

Dear Kettle friends,

Last year, right after the summer in version 4.1 of Pentaho Data Integration, we introduced the notion of dynamically inserted ETL metadata (Youtube video here).  Since then we received a lot of positive feedback on this functionality which encouraged me to extend it to a few more steps. Already with support for “CSV Input” and “Select Values” we could do a lot of dynamic things.  However, we can clearly do a lot better by extending our initiative to a few more steps: “Microsoft Excel Input” (which can also read ODS by the way), “Row Normalizer” and “Row De-normalizer”.

Below I’ll describe an actual (obfuscated) example that you will probably recognize as it is equally hideous as simple in it’s horrible complexity.

Take a look at this file:

Let’s assume that this spreadsheet …

[Read more]
Drizzle metadata tables

Giuseppe has a great post about the Evolution of MySQL metadata, and I thought I’d have a look at what we have in Drizzle. It’s pretty easy to work out how many tables are in each schema, we just query the standard INFORMATION_SCHEMA.TABLES view:

drizzle> select table_schema,count(table_name)
    ->  from information_schema.tables
    -> group by table_schema;
+--------------------+-------------------+
| table_schema       | count(table_name) |
+--------------------+-------------------+
| DATA_DICTIONARY    |                53 |
| INFORMATION_SCHEMA |                20 |
+--------------------+-------------------+
2 rows in set (0 sec)

In Drizzle it’s important to note that there is a differentiation between SQL …

[Read more]
Showing entries 1 to 10 of 22
10 Older Entries »