Showing entries 1 to 1
Displaying posts with tag: hive-metastore (reset)
Metacat: Making Big Data Discoverable and Meaningful at Netflix

by Ajoy Majumdar, Zhen Li

Most large companies have numerous data sources with different data formats and large data volumes. These data stores are accessed and analyzed by many people throughout the enterprise. At Netflix, our data warehouse consists of a large number of data sets stored in Amazon S3 (via Hive), Druid, Elasticsearch, Redshift, Snowflake and MySql. Our platform supports Spark, Presto, Pig, and Hive for consuming, processing and producing data sets. Given the diverse set of data sources, and to make sure our data platform can interoperate across these data sets as one “single” data warehouse, we built Metacat. In this blog, we will discuss our motivations in building Metacat, a …

[Read more]
Showing entries 1 to 1