Showing entries 1 to 10 of 12
2 Older Entries »
Displaying posts with tag: ClickHouse (reset)
Analyze Your Raw MySQL Query Logs with ClickHouse

In this blog post, I’ll look at how you can analyze raw MySQL query logs with ClickHouse.

For typical query performance analyses, we have an excellent tool in Percona Monitoring and Management. You may want to go deeper, though. You might be longing for the ability to query raw MySQL “slow” query logs with SQL.

There are a number of tools to load the MySQL slow query logs to a variety of data stores. For example, you can find posts showing how to do it with LogStash. While very flexible, these solutions always look too complicated and limited in functionality to me.   

By far the best solution to parse and load MySQL slow query logs (among multiple log types supported) is Charity …

[Read more]
Archiving MySQL Tables in ClickHouse

In this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a …

[Read more]
Updating/Deleting Rows From Clickhouse (Part 2)

In this post, we’ll look at updating and deleting rows with ClickHouse. It’s the second of two parts.

In the first part of this post, we described the high-level overview of implementing incremental refresh on a ClickHouse table as an alternative support for UPDATE/DELETE. In this part, we will show you the actual steps and sample code.

Prepare Changelog Table

First, we create the changelog table below. This can be stored on any other MySQL instance separate from the source of our analytics table. When we run the change capture script, it will record the data on this table that we can consume later with the incremental refresh script:

CREATE TABLE `clickhouse_changelog` (
  `db` varchar(255) NOT NULL …
[Read more]
Updating/Deleting Rows with ClickHouse (Part 1)

In this post, we’ll look at updating and deleting rows with ClickHouse. It’s the first of two parts.

Update: Part 2 of this post is here.

ClickHouse is fast – blazing fast! It’s quite easy to pick up, and with ProxySQL integrating with existing applications already using MySQL, it’s way less complex than using other analytics options. However, ClickHouse does not support UPDATE/DELETE (yet). That entry barrier can easily dissuade potential users despite the good things I mentioned.

If there is a will, there is a way! We have so far taken advantage of the new feature that supports more granular partitioning strategy (by week, by day or something else). With more …

[Read more]
ClickHouse MySQL Silicon Valley Meetup Wednesday, October 25 at Uber Engineering with Percona’s CTO Vadim Tkachenko

I will be presenting at the ClickHouse MySQL Silicon Valley Meetup on Wednesday, October 25, 2017, at 6:30 PM.

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for the serious workloads. We will talk about ClickHouse in general, some internals and why it is so fast. ClickHouse works in conjunction with MySQL – traditionally weak for analytical workloads – and this presentation demonstrates how to make the two systems work together.

My talk will cover how we can improve the experience with real-time analytics using ClickHouse, and how we can …

[Read more]
Big Dataset: All Reddit Comments – Analyzing with ClickHouse

In this blog, I’ll use ClickHouse and Tabix to look at a new very large dataset for research.

It is hard to come across interesting datasets, especially a big one (and by big I mean one billion rows or more). Before, I’ve used on-time airline performance available from BUREAU OF TRANSPORTATION STATISTICS. Another recent example is NYC Taxi and Uber Trips data, with over one billion records.

However, today I wanted to mention an interesting dataset I found recently that has been available since 2015. This is Reddit’s comments and submissions dataset, made possible thanks to Reddit’s generous API. The …

[Read more]
Upcoming Webinar September 14, 2017: Supercharge Your Analytics with ClickHouse

Join Percona’s CTO Vadim Tkachenko @VadimTk and Altinity’s Co-Founder, Alexander Zaitsev as they present Supercharge Your Analytics with ClickHouse on Thursday, September 14, 2017, at 10:00 am PDT / 1:00 pm EDT (UTC-7).

Reserve Your Spot

 

ClickHouse is a real-time analytical database system. Even though they’re only celebrating one year as open source software, it has already proved itself ready for serious …

[Read more]
Nested Data Structures in ClickHouse

In this blog post, we’ll look at nested data structures in ClickHouse and how this can be used with PMM to look at queries.

Nested structures are not common in Relational Database Management Systems. Usually, it’s just flat tables. Sometimes it would be convenient to store unstructured information in structured databases.

We are working to adapt ClickHouse as a long term storage for Percona Monitoring and Management (PMM), and particularly to store detailed information about queries. One of the problems we are trying to solve is to count the different errors that cause a particular query to fail.

For example, for date 2017-08-17 the query:

"SELECT foo FROM bar WHERE id=?" …
[Read more]
ClickHouse: One Year!

In this blog, we’ll look at ClickHouse on its one year anniversary.

It’s been a year already since the Yandex team released ClickHouse as open source software. I’ve had an interest in this project from the very start, as I didn’t think there was an open source analytical database that could compete with industry leaders like Vertica (for example).

This was an exciting year for ClickHouse early adopters. Let’s look at what it accomplished so far.

ClickHouse initially generated interest due to the Yandex name – the most popular search engine in Russia. It wasn’t long before jaw-dropping responses popped up: guys, this thing is crazy fast! Many early adopters who tried ClickHouse were really impressed.

Fast doesn’t mean convenient …

[Read more]
On Apache Ignite, Apache Spark and MySQL. Interview with Nikita Ivanov

“Spark and Ignite can complement each other very well. Ignite can provide shared storage for Spark so state can be passed from one Spark application or job to another. Ignite can also be used to provide distributed SQL with indexing that accelerates Spark SQL by up to 1,000x.”–Nikita Ivanov.

I have interviewed Nikita Ivanov,CTO of GridGain.
Main topics of the interview are Apache Ignite, Apache Spark and MySQL, and how well they perform on big data analytics.

RVZ

Q1. What are the main technical challenges of SaaS development projects?

Nikita Ivanov: SaaS requires that the applications be highly responsive, reliable and web-scale. SaaS development projects face many of the same challenges as …

[Read more]
Showing entries 1 to 10 of 12
2 Older Entries »