Showing entries 11 to 20 of 28
« 10 Newer Entries | 8 Older Entries »
Displaying posts with tag: ClickHouse (reset)
MariaDB Track at Percona Live

Less than one month left until Percona Live. This time the Committee work was a bit unusual. Instead of having one big committee for the whole conference we had a few mini-committees, each responsible for a track. Each independent mini-committee, in turn, had a leader who was responsible for the whole process. I led the MariaDB track. In this post, I want to explain how we worked, which topics we have chosen, and why.

For MariaDB, we had seven slots: five for 50-minutes talks, two for 25-minutes talks and 19 submissions. We had to reject two out of three proposals. We also had to decide how many topics the program should cover. My aim here was to use the MariaDB track to demonstrate as many MariaDB unique features as …

[Read more]
ClickHouse Performance Uint32 vs Uint64 vs Float32 vs Float64

While implementing ClickHouse for query executions statistics storage in Percona Monitoring and Management (PMM),  we were faced with a question of choosing the data type for metrics we store. It came down to this question: what is the difference in performance and space usage between Uint32, Uint64, Float32, and Float64 column types?

To test this, I created a test table with an abbreviated and simplified version of the main table in our ClickHouse Schema.

The “number of queries” is stored four times in four different columns to be able to benchmark queries referencing different columns.  We can do this with ClickHouse because it is a column store and it works only with columns referenced by the query. This method would not be appropriate for testing on …

[Read more]
Replicating data into Clickhouse

Clickhouse is a relatively new analytics and datawarehouse engine that provides for very quick insertion and analysing of data. Like most analytics platforms it’s built on a column-oriented storage basis and unlike many alternatives is completely open source. It’s also exceedingly fast, even on relatively modest platforms.

Clickhouse does have some differences from some other environments, for example, data inserted cannot easily be updated, and it supports a number of different storage and table engine formats that are used to store and index the information. So how do we get into that from our MySQL transactional store?

Well, you can do dumps and loads, or you could use Tungsten Replicator to do that for you. The techniques I’m going to describe here are not in an active release, but use the same principles as other part of our data loading.

We’re going to use the CSV-based batch loading system that is …

[Read more]
Percona Live Europe Presents: ClickHouse at Messagebird: Analysing Billions of Events in Real-Time*

We’ll look into how Clickhouse allows us to ingest a large amount of data and run complex analytical interactive queries at MessageBird,. We also present the business needs that brought ClickHouse to our attention and detail the journey to its deployment. We cover the problems we faced, and how we dealt with them. We talk about our current Cloud production setup and how we deployed and use it.

We are really enthusiastic to share a use case of Clickhouse, how it helped us to scale our analytics stack with the good, the bad and the ugly.

The talk could be useful to newcomers and everyone wondering if Clickhouse could be useful to them.

What we’re looking forward to…

There are many talks, but these are among the top ones we’re looking forward to in particular:

[Read more]
ClickHouse: Two Years!

Following my post from a year ago https://www.percona.com/blog/2017/07/06/clickhouse-one-year/, I wanted to review what happened in ClickHouse during this year.
There is indeed some interesting news to share.

1. ClickHouse in DB-Engines Ranking. It did not quite get into the top 100, but the gain from position 174 to 106 is still impressive. Its DB-Engines Ranking score tripled from 0.54 last September to 1.57 this September

And indeed, in my conversation with customers and partners, the narrative has changed from: “ClickHouse, what is …

[Read more]
Easy and Effective Way of Building External Dictionaries for ClickHouse with Pentaho Data Integration Tool

In this post, I provide an illustration of how to use Pentaho Data Integration (PDI) tool to set up external dictionaries in MySQL to support ClickHouse. Although I use MySQL in this example, you can use any PDI supported source.

ClickHouse

ClickHouse is an open-source column-oriented DBMS (columnar database management system) for online analytical processing. Source: wiki.

Pentaho Data Integration

Information from the Pentaho wiki: Pentaho Data Integration (PDI, also called Kettle) is the component of Pentaho responsible for the Extract, Transform and Load (ETL) processes. Though ETL tools are most frequently used in data warehouses environments, PDI can also be used for other purposes:

  • Migrating data between …
[Read more]
Analyze Your Raw MySQL Query Logs with ClickHouse

In this blog post, I’ll look at how you can analyze raw MySQL query logs with ClickHouse.

For typical query performance analyses, we have an excellent tool in Percona Monitoring and Management. You may want to go deeper, though. You might be longing for the ability to query raw MySQL “slow” query logs with SQL.

There are a number of tools to load the MySQL slow query logs to a variety of data stores. For example, you can find posts showing how to do it with LogStash. While very flexible, these solutions always look too complicated and limited in functionality to me.   

By far the best solution to parse and load MySQL slow query logs (among multiple log types supported) is Charity …

[Read more]
Archiving MySQL Tables in ClickHouse

In this blog post, I will talk about archiving MySQL tables in ClickHouse for storage and analytics.

Why Archive?

Hard drives are cheap nowadays, but storing lots of data in MySQL is not practical and can cause all sorts of performance bottlenecks. To name just a few issues:

  1. The larger the table and index, the slower the performance of all operations (both writes and reads)
  2. Backup and restore for terabytes of data is more challenging, and if we need to have redundancy (replication slave, clustering, etc.) we will have to store all the data N times

The answer is archiving old data. Archiving does not necessarily mean that the data will be permanently removed. Instead, the archived data can be placed into long-term storage (i.e., AWS S3) or loaded into a …

[Read more]
Updating/Deleting Rows From Clickhouse (Part 2)

In this post, we’ll look at updating and deleting rows with ClickHouse. It’s the second of two parts.

In the first part of this post, we described the high-level overview of implementing incremental refresh on a ClickHouse table as an alternative support for UPDATE/DELETE. In this part, we will show you the actual steps and sample code.

Prepare Changelog Table

First, we create the changelog table below. This can be stored on any other MySQL instance separate from the source of our analytics table. When we run the change capture script, it will record the data on this table that we can consume later with the incremental refresh script:

CREATE TABLE `clickhouse_changelog` (
  `db` varchar(255) NOT NULL …
[Read more]
Updating/Deleting Rows with ClickHouse (Part 1)

In this post, we’ll look at updating and deleting rows with ClickHouse. It’s the first of two parts.

Update: Part 2 of this post is here.

ClickHouse is fast – blazing fast! It’s quite easy to pick up, and with ProxySQL integrating with existing applications already using MySQL, it’s way less complex than using other analytics options. However, ClickHouse does not support UPDATE/DELETE (yet). That entry barrier can easily dissuade potential users despite the good things I mentioned.

If there is a will, there is a way! We have so far taken advantage of the new feature that supports more granular partitioning strategy (by week, by day or something else). With more …

[Read more]
Showing entries 11 to 20 of 28
« 10 Newer Entries | 8 Older Entries »