Showing entries 1 to 10 of 13
3 Older Entries »
Displaying posts with tag: Normalization (reset)
Using a parser plugin for improved search results with MySQL 5.7 and InnoDB.

With Unicode it is possible for strings to look the same, but with slight differences in which codepoints are used.

For example the é in Café can be <U+0065 U+0301> or <U+00E9>.

The solution is to use Unicode normalization, which is supported in every major programming language. Both versions of Café will be normalized to use U+00E9.

In the best situation the application inserting data into the database will do the normalization, but that often not the case.

This gives the following issue: If you search for Café in the normalized form it won't return non-normalized entries.

I made a proof-of-concept parser plugin which indexes the normalized version of words.

A very short demo:


mysql> CREATE TABLE test1 (id int auto_increment primary key,
-> txt TEXT CHARACTER SET utf8mb4, fulltext (txt));
Query OK, 0 rows affected (0.30 sec)

mysql> …
[Read more]
How to structure and design a relational database to support you data storage needs?

Well, every now and then, when we began to start a new project or app, which has some data storage requirement, we have a deep intriguing thought as to how best represent the data structure so as to support a variety of needs including but not limited to (ACID rules):

1. Normalization
2. Reliability
3. Consistency
4. And many others

Below, I provide a set of steps which you can follow to arrive at a data model that correctly suites your requirements.

Steps:

1. Identify the project or app requirements / specifications and business rules which tell you what your app will be able to do when it is ready.
2. From these business rules, identify possible objects for each business rule and mark them in a paper using rectangular sections like authors, posts etc.
3. Once you have recognized the …

[Read more]
Normalize Your MySQL Database For Efficient Data Access, Query Flexibility and Maintenance

Before you start creating database entities, spend some time designing your database to ensure that it is fit for your purpose. The time you invest in this process saves a lot of time and trouble later. Professional database designers fine-tune their design using a process called 'normalization'. The normalization process takes your database design through a number of 'normal forms', which aim to ensure efficient data access, greater query flexibility, and easier maintenance.

For example, the First Normal Form (or '1NF') ensures that all your database columns contain only a single value. A column that contains multiple data values is difficult to access and keep up to date. It also ensures that each table row only represents a single 'real world' item. Like all the other normal forms, this encourages you to split your data across multiple tables, with less rows in each table. You can quickly see the benefits of this approach as your database …

[Read more]
Every Relation is in First Normal Form, by definition. Every Table may not.

Note: I use a dual terminology(or even triple considering Database Design theory) when I am talking about Databases and their elements.   In Relational Model terminology

  1. Relation
  2. Tuple
  3. Attribute

In SQL terminology

  1. Table            
  2. Row
  3. Column

 

read more

Dynamic de-normalization of attributes stored in key-value pair tables

Dear Kettlers,

A couple of years ago I wrote a post about key/value tables and how they can ruin the day of any honest person that wants to create BI solutions.  The obvious advice I gave back then was to not use those tables in the first place if you’re serious about a BI solution.  And if you have to, do some denormalization.

However, there are occasions where you need to query a source system and get some report going on them.  Let’s take a look at an example :

mysql> select * from person;
+----+-------+----------+
| id | name  | lastname |
+----+-------+----------+
|  1 | Lex   | Luthor   |
|  2 | Clark | Kent     |
|  3 | Lois  | Lane     |
+----+-------+----------+
3 rows in set (0.00 sec)

mysql> select * from person_attribute;
+----+-----------+---------------+------------+
| id | person_id | attr_key      | attr_value | …
[Read more]
Databases: Normalization or Denormalization. Which is the better technique?

This has really been a long debate as to which approach is more performance orientated, normalized databases or denormalized databases. So this article is a step on my part to figure out the right strategy, because neither one of these approaches can be rejected outright. I will start of by discussing the pros and cons of both the approaches. Pros and Cons of a Normalized database design. Normalized databases fair very well under conditions where the applications are write-intensive and the write-load is more than the read-load. This is because of the following reasons: Normalized tables are usually smaller and...

Data Warehousing Best Practices: Comparing Oracle to MySQL pt 1

At Kscope this year, I attended a half day in-depth session entitled Data Warehousing Performance Best Practices, given by Maria Colgan of Oracle. My impression, which was confirmed by folks in the Oracle world, is that she knows her way around the Oracle optimizer.

These are my notes from the session, which include comparisons of how Oracle works (which Maria gave) and how MySQL works (which I researched to figure out the difference, which is why this blog post took a month after the conference to write). Note that I am not an expert on data warehousing in either Oracle or MySQL, so these are more concepts to think about than hard-and-fast advice. In some places, I still have questions, and I am happy to have folks comment and contribute what they know.

One interesting point brought up:
Maria quoted someone (she said the name but I did not grab it) from …

[Read more]
Performance tuning using vertical partitioning.

Does having small data-sets really help? Of course it does! Are memory lookups faster that disk lookups. Of course ! So many times I have seen people complain about queries taking too long now, while they were not taking that long earlier. There is one big reason for this, earlier the size of data-set was small so it could fit into memory. Now that the data-set has grown large enough that it cannot fit entirely into memory, the disk seeks really have slowed down the queries significantly. What to do now? Vertical partitioning. Divide the data-set into separate data-sets vertically....

How to install MongoDB on CentOS 5.4 / RHEL5 and interface with PHP 5

If you’ve been reading up on the various NoSQL offerings and have wanted to try out one but don’t know how to get started, this is one of the easiest ways. I chose MongoDB for this example because I’m going to start using it for a project that needs features that MySQL isn’t as fast at: namely denormalized data with billions of rows. MongoDB has plenty of drivers for other scripting and high-level languages but I’ll focus on the PHP driver today. If there is interest I can do a write up on Python usage later. This example is limited to CentOS, Fedora, and Redhat 5 servers that use the yum package management system. For more information you can reference their download page: http://www.mongodb.org/display/DOCS/Downloads

First install the prerequisites:

  • sudo yum install gcc php php-pear

Then install the mogo php extension via …

[Read more]
Normalization and smoking



An interesting question came yesterday at the end of the MySQL workshop in Athens. An user has a server with about 40,000 tables, and the performance is not brilliant (oh really?). The reason is that there is an application that requires one or more new tables for any user, and the tables are of about ten different structures.
The user asked the wrong question: "will the performance improve if I change the storage engine from InnoDB to MyISAM?"


Salle gave the first answer. Converting the tables to MyISAM would only exacerbate the problem. Instead of having one file per table, the …

[Read more]
Showing entries 1 to 10 of 13
3 Older Entries »