Showing entries 22606 to 22615 of 44106
« 10 Newer Entries | 10 Older Entries »
Doing your own on-time flight time analysis Part III

In the last post, the data from the on-time flight database was loaded in a column-orientated storage engine. Now the numbers can be crunched.

The original goal of this exercise was to find the flight from Los Angeles International Airport, LAX, to Dallas Fort Worth International Airport, DFW, that was the most likely to arrive on-time.

The data is 'opportunity rich' in that there is a lot information in there. It is easy to start wondering about the various nuggets of information in there. Are their certain aircraft (tail numbers) that are routinely bad performers? Are some days of the week better than others? Do national holidays have an effect on the on-time performance? If you are delayed, is there a 'regular amount' of delay? Does early departure make for an early arrival? Can the flight crew make up for a late departure? How much time is usually spend on runways?

But to look for the flight from LAX …

[Read more]
Translation of Summary of Part 1 of "Methods for searching errors in SQL application" just published

There is not much new content this week. Just summary of what was written before.




Summary.


In the summary of this part I'd want to recommend following:



Try to find a query which causes inconsistent behavior



Use log files:



  1. General query log
  2. Log files in your application
  3. MySQL Proxy or any other suitable proxy
  4. Others



Analyze what is wrong, then solve the problem using your findings



Below I place a list of methods which we studied in the first part.


Method #1: use output operator to output query in exactly same way in which RDBMS gets it. …

[Read more]
Problems with subquery transformations

MySQL transforms IN predicates into nearly-equivalent EXISTS predicates, and uses a set of mechanisms to catch all special cases. This transformation, outlined below, also applies to NOT IN predicates since a NOT IN b parses to NOT a IN b. Expressions such as
SELECT * FROM table1 WHERE column1 IN (SELECT column2 FROM table2)
get transformed into
SELECT * FROM table1 WHERE EXISTS ( SELECT column2 FROM table2 WHERE column1 = column2 OR column2 IS NULL )
since the latter can be executed more efficiently. For instance an index can be used for accessing only the rows of table2 that will actually participate in the final result. The added WHERE condition is called a guarded condition, or trigger condition, both terms are used. In order to get correct result, the condition column1 = column2 must be deactivated if we are comparing against a NULL value for column1. This guarantees correct results for most cases but there are some …

[Read more]
detecting and removing unused indexes in MySQL

MySQL performance depends on a balanced usage of MySQL indexes. While it is easy to add an index and identify queries not using indexes via EXPLAIN during development or slow.log it is a lot harder to get rid of unused indexes. Finding and removing them might be crucial for your performance as indexes can create a remarkable cpu cycle and i/o overhead during updates to tables (INSERT/UPDATE/DELETE).

The default MySQL community edition server from mysql.com or your Linux/BSD distribution (which you shouldn't use for a lot of reasons anyway) is not yet helpfull in this regard. There are however inofficial patches for advanced statistics that provide the details needed for optimizing your list of indexes. The easiest way to get started with a patched MySQL server is using a pre-patched binary. At Moviepilot an OurDelta's pre-patchted MySQL 5.0 server that includes the …

[Read more]
Making “Replace Into” Fast, by Avoiding Disk Seeks

In this post two weeks ago, I explained why the semantics of normal ad-hoc insertions with a primary key are expensive because they require disk seeks on large data sets. Towards the end of the post, I claimed that it would be better to use “replace into” or “insert ignore” over normal inserts, because the semantics of these statements do NOT require disk seeks. In this post, I explain how the command “replace into” can be fast with fractal trees.

The semantics of “replace into” are as follows:


  • if the primary (or unique) key does not exist, insert the new row
  • if the primary (or unique) key does exist, overwrite the existing row with the new row

The slow, expensive way B-trees use to implement these semantics are:

[Read more]
Translation of Summary of Part 1 of "Methods for searching errors in SQL application" just published

There is not much new content this week. Just summary of what was written before.




Summary.


In the summary of this part I'd want to recommend following:



Try to find a query which causes inconsistent behavior



Use log files:



  1. General query log
  2. Log files in your application
  3. MySQL Proxy or any other suitable proxy
  4. Others



Analyze what is wrong, then solve the problem using your findings



Below I place a list of methods which we studied in the first part.


Method #1: use output operator to output query in exactly same way in which RDBMS gets it. …

[Read more]
SQL past and future

Ken North, writing in Dr. Dobb's Journal, gives a nice overview of the long and storied history of SQL. The piece helps one understand the wave of mergers among the big database vendors, and make sense of current trends in database and database-like software. And I'd like to offer my opinion about where SQL and database management systems are headed.

North looks into the claims that 'the database is dead' and finds that — yet again — reports of its death were greatly exaggerated:Forrester Research recently estimated the total database market (licenses, support, consulting) would grow from $27 billion in 2009 to $32 billion by 2012. SQL technology is entrenched in many organizations and across millions of web sites. Perhaps that explains why, during the past decade, IBM, Oracle, Sun and SAP made billion-dollar investments in a …

[Read more]
451 CAOS Links 2010.06.29

Elephants on parade: Hadoop goes mainstream. And more.

Follow 451 CAOS Links live @caostheory on Twitter and Identi.ca
“Tracking the open source news wires, so you don’t have to.”

Elephants on parade
# Cloudera launched v3 of its Distribution for Hadoop and released v1 of Cloudera Enterprise.

# Karmasphere released new Professional and Analyst Editions of its Hadoop development and deployment studio.

# Talend announced that its Integration Suite now offers native support for Hadoop.

# Yahoo …

[Read more]
Debugging InnoDB Locks using the new InnoDB Plugin’s Information Schema Tables

Tracking down InnoDB lock information using the new Information Schema tables provided with the InnoDB plugin has never been easier.

Long story short, the other day I was trying to identify what transaction was holding the lock for a particular UPDATE. The UPDATE would not complete and kept timing out with “Lock wait timeout exceeded; try restarting transaction”.

Of course I checked the output of SHOW ENGINE INNODB STATUS. From that output, I could tell it was the replication thread holding the lock, *since* it was the only other transaction running. But I could not verify this with output, it just had to be the case.

Next I enabled the InnoDB lock monitor, and examined that output too. Here, it showed more detail on the locks being held, however, it only shows the first 10 locks held by the replication thread. In this case, the …

[Read more]
Rows and Columns, We Can Definitely Live Together

In the world of relational database, we are all aware of the benefits that row oriented databases deliver to businesses. They have been designed to support the relational model, have driven the broad adoption of an approachable open language for managing data (SQL), and have been the foundation for high transaction applications including online airline reservation systems, massive online commerce solutions and massively multiplayer online games.

Over the course of many years these entrenched row oriented databases have been challenged by the likes of object/relational, NoSQL solutions and even Map/Reduce (though it is certainly not a database). Sometimes row-oriented databases are either overkill or not a suitable fit for the task at hand. NoSQL differs significantly from the row oriented relational database, especially when it comes to dealing with large volumes of unstructured data. As an example it may be useful for rapidly storing and …

[Read more]
Showing entries 22606 to 22615 of 44106
« 10 Newer Entries | 10 Older Entries »