|Showing entries 1 to 18|
I am doing a reporting proof-of-concept (POC) for my company. Business intelligence (BI) is often the last thing that gets thought of during an application’s life cycle because it’s only really necessary after you get customers. Before that, the main focus is on application features. Soon after launch, your coworkers and your customers start asking questions about usage and adoption, and customers start to ask for summary information on their data as well as just dumping their data. If you’re the only guy in charge of the database, this is often overwhelming.
Thankfully, we now have many options for what’s called self-service BI. The developer or DBA sets up the basic data models (say Orders) in an automated tool like SQL Server Reporting Services (SSRS) and then allows an[Read more...]
So this is about a SELECT COUNT(*) FROM tblname without a WHERE clause. MyISAM has an optimisation for that since it maintains a rowcount for each table. InnoDB and PBXT can’t do that (at least not easily) because of their multi-versioned nature… different transactions may see a different number of rows for the table table!
So, it’s kinda known but nevertheless often ignored that this operation on InnoDB is costly in terms of time; what InnoDB has to do to figure out the exact number of rows is scan the primary key and just tally. Of course it’s faster if it doesn’t have to read a lot of the blocks from disk (i.e. smaller dataset or a large enough buffer pool).
I was curious about PBXT’s performance on this, and behold it appears to be quite a bit faster! For a table with 50 million rows, PBXT[Read more...]
I’m seeking help from the MySQL community for what tools are used today to generate complex reports for enterprise applications that use MySQL. In an Oracle world, you have Oracle Report Writer, in Microsoft Crystal Reports.
If anybody has experience using or evaluating these tools please let me know. This may lead to possible work.
Reporting queries (I will use this term here) are the queries which summaries and groups data over the certain period of time. For example, in Social Network site we want to know how many messages have been sent for the given period of time, group by region and status (sent, received, etc), order by number of messages sent.
As an example I will take a table which is used to send SMS (text messages).
SQL: select concat('+', substring(region_code,1 ,2), 'xxx') as reg, status, count(*) as cnt
where submition_date between '2009-01-01' and '2009-04-01' group by reg, status
having cnt>100 order by cnt desc, status limit 100;
This query will do a range scan over the submition_date and perform a filesort. There are common well known approaches which can be used to optimize table[Read more...]
Update for all of those that are following the progress of Monolith. I’ve been busy this week coding the reporting engine - the code that analyses the hundreds of variables that are being collected - and I’m pleased to say that Perl was the correct choice for this application.Installation on the client servers will be a breeze, and relatively few non-standard modules are required for the server application.
I’ve been silently wondering for some time if the reporting engine equations should be part of a pluggable system (stored in the database) or if they should be hard coded. After writing up all of the equations, I’ve come to the conclusion that, for a quicker release, the equations and reporting engine will be hard-coded this time around for the default system alerts. Howeve -> user-defined alerts will be pluggable. Revisions ahead of 2.0 might feature pluggable main[Read more...]
Large databases, long mysqldump times, long waits for globally locked tables. These problems basically never go away when you rely on mysqldump with –all-databases or a list of databases, as it dumps schemas serially. I’m not going to explain serial vs parallel processing here since that’s a larger topic. Suffice to say that in these days of multi-core / multi-cpu servers we only make use of one processor’s core when we serially export databases using mysqldump. So, I have a new script that attempts to alleviate those issues and now I need testers to provide feedback/improvements.
In order to keep some sanity when dealing with hundreds of database servers, the script takes care of the following:
I started using Oracle, a MVCC database, to develop reporting (data warehousing, BI, take your pick) systems years ago. I’ve come to appreciate the scalability improvements that MVCC provides, particularly for pseudo real-time reporting applications, the ones where loads are occurring at the same time as report generation. So when people say InnoDB, partly due to MVCC, isn’t as good as MyISAM for reporting I had to look into this in more detail.
What I found is InnoDB is a good engine for reporting. In some ways, such as performance, it is at times better than MyISAM, and one of the downsides, such as a larger disk requirement, can be mitigated. The trick is to for the primary key to be the one predominant access path. In this example, the InnoDB clustered index, is purchaseDate and another column,[Read more...]
|Showing entries 1 to 18|