Showing entries 1 to 8
Displaying posts with tag: data mining (reset)
More Metadata Is Written Into Binary Log

In row based replication, the row data generated by DML is logged into binary log with some metadata. For example column type, type length etc. In the new release MySQL-8.0.1, more table metadata are written into binary log. All metadata together brings users and us below benefits:

  • Allows us to build robuster replication and convert row data between types smoothly.

Getting Data into Hadoop in real-time

Moving data between databases is hard. Without ever intending it, I seem to have spent a lifetime working on solutions for getting data into and out of databases, but more frequently between. In fact, my first job out of university was migrating data from BRS/Text, a free-text database (probably what we would call a NoSQL) into a more structured Oracle.

Today I spend some of my time working in Big Data, more often than not, migrating information from existing data stores into Big Data so that they can be analysed, something I covered in more detail here:

http://www.ibm.com/developerworks/library/bd-sqltohadoop1/index.html
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html

[Read more]
Four short links: 21 October 2010
  1. Using MysQL as NoSQL -- 750,000+ qps on a commodity MySQL/InnoDB 5.1 server from remote web clients.
  2. Making an SLR Camera from Scratch -- amazing piece of hardware devotion. (via hackaday.com)
  3. Mac App Store Guidelines -- Apple announce an app store for the Macintosh, similar to its app store for iPhones and iPads. "Mac App" no longer means generic "program", it has a new and specific meaning, a program that must be installed through the App store and which has limited functionality …
[Read more]
Four short links: 10 December 2009
  1. Scriblio -- open source CMS and catalogue built on WordPress, with faceted search and browse. (via titine on Delicious)
  2. Useful Temporal Functions and Queries -- SQL tricksies for those working with timeseries data. (via mbiddulph on Delicious)
  3. Optimal Starting Prices for Negotiations and Auctions --Mind Hacks discussion of a research paper on whether high or low initial prices lead to higher price outcomes in negotiations and online auctions. Many negotiation books recommend waiting for the other side to …
[Read more]
Four short links: 1 December 2009
  1. Apertus -- open source cinema camera. (via joshua on Delicious)
  2. A Survey of Collaborative Filtering Techniques -- From basic techniques to the state-of-the-art, we attempt to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area. (via bos on Delicious)
  3. Drizzle Replication using RabbitMQ as Transport -- we're watching the growing use of message queues in web software, and here's an interesting application. (via …
[Read more]
Four short links: 26 October 2009
  1. Toiling in the Data Mines -- Tom Armitage describes the process that Berg calls "material exploration". Programmers very rarely talk about what their work feels like to do, and that's a shame. Material explorations are something I've really only done since I've joined BERG, and both times have felt very similar - in that they were very, very different to writing production code for an understood product. They demand code to be used as a sculpting tool, rather than as an engineering material, and I wanted to explain the knock-on effects of that: not just in terms of what I do, and the kind of code that's appropriate for that, but also in terms of how I feel as I work on these explorations. Even if the section on the code itself feels foreign, I hope that the explanation of what it …
[Read more]
Business Intelligence for the People



Business intelligence has been talked about for quite a while. Even today, while companies are looking to make budget cuts, some experts are saying that BI can be used to beat the recession.

When I hear about BI systems, the first thing that comes to my mind is a huge and expensive system that has very powerful servers, that sucks data from many sources and runs some intensive and even more expensive reporting suite. Since I had been involved in projects to set those systems up, I know that it can probably take around a year to complete.

So everyone is in fact thinking about saving money yet still being …

[Read more]
SQL Puzzle

Dear lazyweb,

I want to mine a code repository for data to map past bugs to sourcecode files.

I have written a small PHP script (the initial version of the script can be found here) to import the relevant data from a Subversion repository into the following tables of a relational database:

bugs            changes         paths
--------        --------        -------
bug_id          path_id   <-->  path_id
revision  <-->  revision        path

What I need now is two queries to ask the database for

  • paths that are most commonly changed during bugfix commits and
  • paths that are commonly changed together …
[Read more]
Showing entries 1 to 8