Showing entries 1 to 7
Displaying posts with tag: General BI (reset)
NoSQL Now 2011: Review of AdHoc Analytic Architectures

For those that weren’t able to attend the fantastic NoSQL Now Conference in San Jose last week, but are still interested in the slides about how people are doing Ad Hoc analytics on top of NoSQL data systems, here’s my slides from my presentation:

No sql now2011_review_of_adhoc_architectures View more presentations from ngoodman We obviously continue to hear from our community that LucidDB is a great solution sitting in front of a Big Data/NoSQL system. Allowing easy SQL access (including super fast, analytic database cached views) is a big win for reducing …

[Read more]
DynamoBI: website? bits?

Well, what a soft launch it has been.
Some people have asked:

When are you going to get a website? Errr…. Soon! We soft launched a bit early, due to some “leaking information” but figured heck, it’s open source let’s let it all out. Soon enough, I swear!

Where can I download DynamoDB? Errr… you can’t yet cause we haven’t finished our build/QA/certification process.

However, since DynamoDB is the alter ego business suit wearing brother of LucidDB, just download the 0.9.2 release if you want to get a sense of what DynamoDB is.

There are 3 built binaries (Linux 32, Linux 64, and Windows 32): and you can find installation instructions …

[Read more]

SQL is great. There are millions of people who know it, it’s semantics are standardized (well, as much as anything is ever I suppose), and it’s a great language for interacting with data. The folks at SQLstream have taken a core piece of the technology world, and turbocharged it for a whole new application: querying streams of data. You can think of what SQLstream is doing as the ‘calculus’ of SQL - not just query results of static data in databases but query results of a stream of records flowing by.

Let’s consider some of the possibilities. Remember the results aren’t a one time poll - the values will continue to be updated as the query continues.

// Select the stocks Average price over the last 10 ticks
select stream stock, avg(price) over (previous 10) from stock_ticker group by stock …

[Read more]
Kettles secret in-memory database

Kettles secret in-memory database is

  1. Not actually secret
  2. Not actually Kettles

There. I said it, and I feel much better.
In most circumstances, Kettle is used in conjunction with a database. You are typically doing something with a database: INSERTs, UPDATEs, DELETEs, UPSERTs, DIMENSION UPDATEs, etc. While I do know of some people that are using Kettle without a database (think log munching and summarization) a database is something that a Kettle developer almost always has at their disposal.

Sometimes there isn’t a database. Sometimes you don’t want the slowdown of persistence in a database. Sometimes you just want Kettle to just have an in memory blackboard across transformations. Sometimes you want to ship an example to a customer using database operations but don’t want to fuss with database install, dump files, etc.

Kettle ships with a Hypersonic driver, and therefore, …

[Read more]
MySQL Archive Tablespace for FACTs

I’m visiting a Pentaho customer right now whose current “transaction” volume is 200 million rows per day.  Relatively speaking, this puts their planned warehouse in the top quintile of size.  They will face significant issues with load times, data storage, processing reliability, etc.  Kettle is the tool they selected and it is working really well.  Distributed record processing using Kettle and a FOSS database is a classic case study for Martens scale out manifesto

This organization doesn’t have unlimited budget.  Specifically, they don’t have a telecom type budget for their telecom like volume of data.  One of the issues that has come up with their implementation has been the tradeoff between space, and keeping the …

[Read more]
2gig + 2gig = 50gig

I was recently writing up a volume and performance specification for a customer project when a discussion arose with the current DBA staff about volume projections. The intuitive thinking was the volume requirements for the BI/Data Warehouse would be the sum of the systems from which it sourced data. The group was thinking this would be a good way to approximate the required space for the system. I asserted this method is flawed, and had to suggest why that BI volume is proportionate but not necessariliy directly proportionate.
BI systems required much greater storage than the sum of their sources because:

  • Data are denormalized. With denormalizing data to increase query performance one increases storage from 1-10000 times (it depends on the data, perhaps more).
  • New Data are created. There are many analytically significant items that occur that never even show up in …
[Read more]
Database Operation Complexity Reference

I mentioned this previously, but I’ve been reading “Principles of Distributed Database Systems.” I’m enjoying it, and it’s helping me solidify many of the concepts that I apply daily in my capacity as a Principal BI Solutions consultant. Database theory, and specifically as it relates to tuning is part of any professionals work with Oracle. We’ve all deciphered the performance implications and clues for improvements from EXPLAIN PLAN. I’ve always been told you want to be able to give Oracle the clues/configurations to enable filter of results (selection) before joining. It always made sense but I had never understood fully the concepts behind these recommendations. Until now…

I don’t espouse to …

[Read more]
Showing entries 1 to 7