My latest data integration challenge has been with a new node in
my data landscape: a hadoop/hive installation. Since PDI has
become my favorite hammer for many different tasks, I thought it
would be handy to get connected to the hive database via jdbc.
With that ability, I can enhance hive output by including lookups
and joins with operational ( MySQL ) databases.
Unfortunately, I didn't have much luck using standard connections
with jdbc and table input steps. I suppose this is because the
hive jdbc driver is still in the embryonic stage.
The turning point for my effort was the discovery of the new User
Defined Java Class in Pentaho 4.0 GA. I struggled a bit before
getting this to work, but I now have a simple working example
that returns the result of a hive query to the stream. There was
quite a bit of late night thrashing, so excuse the un-refined
code.
In summary, the keys to getting the udjc to work …
This chapter is about possible reasons of "Lost connection to MySQL server" error not discussed in previous one.
Chapter 10. Lost connection to MySQL server during query
You can see error "Lost connection to MySQL server" not only
because
too small connect_timeout, but because other reasons too. In
this
chapter we discuss these reasons.
$php phpconf2009_4.php
string(44) "Lost connection to MySQL server during
query"
Most likely error log will show what happened:
Rest of the chapter is here
In posts on June 30 and July 6, I explained how implementing the commands “replace into” and “insert ignore” with TokuDB’s fractal trees data structures can be two orders of magnitude faster than implementing them with B-trees. Towards the end of each post, I hinted at that there are some caveats that complicate the story a little. On July 21st I explained one caveat, secondary keys, and on August 3rd, Rich explained another caveat. In this post, I explain the other …
[Read more]This chapter is about possible reasons of "Lost connection to MySQL server" error not discussed in previous one.
Chapter 10. Lost connection to MySQL server during query
You can see error "Lost connection to MySQL server" not only
because
too small connect_timeout, but because other reasons too. In
this
chapter we discuss these reasons.
$php phpconf2009_4.php
string(44) "Lost connection to MySQL server during
query"
Most likely error log will show what happened:
Rest of the chapter is here
Prior posts addressed the performance benefits of a shared cache tier
(ScaleDB CAS) and also the storage flexibility it enables.This post
compares the ScaleDB CAS purpose-built file storage sharing
system against off-the-shelf solutions like NFS and various
cluster file systems (CFS).
When using a clustered database, like ScaleDB, each node has full
access to all of the data in the database. This means that the
file system (SAN, NAS, Cloud, etc.) must allow multiple nodes to
share the data in the file system.
Options include:
1. Network File System (NFS)
2. Cluster File System (CFS)
3. Purpose-built file storage interface
Locking Granularity:
I won’t get deeply …
It has been a while since I posted on my blog - in fact, I believe this is the first time ever that more than one month passed between posts since I started blogging. There are a couple of reasons for the lag:
-
- Matt Casters, Jos van Dongen and me have spent a lot of time finalizing our forthcoming book, Pentaho Kettle Solutions (Wiley, ISBN: 978-0-470-63517-9). The book is currently being produced, and should be available according to schedule in early September 2010. If you're interested, you might like to read …
If you read Percona's whitepaper on Goal-Driven Performance Optimization, you will notice that we define performance using the combination of three separate terms. You really want to read the paper, but let me summarize it here:
- Response Time - This is the time required to complete a desired task.
- Throughput - Throughput is measured in tasks completed per unit of time.
- Capacity - The system's capacity is the point where load cannot be increased without degrading response time below acceptable levels.
Setting and meeting your response time goal should always be your primary focus, but the closer throughput is to capacity the worse response time can be. It's a trade-off! …
[Read more]If you’ve been looking for a simple python script to use with MySQL that you can use to expand upon for your next project, check this one out. It has error handling for the connection, error handling for the sql call, and loop iteration for the rows returned.
#!/usr/bin/python
import sys
import MySQLdb
my_host = "localhost"
my_user = "user"
my_pass = "password"
my_db = "test"
try:
db = MySQLdb.connect(host=my_host, user=my_user, passwd=my_pass, db=my_db)
except MySQLdb.Error, e:
print "Error %d: %s" % (e.args[0], e.args[1])
sys.exit (1)
cursor = db.cursor()
sql = "select column1, column2 from table";
cursor.execute(sql)
results = cursor.fetchall()
for row in results:
column1 = row[0]
column2 = row[1]
print "column1: %s, column2: %s"%(column1,column2)
db.close()
Compliance. Funding. Financial results. Copyright assignment. And more.
Follow 451 CAOS Links live @caostheory on Twitter and
Identi.ca
“Tracking the open source news wires, so you don’t have
to.”
Compliance
# The Linux Foundation launched the Open Compliance Program, including
tools, training, and consulting.
Funding
# VentureBeat reported that Joyent has raised $7m in a
second round of funding.
# Basho Technologies secured $2m from angel investors in a Series C preferred equity financing.
# …
[Read more]Installing A Web, Email And MySQL Database Cluster (Mirror) On Debian 5.0 With ISPConfig 3
This tutorial describes the installation of a clustered Web, Email, Database and DNS server to be used for redundancy, high availability and load balancing on Debian 5 with the ISPConfig 3 control panel. GlusterFS will be used to mirror the data between the servers and ISPConfig for mirroring the configuration files. I will use a setup of two servers here for demonstration purposes but the setup can scale to a higher number of servers with only minor modifications in the GlusterFS configuration files.