|Showing entries 1 to 30 of 30|
Enabling Real-Time MySQL to HDFS Integration
Batch processing delivered by Map/Reduce remains central to Apache Hadoop, but as the pressure to gain competitive advantage from “speed of thought” analytics grows, so Hadoop itself is undergoing significant evolution. The development of technologies allowing real time queries, such as Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN
To support this growing emphasis on real-time operations, we are releasing a new[Read more...]
This article will explain how the data is organized in InnoDB storage engine. First we will look at the various files that are created by InnoDB, then we look at the logical data organization like tablespaces, pages, segments and extents. We will explore each of them in some detail and discuss about their relationship with each other. At the end of this article, the reader will have a high level view of the data layout within the InnoDB storage engine.
MySQL will store all data within the data directory. The data directory can be specified using the command line option –data-dir or in the configuration file as datadir. Refer to the Server Command Options for complete details.
By default, when InnoDB is initialized, it creates 3[Read more...]
MySQL is deployed in 9 of the top 10 most trafficked sites on the web including Facebook, Twitter, eBay and YouTube, as well as in some of the fastest growing services such as Tumblr, Pinterest and box.com
Working with these companies has given MySQL developers, consultants and support engineers unique insight into how to design database-driven web architectures – whether deployed on-premise or in the cloud.
The MySQL Web Reference Architectures (http://www.mysql.com/why-mysql/white-papers/mysql-reference-architectures-for-scalable-web-infrastructure/) are a set of documented and repeatable best practices for building infrastructure that deliver the highest levels of scalability, agility and availability with the lowest levels of cost, risk and complexity.
Four components common to most web and mobile properties are sized, with optimum[Read more...]
Oracle's MySQL team is running/participating to a number of events during the upcoming weeks and months. Don't miss this chance to learn about the latest developments straight from the source and to get all your questions answered!
Additional events will likely be scheduled down the road and posted on our events page (http://www.mysql.com/news-and-events/events/), but you can already register for the following ones:
“Big Data” offers the potential for organizations to revolutionize their operations. With the volume of business data doubling every 1.2 years, analysts and business users are discovering very real benefits when integrating and analyzing data from multiple sources, enabling deeper insight into their customers, partners, and business processes.
As the world’s most popular open source database, and the most deployed database in the web and cloud, MySQL is a key component of many big data platforms, with Hadoop vendors estimating 80% of deployments are integrated with MySQL.
The new Guide to MySQL and Hadoop (http://www.mysql.com/why-mysql/white-papers/mysql_wp_hadoop.php) presents the tools enabling integration between the two data platforms, supporting the data lifecycle from[Read more...]
The scalability enhancements delivered by extensions to multi-threaded data nodes enables MySQL Cluster 7.2 (http://mysql.com/products/cluster/) to deliver over 8x higher[Read more...]
InSourceCode developers work on "Madison" with volunteers.
There wasn't a great deal of hacking, at least in the traditional sense, at the "first congressional hackathon." Given the general shiver that the word still evokes in many a Washingtonian in 2011, that might be for the best. The attendees gathered together in the halls of the United States House of Representatives didn't create a more interactive visualization of how laws are made or a mobile health app. As open government advocate Carl Malamud observed, the "hack" felt like something even rarer in the "Age of the App for[Read more...]
Jon Bruner's "American Migration" visualization, based on IRS data, demonstrates how "Americans are enormously mobile: 37.5 million people moved from one house to another last year, with 4.3 million of them moving between states." Bruner's interactive map lets you click on a specific county and see both the immigration and emigration data for that location — where folks move from and where they move to.
Screenshot from the "American Migration" visualization (click for full
Oracle's turn-about announcement of a NoSQL product wasn't really surprising. When Oracle spends time and effort putting down a technology, you can bet that its secretly impressed, and trying to re-implement it in its back room. So Oracle's paper "Debunking the NoSQL Hype" should really have been read as a backhanded product announcement. (By the way, don't click that link; the paper appears to have been taken down. Surprise.)
I have to agree with DataStax and other developers in the NoSQL movement:[Read more...]
A new breed of startup is emerging, built to take advantage of the rising tides of data across a variety of verticals and the maturing ecosystem of tools for its large-scale analysis.
These are data startups, and they are the sumo wrestlers on the startup stage. The weight of data is a source of their competitive advantage. But like their sumo mentors, size alone is not enough. The most successful of data startups must be fast (with data), big (with analytics), and focused (with services).
The question of[Read more...]
As we move toward a data economy, can we take the digital content model and apply it to data acquisition and sales? That's a suggestion that Gil Elbaz (@gilelbaz), CEO and co-founder of the data platform Factual made in passing at his recent talk at Web 2.0 Expo.
Elbaz spoke about some of the hurdles that startups face with big data — not just the question of storage, but the question of access. But as he addressed the emerging data economy, Elbaz said we will likely see novel access methods and new marketplaces for data. Startups will be able to build[Read more...]
The elmcity service connects to a half-dozen other services, including Eventful, Upcoming, EventBrite, Facebook, Delicious, and Yahoo. It's nice that each of these services provides an API that enables elmcity to read their data. It would be even nicer, though, if elmcity didn't have to query, navigate, and interpret the results of each of these APIs in different ways.
For example, the elmcity service asks the same question of Eventful, Upcoming, and EventBrite: "What are the titles, dates, times, locations, and URLs of recent events within radius R of location L?" It has to ask that question three different ways, and then interpret the answers three different ways. Can we imagine a more frictionless approach?
I can. Here's how the question might be asked in a general way using the[Read more...]
I chatted today about VMware's Cloud Foundry with Roger Bodamer, the EVP of products and technology at 10Gen. 10Gen's MongoDB is one of three back-ends (along with MySQL and Redis) supported from the start by Cloud Foundry.
If I understand Cloud Foundry and VMware's declared "Open PaaS" strategy, it should fill a gap in services. Suppose you are a developer who wants to loosen the bonds between your programs and the hardware they run on, for the sake of flexibility, fast ramp-up, or cost savings. Your choices are:
An IaaS (Infrastructure as a Service) product, which hands you an emulation of[Read more...]
Memcached is one of the technologies that holds the modern Internet together, but do you know what it actually does? Brian Aker has certainly earned the title of Memcached guru, and below he offers a peek under the hood. He'll also provide a deeper dive into Memcached in a tutorial at the upcoming 2011 MySQL Conference.
Letting data speak for itself through analysis of entire data sets is eclipsing modeling from subsets. In the past, all too often what were once disregarded as "outliers" on the far edges of a data model turned out to be the telltale signs of a micro-trend that became a major event. To enable this advanced analytics and integrate in real-time with operational processes, companies and public sector organizations are evolving their enterprise architectures to incorporate new tools and approaches.
Whether you prefer "big," "very large," "extremely large," "extreme," "total," or another adjective for the "X" in the "X Data" umbrella term, what's important is accelerated growth in three dimensions: volume, complexity and speed.
Big data is not without its limitations. Many organizations need to revisit business processes, solve data silo[Read more...]
A new healthcare project in Zambia is trying to integrate supervisors, clinics, and community healthcare workers (CHW) into a system that can improve patient service and provide more data about the effectiveness of care. Because of the technical challenges in an extreme rural setting, unique solutions are required. According to Cory Zue, chief technology officer of Dimagi, CouchDB went a long way toward keeping a consistent set of records under extreme circumstances. The full story will be laid out in Zue's talk at the upcoming MySQL conference, but here's a sneak peak.
Today, the United States Department of Commerce's National Telecommunications and Information Administration (NTIA) unveiled a new National Broadband Map, which can be viewed at BroadbandMap.gov.
The map includes more than 25 million searchable records and it incorporates crowdsourced reporting. Built entirely upon Wordpress, the map is also one of the largest implementations of open source and open data in government to date.
Importantly, the data behind the map shows that despite an increase in broadband adoption to 68%, a digital divide persists between citizens who have full access to the rich media of the 2011 Internet and those who are limited by geography or means.[Read more...]
MySQL keeps many different files, some contain real data, some contain meta data. Witch ones are important? Witch can your throw away?
This is my attempt to create a quick reference of all the files used by MySQL, whats in them, what can you do if they are missing, what can you do with them.
When I was working for Dell doing Linux support my first words to a customer where “DO YOU HAVE COMPLETE AND VERIFIED BACKUP?” Make one now before you think about doing anything I suggest here.
You should always try to manage your data through a MySQL client. If things have gone very bad this may not be possible. MySQL may not start. If your file system get corrupt you may have missing files. Sometimes people create other files in the MySQL directory (BAD). This should help you understand what is safe to remove.
Before you try to work with one of[Read more...]
HandlerSocket is cool. But, it turns out there are a few issues.
Justin Swanhart points out HandlerSocket currently lacks atomic operations . Since HandlerSocket uses different connections for reading and writing, you can’t increment/decrement a value without creating a race condition.
Still, the idea of skipping SQL interpretation and just reading the data you know you want is a great one. Writing data might even be better. But being able to use both SQL and NoSQL could be really wonderful. What if we could use complex queries to update complex tables and pluck values out as needed. For example, queries to analyze current weather conditions and produce forecasts that we could then retrieve via a location key? What about updating current condition data[Read more...]
Tomorrow's augmented reality is being built today on mobile devices. The Tasker application for Android is a fun platform for prototyping personal automation and sensing applications. Described modestly as an application which "performs tasks based on contexts," it gives non-programmers access to the sensing and system features of the[Read more...]
One of the themes from News Foo that continues to resonate with me is the importance of data journalism. That skillset has received renewed attention this winter after Tim Berners-Lee called analyzing data the future of journalism.
When you look at data journalism and the big picture, as USA Today's Anthony DeBarros did at his blog in November, it's clear the recent suite of technologies is part of a continuum of technologically enhanced storytelling that traces back to computer-assisted reporting (CAR).
As DeBarros pointed out,[Read more...]
The trend for NoSQL stores such as memcache for fast key-value storage should give us pause for thought: what have regular database vendors been doing all this time? An important new project, HandlerSocket, seeks to leverage MySQL's raw speed for key-value storage.
NoSQL databases offer fast key-value storage for use in backing web applications, but years of work on regular relational[Read more...]
Today's databases are designed for the spinning platter of the hard disk. They take into account that the slowest part of reading data is seeking: physically getting the read head to the part of the disk it needs to be in. But the emergence of cost effective solid state drives (SSD) is changing all those assumptions.
Over the course of 2010, systems designers have been realizing[Read more...]
|Showing entries 1 to 30 of 30|