Speakers: Christophe Bisciglia, Founder & Chief Strategy Officer, Cloudera (previously led Google’s Academic Cloud Computing Initiative) Jeff Hammerbacher, Founder & Chief Scientist, Cloudera (previously led Facebook’s Data Team) Tom White, Founding Engineer, Cloudera (Hadoop Committer) Aaron Kimball, Founding Engineer, Cloudera
Abstract: Recently, there has been a lot of buzz around MapReduce, and the Apache Hadoop project. Even more recently, we have seen proprietary SQL database systems add support for MapReduce. This is great for data, but challenging if you prefer open source solutions.
In this tutorial, we will provide both background knowledge and the practical experience necessary to combine these models to get more out of your data. We will use MySQL and Hadoop preloaded with interesting data, and make the complete system available in the cloud to participants. We will focus on how to conduct analysis leveraging both models and go over, in detail, the glue necessary to make this work. A common use case will be extracting data from MySQL, conducting analysis you can’t do with SQL via MapReduce, and having the results reloaded into a MySQL database.
Objectives: After this tutorial, participants should:Description: This tutorial will be three hours, and time will be split roughly evenly between instructional and practical components. Format will be wide open, so participants are free to interrupt, ask questions, and suggest focusing more in-depth on areas of specific interest. We will assume basic-to-intermediate knowledge of MySQL, and will most heavily target participants who are having trouble scaling with their data.
The instructional component will include:For the practical component, we will provide access to a large Hadoop (for MapReduce) installation in the cloud, as well as MySQL instances preloaded with interesting data. Users will get to write code that extracts data from MySQL (using both queries and dumps), uses MapReduce to analyze that data in greater depth, and dumps the results back into MySQL so they are available to existing systems. We will walk the users through a general data processing pipeline, step by step, but will focus on supporting them in conducting their own analysis.
Suggested Tracks: In decreasing order of preference.
Christophe Bisciglia joins Cloudera from Google, where he created and managed their Academic Cloud Computing Initiative. Starting in 2007, he began working with the University of Washington to teach students about Google’s core data management and processing technologies – MapReduce and GFS. This quickly brought Hadoop into the curriculum, and has since resulted in an extensive partnership with the National Science Foundation (NSF) which makes Google-hosted Hadoop clusters available for research and education worldwide. Beyond his work with Hadoop, he holds patents related to search quality and personalization, and spent a year working in Shanghai. Christophe earned his degree, and remains a visiting scientist, at the University of Washington.
Aaron Kimball has been working with Hadoop since early 2007. Aaron has worked with the NSF and several other universities nationally and internationally to advance education in the field of large-scale data-intensive computing. He helped create and deliver academic course materials first used at the University of Washington, which were later adopted by many other academic institutions, as well as Hadoop training materials used by several industry partners. Aaron has also worked as an independent consultant focusing on Hadoop and Amazon EC2-based systems. Aaron holds a B.S. in Computer Science from Cornell University, and an M.S. in Computer Science and Engineering from the University of Washington.
For information on exhibition and sponsorship opportunities at the conference, contact Sharon Cordesse at scordesse@oreilly.com
Download the MySQL Sponsor/Exhibitor Prospectus
Download the Media & Promotional Partner Brochure (PDF) for information on trade opportunities with O'Reilly conferences or contact mediapartners@ oreilly.com
For media-related inquiries, contact Maureen Jennings at maureen@oreilly.com
To stay abreast of conference news and to receive email notification when registration opens, please sign up for the MySQL Conference newsletter.
View a complete list of MySQL contacts.
Add a comment (requires login)