Big Data/Analytics Zone is brought to you in partnership with:

Sean Hull is a renowned author, speaker and advisor on MySQL & Oracle databases, web performance and scalability, EC2, Linux & Internet Architecture. He is the author of "Oracle & Open Source" O'Reilly, 2001, and speaks widely at conferences and forums. With 20 years of professional experience, he consults with large enterprises and startups in New York City and the San Francisco Bay Area. Sean is a DZone MVB and is not an employee of DZone and has posted 64 posts at DZone. You can read more from them at their website. View Full User Profile

Big Data – What is it and why is it important?

07.22.2012
| 3945 views |
  • submit to reddit

There’s lots of debate about exactly what constitutes “big” when talking about big data.  Technical folks may be inclined to want a specific number.

But when most CTOs and operations managers are talking about big data, they mean data warehouse and analytics databases.  Data warehouses are unique in that they are tuned to run large reporting queries and churn through large multi-million row tables.  Here you load up on indexes to support those reports, because the data is not constantly changing as in a web-facing transaction oriented database.

More and more databases such as MySQL which were originally built as web-facing databases are being used to support big data analytics.  MySQL does have some advanced features to support large databases such as partitioned tables, but many operations still cannot be done *online* such as table alters, and index creation.  In these cases configuring MySQL in a master-master active/passive cluster provides higher availability.  Perform blocking operations on the inactive side of the cluster, and then switch the active node.

We’ve worked with MySQL databases as large as 750G in size and single user tables as large as 40 million records without problems.  Table size, however has to be taken into consideration for many operations and queries.  But as long as your tables are indexed to fit the query, and you minimize table scans especially on joins, your MySQL database server will happily support these huge datasets.

Published at DZone with permission of Sean Hull, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: