Big Data – What is it and why is it important?
There’s lots of debate about exactly what constitutes “big” when talking about big data. Technical folks may be inclined to want a specific number.
But when most CTOs and operations managers are talking about big data, they mean data warehouse and analytics databases. Data warehouses are unique in that they are tuned to run large reporting queries and churn through large multi-million row tables. Here you load up on indexes to support those reports, because the data is not constantly changing as in a web-facing transaction oriented database.
More and more databases such as MySQL which were originally built as web-facing databases are being used to support big data analytics. MySQL does have some advanced features to support large databases such as partitioned tables, but many operations still cannot be done *online* such as table alters, and index creation. In these cases configuring MySQL in a master-master active/passive cluster provides higher availability. Perform blocking operations on the inactive side of the cluster, and then switch the active node.
We’ve worked with MySQL databases as large as 750G in size and single user tables as large as 40 million records without problems. Table size, however has to be taken into consideration for many operations and queries. But as long as your tables are indexed to fit the query, and you minimize table scans especially on joins, your MySQL database server will happily support these huge datasets.
(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)