Is Your Database Ready for Big Data?
February 26, 2013 Leave a comment
Chris Eaton, Technical Specialist, IBM
A lot of people think of Big Data as simply Hadoop. But it’s so much more than that.
IBM, for example, has an entire Big Data Platform which covers analytics on data at rest like with Hadoop, in-databases analytics, and analytics on data in motion with streaming analysis. One way to think about Big Data is that it is a “tool” to increase the IQ of your company. There is data that you already have in house and also data flowing through your corporate systems (much of which you simply throw away today) which could be leveraged to make your company smarter – Smarter about your system behavior (availability of your services to your clients for example), smarter about your clients buying behavior and even smarter about how much influence some of your clients currently have on others in their social circles (some of whom may not currently be your clients).
So where do databases fit in today and into the future? Of course they are an integral part of the big data platform. Currently you store shiny gold nuggets of information in your relational databases. You have spent time transforming the data in your relational systems to make sure they have integrity and consistency. Most would consider that relational data has a high value per byte (compared to a Hadoop or a streaming solution wherein much of the data has low value per byte, but in aggregate is very valuable data).
So what’s on the horizon and is your database ready? As more and more data is analyzed in real time, there is and will be a need to store this data in a richer format (shine it up, so to speak). This means that there will be an increasing need for databases to ingest vast amounts of data in real time and on the other side for databases to be able to perform analysis of larger and large quantities of data in seconds, not minutes or hours.
You may have seen in DB2 10, a new INGEST utility to more efficiently handle continuous data feeding into the database. What about the query side? Well, certainly DB2 DPF can handle massive amounts of data today leveraging the power of large scale out clustering (much like Hadoop divides and conquers big data problems with lots of parallelism). But what about those cases where the amount of data isn’t so huge that I need a scale out solution but the analysis needs to be done in real time? There are lots of niche and not so niche vendors out there looking at in-memory techniques for speeding up queries; but most of them require the data to be sucked out of the database system and stored in repositories that are much less flexible (sometimes needing to store everything in-memory, which can be very inflexible).
It seems to me that this high speed in-memory analytics, especially for line of business type workloads will take off in the next several years faster than we have seen it in the past. Giving line of business users the power to analyze large quantities of data in real time without having to force IT build one-off repositories, I think, is only going to grow. The flexibility of in database merged with in-memory analytics will give IT the ability to leverage existing database assets and therefore build line of business solutions much faster.
So, keep your eyes on this space and make sure you’re on a path to exploit your existing databases and thereby make your company smarter.
To find out more about managing big data, join IBM for a free event: http://ibm.co/BigDataEvent