Data Analytics, or How Much Info for a Buck?

Bill Cole

Bill Cole – Competitive Sales Specialist,Information Management, IBM

Leave only footprints; take only pictures.  Have you seen that slogan in a national park?  My wife (she’s now an ex) didn’t believe the signs that told us to leave everything exactly where it was.  She didn’t want to just enjoy the beauty.  She wanted to take some home with us.  The flashing light of the Park Ranger car told me we were in trouble for picking up a few rocks along the side of the road.  The nice man in the Smokey hat told me to put the rocks back.  The scenery is for consumption with your eyes, your camera, not for taking home.  I did as instructed, happy to be leaving with my wallet in one piece.

I’ve always produced data and then turned it into information by adding other bits of data together and adding some context.  My users guided me for a while and then I both guided and pushed them.  This seemed to be the natural order of things, sort of like factories and the folks who buy the goods from those factories.

The IT/BI/DA teams accumulate and store the data and then massage to build what are essentially standard reports.  Standard reports are good for standard thinking, of course.  If you know the answer you’re looking for, a standard report probably has it in there somewhere, like those old balance sheets and ledgers that I ran so long ago.  But there was nothing in those reports that would help think outside of the data on those reports.  In fact, there was so little insight in them that one of the plant managers actually asked me what good these reports were.  There’s really not a good response to that one.

Insights are gained when the lines of business can chase an idea through all sorts of non-standard iterations.  Almost like chasing one of those happy mistakes from science, like penicillin, or those ubiquitous not-very-sticky note sheets that we all stick all over everything so we can easily keep track of passwords, etc.  LOL, like you haven’t done that.

So how do we get to this idea-chasing sort of thing?  This place where the data analysts or, better still, the line of business user can see something interesting and start chasing it?  This is custom-developed solution, a virtual pair of bespoke shoes that were for your situation and only for your situation.  The person in the next cubicle needn’t look over your shoulder.  It would do them no good after all.  There’s a scene in the Maureen O’Hara/John Wayne move “The Quiet Man” in which John asks directions and the local says “Do you see that road over there?  Don’t take it, it’ll do you no good.”  Insights are like that.  You need to know not to walk down a road that will do you no good.

The trick, it seems to me, is having the right tools.  Let’s start with the database (you know I’m a practicing DBA and that means all discussions start with the database).  DB2 BLU is exactly the right repository for your decision-making data.  After all, it offers both row- and column-oriented models in a single database!  This means you’re getting performance no matter which way your data chooses to be represented.  Moreover, there are different kinds of compression to ensure you save space and improve performance.  What could be better?  And all for the price of an upgrade!  Easy.  No-brainer.

There’s a neat coda to this, too.  You’re not confined to the old solution of finding a server, building it and installing the software, then building the database.  Let’s talk choices, folks.  Lots of choices.  Maybe every choice.  On premise, just like we’ve always done, works.  Maybe your own cloud would be better.  Build your BI/DA system in a PureFlex or PureApp or PureData cloud hosted in your own data center.  There’s a simple solution with lots of benefits including workload management.  Set it and forget it and go on about your business.  Maybe DBaaS works better.  Virtualize the workload and database in an existing private cloud to make use of those “excess” mips.  (Parkinson’s Law says that any organization grows to fill all the space available.  I think the demand for mips grows to fill the available servers, thus negating the concept of “Excess mips.”)  There’s SoftLayer for either a public or private cloud.  Remember, they’ll go all the way to bare metal if that’s what you need.  Finally, maybe best, is DB2 BLU available in the cloud. I championed this a while back and it’s now reality.  A pre-configured database that IBM manages and maintains, including backups and upgrades.  Talk about easy!  Go ahead, get some sleep.  We’ve got this one.

One last thought about the tools.  InfoSphere Analytics Server will do the analysis for you and present your users with suggested insights right out of the box.  And it will help the folks find their own insights by helping them look, filter and massage the data in any way that suits them.  It’s a cool tool for those times when you need the freedom to find your own way through the forest of data.

Finally, I’ve always kept two Robert Frost poems on my wall.  Perhaps, “Two Roads Diverged in a Yellow Wood” is the one for this post.  We in IT need to give the folks in the lines of business the right tools to chase down the new roads, new insights.  We’ll give the GPS for the roads less traveled by.  Good luck on your journeys of exploration!

The other poem is “Stopping By Woods On a Snowy Evening,” of course.  We all have miles to go before we sleep, before our work is complete, and using the right tools makes those miles ever so much more productive.  Bundle up on those snowy evenings and enjoy the ride.

Follow Bill Cole on Twitter : @billcole_ibm

Visit the IBM BLU HUB to learn more about the next gen in-memory database technology!

DB2 with BLU Acceleration and Intel – A great partnership!

Allen Wei

Allen Wei, DB2 Warehousing Technology, System Verification Test, IBM

DB2 with BLU Acceleration is a state of art columnar store RDBMS (Relational Database Management System) master piece that combines and exploits some of the best technologies from IBM and Intel. In the video linked below, there is mention of an 88x speedup when compared with the previous generation of row store RDBMS on the exact same workload. That announcement was made during IBM IOD in November 2013.

Guess what? In a test done a few days ago (less than 3 months after the video was filmed), the speedup, again comparing DB2 with BLU Acceleration with row store RDBMS using the exact same workload on new Intel Xeon IVY-EX based hardware, is now 148x. Really? Need I say more? This shows that not only is DB2 with BLU Acceleration equipped with innovative technologies, but it also combines the exact set of technologies from both RDBMS and hardware advancement that you really need. This helps BLU Acceleration to fully exploit hardware capacities to the extreme and to give you the best ROI (Return on Investment) that every CTO dreams about.

You might start wondering if this is too good to be true. I have shown you the numbers. So, no, it is the truth. You might want to ask, even if this is true, is it complicated? Well, it does take discipline, innovative thinking and effort to offer technologies like this. However, my answer is again a No. It’s completely the opposite! In fact, As Seen on TV (the video clip), it’s as simple as – create your tables; load your data and voila!! Start using your data. There is no need for extensive performance tuning, mind-boggling query optimization or blood-boiling index creation. Leave these tasks to DB2 with BLU Acceleration.  Can’t wait to try for yourself? It really is that fast and simple.

Do you need to hear more before you are 100% convinced? Let me begin by recalling a few key disruptive technologies that are built into DB2 with BLU Acceleration. This is mentioned in the video clip as well, and I will prove to you that we are not listing for the sake of listing them.

What state of the art technology was built into DB2 with BLU Acceleration that makes it so great? Here is a summary of what you saw on in the video clip:

# Dynamic In-Memory Technology – loads terabytes of data into random access memory instead of hard disks, streamlining query workloads even when data sets exceed the size of the memory.

  • This allows the CPU to operate efficiently without waiting on the disk I/O operations
  • In my case, for one instance, I could fit 2TB database into 256GB RAM or 1/8 of the database size
  • I could also fit a 10TB database into 1TB RAM or 1/10 of the database size, in another test.

# Actionable Compression – Deep data compression and perform actions directly on uncompressed data

  • Deep compression
  • I noticed a storage space consumption that was 2.8x – 4.6x smaller than corresponding row-store database, depending on the size of  the database
  • Data can be accessed as is in the compressed form, no decompression needed
  • CPU can dedicate power to query processing not on decompression algorithms.

#  Parallel Vector Processing – Fully utilize available CPU cores

  • Vector is processed more efficiently hence there’s an increase in CPU efficiency
  • All CPU cores are fully exploited.

#  Data Skipping – Jump directly to where the data is

  • We do not need to process irrelevant data.

Have  you been convinced, yet? I know you have. However, you don’t need to just take my word for it. Try it. The time you spent on reading the blog and trying to find a loophole is enough to give yourself a high performance database from scratch. Period.

Read more about the Intel and BLU Acceleration partnership here : DB2 BLU Acceleration on Intel Xeon E7v2 Solutions Brief 

Allen Wei joined IBM as a developer for the BI OLAP product line, including OLAP Miner. He was a key member of the Infosphere product line, and has lead Infosphere Warehouse and DB2 LUW SVTs. Currently  he focuses on tuning the performance of BLU Acceleration, mainly w.r.t the Intel partnership.

Visit the IBM BLU HUB to learn more about the next gen in-memory database technology!

Also checkout this post on the Intel Blog about how IBM and Intel have been working together to extract big data insights.

Introducing the IBM BLU Acceleration Hub

John Park-pic

John Park , Product Manager – DB2, BLU Acceleration for Cloud.

Hemingway once wrote “There is nothing to writing. All you do is sit down at the typerwriter and bleed” — he also wrote — “The best way to find out if you trust someone is to trust them.”

So when Susan Visser “pinged” me on IBM’s venerable Sametime system asking me to blog for the launch of ibmbluhub.com my immediate response was “I don’t blog, I only know how to post pictures of my pets and kid to facebook” she responded, “It’s easy, trust me”. Hence the quotes.

So here I am, and who am I? Well, my name is John Park. I am an IBM’er, I am a DB2 Product Manager and as of today, I am a blogger (?).

My IBM life has revolved around the DB2 and the analytics space, starting off as a developer building the engn_sqm component (think snapshot, event monitor and governor) – so if your stuff is broke – its probably my fault.

Then I moved into the honorable realm of product management, leading the charge on products such as Smart Analytics Systems, PureData for Operational Analytics and now BLU Acceleration for Cloud … which is why I guess I’m here.

On a personal note, I like to build stuff – specifically I like to build cool stuff, and BLU Acceleration is freak’in cool. When I think about the significance of this technology I recollect back to fixing Version 7 of DB2, building V8 and my last piece of code in v9.5. All along the way the DB2 team building features and products which helped our customers and our users, use DB2.

Personally, I see BLU as a convergence point, the pinnacle of where all the years of engineering and thought leadership have finally come to “eureka”.  Let me guide you in my thinking …

Autonomic features such as Automatic Maintenance, Self Tuning Memory Management and Automatic Workload Management were all incremental steps along DB2 Version releases, each fixed a problem the DB2 user had and improved the usage of DB2

DB2’s compression story started with row compression, index compression, then went to adaptive row compression and now to actionable compression and with each compression story a better value proposition to the DB2 user.  (Note that the word compression is used is 6 times!)

DB2’s performance optimization journey went from database partitioning and MPP to table and range partitioning, continuous ingest, multi-temp and workload management, making DB2 a leader in performance across all workloads.

Usability in its simplest form, value driven compression and unprecedented performance, are the three key tenets to the development of BLU. These features improved DB2 incrementally between versions, and as the product incrementally grew, our engineers experience and creativity expanded. With BLU we see these features and the knowledge gained from developing these features transform and support the simplest statement I have ever seen in enterprise software – “Create, Load and GO”. Simply amazing.

Welcome to the world, “ibmbluhub.com”, for the readers, enjoy the ride, this is the first step in a new direction for data management and technology. I haven’t even talked about BLU Acceleration for Cloud ….

Until then, here is a picture of my cat.

John Park

Data Warehousing on the Cloud with BLU Acceleration

Adam Ronthal

Adam Ronthal – Technical Marketing, BLU Acceleration for Cloud, IBM

I recently took on a new role at IBM focused on technical marketing for BLU Acceleration for Cloud, IBM’s new cloud-based agile data warehousing solution.

Why cloud?  And specifically, why analytics in the cloud?  Analytics, long recognized as a competitive differentiator has traditionally required significant resources — both in skills, and capital investment to enter the game.    Most on-premise data warehouses usually have at least a six-figure price tag associated with them, with many implementations costing millions.  And while you do get significant value and performance with an on-premise implementation, that capital investment means longer procurement lead times, and longer lead times in general to ramp up an analytics project

Cloud computing represents a paradigm shift… now even small organizations with limited budgets and resources can access the same powerful analytic technology leveraged in the most advanced analytic environments.  BLU for Cloud is a columnar, in-memory solution that brings appliance simplicity and ease of use for data warehousing and analytics to everyone — all for less than the price of a cup of coffee per hour.[1]

BLU for Cloud is perfect for:

  • Pop-up Analytics Environments – need a quick, agile data warehouse for a temporary project?  Put it in the cloud!
  • Dev/Test Environments – Yes, it’s compatible with the enterprise databases already in use within your organization because it’s based on DB2, an industry standard!
  • Analytic Marts – Augment and modernize your existing data warehouse infrastructure by leveraging cloud flexibility
  • Self Contained Agile Data Warehousing - leverage BLU for Cloud for almost any analytics application

Come find out more at my PULSE and TDWI sessions in Las Vegas next week!

At PULSE:  Tuesday, Feb 25, 5:00pm at the Expo Theater

At the TDWI World Conference:  Wednesday, Feb 26, 12:35pm

Or check out the BLU for Cloud website at http://www.bluforcloud.com for more details.


[1] To be fair, we’re probably talking Starbucks, not Dunkin Donuts…

More on the author – 
Adam Ronthal has worked in the technology industry for over 19 years in technical operations, system administration, and data warehousing and analytics. In 2006, Adam joined Netezza as a Technical Account Manager, working with some of IBM Netezza’s largest data warehousing and analytic customers and helping them architect and implement their Netezza-based solutions. Adam led the team to write the Netezza NZLaunch Handbook, a practical implementation guide for IBM Netezza customers, and served as editor of the final guide. Today, Adam works in technical marketing for IBM’s Cloud, Big Data, and Appliance offerings. Adam is an IBM Certified Specialist for Netezza, and holds a BA from Yale University.

Here’s an interesting video on BLU for Cloud :

Tales from the Chipset

Bill Cole

Bill Cole – Competitive Sales Specialist,Information Management, IBM

My history in computing is littered with different roles.  Everything from operator to programmer to architect as well as (forgive me) manager and director.  A few lives back, I worked for a company that built the first computer on a single chip.  It was a beast to make it productive outside the lab.  In fact, it was so difficult that I was the first one to make it work in the field.  The funny thing was that the whole configuration process was carried out using (this is absolutely true) staples.  That’s right, staples!  I keep a copy of that chip in my memento cabinet, too.  By now the whole adventure seems a bit surreal.  It was fun, though, and I learned a lot.

That was my first adventure with chipsets.  But not my last.  Later, when I managed a database development team for that company, I had a further education into just how software and hardware work together to build a system that delivers the performance customers need to address business problems and opportunities.

That’s what makes a system special, I think.  It’s not a matter of simply slapping together some software or building a chip and hoping customers will show up.  Sure, you can build vanilla software that works on a chipset but there’s no synergy in that.  Synergy where the system is better than the parts.  That’s what DB2 on Intel is all about.  The IBM and Intel team consists of engineers and developers from both companies work together to optimize the chip and the DB2 engine so that our mutual customers get the fastest return on investment.

So, you ask, how is that different from any other database?  It’s not just different, it’s unique.  Doesn’t a certain red database run on the same hardware?  Yes.  And they use the same code line for Intel platforms that they do for any other bit of hardware.  The code doesn’t know where it’s running and can’t make use of those features that would give their customers the sort of performance that DB2 delivers on the same chipset.

But SQL Server also runs on the chipset.  Ah, yes, that’s true but it, too, is a prisoner of a code line.  It’s not optimized for the chipset; it’s optimized for the operating system.

So what chipset do most Linux installations run on?  I think we all know the answer to that one.  Intel, of course.  SQL Server is out of the picture.  That red database is still running the same code line that runs on Windows and every other environment.  Still no optimization.

I know, whine, whine, whine.  What does this mean to me and my organization?  Simple.  Better performance.  Better return on investment through improved, sustainable and predictable performance.

Talk, talk, talk.  Show me, you say.  Let’s take the easy one first.  Vector instructions.  I’ve written about these in an earlier post and I’ll amplify that now.  These instructions push multiple data streams through the registers with a single instruction that uses a single processor cycle.  This means we’re multiplying MIPS because we don’t have to wait on multiple cycles to process the same data stream.  Said in another way, it’s sort of like doing homework for all your courses simultaneously.  Wouldn’t that have been nice!

Then there’s register width.  DB2 is built to manage compression based on the width of the registers.  Filling a register means more efficient use of the fastest resource in the processor.  That’s exactly what DB2 does.  We know the width of the registers on Intel chips – and Power chips, too – so we make sure we’re filling those registers to get the most out of the instructions.  Again, this is not only efficient and saves disk space, it makes the overall system faster since we don’t have to use as many instructions to compress and de-compress the data.  Easy.

So the vector instructions and registers make DB2 fast and efficient.  What else is there?  Balance, my friends.  I’ve been involved with building and tuning systems for too long to think that the processor is all there is to system performance.  Indeed, it’s processor, memory and I/O that combine to give us the performance we get.  Again, this is where knowing the chipset comes in very handy.  DB2 is written to take advantage of the various caches on the x86 chipset, including the I/O caches.

And we have worked with Intel to understand where the sweet spot is for the combination of processor, memory and I/O.  Standard configurations are pre-tuned to fit within the guidelines we know will give you the best performance.

And then there’s the tools that help us make the most of the chipset.  You may not realize that the release of a chipset includes tools for compiling, profiling and optimizing the software that runs on those chips.  DB2 on Intel has gone through the process of being optimized specifically for Intel on any of the operating systems for which we have distributions.  (I know that was awkward but my old English teachers would haunt me for weeks if I ended a sentence with a preposition!)  Gotta like that.  Seems a great arrangement to me.

Finally, my wife loves IKEA, where all the furniture comes in kits.  We’ve built lots of furniture out of those kits.  And they always include tools that work with that kit.  Taking advantage of a chipset is much the same way.  Use the tools to get most from the hardware.  There’s no sense in buying too much hardware just because your database is blind, eh?

Being the program manager for databases as I mentioned above gave me the opportunity to sit in front of many customers to listen to them tell me about their experiences with my hardware and software, both good and bad.  I carry those conversations with me today.  Your story is the one I want to hear.  Let me know what DB2 is doing for you.

Follow Bill Cole on Twitter : @billcole_ibm

Watch this video on the collaboration between Intel and DB2 with BLU Acceleration!

A BLU Valentine

Vineeth

Vineeth George Abraham
Product Marketing, Data Management, IBM

Valentine’s Day is here, once again. A day to celebrate love – be it with family, friends or partners. A day to take a break from the mundane and spend time with the ones you care about.

For a day, everything seems possible when you celebrate that feeling of togetherness and endearment that we term as love. But if you are anything like me, there’s a part of this day that you’ll always dread. Why?

Deciding on a gift or an experience for someone close to me,  is not an activity that I’m terribly excited about. It’s not because I don’t care. Au contraire! It’s because I invariably  get confused while selecting something meaningful; and this exercise turns out to be a lot harder eventually , than it initially looks.

There’s a ton of options out there. But, what gift or gesture would have the most meaning? This struggle with choice rears its head like clockwork, on every birthday and anniversary too. I’d be thrilled  just to get a useful suggestion, when faced with this situation.

At least, it’s highly improbable that you’ll forget Valentine’s day, unlike some birthdays or anniversaries. You will get reminders in passing during your coffee breaks, your lunches and even in general water-cooler conversations. The steady stream of messages through various media, and the general buzz in the air makes sure that only hermits probably somewhere in the Himalayas or the Andes stay oblivious to the effects of Valentine’s Day.
Phew, that’s a minor relief. Forgetting one of these days might get you the cold shoulder treatment for a while. Brrr…. Now that I think about it, a stint in the Himalayas for a week might actually be a bit more bearable!

I digress… Now where was I? Ah yeah…Plans, gifts.When I think of experiences, I tend to oscillate between extremes before finally settling somewhere in the middle. Do something adventurous (skydiving, ‘jumps’ to mind) or stick closer to terra firma, with a dinner and a movie?

The interesting bit is that all the hints and signals, for a great experience were probably shared with you. If you had paid attention to all the conversations, the tweets, the calls, the glances, a comment made in passing you’d have had your answer.

It’s in times like these, that I’ve come to yearn for an assistant like Jarvis, from Iron Man. Imagine how useful he’d be. A voice in your ear : ‘Sir – from my analysis of your phone calls, Facebook, twitter, credit card transactions etc. etc…. there’s a high probability that he/she would be thrilled to go 1-Spelunking or 2-dancing!’  Dancing it is, then! Problem solved, in seconds. Wouldn’t it be fantastic if you could call upon someone like that in your daily life?

Let’s see what happens in an enterprise environment.

The amount of data that an average enterprise generates and sifts through is huge, in Peta/Zeta bytes! What if there were a way to crawl quickly through all that data – get useful nuggets of information and meaningful insights?

It’s not just enough that data is stored efficiently. It should be easily available to access, manipulate and compare. The right insight at precisely the right time can be a game changer in most industries.

With DB2 and BLU Acceleration – IBM can deliver just that. Part of the larger Watson Foundations solutions, BLU Acceleration in DB2 will help you get better insights faster.  What’s great about DB2 and BLU Acceleration is that along with the robustness and great compression that you’ve come to expect for transactional workloads, you can now have faster analytics and data warehousing capabilities to boot. With it’s noSQL capabilities, DB2 can now deal with data in various formats.

Your data is secure, stored efficiently and you can derive insights extremely quickly.  With a cloud solution for BLU Acceleration, data warehousing capabilities can now be accessed easily with as little overhead cost as possible.

Sometime in the near future I am sure that a portable system similar to Jarvis, with BLU Acceleration DB2 and a host of IBM technologies in the background, will be reality. The IBM Watson programme is a certainly a leap in the direction of cognitive computing.

Here’s an interesting Valentine’s Infographic on Data Management and BLU Acceleration. Enterprises could do with a bit of love too. Share the love – and the infographic!

6 More ways to love Big Data

In the meanwhile, let me get back to prepping for 14/2/14…

Learn more about DB2 and BLU Acceleration on our Google+ page.
Follow Vineeth on twitter @VinGAbr

It’s Obvious. It’s in the Data.

Bill Cole

Bill Cole, Competitive Sales Specialist,Information Management, IBM

You’ve had that experience, right?  Somebody says that the answer is in the data so you look harder and all you see is stuff.  There’s not a pattern within a grenade blast of this data.  Maybe if you had a bit more time you’d find it.  Or maybe having the data in the right format would make a difference.

 We all know the traditional relational database isn’t a great platform for analyzing mass quantities of data.  Your OLTP relational database is built for processing small-ish transactions, maintaining data integrity in the face of an onslaught of concurrent users all without regard to disk space or processor utilization.  Abuse the resources to get the performance you need!  To paraphrase John Paul Jones: Ignore the checkbook, full speed ahead!

So we learned to build special-purpose structures for our non-transactional needs, and then manage the fallout as we tried to find anything that even smelled like (consistent) performance.  Each step forward in the data warehouse arena was a struggle.  We demanded resources or explained away failures with a wave of a disk drive or processor.

This situation was clearly not good for our mission of analyzing great chunks of data in a reasonable time.  Subsets of data – data marts – were used to work around our limitations.  But this meant we were either replicating data or losing some data that might be useful in other queries.  Clearly not the best of situations.

Our friends out in Almaden studied the problem and found that column-oriented tables were the best basis for a solution.  After all, we were gathering up large quantities of raw data and analyzing it, not processing OLTP transactions.  There would be little need for those annoying special-purpose structures.  Nor would we need any indexes.  All this would save lots of space and reduce processing time, too, so we could achieve not only predictable performance but VERY good performance.  The kind of performance our friends in the business needed to build better relationships with suppliers and customers.

The implementation of the new analytics platform is in DB2 10.5 with BLU Acceleration (The answer to why “BLU” is in an earlier blog entry.).  The very cool thing is that BLU is an option you can choose for either the entire database or just the analytics tables.  So you can have your traditional row-oriented tables and the column-oriented tables in a single database if that suits your design.  No need to learn and maintain a whole new technology just for your analytics.

And we can’t forget the synergy with Cognos.  After all, the two products are developed just a few miles from each other.  Turns out the Cognos folks help the DB2 team by sharing typical analytics queries and the DB2 team uses those examples to tune the query engine.  Nice!  Of course, this helps out with the queries we build ourselves or through – gasp!  – other products.  Oh well, DB2 is there to make us all look good.

A quick refresher on column-oriented data.  The easiest way for me to think about it is that we’ve stood the database on its side so that instead of seeing everything in rows we’re seeing the data in columns grouped together.  A typical description of a table has the column names running across the top of the page which is analogous to the way data is stored in a most relational databases.  However, the column-oriented table has the data for a column grouped together and the rows are built by assembling the data from the columns.  Not ideal for OLTP but excellent for processing gobs of data that’s particular to a group of columns.  (There’s a fuller discussion of this in a previous blog post.)  No need for indexes since we’re not looking for individual rows.

The sort of performance users have reported with DB2 and BLU Acceleration, is nothing short of amazing.  Double-digit improvements in throughput.  And it’s this reliably predictable performance that allows us to build those applications that require sub-second kind of analysis.  You know the ones I’m talking about.  While you are on the phone or a web site, the agent or the site offers you options based on YOUR previous interactions, not just options for any random caller or user.  The options are specific because we can analyze data in the time you’re on the phone or a web site.

Finally, I’m told the mark of genius is being able to connect seemingly random dots into a pattern.  You know those folks who are at the conclusion while the rest of us are still just looking at the dots.  You don’t need a genius if you’ve got BLU!  You’ll find that pattern/information gem in record time, too.  And you’ll show the business that you’re delivering the data they need when they need it.

Know more about the innovative technology in BLU Acceleration through this video series on YouTube!

Extreme Performance – DB2 BLU Acceleration

Image

Michael Kwok, Ph.D.; Senior Manager, DB2 LUW Warehouse and BLU Performance

10% faster?

Nope.

50% faster?

Wrong again.

3x faster?

A lot higher.

DB2 with BLU Acceleration can easily speed an analytic workload by 8 to 25 times!

Are you serious?

Yes.

I lead the DB2 warehouse performance team at the IBM Toronto Lab, responsible for the performance of BLU Acceleration.  My team, together with the greatest minds from research and development, delivers the extreme performance you’ll find in BLU Acceleration.

I personally witnessed an order of magnitude speed-up in analytic workloads using BLU Acceleration when compared to the traditional row-based database.  I even saw queries running 1000 times faster, speeding from minutes to seconds. At first, I thought these queries hit some sort of error that resulted in completing almost instantly.  But, after I checked, all of these queries returned the correct results.  It was just so amazing!

So what makes DB2 with BLU Acceleration so fast?  In short, the innovative, dynamic in-memory columnar technologies are responsible for the extreme performance.

First of all, DB2 with BLU Acceleration massively improves I/O efficiency.  We perform I/O only on columns involved in a query.  The new prefetching algorithm effectively reads relevant data into memory before they are accessed.  The new compression technique gives substantial storage savings, yielding a significant reduction of I/O.

Our columnar technologies also support massive improvements in memory and cache efficiency.  BLU Acceleration eliminates the need to consume memory, cache space or bandwidth for unneeded columns.  Data is kept compressed in memory, and packed into cache-friendly structures during processing.  While we have a more effective prefetching algorithm, the new scan-friendly victim selection algorithm further allows us to keep a near optimal set of data buffered in memory.

Our new compression technique makes data a lot smaller both on disk and in memory.  A patented technology preserves order so that the data can be used without decompressing.  We call it “actionable compression.”  In other words, we can now work on predicate evaluation (e.g., =, <>, <, >, …), joins and aggregation directly on the compressed data.  Imagine how many CPU cycles this can save.

BLU Acceleration uses parallel vector processing.  Vectors have superior memory efficiency.  The runtime engine of BLU Acceleration is automatically parallelized across cores and achieves excellent multi-core scalability.  For example, careful data placement and alignment, coupled with adapting to the physical server attributes, helps achieve this multi-core scalability.  We also leverage the Single Instruction Multiple Data (SIMD).  Using hardware instructions, we can apply a single instruction to many data elements simultaneously:  in predicate evaluations, joins, groupings and arithmetic.  This speeds up query processing a lot.

Last but not least, data skipping contributes to this extreme performance  BLU Acceleration automatically creates a small data structure called “Synopsis” to store the minimum and maximum values for each page of column data.  This synopsis allows us to quickly skip pages that do not qualify for a query and can be ignored.  This saves I/O and CPU processing.

BLU Acceleration is not just one technology or one idea, nor an ordinary columnar technology.  It is a collection of many innovative technologies from research to development.  It is CPU-, memory-, and I/O-optimized.

Extreme performance?

See for yourself:

 


Rethinking buffer pool page replacement

Image

Adam Storm – Senior Software Developer,IBM Toronto Lab.

Life as we know it is generally quite ordered.  It’s common for drivers to take the same route to work every day, place their keys in the same spot when they arrive home, and then go to bed at the same time every night.  These routines soon become habits, which free the mind to focus on more important, thought intensive work.  In the field of computer science however, this determinism can sometimes create problems.  For instance, elegant algorithms like Quicksort can perform poorly for certain input sets.  When the algorithm is randomized however, it ironically becomes more predictable, and has better overall performance characteristics.

In the course of developing BLU Acceleration, we observed something interesting about our buffer pool victim selection algorithm (the algorithm which kicks-in when we want to find a page to evict from a full buffer pool) – it didn’t perform well on workloads which consisted primarily of large table (or in our case, column) scans.

Modern buffer pool victim selection algorithms usually utilize one or more of the following basic approaches:

  • Least Recently Used (LRU) in which the least recently used page in the buffer pool is victimized.
  • Most Recently Used (MRU) in which the most recently used page in the buffer pool is victimized.
  • Least Frequently Used (LFU) in which the least frequently used page in the buffer pool is victimized.

Unfortunately, all of these approaches perform poorly in cases where a table (or column), which is larger than the buffer pool, is scanned repeatedly.  Fortunately however, we noticed that by modifying the victim selection algorithm to take into account the size of the column relative to the size of the buffer pool, we could greatly improve the page reuse rates (aka, the buffer pool hit rates).  Determining which pages to keep around was the tricky part.  For that, we turned to randomness.

In the video below Chris Eaton and I discuss this new page replacement algorithm, which we call the Scan Friendly Victim Selection algorithm, in greater detail.

About the author:

In his 13 years at IBM Adam Storm has worked on many critical features in DB2, including the Self-Tuning Memory Manager, pureScale and BLU Acceleration. Adam is currently the architect for Insert, Update and Delete on column organized tables.

Analytics at breakthrough speed on Power Systems

Simple to Acquire, Deploy & Implement

Image

Greg Fry, Power Systems Global Marketing

Unless you’re living under a rock, by now you’ve seen, heard about and/or experienced the “explosion of data.”  Social, mobile and cloud technologies are fundamentally changing how we work and interact, and in the process contributing to staggering growth in digital content.  Businesses are naturally looking to uncover insights within large quantities of information, and those that do are gaining a competitive advantage.  IT departments are tasked to deliver analytics and make sense of this data, but also must control costs.

Fortunately, DB2 with BLU Acceleration truly represents a new breed of data management innovation.  Thanks to IBM Research and Development Labs, BLU Acceleration technology provides speed and simplicity for analytics and reporting workloads like never before.  For our clients, this means typically 8 to 25 times faster insights from more data to make better decisions – to improve customer experience, reduce risk and improve the efficiency of operations (just to name a few).

Here is a quick rundown of the unique combination of innovations built into BLU Acceleration Technology in DB2 10.5.

1)  Dynamic In-Memory columnar processing speeds processing of terabytes of data compressed to fit in memory, with the intelligence to dynamically move data from storage as it is needed.

2)  A patented actionable compression technique saves storage space as order is preserved, avoiding the need to decompress data to process it.

3)  Parallel vector processing spreads work across multiple processor cores and allows a single question to stream through multiple data sets.

4)  Intelligent data skipping is built in that flags data relevant to your query and automatically skips processing irrelevant data.

Combining IBM DB2 software and IBM Power Systems based on the IBM POWER7+ processing architecture will help your organization maximize the value of its data – faster than the competition.  Let’s “peek under the hood” and look at why running DB2 with BLU Acceleration on Power Systems offers such optimized performance.

First, DB2 automatically leverages the massive hardware parallelism of POWER7+ architecture.  POWER’s multicore parallel processing and large memory bandwidth speeds up analytics.  DB2 also exploits POWER features like larger page sizes.  Lastly, Power Systems are reliable, ensuring high availability to minimize unplanned downtime.

The recently announced IBM BLU Acceleration Solution – Power Systems Edition reduces deployment time and effort through pre-installed and pre-optimized server, storage and software.  This system is optimized for DB2 BLU, and runs AIX and PowerVM for outstanding security in a virtualized environment.  It supports faster analytics from a sub-terabyte data warehouse up to a 10TB data warehouse with a single server node – with even more capacity planned in the future.  And it’s highly scalable with Capacity on Demand (CoD) so that you can upgrade your system without powering down the server.

To hear from IBM client Coca Cola Bottling Company Consolidated on lessons learned from their testing of DB2 with BLU Acceleration on Power Systems, view the recent webcast sponsored by IBM and InformationWeek – A Simple and Affordable Path to Speed of Thought Analytics.

You can ask your IBM Representative or IBM Business Partner for more information.  To learn more visit our website or come join in on the conversation at:

–        IBM Power Systems Twitter

–        IBM Power Systems Facebook

–        IBM Power Systems LinkedIn

About Greg Fry
Greg has worked in marketing in the B2B tech industry for 7+ years.  Currently on the IBM Power Systems Global Marketing team, he is focused on Big Data & Analytics on IBM Power Systems.  Greg enjoys exploring how technology transforms the way we live, work and interact.  He is a big Philly sports fan and lover of music.

Follow

Get every new post delivered to your Inbox.

Join 34 other followers