Bill Cole – Competitive Sales Specialist,Information Management, IBM
I love new features. We breathlessly announce a myriad of new features and I see eyes glaze over around the world. Hey, your database and your application are working just fine now, right? So the new features just get you a lunch when the IBM sales team comes by for the quarterly meeting.
The DB2 Development team spends hours/days/weeks/big bucks determining what features clients want or what the other guys are doing as well as throwing in our own insights. Then there are the thousands of hours doing the implementation & testing and then we do a lousy job of explaining why you care. After all, moving to a new release means extra work for you and there’s got to be some payoff if you’re going to give up a few hours of your life to do the upgrade. So I want to burn some of your time to explain what these new features mean to your life.
For me, the very coolest features are those you get for the price of the upgrade. You know the ones you don’t have to “do” anything to take advantage of, and that’s why the new release of DB2 10.5 for LUW is so nice for database practitioners. That is where we’ll start.
In this installment, we’ll talk about BLU Acceleration because it’s the group of features that will get you the biggest return on your install/upgrade investment. (See the trivia quiz at the end of the article for more about “BLU.”) First, BLU Acceleration is largely for data warehouses. There aren’t many OLTP systems – or databases – that would benefit from the BLU Acceleration technology. So if you’re simply interested in OLTP or pureScale features, read my next installment. The benefits that follow are specifically for column-oriented tables.
BLU Acceleration is columnar. Okay, column-oriented. After you think about it, columns make much more sense for data warehouses. After all, we seldom pull all the columns out of the rows we touch. We’re typically processing subsets of columns. So, orienting the database sideways (one of the descriptions that’s useful: We laid the database on its side.) makes performance sense. I’ve seldom seen a table in a Data Warehouse database that contains columns that will be used in every (or most) queries. Given that it takes time to parse a row of columns and pull out a small group of columns in every row, it makes performance sense to build reporting/analysis databases columns rather than rows.
Example: Pick up a deck of playing cards. If you’re from small towns in the American Midwest, it could be a Pinochle deck. First sort all the cards into suits. We’ll think of the suits as rows. Put the cards of each suit in value order (columns). Now pull all the jacks out. Gotta sort through each suit to find the jacks, right? So now flip the cards on their sides by putting the cards in order by value regardless of suit. So if you have to pull out those pesky jacks it’s easy enough to find the pile of jacks. Much quicker. That’s what we’re talking about with column-oriented databases. And if you’re looking only for red jacks, it makes the search faster, too.
Compression is ubiquitous in DB2 10.5 with BLU Acceleration. We compress column values in two different ways (one for alpha and another for numeric) and then the uncompressed data gets compressed in the storage pages. This is another benefit of columnar databases. All the values are lined up neatly so compression is quick and easy.
So what? You say. Well, compression saves disk space. Yeah, I know, disk is cheap. It’s not that cheap when we’re talking about hundreds of gigabytes! We’ve seen space reductions up to 90%. Your mileage will vary but you’ll save disk real estate. And you’ll save all that time when making copies of the database. Copying 10GB is whole lot faster than copying 50GB. So the savings multiply when you count all the environments you have to create and manage. And the compression is free!
Then there’s the matter of query performance with compression. Typically, data is stored compressed and then uncompressed for comparison predicates (equal, !=, greater, like) which eats lots of processor cycles to no particular purpose. In DB2 10.5 with BLU Acceleration, the predicate value is compressed and then all the candidate column values are compared to this compressed predicate value.
There’s one other side effect of compressing. We store it in the cache compressed so we get lots more data in the cache. Wow! It’s like swapping all the dollar bills in your pocket for twenties. Same amount of money but smaller stacks. You can have a much larger cash cache in the same space. This follows the aphorism “No I/O is the best I/O.” The system is tuned to reduce the amount of I/O required to drive data through your queries. We ask you to measure how much I/O we’re doing compared to previous versions or row-oriented tables and let us know.
BLU Acceleration loves multi-core hardware and will take advantage of it to improve performance so your investment in all that multi-core hardware will make you look brilliant because you got in front of the curve. In fact, BLU is aware of the hardware and will make maximum use of the hardware it finds itself running on. I really like that. You invested in premium hardware with extensive instruction sets and you should be able to use it all. Or get more than the lowest common denominator out of it. BLU will use the additional caches (e.g., L1 & L2) to further reduce I/O.
Then there’s the use of the SIMD operations. Single instruction stream, multiple data streams. This is also called a vector operation because one execution will operate on many data values. Cheap. Fast. Efficient.
BLU is self-tuning. Autonomic. Yes, you’ve heard that one before. Try it and see. We’ve limited the knobs and levers to practically none. You don’t need to play with the levers and knobs to get good/better/fantastic performance. You’re left to help the business do business better.
If you’re uncertain about which tables should be converted to column-oriented, Optim Query Workload Tool will make suggestions for you. Cool. Sort of takes the guesswork out of the process, eh? And then just run the db2convert utility for the selected table and you’ve done all that needs to be done.
Some of the tenets we followed in putting together this release of DB2 10.5 were simplicity, transparency and continuity. Keep everything the same where possible. No special steps. No re-learning everything you already know. The SQL statements are the same. The tools are the same and they’re aware of how to do with column-oriented tables. And you can mix table types within a single query. The optimizer knows how to handle that situation so you don’t have to do anything special.
What are the steps to standing up a new database? Create. Load. Go. No days or weeks of designing and modeling. No arguing over indexes. There aren’t any because they’d just slow you down. No materialized anything. Just data. You’re happily surprising your user community with your database acumen. A hero to the business.
Finally, there I was wondering what “BLU” means. I couldn’t make anything that I know fit and then I was given this odd little tip. Use your favorite search engine and look for “BLINK IBM” and “BLINK ULTRA”. The presentation and discussion you’ll find are tremendously useful for understanding the genesis of BLU Acceleration. It all makes sense about the second time you watch the video. Enjoy!
Bill Cole on Twitter : @billcole_ibm
DB2 10.5 with BLU Acceleration is here! Download the Trial Code today!
Recent Comments