Pages

Friday, September 11, 2015

Understanding how the Y2K bug affects Big Data

It recently dawned on me that Big Data has the same problem as the y2kbug. Now, some people are probably familiar with the whole mess behind the scenes, but in case you aren't, back in 1999 the y2k bug was all the buzz. The world was going to end, people were filling their stockrooms, and everyone was discussing what would happen. I was 12 at the time and didn't really understand the significance. Someone described the two numbers used to know the current year as a two decimal slot 'thing' and that computer would jump back to 1900. I however was not worried as my dad worked for the government, and he told me there was nothing to worry about. As we transitioned into the new millennium, we all found out there was really nothing to be worried about.

Now for some technical background. For those of you who took an intro to programming, or are familiar with the original bug, this might be a bit boring, but here goes. Computers run on 1's and 0's, on or off. It takes a combination of 4 1's and 0's to represent a decimal, 8 for two decimals, etc. These things are called 'bits'. You may have heard of a 32bit processor, or a 64bit processing. At its simplest, this means the processor looks at groups of 32 or 64 bits respectively. Programming languages represent numbers inside of these groups. A basic number, an integer or 'int' takes 32 of these bits. That means, the largest an integer can be on a 32bit system is 2,147,483,647 (there's a trick where the 32nd slot indicates negative or positive. There are, of course, other representations that go larger then 32, this is just the most convenient when programming.

Suddenly, without warning, alarm, or anyone even noticing, disaster strikes.

Let's use a 'for instance' of a programming deciding on his new project. He has to keep a record for every time the phone rings at his company. He naturally uses an 'int' and all is well. His program and the data storage works so well, it is integrated into a large component, and, in a SAAS model, more companies are added to the list. Soon, the count starts growing faster. A few years pass, the original programmer finds a new job, and time passes. Someone even notices the number getting larger, and adjusts the type in the database to handle larger numbers. Suddenly, without warning, alarm, or anyone even noticing, disaster strikes. The number becomes larger then the integer can handle. Time passes further. The database number keeps growing, but the code got stuck at the limit.

There are many ways that failure can occur. Reports can go awry, systems can crash, and data gets lost. Even the best plans around such a transition miss things in a large code base. Unlike its inferior cousin, the big data y2k bug has no fanfare. Now, if you know a software engineer that has maintained a system past the 32bit mark, make sure to congratulate them with a party worth of the new millennium.

No comments:

Post a Comment