It
recently dawned on me that Big Data has the same problem as the y2kbug. Now, some people are probably familiar with the whole mess
behind the scenes, but in case you aren't, back in 1999 the y2k bug
was all the buzz. The world was going to end, people were filling
their stockrooms, and everyone was discussing what would happen. I
was 12 at the time and didn't really understand the significance.
Someone described the two numbers used to know the current year as a
two decimal slot 'thing' and that computer would jump back to 1900. I
however was not worried as my dad worked for the government, and he
told me there was nothing to worry about. As we transitioned into the
new millennium, we all found out there was really nothing to be
worried about.
Now
for some technical background.
For those of you who took an intro to programming, or are familiar
with the original bug, this might be a bit boring, but here goes.
Computers run on 1's and 0's, on or off. It takes a combination of 4
1's and 0's to represent a decimal, 8
for two decimals, etc. These things are called 'bits'. You may have
heard of a 32bit processor, or a 64bit processing. At its simplest,
this means the processor looks at groups of 32 or 64 bits
respectively. Programming languages represent numbers inside of these
groups. A basic number, an integer or 'int' takes 32 of these bits.
That means, the largest an integer can be on a 32bit system is
2,147,483,647 (there's a trick where the 32nd
slot indicates negative or positive. There
are, of course, other representations that go larger then 32, this is
just the most convenient when programming.
Suddenly, without warning, alarm, or anyone even noticing, disaster strikes.
Let's
use a 'for instance' of a programming deciding on his new project. He
has to keep a record for every time the phone rings at his company.
He naturally uses an 'int' and all is well. His program and the data
storage works so well, it is integrated into a large component, and,
in a SAAS model, more companies are added to the list. Soon, the
count starts growing faster. A few years pass, the original
programmer finds a new job, and time passes. Someone even notices the
number getting larger, and adjusts the type in the database to handle
larger numbers. Suddenly, without warning, alarm, or anyone even
noticing, disaster strikes. The number becomes larger then the
integer can handle. Time passes further. The database number keeps growing, but the code got stuck at the limit.
There
are many ways that failure can occur. Reports can go awry, systems
can crash, and data gets lost. Even the best plans around such a
transition miss things in a large code base. Unlike its inferior
cousin, the big data y2k bug has no fanfare. Now, if you know a
software engineer that has maintained a system past the 32bit mark,
make sure to congratulate them with a party worth of the new millennium.
No comments:
Post a Comment