Tuesday, September 3, 2013

Big Data

Big data is a topic that many people spout off and one that is quite interesting to talk about.  If you're unaware, the idea behind big data is that every website you hit, every search you put into Google, every Facebook post you like is stored somewhere and in turn those companies use that data to better predict what users like you want to search for or buy or might want to look at and like later.  So why is this 'big' data?  Well for one, there are billions of people on the internet every second of every day.  For each person, there is some amount of data being created and stored somewhere on some server belonging to some company.  How much data does this equate to?  Well here are some ideas.  Easily put, it's way more than you can sit and look at even if you had multiple lifetimes to look.

Now that we've established just how big the internet is and how much data there is, how do we deal with that?  These numbers are so large that they are beyond what we normally think of outside physics, in particular physics of an astronomical scale.  Let's put these numbers in perspective.  There are roughly 3 million emails sent per second every day.  Now 3 million is a large number I think we can agree on that.  But that also means that there are roughly 90 trillion (107 trillion if you're looking at the infographic in the second link) emails sent per year.  Sure, a great deal of those are spam, but even the 10% that isn't amounts to over 10 trillion actual emails.  If we assume, that each email is just one line long and that there are 50 single spaced lines per page, 100 as we'll even let them be double sided.  Let's be so bold as to say that a novel is generally 500 pages, to make the math easier.  10 trillion emails would then be 100 billion pages long or 200 million novels worth, and that's only what we're generating per year!

Personally, I find big data intriguing.  There's this idea that this is a internet related problem and so people might call this problem an 'internet scale problem' but honestly, 'Big Data' just sounds better.  It's simpler and doesn't really require you to know why these things are necessarily problems.  Be assured though, big data is a problem that many people are trying to figure out.  As the years go by, more and more ways of handling it emerge from transitioning to NoSQL, to using MapReduce, among other things.  If you didn't care for the post, I hope you at least enjoyed the infographics.  Until next time.

--CsMiREK

No comments:

Post a Comment