Archive

Posts Tagged ‘Data Ware’

Big Data : An Introduction


Hey guys, I am back to blogging after a pretty long gap. Since my last blog I have been going through data warehousing stuffs. In the midst of my learning data warehousing techniques, I came to know about a bigger issue which is troubling IT companies. It’s called BIG DATA. So I thought to share my knowledge on this advanced business analytic with you guys.

If you are thinking BIG DATA deals with “data which are big in nature”, then I have to say you are perfectly correct. But if your brain is limited to the database tables with 1000 rows to 100K rows; then I fear BIG DATA is something bigger and messier than this. Well, a formal definition on BIG DATA would go as:

Big data is a term applied to data sets both structured and unstructured, whose volume is more than the capacity of commonly used software tools to capture, manage, and process the data with usual database and software techniques within an acceptable time.

 Today, companies face a serious issue. They have access to lots and lots of data and they have no idea what to do with those data. An IBM survey shows that over half of the business leaders today realize that they don’t have access to insights they need to do their jobs. These data normally are generated from the log files, IM Chats, Facebook chats, emails, sensors, etc. These data are raw in nature and is something you won’t find in database table (row-column) format. It’s accumulated from the day to day activity from the work of each and every associate. Companies are trying to access these data store to derive some business intelligence and strategies. BIG DATA is not about relational database but of the data which has got no relations to each other.

 BIG DATA can be classified basically into three different categories based on data characteristics:

1.      VOLUME:

There is huge amount of data that are being stored in the world. In the year 2000, there is around 800,000 petabytes (1 PB = 1015 bytes) of data stored in the world. The volume of data is growing rapidly. Companies have no idea what to do and how to process these data. Twitter alone generates more than 7 petabytes of data everyday and Facebook generates around 10PB of data alone. This value is growing exponentially companies. Some Enterprises generate terabytes of data every hour of every day of the year. It won’t be wrong to think that we are drowning deep in the ocean of data. By 2020, it is expected to reach 35 zettabytes (1 ZB= 1021 bytes).

 2.       VARIETY:

With huge volume of data comes another problem i.e. Variety. With the onset of rapid technology usage, data is not only limited to just relational database, but it has grown to the raw un-structured and semi-structured data mainly coming from web pages, log files, emails, chats, etc. Traditional systems struggle to store and perform required analytics to gain intelligence because most of the information generated doesn’t lend itself to traditional database technologies.

 3.       VELOCITY:

Velocity is one characteristic of BIG DATA that deals with how fast a data is being stored and used for analytics. In BIG DATA terminology, we are looking at a volume and variety aspect also. So, thinking on the rate of arrival of data along with the volume and variety, is something a traditional database technology could hardly handle. As per the survey is concerned, around 2.9 million of emails are sent every second, 20 hrs of video is uploaded every minute in YouTube and around 50 million tweets per day in Twitter. So I think you can imagine the velocity of data come at you.

There is also another characteristic of BIG DATA, which is VALUE. A value aspect of big data is something all companies are looking forward to. Unless you are able to derive some business intelligence and value of these data present, then there is no use of such data. In simple terms, Value deals with what the present unstructured raw data can get a meaningful statistics so that it can be useful in taking proper business decisions.

Companies are trying to extract all the information possible and derive better intelligence out of it and to gain a better understanding of the customers, marketplace and the business. Few technical solutions like HADOOP (which I will explain in my next blog), NoSQL, DKVS databases, etc. are combating BIG DATA problems.

For now all I could conclude is that the right use of BIG DATA will allow analysts to spot trends and give niche insights that help create value and innovation much faster than the conventional methods. It would also help in better meeting consumer demand and facilitating growth.