Data Analytics and Big Data

Send to friend

Written by Chris Toms - Technical Developer for CSfD

According to Intel, ‘Big Data’ is data of ‘unusual size’ or data ‘generated at spectacular velocity [sic], such as the data collected from telescopes or by social media providers’ [1]. Searching for something a bit more concrete than ‘unusual size’ and ‘spectacular velocity’ reveals a more technical idea of a dataset that cannot be easily utilised using traditional database methods, either because the organisation of the data is not suitable for the queries being asked, or because the size of the dataset or the speed at which the answer is required makes a traditional query too slow, or a combination of all of three.

The various solutions that have been devised share some common ideas. The first is to use computer clusters where hundreds of computers work together to speed up the processing. The other is to use a new method of programming the questions to be asked by dividing the queries into two phases. The first (map) asks the simplest form of the query (which may have to be asked for each data item of interest), the second (reduce) aggregates the results. There are several competing solutions available such as; Apache Hadoop, Google BigQuery, and NoSQL to name just a few.

‘Big Data’ then is not a thing in itself; it’s a way of processing large datasets i.e. ‘big data’. As a business, if you are wondering if ‘Big Data’ is something you should be investing in you are probably starting in the wrong place. You should rather be asking if ‘Big Data’ is a possible solution to a problem that you already face.

You may already have large amounts of data but not in a traditional database, it may be spread across several different traditional databases and other file systems. If that data (as a whole) could be effectively queried, what would you ask? With a worthwhile question, perhaps ‘Big Data’ is what you want. After all, the original driving force behind ‘Big Data’ has been powered by finding ways to answer questions rather than a solution looking for a problem [3]. Perhaps what businesses ought to take on is the realisation that it’s OK to ask those data intensive questions, the only limit is the investment that you’re willing to make.

I’m still trying to conjure an image of data being ‘generated’ at ‘spectacular velocity’ (I always thought of computer systems as capturing or recording data). But I like to think that the data moving at the highest velocity (at least in our solar system) is on the New Horizons space probe currently moving at 33,000 mph (approx.) or 0.0049% the speed of light[4] somewhere out past Uranus on its way to Pluto *.

* New Horizons top speed was approximately 51,000 mph during Jupiter fly by.

[1] What is Big Data (Intel)

[3] Big Data Myths Give Way To Reality In 2014

[4] New Horizons space probe

 

Rebecca Worrod

Computer Systems for Distribution (CSfD) specialises in innovative, flexible management solutions that help logistics and distribution organisations to support and grow their business. Our team is a solid mix of youth and experience. Working with bright emerging talent as well as seasoned professionals, CSfD has been combining long term knowledge and maturity with originality and fresh ideas for over…

http://www.csfd.com

Comments (0)

Add a Comment

This thread has been closed from taking new comments.

Editorial: +44 (0)1892 536363
Publisher: +44 (0)208 440 0372
Subscribe FREE to the weekly E-newsletter