As the ‘big data’ phenomenon is upon us, I thought it timely to consider what people actually mean when they talk about ‘big data’. Is it simply lots of information? The answer much like the phenomenon was complex and difficult to determine. It appears there is no formally agreed definition for the phrase that some have referred to as the ‘soup du jour’.
Exploring the concept further, my initial ‘Googling’ led me to Wikipedia which refers to big data as:
‘any collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications.’
In Wikipedia’s definition, ‘big’ is large (as in volume), and ‘complex’ indicates it is made up of many interconnected parts (not ‘complex’ as often mistakenly taken to mean difficult to understand or analyse). This means that just because there’s lots of it, it doesn’t make it ‘big’….it also has to be complex. The final bit of the definition is relative to our current available technologies and provides an indication that big data is potentially time-bound, that is, once we further develop bigger and better technologies to store, manage and analyse big data, like NoSQL, will the term become extinct?
The multinational technology corporation IBM tells us that big data spans three dimensions: it includes a variety of data (including unstructured, semi-structured and structured data), it is often time-sensitive and it must be used whilst its streaming, and the volume comes in one size only…large (IBM, n.d). Others have extended upon this, including attributes associated with value of the data from which business intelligence can be derived, and the ability to readily assess the accuracy of the data (Gordon, 2013). In this regard, ‘big’ may be attributed to all or a combination of ‘variety’, ‘velocity’ and ‘volume’, ‘value’ and ‘veracity’.
In terms of ‘data’, as you may already be aware, they are facts in their rawest form. Data may be unorganised, unprocessed and potentially useless, much like a single isolated piece of a jigsaw puzzle. However, when combined and/or put into a specific context we can see more of the puzzle and begin to derive meaning and information from the data. Nick Milton (Milton, 2009) provides a nice (short) video on the relationships and nuances between data and information (and indeed knowledge) which is worth a look (https://www.youtube.com/watch?v=sdzUfHwNCVQ), providing us with a scenario based explanation of the data-information-knowledge string. In this way, data and information are two sides of the same coin, data may or may not equate to information, but information is comprised of data – they are inextricably linked.
So, whether or not it’s appropriate to analyse ‘big’ and ‘data’ as separate components of the single concept, the answer to the question is the same…‘lots of information’ is part of what ‘big data’ is about, but it’s a limited view….we’ve had a lot of information for quite some time now! What appears to be the most distinctive and revolutionary bit about ‘big data’ is what we can now do with the data… exposing, gathering, organising, linking, reusing and repurposing it to produce new information (Shaw, 2014) for the benefit of society (dubious you might say, if big data can detect when women are pregnant based on purchases). So this is why many big players are in the big data field all wanting to harness this data to create new business intelligence to push products, predict events and requirements, solve problems and much much more.
This begs the question of the requirement and relevancy of the information professional in the big data field….is there room for us? Well, I hope you were all shocked with the inference that some would perhaps say not….information professionals should certainly be making their way to the forefront of the big data movement if they haven’t already. Whilst we might leave the in-field experts to explore, analyse and derive new business intelligence from big data sets, information professionals have a clear and vital role in helping to unlock and expose data, link data, manage and ensure accessibility, promote sharing and potentially even ‘push’ data, and assist with data retrieval and research to support business areas. A day in the life of an information professional really…part of what we do now to manage the lifecycle of any other information asset…some of the big players in the field just haven’t heard our voices yet to reap the benefits that a more collaborative approach would bring to providing insight and more significant value to big data.
Gordon, K. (2013). What is big data? ITNow, Vol. 55(3), pp.12-13