What is Dark Data?

The basic definition of dark data is data that has been collected, but is unstructured and, therefore, not currently being used. It is data that has been continuously collected and stored, but has not been organized via categorization, labels, or any other effective organization tool. However, this massive treasure trove of data could hold valuable insights if it were to be organized and, subsequently, analyzed. Essentially, this type of data could be highly influential in the decision-making process of a business, if the business could properly evaluate and analyze the data via data analytics.

dark data

Examples of Dark Data

One example of dark data is a customer call record, which could potentially hold valuable information on a customer’s thoughts and geolocation. These types of records are regularly recorded and stored, but rarely organized or analyzed. Another example of dark data is a website log file, which could potentially hold valuable information on website visitor behavior and traffic. These logs are regularly collected, but rarely analyzed in any organized or meaningful way.

Growth of Dark Data

According to a 2011 IDC study, 90% of digital data is unstructured data, or dark data. The study also found that the world’s digital data is doubling every two years, significantly faster than Moore’s Law predicted. New technologies are allowing for low-cost solutions to capturing and storing massive amounts of information. In 2011, the overall cost of capturing and storing large amounts of unstructured information dropped to just one-sixth the cost seen in 2005.

Issues with Dark Data

Considering the increasing demand and usage of big data and data analytics, there is now a quickly increasing demand to organize dark data and make it usable. However, this type of data is often complex, very large in size, and stored in multiple locations. This makes analysis very difficult and costly.

Nonetheless, the potential value of analyzing unstructured data is staggering. There have been many proposed solutions to making unstructured data usable for big data endeavors. Some of these solutions are described below.

Solutions to the Dark Data Problem

  • machine learning, or allowing some type of artificial intelligence to develop a computer program that changes and improves based on a constant supply of new unstructured data
  • open data, or making unstructured data available for everyone to analyze and explore
  • software that converts dark data to graphics, creating a program that automatically organizes data into easy-to-understand graphics

These solutions are all feasible and are now being actively explored and attempted, by various companies, in the race to acquire and utilize the newest and most valuable big data.

How would you solve the dark data problem?