Big Data and BI

Business Intelligence and Big Data are often mentioned in the same context. But what does Big Data mean and is it something everyone can use in their BI tool? The answer is usually that; what many people think of as Big Data is not actually "big", even if it is a large amount of data with high value for your organisation.

Illustration of BI and Big Data

Business Intelligence (BI) helps you and your organisation better understand your business by using methods, processes and systems to extract and structure data and information. Business Intelligence often combines business analysis, data extraction, data visualisation and the data infrastructure itself.

A BI tool always contains data of various kinds. Often it comes into the tool through integrations and/or imports. The type of data handled varies greatly depending on the type of business.

In some cases, it is data about products or staff. In other cases, it is customer or market data. The variation is great and the challenge is often to prioritise which data is most relevant for your company.

What is Big Data?

Gartner defines Big Data as:

1.Volume: A lot of data, in the order of petabytes (1,000,000,000,000,000,000 bytes) and exabytes (1,000,000,000,000,000,000 bytes). Already here it can be determined that very few companies work with Big Data.

2. Variety: A variety of data sources in different formats.

3. Velocity: The speed of an often constant flow of data from the various data sources, with both structured and unstructured data.

Other sources talk about as many as five "V". Where also Veracity and Value are included:

4. Veracity: Data quality and accuracy, which describes how much you can trust your data.

5. Value: The value that using Big Data brings to your business.

Unstructured and Structured Data

The data is often divided into unstructured data and structured data, this will help you determine the Varierty, Velocity, Veracity and Value of your Big Data usage.

Structured data is organised in a way that both computers and humans can read and analyse. Structured data is usually stored in a relational database, in the form of text or numbers, where you can enter information automatically or manually as long as you stick to the structure.

Unstructured data includes documents, spreadsheets, images, audio and video that are not part of a database.

Semi-structured data are, for example, emails, XML (Extensible Markup Language) and EDI (Electronic Data Interchange) that have no formal structure, but still separate semantic elements.

Can you "use" Big Data?

Where the boundary of "big" is not only a question of quantity, it also has to do with how you use your data, as well as the conditions, software and competences you have access to.

Instead of focusing on the depth of your data, Gartner sees that the data trend today is more about shallower data (smaller amounts) and more breadth. According to Gartner, data should be used smarter and closer to the business, thus creating value.

It is possible to see Big Data as a methodology, where you use tools to manage your data in a different way than in a BI tool. The answer to the question of whether Big Data is something for you is therefore about a broader perspective than the type of data in question.

In many companies, Big Data is crucial, but that is because their business model involves needs that traditional BI tools cannot meet. For example, many consumer companies use unstructured social media data to identify trends in customer behaviour. But there are many examples from other industries where Big Data creates new insights and better decisions.

What Should a Big Data Analyst Know?

As the use of Big Data differs greatly between different organisations, it can be difficult to define the skills needed. Most often, those who work with Big Data are statisticians who can program (or vice versa).

  • The programming language SQL (Structured Query Language) is often associated with structured data.
    R is a programming language entirely focused on statistics (mathematics), and a "statistical thoroughbred" in this context.
  • SPSS stands for "Statistical Package for the Social Sciences" and is often used in research for complex statistical analyses. SPSS was developed in 1968 and is now called "IBM SPSS Statistics" and has its own command language.
  • Python is a broad programming language and therefore fits into this list, as it is broad enough to program statistics.
  • The analysis software company SAS has various tools in its product range. Most of them are very advanced and related to Big Data. SAS also has its own programming language called SAS.

10-minute video demo of Hypergenes solution: