Data science. Definition, Chief Data Officer and Big Data
This project is composed of the following topics:
- Data science. Definition, Chief Data Officer and BigData link
- Data science. Data governance and master data link
How this case study should be read
The article is useful because it separates three layers that companies often confuse: BI for controlled reporting, Big Data for infrastructure and scale, and Data Science for experimentation, prediction and decision support. The Chief Data Officer appears as the role that connects those layers with governance, ownership and business value.
Continue with data governance and master data or the data warehouse design case if you want the next architectural step.
1. Case study
Suppose we are the CDO or Chief Data Officer of a multinational company based in your country. Said organization, is dedicated to the production and distribution of telephone, computer and multimedia products.
In this scenario, your main responsibility is to manage the data governance platform, whose The main objective is to plan, monitor and manage the use of data.
As the data governance officer, you hold regular follow-up meetings with members of other departments of the organization, such as sales and marketing, where reporting is presented corresponding to the situation and evolution of the products, the brand and other internal data of the company.
Currently, the Online Marketing department is the department that sends the most data to the heart of the company. organization through the online marketing tools and techniques available to it: SEO (Search Engine Optimization), SEM (Search Engine Marketing), E-mail marketing, product aggregators and media publications third parties.
In addition to the previous online marketing techniques, the organization uses Marketing Analytics techniques, that, based on all the available data, allow different strategies to be evaluated and decisions to be made more beneficial for the business.
To this end, the most important metrics are defined and analyzed, and based on the data available, will apply predictive analysis techniques to develop analytical models that allow analyzing possible future scenarios.
In this sense, the use of geographic measurement and visualization tools can help us carry out precise analyzes and capture, for example, existing correlations between different marketing campaigns and the distribution of sales of our products throughout the territory, taking into account analysis variables such as the age, gender, location of the clients, etc.
On the other hand, the organization uses social networks to inform customers about the appearance of new products and also to receive the degree of customer satisfaction with the company's products.
However, the multinational's use of social networks does not go further, and in this sense, during the last meeting of the company's management committee, we were informed of the need to promote said use to obtain a competitive advantage over the rest of the companies in the sector.
The main objective is to detect people, with high influence and impact on social networks, whose opinions about our products allow us to attract new customers and even new markets. These people They will be considered brand ambassadors on social networks.
Consequently, it is a social analytics project that the multinational wants to carry out in all the countries where it is represented, in order to disseminate the company's values, increase the number of customers of its target audience, increase sales and services in current markets and increase the presence of the online brand.
As this is a global project, the data and analysis obtained must be shared by all the delegations.
In this sense, the company is aware of the problems of current computing systems:
- They are not fast enough to capture and store this information.
- They cannot accommodate the volume of data related to the organization's products that is generated daily on different social networks.
- They cannot manage the multiple sources of origin and the heterogeneity of the information (messages and photos on Twitter, Facebook and Instagram, videos on YouTube, etc.).
2.Analysis of the situation
- Define the concepts of Business Intelligence (BI), Data Science and Big Data.
- Contextualize these three concepts in the use case described in the previous statement, arguing in what moment the different concepts would come into play, indicating the convenience or need of application for the case that arises.
Evaluation criteria
- Correctly understand the differences between Big Data, BI and Data Science.
2.1 Definition Data Science
The most accurate definition of Data Science in my opinion is the one made by H. Harris in 2011, which says:
Data Science We can define it as the different tasks that a Data Scientists in a project. The documentation that currently exists on this is very abundant, and covers all data collection and collection tasks, use of statistical algorithms and machine learning, interpretation of the results ending with the visualization and communication of the conclusions obtained. } (H. Harris, Data Science, Moore's Law, and Moneyball: 2011)
In my point of view, data alone does not offer any value a priori, neither to companies nor to the society. Sometimes they are not even readable or understandable at first glance, and a prior transformation of them to be able to interpret them.
[6] The function of the data scientist is to obtain this data from different channels and with formats. heterogeneous, examine that information, knowing how to extract patterns and interpret trends to give them a value. For For this reason, different disciplines need to be put in place that can examine the data from different points of view. view and develop a global analytical capacity, but at the same time concrete. Thus, the data scientist will be specialist in different fields such as mathematics, programming, statistics and even sociology.
This profile allows me to define Data Science as an interdisciplinary field that involves scientific methods, processes and systems to extract knowledge or a better understanding of data in its different forms, whether structured or unstructured. Data Science includes fields of analysis such as descriptive and predictive analytics and prescriptive, statistics, data mining or machine learning. The The objective of this science is to allow organizations to obtain valuable information from said data, detect patterns, and thus achieve competitive advantages, identify new business opportunities and improve the user experience.
[3] 2.2 Big Data Definition
The most accurate definition of Big Data in my opinion is the one made by D. Boyd and K. Crawford in 2012. which says:
big data can be defined as the cultural, technological and academic phenomenon born from the interaction between the following studies:
- Technology: maximizing computing power and algorithmic precision to collect, analyze, link and compare large data sets.
- Analytics: Using large data sets to identify patterns to make claims economic, social, technical and legal.
- Mythology: The widespread belief that large data sets offer a superior form of intelligence and knowledge that can generate ideas that were impossible before, with the aura of truth, objectivity and precision [7]
D. Boyd and K. Crawford, Critical Questions for Big Data.2012.
Another definition used that I have considered valid is the following:
set of strategies, technologies and systems for storing, processing, analyzing and displaying sets of complex data.Both definitions have a very important detail in common, they talk about big data as a set of techniques that allow us to obtain patterns to understand a series of data. That is, in both definitions the big data with a large amount of data but rather a series of data with a relationship that is not evident without analysis of them.
[4] 2.3 Definition Business Intelligence (BI)
For the definition of Business Intelligence I have based myself on the document Introduction to Business Intelligence by Josep Curto Díaz and Jordi Conesa Caralt
[8] where it is stated that:
It is understood by Business Intelligence to the set of methodologies, applications, practices and capabilities focused on creation and administration of information that allows users of an organization to make better decisions. } Josep Curto Díaz and Jordi Conesa Caralt, Introduction to Business Intelligence: 2010
This definition was born in response to the need to have better, faster and more efficient methods for extract and transform an organization's data into information and distribute it along the value chain.
Within Business Intelligence we find the following technologies:
- Data warehouse.
- Reporting.
- OLAP Analysis (On-Line Analytical Processing).
- Visual analysis.
- Predictive analysis.
- Dashboard.
- Balanced scorecard.
- Data mining.
- Performance management.
- Forecasts.
- Business rules.
- Dashboards.
- Data integration (including ETL, Extract, Transform and Load).
2.4 Contextualization with the case study.
Data Science
According to my criteria, in the context of the statement, data science will be responsible for:
- Collect all the data from the different departments of the organization, such as sales and marketing, corresponding to the situation and evolution of the products, the brand and other internal data of the company. company. Within this data collected for your study we will have data obtained by:
- Online marketing tools and techniques available: SEO (Search Engine Optimization), SEM (Search Engine Marketing), E-mail marketing, product aggregators and publications in third-party media.
- Marketing Analytics techniques, which allow you to evaluate different strategies and thus be able to take the decisions that are most beneficial for the business.
- Social networks, where we can obtain the degree of customer satisfaction with the company's products. company.
It is important to highlight at this point that data science tries to obtain and analyze data patterns, so Therefore, he collects the data that may interest him that Big Data has previously been in charge of storing and interpret. - Based on the data provided by the organization, predictive analysis will be applied to prepare of analytical models that allow analyzing possible future scenarios. Another point to manage for data science is to detect people, with high influence and impact on social networks, whose opinions about Our products allow us to attract new customers and even new markets. These people will be considered brand ambassadors on social networks using the same data discussed.
- Once the previous tasks have been completed, the objective of applying this science in the company will be to improve the disseminate the company's values, increase the number of customers in its target audience, increase sales and services in current markets and increase the brand's presence online.
Big Data
According to my opinion, in the context of the statement, big data will be responsible for:
- Store all the data from the different departments of the organization, such as sales and marketing, corresponding to the situation and evolution of the products, the brand and other internal data of the company. company. Within this data stored for study we will have data obtained by:
- Online marketing tools and techniques available: SEO (Search Engine Optimization), SEM (Search Engine Marketing), E-mail marketing, product aggregators and publications in third-party media.
- Marketing Analytics techniques, which allow you to evaluate different strategies and thus be able to take the decisions that are most beneficial for the business.
- Social networks, where we can obtain the degree of customer satisfaction with the company's products. company.
- Define and analyze metrics.
- Distribute stored data between different delegations.
- Fix the following problems:
- Capture and store this information quickly.
- Host the volume of data related to the organization's products that is generated daily in the different social networks.
- Manage the multiple sources of origin and the heterogeneity of information (messages and photos in Twitter, Facebook and Instagram, videos on YouTube, etc.).
In my opinion, in the context of the business statement intelligence will correspond to you: - The use of geographic measurement and visualization tools can help us perform analysis precise and capture, for example, existing correlations between different marketing campaigns and the distribution of sales of our products throughout the territory, taking into account variables of analysis such as age, gender, location of customers, etc.
- Inform customers through social networks about the appearance of new products, since it I consider it as a mechanism for transmitting information.
- Share data analysis between different delegations.