Design and construction of the data warehouse. Questions
1. Question 1
Indicate whether the following statement is true or false and justify your answer:The Data Warehouse has as one of its main objectives being a central repository of corporate information that can be supplied by several systems. For that reason, some of its functions are to consolidate and homogenize the information coming from the different company systems, as well as to cleanse and debug the data, ensuring its quality.
1.1 Answer
The statement is True. Important information for a business is normally scattered across different systems, databases or applications. This happens because each project or process usually requires a different management approach, and the same entity is often handled differently depending on the system. For example, a customer is not managed in exactly the same way in an invoicing program as in a customer service system. One of the main objectives of a Data Warehouse is precisely to act as a central repository of corporate information coming from different systems. These projects require advanced knowledge of where the data is located, what it means and how it should be transformed. It is necessary to build a new intermediate architecture that is both permanent and flexible, and to work with products and platforms that are sometimes sophisticated and not always fully mature. A good example would be a commercial performance dashboard for an executive in a ceramics company. That executive may want to know, through a web platform, how many incidents customers have reported, which actions were taken, where customers are located, who handled each case, which offers were chosen the most, which tariffs generated the greatest interest, and so on. To build that dashboard, information must be collected from different company platforms, managed and given meaning before being presented on screen. That requires direct contact with the internal stakeholders responsible for public service, invoicing, sales management and other areas, because they are the ones who provide and validate the information they manage. The dashboard should obtain the information from the Data Warehouse, where the data coming from the different systems has already been unified, cleansed and prepared so that it can be interpreted correctly.2. Question
Indicate whether the following statement is true or false and justify your answer:The aggregation level makes it possible to accumulate data, so that at level 0 we would have the maximum level of detail, and at higher levels we would aggregate data according to some dimension such as calendar, product or geography. This aggregation level is usually unique and rather low in operational databases, whereas in the data warehouse we usually have the same information available at different aggregation levels.
2.1 Answer
It is True. In information management it can be useful to summarize a subset of data into a single value so that the original dataset is simplified and a new variable is generated with greater value, better control or easier availability. At level 0 we would have all the detailed data, for example the number of people who entered a store each minute or an individual sale registered at the point of sale. At a higher level we could add the store location to the sale or aggregate the number of people entering every 30 minutes in order to extract more value from the information.3. Question
Select the correct option:- The operational data warehouse stores all the historical data of the company in order to satisfy the needs of its analysts.False: the operational data store is always up to date, but it does not contain historical data.
- Using a departmental data warehouse always covers all the information needs of the company.False: building a data warehouse is expensive and has demanding performance requirements. The usual solution to obtain low response times is to maintain different stores with only partial business information, that is, only the part relevant to a department or group of users, not all the information needs of the whole company.
- The operational and the corporate data warehouse complement each other.True: the corporate warehouse stores all historical data, but it is not always fully up to date, while the operational one is always up to date, but does not contain historical data.
- All of the above are correct. False: only c) is correct.
- None of the above. False: c) is correct.
- The correct answers are a) and c). False: only c) is correct.
3.1 Answer
The answer is C)4. Question
Select the correct option:- Departmental data warehouses or Data Marts contain partial business information.- True. Building a data warehouse is very expensive and its performance requirements are difficult to achieve. The solution for obtaining low response times is to maintain different stores that only contain the portion of the business relevant to a department or specific group of users. For that reason, departmental data stores only contain the information needed by that department.
- Departmental data warehouses are designed to obtain good response times for the queries of a certain group of analysts.- True. As in the previous case, they can be focused on a specific group of users with specific needs.
- Some examples of Data Marts are finance, marketing or sales.- True. Marketing, finance or invoicing and sales departments will use entities with customer or product data. It is important to use the same entities to preserve data integrity across warehouses belonging to different departments. These shared entities are called conformed dimensions in dimensional models, and they include customers, products, suppliers and accounts, which are critical to the business and therefore used by many departments.
- All of the above.True: a), b) and c) are correct.
- None of the above.False: a), b) and c) are correct.
- The correct answers are b) and c).- False. a) is also correct.
4.1 Answer
The answer is D)5. Question
Select the correct option:- Data warehouses will never contain entities with sensitive information, such as customers, suppliers or products.False: data warehouses may contain information about customers, suppliers or products. For that reason, and depending on the stored data, security processes and techniques such as field encryption and protected access credentials will be required.
- It is common for master data to be shared by several departmental data warehouses.True: master data is commonly shared by several departmental data warehouses and sometimes by non-informational systems that access these entities because the corporate warehouse is often the most faithful version of them.
- Master data quality management processes usually review aspects such as accuracy, integrity, consistency and completeness.True: MDM activities are directly related to data quality monitoring processes that review aspects such as accuracy, integrity, consistency and completeness. Warehouse entities often become master entities that require special management in order to consolidate relevant information coming from different systems, ensure data quality, refresh it and synchronize it with other systems.
- All of the above are correct.False Only b) and c) are correct.
- None of the above is correct.False Only b) and c) are correct.
- The correct answers are b) and c).True The correct answers are b) and c).
5.1 Answer
Answer F) is the correct one.6. Question
Mark the option or options that are characteristic of a Data Warehouse:- It is subject-oriented.True: because user requirements cannot be fully known at the moment the warehouse is built, the information is not structured according to functionality or application use, but by subjects of interest.
- It contains historical information to analyze the temporal evolution of data.True: it is important to know when an event happened in the real world. This historicity allows us to see a movie instead of a still picture. Every datum in the warehouse should therefore be accompanied by its validity period.
- The information may be volatile in order to prioritize performance.False: quite the opposite. Non-volatility tells us when we became aware of events and allows us to know whether a report was produced using one set of data or another. Non-volatility implies that data is not truly modified or deleted; instead, corrections are inserted together with the date on which they were recorded.
- It is functionality-oriented.False: the warehouse is not organized by functionality but by subjects of interest.
- It makes it possible to integrate information from the different company systems.True: operational systems are heterogeneous, run on different hardware and software, use different data models and present the business from different perspectives such as finance, sales or HR. The first step before offering all the data to analysts is therefore to integrate those systems so that the users perceive them as if they came from a single source.
6.1 Answer
Answers a), b) and e) are correct.7. Question
Regarding response times, mark the option or options that are characteristic of a Data Warehouse:- It is important to optimize access to allow reasonable response times.True: operational databases require instantaneous response because the data is updated constantly. In data warehouses the response must still be fast, but not necessarily immediate. Access should therefore be optimized, for example by creating departmental stores containing only the information relevant to a specific analyst or user. The corporate warehouse is not ideal for end users because it is designed to manage and integrate large volumes of data and a high number of users, which can degrade response times.
- It must provide an immediate response.False: data warehouses should be fast, but not necessarily immediate. Their operations are often large analytical queries that cannot be obtained instantly, although they should remain within a reasonable time for analyst work. Some reports can even be scheduled in the background.
- Response time is not relevant.True in the sense that the operations are usually large-scale analytical queries that cannot be instant. What matters is that they complete in a reasonable time according to the analyst's work.
- Response time is more relevant than the truthfulness of the data.True if we interpret it within the historical nature of the warehouse: because of data volume and implementation techniques, the corporate warehouse and the departmental stores that depend on it are not continuously updated. Their users are often more interested in historical data than in immediate freshness.
- It is acceptable for the user to receive a timeout response if a query saturates the database server.False. A query should never be allowed to saturate the warehouse server. It can be deferred to the background or supported by a departmental store, but a timeout is not an acceptable outcome.
7.1 Answer
a) c) d)8. Question
For each of the following options, determine whether it is characteristic of a departmental, operational or corporate data warehouse.- They are volatile and oriented to applications or functionality.Operational data store: unlike the corporate or departmental warehouse, the operational store gives up historical data and provides a volatile repository. It is still integrated and subject-oriented, but it does not contain a temporal perspective.
- Its design is intended to obtain data immediately, taking information from the corporate data warehouse.Departmental data warehouse: because building a warehouse is expensive and performance is difficult to guarantee, the usual solution is to create smaller stores with only the business slice needed by a department or group of analysts. These stores are called departmental data warehouses and are designed to provide good response times for a specific set of analysts.
- They serve, among other things, to solve integration and data multiplicity problems.Corporate data warehouse: because the source systems have different structures and organizations, the data obtained from operational systems cannot be used directly in the warehouse. It must be transformed and integrated according to warehouse requirements.
- It stores an updated image of the organization's data.Operational data store: the users of this store need integrated information that is fully updated.
- It stores a movie built from different snapshots of the data, that is, the history of the data.Corporate data warehouse: this type of repository supports trends, historical evolution, relevant business facts and future possibilities, which are the kind of needs typical of decision-support environments.