What is a data warehouse
A data warehouse is a repository of an organization's electronically stored data. Data warehouses are designed to facilitate reporting and analysis.
A data warehouse consists of a computer database responsible for the collection and storage of information for a specific organization. This collection of information is then used to manage information efficiently and analyze the
collected data. Although data warehouses vary in overall design, majority of them are subject oriented, meaning that the stored information is connected to objects or events that occur in reality. The data provided by the data warehouse for analysis provides information on a specific subject, rather than the functions of the company and is collected from varying sources into one unit having time-variant.
Data warehousing professionals build and maintain critical warehouse
infrastructure to support business and assist business executives in making smart business decisions. Warehouse ETL (Extraction, Transformation and Loading of data) is an essential part of data warehousing where the data warehousing professional populate data warehouse with information from production databases. Data warehousing professionals work with business analysts and make changes to warehouse ETL in order to maintain consistent and accurate reporting on warehouse table structures.
How does it differ From a Database?
There are a number of fundamental differences which separate a data warehouse from a database.
The basic differences are:-
The biggest difference between the two is that most databases place an emphasis on a single application, and this application will generally be one that is based on transactions. If the data is analyzed, it will be done within a single domain, but multiple domains are not uncommon.
Some of the separate units that may be comprised within a database include
payroll or inventory. Each system will place an emphasis on one subject, and it will not deal with other areas. In contrast, data warehouses deal with multiple domains simultaneously.
As it deals with multiple subject areas, the data warehouse finds connections
between them. This allows the data warehouse to show how the company is
performing as a whole, rather than in individual areas. Another powerful aspect of data warehouses is their ability to support the analysis of trends. They are not volatile, and the information stored in them doesn't change as much as it would in a common database. The two types of data that you will want to become familiar with is operational data and decision support data. The purpose, format, and structure of these two data types are quite different. In most cases, the operational data will be placed in a relational database.
In the relational database, tables are frequently used, and they may be normalized. The operational data will be calibrated in a way that allows it to deal with transactions that are made on a daily basis. Every time an item is sold to a customer by the company, a record must be made of it. As can be expected, this data will be updated on a frequent basis. To ensure the efficiency of the system, the data must be placed in a certain number of tables, and the tables must have fields. Because of this, a single transaction may be comprised of at least five fields.
While this system may be highly efficient in an operational database, it is not
conducive to queries. In this situation, decision support data is often useful, and it offers support for things that are not readily used by operational data.
If you want to take out a single invoice, you will often be required to join multiple tables. While operational data will deal mostly with transactions that are made daily, decision support data will give meaning to the data that is operational. The differences between decision support data and operational data can be split into three categories, and these are dimensionality, time span, and granularity.
Dimensionality is a concept which shows that the data is connected in various
ways. The data that is stored in a data warehouse will often be multidimensional, and it is much different than the simple view that is often seen with operational data. Many data analysts are concerned with the many dimensional aspects of
The time span deals with transactions that are atomic, or current. These
transactions will deal with things such as the inventory movement, or the purchase of an order. Generally, operational data will deal with a short time frame. However, decision support data tends to deal with long time frames. Many company managers are interested in transactions that occurred over a certain time period. Instead of dealing with the purchase of one customer, managers are often more interested in the buying patterns of a group of customers. If a sale has just been made, it will not be found in a decision support data warehouse.
Granularity is the third concept that separates operational data from decision
support data. Operational data will deal with transactions that have occurred
within a certain period of time. However, the decision support data must be
broken down into different parts of aggregation. While it may be summarized, it
may also be more current. The managers within an organization will need
information that is summarized at various degrees. Data warehouses have become more important in the Information Age, and they are a necessity for many large corporations, as well as some medium sized businesses. They are much more elaborate than a mere database, and they can find connections in data that cannot be readily found within most databases.