Above is the common elements that make up a Data Warehouse. The data warehouse consists of several different elements, the source can come from legacy systems that are usually an operational system used by the corporation or external data sources, the data staging area where it is processed and moved to the presentation server where the data is organized and stored for future queries and reports. The last step is the end user data access point, currently one of the most popular forms to access data is through a web page.
The first step is extraction and involves obtaining the data from the source systems.
Next it is transforming the data using a series of steps to make the data usable for the data warehouse. The steps consists of cleaning the data to improve data quality and consistency, purging of any unnecessary data that is not required by the data warehouse, combining of similar data from different source systems and creating surrogate keys.
After the data is extracted, transformed and cleansed, it is ready to be loaded and indexed into the warehouse for fast querying.
Other steps include running a series of reports to ensure that the data stays consistent and that the quality stays intact. The data is released to users for generating reports and dashboards and is secured against unauthorized user access.
Main steps in the ETL cycle are:
- Initiating the Cycle
- Building of reference data
- Extraction
- Validation
- Transforming
- Staging (if staging tables are necessary)
- Audit reports to verify that all business rules are kept
- Publishing to warehouse tables
- Archiving
- Clean up.