The following DWH/BI components are presented in the testing framework:
1. Verifications of Source and Target Data Requirements and Technical Schema Implementations
Requirements and schema-level tests confirm to what extent the design of each data component matches the targeted business requirements.
This process should include the ability to verify:
- Business and technical requirements for all source and target data
- Data integrity specifications technically implemented (DBMS, file systems, text files, etc.)
- Data models for each implemented data schema
- Source to target data mappings data loaded into DWH targets. Examples of sources and associated targets include source data that are loaded to staging targets as well as staging data that are loaded to data warehouse or data mart targets
Schema quality represents the ability of a schema to adequately and efficiently project ‘information/data’. Schema in this definition refers to the schema of the data warehouse regardless if it is a conceptual, logical or physical schema, star, constellation, or normalized schema. However, this definition is extended here to include the schemas of all data storages used in the whole data warehouse system including the data sourcing, staging, the operational data store, and the data marts. It is beneficial to assess the schema quality in the design phase of the data warehouse.
Detecting, analyzing and correcting schema deficiencies will boost the quality of the DWH/BI system. Schema quality could be viewed from various dimensions, namely schema correctness, schema completeness, schema conformity, schema integrity, interpretability, tractability, understandability, and concise representation.
2. ETL source and target data integrity tests
Most DWH integrity testing and evaluation focus on this process. Various functional and non-functional testing methods are applied to test the ETL process logic for data. The goal is to verify that valid and invalid conditions are correctly processed for all source and target data, ensuring primary and foreign key integrity, verifying test correctness of data transformations, data cleansing, application of business rules, etc.
A properly-designed ETL system extracts data from source systems, enforces data quality and consistency standards, conforms data so that separate sources can be used together, and finally delivers data in a format that enables application developers to build applications and enables end users to make decisions
3. BI reporting verifications
BI applications provide an interface that helps users interact with the back-end. The design of these reports is critical for understanding and planning the data integrity tests.
Insights such as what content uses which information maps, what ranges are leveraged in which indicators, and where interactions exist between indicators is required to build a full suite of test cases. If any measures are defined in the report itself, these should be verified as accurate. However, all other data elements that are pulled straight from the tables map should already have been validated from one of the above two sections.
A sample DWH/BI verification framework and sample verifications
DWH/BI data integrity verification is categorized here as follows. Figure 2 shows a verification classification framework for the techniques applicable to sources and targets in data warehouse, ETL process, and BI report applications.
Figure 2: Framework for DWH/BI Data Integrity Verifications