Strategies for Testing Data Warehouse Applications

Testing Goals

  • Data completeness.
    Ensures that all expected data is loaded.
  • Data transformation.
    Ensures that all data is transformed correctly according to business rules and/or design specifications.
  • Data quality.
    Ensures that the ETL application correctly rejects, substitutes default values, corrects or ignores and reports invalid data.
  • Performance and scalability.
    Ensures that data loads and queries perform within expected time frames and that the technical architecture is scalable.
  • Integration testing.
    Ensures that the ETL process functions well with other upstream and downstream processes.
  • User-acceptance testing.
    Ensures the solution meets users' current expectations and anticipates their future expectations.
  • Regression testing.
    Ensures existing functionality remains intact each time a new release of code is completed.

 


Data Completeness

Preconditions.

1. Objects are deinstalled successfully.
2. Objects are installed successfully.
3. -init scripts have been fulfilled.
4. the n.a., n.d. and initial data is in dimensions, facts are empty.


DIM. Comparing record counts between source and target data.

Script:

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT COUNT ( * ) AS cnt
           FROM wrk_table_contacts
         MINUS
         SELECT COUNT ( * ) AS cnt
           FROM mv_table_for_dataload );

Strategy:

1. Run loading process.
2. Run loading process again.
3. Test target minus source.
4. Test source minus target.


DIM. Table is loaded with default IDs.

Script:

-- Verify total rows
SELECT CASE WHEN SUM ( 1 ) >= 2 THEN 'OK' ELSE 'ERROR' END AS is_filled
  FROM ( SELECT column_id AS total
           FROM dim_table WHERE column_id IN ('-98', '-99');


DIM. The same script but sometimes it's needed to exclude default values.

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT table_channel_code AS cnt
           FROM dim_table_channels
          WHERE table_channel_id NOT IN (-98, -99)
         MINUS
         SELECT DISTINCT table_channel AS cnt
           FROM wrk_table_contacts
          WHERE table_channel NOT IN ('n.a', 'n.d.') );


DIM. Populating the full contents of each field to validate that no truncation occurs at any step in the process.

Script:

 SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT user_id_ticket
              , user_id
              , table_theme_id
              , table_channel
              , table_channel_id
              , table_status
           FROM wrk_table_contacts
         MINUS
         SELECT user_id_ticket
              , user_id
              , table_theme_id
              , table_channel
              , table_channel_id
              , table_status
           FROM mv_table_for_dataload );


FCT. Table is loaded with dimension IDs.

Script:

-- Verify total rows
SELECT CASE WHEN SUM ( 1 ) > 0 THEN 'OK' ELSE 'ERROR' END AS is_filled
  FROM ( SELECT column_with_id AS total
           FROM fct_table WHERE column_with_id NOT IN ('-98', '-99');


FCT. Comparing record counts between source and target data.

Script:

-- Verify total count of tickets
SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT COUNT ( table_srcid ) AS cnt
           FROM mv_table_for_dataload
         MINUS
         SELECT COUNT ( table_srcid ) AS cnt
           FROM fct_table_contacts );


FCT.  Find lost data.

-- Verify total count of lost data
SELECT COUNT ( * ) AS LOST
           FROM mv_table_for_dataload a
              , fct_table_contacts b
          WHERE a.table_srcid = b.table_srcid(+)
            AND a.contact_srcid = b.contact_srcid(+)
            AND b.contact_srcid IS NULL );


Data Transformation

Example 1. Validate that data types in the warehouse are as specified in the design and/or the data model.

Manually.

Strategy:

1. Review ETL code.
2. Wait  couple of days.
3. Review ETL code again. If there are defects then repeat from 1 point.


Example 2. SCD transformations. Find wrong flags.

Test 1:

SELECT COUNT ( * ) AS ERRORS
  FROM dim_table_contact_actions_scd
 WHERE valid_to = TO_DATE ( '31.12.9999'
                          , 'DD.MM.YYYY' )
   AND active_flag = 'N'
   AND table_contact_action_surr_id NOT IN (-98, -99);

Test 2:

 SELECT COUNT ( * ) AS ERRORS
FROM(SELECT table_contact_action_surr_id, table_contact_action_id, valid_from, valid_to, active_flag cnt
  FROM dim_table_contact_actions_scd
 WHERE valid_to != TO_DATE ( '31.12.9999'
                           , 'DD.MM.YYYY' )
   AND active_flag = 'Y'
   AND table_contact_action_surr_id NOT IN (-98, -99));


Example 3. SCD transformations. Find lost time frames in dimension. Usefull test, use it! Service team will be happy.

Script:

SELECT SUM ( CASE WHEN valid_to <> NVL ( valid_from_next, '31.12.9999' ) THEN '1' ELSE '0' END ) AS ERRORS
           FROM ( SELECT table_contact_action_surr_id
                       , table_contact_action_id
                       , valid_to
                       , LEAD ( valid_from ) OVER (PARTITION BY table_contact_action_id ORDER BY valid_from ASC)
                            AS valid_from_next
                    FROM dim_table_contact_actions_scd
                   WHERE table_contact_action_id NOT IN (-99, -98) ) );


Example 4. SCD transformations. Verify time filter.

Manually.

...
AND table.event_dt >= user.valid_from(+)
AND table.event_dt < user.valid_to(+) --> if (+) then verify that there is NVL.
...

Don't use BETWEEN function in this case!!!


Data Quality

Example 1. Verify all dimension IDs in fact table, every dim_id.

Script:

SELECT COUNT ( * ) AS ERRORS
  FROM ( SELECT table_surr_id
           FROM fct_table_contacts
         MINUS
         SELECT table_surr_id
           FROM dim_table_scd );


Example 2. Verify all level codes in the filters.

...
FROM DIM_table_SCD prod,
...
Verify filter -> WHERE prod.level_code = 'USER'


Example 3. Verify comments in code.

...
WHERE system_id = 1
...
Verify filter -> -- source system


Example 4. Verify comments of db objects.

TableStructure.sql - generate the html file with table structure and comments.


Performance and scalability

Example 1.

DataAmount.sql - shows the size and growth index of fact table.


Example 2. Analyse log table.

SELECT *
  FROM lib.vw_log_t
 WHERE SYSTEM like '%Project Title%'
   AND timestamp > TO_DATE('10.05.2010', 'DD.MM.YYYY')
   ORDER BY timestamp DESC;


Regression Testing

Building automation during system testing will make the process of regression testing much smoother.

Fitnesse is a good point.


 
Copyright
Andrey Marchenko