While data quality is by no means a technology problem or one that technology alone can solve, a good software solution can create a platform around which the right processes can be built to achieve and maintain better data governance. Here are a set of key requirements which should appear in any data quality solution RFP:
- Data profiling/validation rule generation – appropriate analysis of existing data structures and content in order to determine rules for ongoing data validation and exceptions reporting.
- Initial database address standardisation and de-duplication perform initial address standardisation to local postal authority conventions, appending an address quality score to each record. Conduct org level de-duplication on the standardised data at country and site level and once organisations have been de-duped, conduct an individual level de-dupe at organisation level.
- Operational data processing – ongoing ad hoc data loads from internal and external sources, requiring address standardisation and merge/append (i.e. de-duplication) processing for loading to the main database. Monitoring and reporting of data validity and rule compliancy.
- Monitoring and maintenance – proactive identification of data quality issues resulting from invalid data loads or user updates. Present data requiring review/correction to appropriate users in order that amendments can be made and then prepared for loading back into the central database.
- Profiling and metrics – ongoing data quality metrics (consistency, completeness, frequency counts, scoring) and intervention reporting (duplicates identified and removed, automated validity amendments, manual corrections) based on set rules. Presentation via dashboard type report for easy review.
- Online data capture real-time validation, standardisation and enhancement of data captured via web-based forms, including contact name and job title, email, telephone number and other elements. Apply formatting to all data (capitalisation etc) and telephone (local presentation conventions). Process captured data to be merge/appended to the main database.
It’s not a comprehensive list, but it’s a good start.