For many organisations it is easier to measure data quality than it is to do something about poor quality data. This can be for many reasons, such as lack of clarity about data ownership, lack of confidence that data cleansing will achieve the desired results, and technical difficulties in implementing the proposed data cleansing solution.
New functionality in SAP BusinessObjects Information Steward 4.2, called Data Cleansing Advisor, empowers data stewards and business-users to design and take ownership of data cleansing solutions. This can solve some of the difficulties associated with a data-cleansing project. The idea is to simplify the data-cleansing process and make it more transparent. Here’s how it works:
Introducing Data Cleansing Advisor
Data Cleansing Advisor starts by identifying the content- or meaning- of particular datasets. Assuming that Data Cleansing Advisor can identify the content of a particular data set, it can recommend cleansing solutions based on SAP best practices for that content. However, this will not work with all datasets! SAP has configured Data Cleansing Advisor only to work with what it calls party data- i.e., data about addresses, firms, and people.
Once the cleansing and matching rules are proposed, the impact of these can be validated on-screen in Information Steward. If the cleansing solution is judged fit-for-purpose it can then be used to cleanse enterprise data. This happens by publishing the cleansing solution so that it can be leveraged by SAP BusinessObjects Data Services.
From within Data Services, the cleansing solution can be used to resolve, standardise, correct match and enrich enterprise data in accordance with the logic of the cleansing solution and even (if required) pass these results back into core systems thereby automating the data cleansing process. This process flow is illustrated by the following flow diagram.
Data Cleansing Advisor Example
The following example shows how Data Cleansing Advisor can be used as part of an end-to-end data cleanse for a set of records about companies and addresses.
The image below shows some test data created for Information Steward. The file contains several records showing – to varying degrees of accuracy- the address for Google and Walmart’s US headquarters.
Once loaded, Information Steward is correctly able to identify the content type of each of the individual fields as per the following screenshot.
After creating the cleansing solution in Information Steward, the results are displayed on screen. The following two screenshots show the break-down of improvements that Information Steward believes it can make and an example of the impact that the data cleansing solution will have on the actual records with a before/after view.
Finally by exposing the Data Cleansing Solution to Data Services Workbench (pictured below), it is possible to use the data cleansing solution as ‘black-box’ logic within a Data Services job. In this job, the data cleansing solution is pointed back to the original source data file, and outputs its results as a CSV file (see screenshots below). However, at this point Data Services could also be used to source its data from core systems and re-upload the cleansed data back to core systems creating an automated data cleansing routine.
AgilityWorks offers organisations an opportunity to deploy SAP BusinessObjects Data Services and Information Steward with a fixed-price service offering. The AgilityWorks Best Practice Deployment for SAP BusinessObjects Data Services and Information Steward is the product of deep product knowledge, project experience and customer feedback. This service offering delivers the customer a best practice deployment of the two technologies and accelerates their data quality journey with pre-delivered content.
For more information about our Best Practice Deployment for SAP BusinessObjects Information Steward offering please contact firstname.lastname@example.org
To find out more about our services in Enterprise Information Management here