Using Mistake Proofing Principles to Improve Data Quality

As a Black Belt, have you ever worked on projects where data quality was a major contributor to your defect? Diana Hernandez shares ideas on how you can "mistake-proof" your data.

I recently worked on a corporate-wide initiative involving all returned and undeliverable checks, that required the creation of a central repository to house all data, a user friendly and mistake-proof interface for data entry and reporting, and a streamlined process to be consistently applied across departments and users.

Our business partners were repeatedly complaining that creating any types of reports in the current process was labor intense and prone to errors, since users entered, updated, and stored the check information received daily in poorly designed Excel files stored on their local drives, which did not include any data entry restrictions or controls. This caused all sorts of data integrity issues that were reflected on the reports created.

In order to better understand the users’ pain points with the data entry tools and reporting, we interviewed the process owners, key stakeholders, and subject matter experts from different departments in a series of meetings and documented our findings.

Our analysis showed that users mainly complained about the excessive time they spent creating these reports, as they manually had to correct any errors founds, and their inability to create end-to-end reporting, because the data was stored in multiple files that were not linked in any way. Also, there was no consistent process in place to enter and update check information, resulting in a wide range of cycle times.

In order to correct these issues with the existing interface and process, we recommended that we build two business solutions to be used in the short-term and longer-term. Since there was an immediate need to capture all data in a central repository, we recommended that for the short-term we build an Access database rather quickly with available resources, and a more robust solution with all the bells and whistles, and supported by our IT department for the longer term. The Access database is in production, and the robust solution is in development in-house, and it is scheduled to go into production by June 2012.

Challenges with the Current Process:

For the most part, reports were created using a lot of manual intervention due to data integrity issues and decentralized data repositories. To give you a better feel for the challenges the users were experiencing, here are some examples:

Try creating a report grouped by member name (first & last name) where the values are different. For example, a member with values "James Dean" and "James M Dean" appeared on separate reports due to the middle initial and missing account numbers (unique fields) for both records. When there is a list of pre-defined values, it is wise to use combo boxes or list boxes to prevent typing or format errors (I will provide visual examples of combo boxes on the following pages).

Or, try to create a report when there are spelling errors. The same member name spelled differently as "James Deen" and "James Dean" and the account numbers are missing on both records, these two values appeared on two separate reports. This is why it is imperative to include drop downs, combo boxes, or lookups on your data entry forms to restrict what is entered. You must be thinking "What a nightmare!"

In other cases, users typed different words with the same meaning for "Check Status". They typed "Closed", "Close", "Completed" or "Complete" to mean the same thing. Try sorting or grouping on this field and you’ll get four different reports. As a general rule, to prevent these types of issues it is imperative to restrict the values entered by including drop downs, combo boxes, check boxes or lookups to select the proper value from a pre-defined list of values.

Another common mistake was when fields were set up using the wrong data types. Users defined date fields as text causing all sorts of issues with reporting. For example, the same date typed as: "01/01/12", "Jan 1st2012" or "1/1/2012" caused each value to appear on three separate reports. As a general rule, date fields should always be formatted in date format using the same mask (e.g. "DD-MM-YY") consistently across the database. When designing your database, keep in mind these rules as they will save you a lot of headaches with reporting.

Missing values can also cause unreliable reports, as data for multiple members can appear on one report, even though they are different members. When designing your database or creating data entry screens, remember to set up your required fields as required to prevent these issues from happening with reporting.

Poorly designed data entry screens were also the cause of a lot of frustration when users entered or updated information, as for the most part, all data entry was free form with no controls or restrictions. Here are some examples!

The user logging in the check information for the first time had to type her/his name multiple times and frequently made typing errors due to the lack of restrictions. For example, a user typed "James Dean", "J. Dean", and "James M. Dean" resulting in three different reports. The logical thing to do was to provide a list of values to choose from by using drop downs or combo boxes.

Also logical fields were defined as text resulting in additional spelling and format issues. Users typed "Yes", "yes", "Y", or "y". The wise thing to do was to define them as logical fields (Yes/No) and use check boxes to prevent spelling errors and speed up data entry.

Designing Future State:

Now that you have a good understanding of the current state challenges, it is time to show you what was done in future state to mistake-proof the interface. Here are some examples:

For most of the fields on the data entry screen we used combo boxes, list boxes, check boxes, and lookups to restrict the values entered and maintain pre-defined formats.

For "Overall Process Status" we used a combo box to force the user to select one of the pre-defined values: "Steps 1& 2", "Step 3"; "Step 4"; "Step 5", and "Completed" as illustrated below:

We also included a default value (today’s date) for "Logged Date" to speed up data entry and prevent entering the wrong dates as follows:

Also, restricted the values that could be entered in "Check Amount" to only numbers, alerting the user that he/she has made a mistake if other values were entered:

Key Learnings:

Mistake-proofing your database and data entry screens are essential to improve data quality and reporting! Going forward, when you are designing your database, keep in mind to restrict or control what is entered by the users by defining your database properly and including default values, logical data types (Yes/No), required vs. not required fields, valid values, formats, etc.

Also, when designing your data entry screens, mistake-proof your fields by including combo boxes, list boxes, check boxes, option buttons, toggle buttons, calculated fields, and default values.

If you apply these mistake-proofing principles in your next data centric project, you will ensure a successful project.