Case study: Reducing customer information database errors

Contributor Marcelo Grimaldi reports on how one telecom company tackled database errors that were leading to problems with invoices.

Any business school will tell you that watching your cash flow is one of the most important things you can do to make sure you stay in business. You need to keep an eye on the payments coming in and the payments going out. But what happens if your invoicing process goes wrong?

That was the challenge facing one telecommunications company. A high number of failures were occurring in bills being sent to customers leading to loss of revenue for the company and problems with cash flow (as due dates were extended for adjusted invoices). A significant percentage of invoices were also being returned by mail, which was increasing operational costs and reducing the level of customer satisfaction.

Business 101: Issuing the right invoice is key to getting paid...

The company identified that there were two main offenders for the invoicing error – the customer’s data was wrong and the customer received a bill related to a phone that wasn’t theirs.

The company decided to undertake a project to address these problems and identified the following benefits:

Tangible benefits

  • Reduce the number of returned invoices, direct mailing and collection letters.
  • Reduce the number of invoice adjustments.
  • Regular operation of company cash flow.

Intangible or correlated benefits

  • Having a reliable and consistent customer base that allow us them to develop a lasting and profitable relationship with customers (CRM).
  • Reduce the postage cost.
  • Increase customer satisfaction.
  • Improve perception of quality of service.
  • Establish barriers to systemic errors of company data base (reducing rework).

The following table includes the financial opportunity estimation, based on operational costs and cash flow slip due to the problem. The expected returned benefit equal 60% of the total opportunity (figures in brazilian currency – R$).

Financial opportunity estimation (click to enlarge)

A macro flow of the relationship between systems and human processes has been designed, allowing us to identify key process input and output variables:

Key process input and output variables (click to enlarge)

Root cause investigation

Due to the hand operated investigation of causes and the large number of data that are returned from IT queries, a suitable sample size had to be defined, so that conclusions could be deemed reliable.

We used the expression for infinite population, because it returns values very similar to those obtained from finite population expression, for a large population size.

n= Z2pq/d2

where p = q = 0.50 (returns the largest sample size).

n= sample size.

Z= 1.96 (for 95% reliability of the procedure).

p= 1 – q = process proportion defective.

d= maximum allowable error.

As we made d = 6.5%, n = 227 was the sample size used for the analysis of customers claims.

The sampling refers to customers that received invoice adjustments in a given month and therefore increased invoice errors metric.

The results from a first sequence of samplings are summarized on the following charts:

Customer invoice was wrong owing to incorrect data (click to enlarge)

Customer invoice was wrong because it related to wrong phone (click to enlarge)

The critical situations are highlighted with the dotted line.

These occurrences helped the project team to map the processes of errors treatment by the Backoffice.

The continued use of sampling led to the basic cause for the claims (Pr6 and Pr10):

Change of ownership of phone lines

The project team indicated the possible critical variables in a Fishbone Diagram

Fishbone diagram analyzing possible critical variables (click to enlarge)

The critical variables have been analyzed by the project team:

CRM System/Billing System interface is ok?

Three simulations have been undertaken:

  • 1 – Related to the address modifications procedure.
  • 2 – Related to the critical customer information (as defined by the Backoffice).
  • 3 – Related to the procedures used by Telequalification Operation.

The results have been deemed satisfactory.

An unreliable Master Data Base makes the contact with customers necessary, in order to update the CRM system

This is deemed the most significant input variable. The actions related to the Master Data Base have been started with the definition of a measurement system that:

  • Could evaluate the quality (proportion defective) of the Master Data Base.
  • Could evaluate the possible recurrence of customers who have already had their data adjusted (in order to evaluate the Telequalification Operation).

Problems with information replication among systems

Correction has already been implemented, including replication of data on CRM.

A possible wrong download of customers data base one year ago

This Master Data Base download has generated a large volume of addresses without supplements, even when such supplements exist. A latest survey conducted by IT reported that there are 8 million customers with blank information.

CRM system allows to change the site ownership without changing the contact / Billing system uses the name of a person to be contacted in the invoice of "business" customers

For corporate customers the site contact name appears in the first sheet of the invoice. For older sites there is a possibility that the contact information is inconsistent. This inconsistency can occur when an operator changes the ownership of a site and doesn’t change the contact field.

The CRM system, whenever there is a modification on customer information to name/cnpj/cpf (brazilian official documents) must ask the operator if all the data remain valid.

An IT requirement is being drawn up in order to implement the above procedure. This requirement must also include the definition of not including the contact name in the "business" customer’s invoices.

Although all critical variables have deserved analysis and improvement or control, the question related to the Master Data Base seemed to be the most significant according to a first analysis of the project team.

The Master Data Base was used as reference tool only (as we see in the process map).

There was no link from Master Data Base to CRM system, in order to update it.

CRM was the system that provided data for issuing invoices.

So, the question to be made was:

Did the Master Data Base really have a poorer quality of information than the CRM system?

In order to answer the question, a temporal comparison between what was called MDB quality and CRM quality had to be undertaken. The ownership of phone lines was the chosen variable to define:

pMDB and p CRM

Where, for example, p MDB was the proportion defective related to ownership of the Master Data Base.

The comparison in a time line can be seen in the following chart:

Temporal comparison of Master Database quality versus CRM quality

Both processes seem to be under control and we can see that the Customer Relationship Management system shows a worse level of quality regarding the essential information ownership of phone line. Therefore, one of the first beliefs of those involved with the process had no quantitative basis. The Master Data Base was really a better source of correct information for issuing invoices.

So, an automatic link from MDB to company´s CRM would bring significant benefits. We would be exchanging a 30% error risk by a 10% error risk.

The obstacle to be overcome, then, was the creation of an automatic update process for the changes made by other telecom companies were reflected in our CRM system.

Therefore, the IT department had a significant participation in the creation of this new updating process.

The decisions that served as a basis for this implementation have been based on the assumption (quantitatively proven) that the MDB information quality is better than our CRM information quality.

The figure below includes the main improvement actions implemented by the project team. The most significant action (Movement File) is indicated and is a consequence of the quantitative comparison between the MDB quality and the CRM quality.

Main improvement actions implemented by project team (click to enlarge)

  • Review of systems interfaces (CRM and Billing systems).
  • Master Data Base (MDB) quality evaluation.
  • Treatment of input errors.
  • CRM data base cleansing.
  • Master Data Base cleansing.
  • Treatment of change of ownership backlog.
  • Movement File implementation: automatic updating (Master Data Base / CRM).
  • Improvement of security of information input in CRM.
  • Cleansing of incremental Master Data Base uploads.
  • Correction of wrong download of customers data base


After implementation of these significant improvement actions (notably the use of Master Data Base as the new source for invoices data), we could see the process improvement graphically in next figures.

Weekly sampling of customer claims related to change of ownership of phone line (click to enlarge)

The chart above is on a weekly basis and presents the absolute values of customers claims related to change of ownership of phone lines. This indicator shows two kinds of weakness:

  • Not every invoice failure is reported by the customer.
  • The absolute numbers do not include the relativization that is needed to evaluate the process quality. In other words, a "denominator" is missing in our indicator. This would be the total invoices issued weekly.

This second chart presents the monthly sampling, in relative terms, of customer’s claims captured in the company call center, due to change of ownership. Every month, we divided the amount of claims captured by the amount of invoices issued that month, making the metric more reliable. Note the reduction of mean and variance after the improvements implementation. The process also started to operate under higher stability.

Monthly sampling of customer claims related to change of ownership of phone line divided by number of invoices issues that month (click to enlarge)

The following Xchart allows us to analyze the modifications more deeply.

Analyzing results using an X Chart (click to enlarge)

Before improvements the process was out of control and presenting high variation. After the implementation of the action plan the mean was reduced from 1.24% to 0.55% and all the measurements for seven months fall between control limits (the proportion defective reduction was nearly 60%, the established goal).

From what we saw, it was difficult for the project group to create a totally unbiased metric for the project, due to the limitations of the very circumstances in which customers complaints are recorded.

The customers complaints follow up was very helpful, but a more rigid approach would be needed in order to estimate the process proportion defective.

The project control phase demanded the establishment of procedures and documentation related to the IT systems as well as the creation of a statistical process control which allowed process owners to monitor and improve the process.

Invoices issued by the company are divided in billing cycles, which refer to the various due dates of invoices. So, due to the characteristics of each of these cycles, the process control had to be done separately for each cycle.

The proposed flow for the process control follows below:

Proposed flow for process control (click to enlarge)

We needed to estimate a sample size that, besides being workable, could estimate the process proportion defective with 95% reliability and a maximum allowable error less than 5%.

The control scheme has been implemented after improvements, therefore the process was already operating with a 0.55% estimated proportion defective (since this metric is based on customer claims). The population has been considered infinite, which is a more conservative approach, but with no significant practical changes, as the real population size (number of invoices issued each month) is very large.

n= Z2pq/d2

where p = 0.0055.

n= sample size.

Z= 1.96 (for 95% reliability of the procedure).

p= 1 – q = process proportion defective.

d= maximum allowable error = 0.01.

So, n = 210.

The selected sample size allows, for an estimated proportion defective 0.55%, to estimate the real process proportion defective with a maximum error of 1.00%.

Therefore, prior to the issuance of invoices in each cycle, the data confirmation with the customers would be made for 210 phone numbers randomly selected, which was perfectly feasible for the existing operation at the time of the project. The XmR charts generated from these monthly sampling check if the process is stable and estimates the common cause variation.

Considerations from a statistical viewpoint

  • In the X Chart, we can see that seven points were deemed sufficient for the analysis of the process after improvements. As the workable sampling scheme was in a monthly basis, few points were used to deem the process under control (seven months). The process monitoring had to be implemented according to what is defined in the Control phase. According to Wheeler (2000), "it is perhaps intuitive that limits which are based on a greater amount of data will, in some sense, be more stable and more trustworthy than limits based on a lesser amount of data".

The next graph shows the relationship between the degrees of freedom (which are related to the amount of data used) and the coefficient of variation:

  • When we have fewer than 10 degrees of freedom the limits will be very soft, and each additional degree of freedom will add valuable information.
  • Between 10 degrees of freedom and 30 degrees of freedom, the limits will coalesce and firm up.
  • Beyond 30 degrees of freedom, the limits will have, for practical purposes, solidified.

So, for project purposes, we run a risk when we stated that the process had been improved. It would be needed more data to state that the process was stable after project. But the samples to be gathered in order to control the process would confirm the improvement.

Relationship between the degrees of freedom (which are related to the amount of data used) and the coefficient of variation. Source: WHEELER, D. J. Normality and The Process Behavior Chart. SPC Press, 2000.

According to Dr. Wheeler, "Statistics with small coefficient of variation are less uncertain. The control limits are based on statistics, therefore their uncertainty are also influenced by the coefficient of variation".

Final considerations

In this case study we focus our attention on the main root cause of problems existing in telecom information data base: change in ownership of telephone lines. The working group, however, had the opportunity to address other input variables existing at the time of the mapping process, suggesting and implementing improvements to these variables.

The creation of effective instruments of measurement was essential to the success of the project, although estimates from the total of customer complaints captured in the Call Center (graphs A and B) have been crucial for the team get their hands on the course of the project.

The control phase of the project is of paramount importance, since the improvements resulting from the use of SPC (Statistical Process Control) are present in the control of the project.


WHEELER, D. J. Normality and The Process Behavior Chart. SPC Press, 2000.


CRM: Acronym for Customer Relationship Management, which means managing the customer relationship. Proposes the removal of a product-oriented world products to another: customer oriented. Companies that have a culture of CRM retain customers through knowledge, that shows where, how, what and why do something for the customer.

MDB: Database containing information (phone number, title, billing address etc.) on customers of the telephone system and that are replaced by telecoms periodically in incremental form (only depicts the variations in a given period) or full form.

DATACARE: Tool for treatment and unification of data from the brazilian market. It is a software that analyzes data quality, cleans and standardizes the attributes, verify, correct and standardize addresses and phone numbers.

TELEQUALIFICATION: Operation implemented for working on the consistency of information about customers.