What Big Data, Big Content, and Big Paper mean for information governance

Paul Mullon

As the amount of data and information stored and generated by our businesses get bigger, ensuring that everyone creates and keeps the right information, and protects it isn’t easy.

One of the key challenges with adopting a holistic approach to information governance is that the information is stored in multiple formats, in many locations, and managed by many people with different responsibilities. This article looks at a few of these different perspectives and identifies an approach to gaining control.

Big Data

This has become a hot topic, and remains a current focus of many IT departments today. The challenge with big data is that it is data focused, and this remains (necessarily) under the control of IT. Looking back to the heyday of data warehouses, the challenge always existed to gather the right (accurate) information, store it in separate databases that would allow business intelligence processes to be performed, while protecting and preserving the integrity of the original data.

The challenges remain similar, except now there is far more data to be taken into consideration. With the size of large data sets, new considerations emerge regarding appropriate systems to gather, manage and store all this data. From a governance and discovery perspective, the organization needs to ensure that the data is accurate, is up-to-date, and accessible only to those who need it.

The systems to manage big data must sit within control of the IT department, but questions need to be asked regarding who has the responsibility to gather and curate the data, both in its raw form, and the outputs from the business intelligence models being run.


Big Content

Whilst big data is an issue, an even greater concern may be the management of all the unstructured information in the organization, that doesn’t sit in databases and corporate ERP systems. Although different organizations have different views on Enterprise Content Management (ECM), the fact remains that a very high percentage of information in an organization is in unstructured formats. Ignoring paper for a second, the amount of information in scanned images, office suites, word processing files, spread-sheets, presentations, videos and pictures is truly staggering.

Conventional industry wisdom talks of 80% of information in an organization being in unstructured formats. Whether the actual percentage is 80% or even as low as 50% is a moot point; the concern remains that organizations hold tens of terabytes of unstructured information. The nature of this information is such that it is generally created and stored in a decentralised manner, by office workers scattered around the globe.

Ensuring that all this information is accurate, collated, used properly, protected, and stored in formal systems that allow its search and retrieval remains a nightmare for many organizations. This information is often out of the control of IT departments and may be stored on local or shared drives, with duplication being a major concern. Finding the latest version, ring-fencing it and preserving it are key issues from an information governance and discovery perspective.

Big Paper

Yes, I know, you’ve all gone paperless, and none of your organizations have any paper stores any more. Sadly this is seldom the case, and the use of paper is still increasing by over 10% per year. For many processes, and many organizations, paper remains a key source document, and hence a key store of information. Information governance requires management of information on all formats and media, and paper simply cannot be ignored. While e-discovery - effectively the mining of large amounts of electronic data for use in legal cases - is the fashion of the day, true discovery still begs the question – is there information out there on other formats which may be required in the event of disputes or litigation.

As with electronic unstructured information, paper is often generated in a decentralised manner. It happens "out there" in user departments, very far from the control of IT. Much of this paper is a record and needs to form part of the organizations records management programme, and hence clearly needs to be considered as an integral part of information governance.

Bringing It All Together

Big data, unstructured electronic content and paper records all need to be considered when implementing a holistic information governance programme. The fact that this information is generated and stored in multiple locations, by different departments is a cause of the problem. Ensuring that everyone creates and keeps the right information, and protects it according to company dictates isn’t easy.

Formulating coherent global policies and creation of a multi-disciplinary information governance steering committee are two key starting points for getting control of the different types of information. The next step is recognising what information exists, where it is stored, and allocating responsibilities for it during its lifecycle. Monitoring, reporting and revising processes on an on-going basis become the cornerstone for implementation into the future.

Paul Mullon is the founder of COR Concepts, and boasts 24 years experience of documents and records management. As Chairman of the South African Standards Technical Committee for Document Management Applications, he is the one of country's leading information governance experts