Big Data in healthcare: the data is only as good as the user (Transcript)

Diana Davis speaks to Athula Herath, Satatistical Director at MedImmune Biotech, and R&D unit of UK based pharmaceutical company, AstraZeneca about the values and challenges of big data in healthcare.

Pex Network: Athula, welcome to the program. Thank you very much for joining us today.

Athula Herath: I'm happy to be here.

Pex Network: To start off, I wonder if you can tell me a bit more about what your role as Statistical Director at AstraZeneca actually entails?

Athula Herath: Sure. MedImmune is the biologics arm of AstraZeneca, a major pharma company in the UK. One of our research sites is situated in Cambridge in the UK, and the other is in Gaithersburg near Washington DC in the USA. There are about 4,000 plus people, globally, and about 700 to 800 in the UK.

The R&D Statistics function that I lead in Cambridge is a part of the translational science department at MedImmune and I lead the statistical activities within the UK. Within statistics we support R&D in numerous ways, today we’ll be discussing translational medicine, more appropriately the translation from bedside to bench.

The traditional translation is usually referred to as the translation from bench to bedside. The bench (often a laboratory) where a chemical/antibody is invented and bed where a medicine is administered to a patient. As you know, the traditional translation has not improved the success rate in pharmaceutical treatment development, which is, some people say, approximately just one in 100.

We, along with many other people, thought that then there needed to be some filters, so the Translation from bench to bedside is useful - provided that we can figure out beforehand what needs translating. Bedside to bench is an additional filter that is applied at the beginning of the translation process. You might be familiar with the book by Stephen Covey, a very popular book, which is called Seven Habits of Highly Successful People. The second habit described in that book is ‘Begin with the End in Mind’. That is what bedside to bench is, beginning with the end in mind for pharmaceutical development.

Pex Network: Can we talk a little bit about how big data then has impacted your work?

Athula Herath: Yes. Currently there is a lot of enthusiasm, and also drive for big data with technology and development. It has been part of our work for a long period of time.

In the late 90s, the Human Genome Project, the time that what we call "omic" revolution, had begun in the sense that people started at that time to analyze gene expression, and then moving on to study other omics - like proteomics, mutations, and genome wide association studies (GWAS). GWAS that look at genetics on a large population scale. You carefully design the study associated with some specific disease, where they will find out what part of the genome is responsible for a disease condition or ‘trait’ as they call it.

Then, within the last two or three years, you can see explosion of individual genomes being sequenced, the reason being the costs, i.e. becoming cheaper, I think today you can sequence somebody’s genome for less than $1,000. The result is, these studies generate an enormous amount of data.

With the advent of the omics revolution and the like, you can now drill down to the lower levels, the molecular characteristics at a cost that is affordable.

For example, at the beginning, you go to the clinic, and in the clinic will record the demography data of the patients, like date of birth, gender, and all these patient characteristics, and then they will also gather any variables, or any things that enable the physician to identify the status of the disease.

The clinical outcome variables are very important as for people with poor outcomes these are the variables that physicians will attempt to change by treatment.. For example, in an asthma patient you want to change the number of hospital admissions within any year. While useful these outcomes may not provide any insights to what has gone wrong within the patient.

Now with the advent of the omics revolution and the like, you can drill down to the lower levels, the molecular characteristics at a cost that is afforadable, and these will give you a deeper insight into what might have gone wrong. For example, if somebody admits into a hospital due to asthma exacerbation (difficulties in breathing), but it doesn’t tell you why this person has been admitted to the hospital. In some cases it is possible that the person had their immune system compromised due to a bacterial or viral infection, so the molecular information allows us to figure out, what might have gone wrong internally. Hopefully, utilising this insight, we can devise a treatment for it.

In fact that is one of the biggest impacts of big data in health today. , Synthesizing evidence at various levels, combining it, and then coming up with a diagnosis, which can then be used to devise a treatment for.

Pex Network: So is that where you see the greatest potential of big data within pharmaceutical development?

Athula Herath: I'm a little biased here, as I work for the research function of a pharma company, the greatest potential of Big Data is to enhance the health of the people, or treat sick people. That is probably the biggest impact, I think, in my opinion, but there are other areas in the pharma industry where big data are utilized. For example, manufacturing, patient communication, marketing, and all sorts of other stuff too.

Pex Network: It can't all be smooth sailing when we're talking about data. What would you say is your biggest big data challenge?

Athula Herath: That's an interesting point. One of the biggest of the big data challenges is, as I sometimes ask, ‘where is the big data?’ It may sound like a joke, but to be honest, it is serious.

When it comes health, that data is the most private to the individuals involved. In the UK, we usually go to a GP (a General Practitioner); there is a bond between the general practitioner and the patient, patient/doctor confidentiality; within that the patient reveals all relevent information, within the prevailing trustful environment.

If there is no guarantee in terms of privacy they will be little reluctant, for quite the right reasons, to reveal this information.

In order to more accurately predict you need to have access to the data at the population level, but on the other hand, having access may mean loss of privacy, and because of those privacy concerns people don’t contribute.

Health Record systems have been in place for many, many, many years, at least in the UK for the last 40 years or so, in one way or the other, sometimes on paper, but even in electronic form for at least 30-35 years.

The issue we have with electronic health records is they’re fragmented; they’re different systems, for obvious reasons. So one of the biggest issues is accumulating and compiling this databank

In big data, one of the premises is that making inferences at the population level. You can look at all the patients for example in the UK, or it could be Europe. The advantage of having access to a large patient population is that it allows you to identify patterns and then use these patterns to classify the outcomes, deriving the treatment. I think there is a little bit of a conundrum or paradox here. In order to more accurately predict you need to have access to the data at the population level, but on the other hand, having access may mean loss of privacy, and because of those privacy concerns people don’t contribute. Not having large enough populations impacts the accuracy of the predictions.

The biggest challenge, the big data challenge, is getting access to patient level data for the population. One of the examples you might remember from earlier this year, January, February time, this fiasco that broke about the National Health Service in the UK. I think they have got the process of consulting individual patients wrong,; so access to data is a big challenge.

Pex Network: Once you’ve got access to that data, what would you say are some of the common mistakes that you see pharmaceutical companies making, when they are using statistical analysis in research and development work?

Athula Herath: I think that's a very, very good question indeed. To be honest, the pharma industry is more effective and refined than it has ever been. But one of the challenges in this approach that we have is that we seem to lack the big picture (all causative effects) of a disease.

Statistically people will tell you that you don’t analyze data of an experiment which is not designed.

You might recall a few months ago some big issue broke out because Google used a scoring system and data mined from their search records, to predict incidence of the flu epidemics across the world. They used a big data approach, but because of disparity between the assumption of search term drivers (searching because people have the flu) and the reality (people weren’t actually ill) the results were overestimated in certain parameters. Now, that often happens in well designed studies. The word here, is the bias, so the estimates or whatever you derive from the numbers may have been influenced by unforeseen and unanticipated factors.

When it comes to health, you would like an accurate diagnostic, as accurate as possible, so one of the most important parts in big data, common mistakes anybody could make is just analyze the data.

Pex Network: What skills and competencies do you think are needed in the pharmaceutical industry to start to harness some of the benefits of big data?

Athula Herath: A very good question. I'm a Statistician, so you probably hear a little bit of a biased opinion here; as a Statistician I have some idea of what bias means. Prime skills, are analytical skills, i.e. a good dose of methodological knowledge in classical statistical methods.

In addition to the classical statistical methods there is a class of statistical methods which deal with multiple (large number of) variables simultaneously.

As I mentioned, we frequently encounter partial or conditional information, so you need a little bit more statistical knowledge to deal with partial or conditional information. As what we are attempting to describe or model is a population consisting of sub groups, each having, their own distributions, statistical methodologies that deal with such situations (e.g: inferences on mixtures) is also important.

Underpinning the big data, also are how to store data, how to manipulate data, how to access clouds, and these are competencies in computer science.

People also need to delve into the disease: the disease biology, the molecular biology, population genetics, an disease epidemiology.

So, to summarize, statistical skills, computer science skills and disease specific knowledge. One of the major things that this highlights is that one person cannot be all of these things and therefore you are a part of a team. I am a part of a team, as opposed to being an individual. Being a part of a team is also an essential skill. Very much like Germany demonstrated what teamwork really means, by winning the World Cup.