 Is 90% of All Data Collected Rubbish?
|
Is 90% of all data collected rubbish? A provocative keynote debate at the recent DIA 2nd European Clinical Forum explored
this question.
Specific to electronically delivered data, Nick Lucas of INC Research, UK, said it is never clear whether what is received
is in fact what was requested. Statistics show that 100% of all queries are generated by 40% of all validation checks, raising
the question as to what the rest of the validation checks are checking—probably unnecessary data.
Other statistics show that less than 5% of the data in a database changes after it has been entered. Therefore, more than
95% of the data never changes post entry. This means data cleaning efforts are focused on a very small amount of the data,
and it could be argued on the need to focus on the 95% of untouched data. Certainly it may be clean, but it may also not be
the focus of the Safety/Efficacy parameters required of the study and, therefore, superfluous.
The sheer volume of queries—1 in 10 CRF pages, 10 in 10 CRF pages, or 20 in 10 CRF pages—is an indicator of poor quality data.
Using EDC means that the data quality at point of entry should be better. But sites complain that the computer keeps raising
queries as data is entered. The answer is to persuade sites to enter better data and the system will not keep raising queries.Jane Barrett of the Barrett Consultancy, noted that in research, "We do not know, what we do not know." It may well be that
90% of data will show nothing, either positive or negative, but knowing that nothing changes is also useful information. Thalidomide
was tested in development in one species, and no teratogenicity was seen. When later tested in another species the teratogenicity
seen in humans was confirmed. If the testing is not down in that second species, we do not know all the answers. It is indeed
possible that only 10% of data is useful, but the other 90% normal is needed to show that.
Joris Cauquil, of AMITIS/Effi-Stat, France, questioned whether the answers to queries are real, or merely represent the investigator
trying to pacify the sponsor. Also, is it necessary to gather so much data, bearing in mind that the more statistical tests
performed, the more likely a false positive will result. It is far more important to ensure accuracy of measurement and reliability,
both of clinical measurements and software, and to ensure that the criteria of the study are met.
Linda Talley of Eli Lilly, in the United States, reminded the audience that clinical data is the foundation of drug development,
and the data collected are used by a variety of different individuals for different purposes. The trial data collection strategies
for Phases I, II, III, and IV are very distinct and serve different purposes. But Talley emphasized a need to look at the
whole drug development program and not just individual trials when collecting data and determining data needs. The stance
should always be addressing data quantity collected versus quality.
Jane Barrett, MBBS, FFPM, LLM, is the Treasurer of the British Association of Pharmaceutical Physicians, and is principal of The Barrett
Consultancy, janebarrett@doctors.org.uk