Patients who contribute their data to research are primarily
motivated by a desire to help others with the same plight, through the
development of better treatments or even a cure. Out of respect for these
individuals, and to uphold the fundamental tenets of the scientific process, I’d
like the clinical trials community to shift its default position on data
sharing and reuse to align to data availability on publication, similar to the life science community. This
will enable more robust, rigorous research, create new opportunities for discovery
and build trust between patients and scientists.
This aspiration is widely shared in the basic research
community, and has been well articulated in considered and public discussions such as a series
led by the National Academy of Sciences in 2003. Nevertheless,
recent articles in the New England
Journal of Medicine have pushed against data sharing, calling those who reuse data
“research parasites” (followed by a bit of clarification)
and concern about how best to structure clinical trial data sharing with a lengthy and complex embargo procedure, potentially including payment.
A tradition of sharing
Sharing tools has been the norm (mostly) for genetics and molecular biology since the early days of genetics, mainly because you couldn’t really get anything done unless people let you use their reagents. This has persisted for over a century, from the first studies of fly lines to cDNA clones, enzymes, antibodies and, now, ‘omics datasets.
The Protein Data Bank in the 1970s, the EMBL Bank (now ENA)
and GenBank nucleotide collections in the 1980s and the Human Genome Project in
the 1990s all thrived thanks to the norms of reagent sharing and data
deposition, and the returns to science were - and are still - huge. Such practices are
pragmatic in terms of both data quality and author credit, each of which provides
incentives for researchers.
I am perhaps painting a beautiful picture of an imperfect
world – there is still much to be done to ensure all this data sharing can work. Compliance, agreeing on things
like adaptable standards, and keeping the infrastructure humming are all
challenges we grapple with on a daily basis in molecular biology. But we have
much to be proud of, and embracing the ethos of sharing has brought us a long
way in a short time.
Data release: why?
Releasing data when you publish a paper isn’t about giving things
up – although I can see that for some, the lack of instant reward might make one
feel that way. Data release is not about rewarding a single PI; it’s about
benefitting the clinical research community as a whole, and making the most of
the data entrusted to you by patients. So - why release data?
We are custodians, not owners, of patient data.
Patients participate in trials to further medical research, benefit from new medicines (potentially) and gain from focused care and advice. But numerous surveys have shown that participants are primarily motivated to share their data – the most valuable aspect of a clinical trial – by the altruistic desire help others in the future.
So it is very strange that some researchers feel justified in assuming the data produced in a clinical trial is somehow their own scientific property. From the
perspective of patient care, this position is particularly questionable when it
impedes the ability of other scientists to re-examine the data for additional
studies, which would contribute to the progress so eagerly desired by the
participants.
If we’re not doing all this research to improve patient
care, then probably we should change the consent process.
Challenging the interpretation of observations is fundamental to the scientific process.
Evidence is a wonderful thing. Our freedom to base our arguments on reproducible experiments dates back to the 17th Century, when people in Europe were finally permitted to openly discuss and debate science based on direct observation. Evidence is the backbone of scientific discourse, so it follows that papers without data can be easily dismissed as well-articulated speculation.
When a dataset is published, readers are then able to drill down
to raw observations, and can verify methods or explore alternative explanations.
Yes, this means they can potentially expose errors in your work and your thinking. But it’s far more
common for readers to double-check the work against other published datasets,
which can answer lots of different questions. Ultimately, this is a good thing
for science.
Sharing data sharpens the mind.
The very real anxieties that come with data sharing are both individual and collective, because we are building knowledge together. Professional pride dictates that if your data will be open for inspection, you will be much more careful about the details. (After all that data cleaning and fixing, confounder/covariate discovery and adjustment, you do not want to be the one who left a howler for others to discover.)
Everyone knows there are skeletons in the data closet,
mostly down to the complications of running real-life experiments, so current
analyses make use of several approaches to boost confidence in the results. But
generally speaking, just knowing your peers could be wandering through your data
sharpens your mind and makes you focus on handling and presenting your analysis
properly.
When an entire community does this, it benefits from a
deeper consensus on what a “good study” looks like. That matters a lot.
Meta-analysis and Serendipity
When it can be done, meta-analysis (the combining of datasets) is a win–win–win (funders, scientists, patients). It’s about building on studies, combining them to gain new insights, asking different questions and finding new leads. Meta-analysis isn’t always possible – clinical trials often look at entirely different things, and even when they do study the same thing, they can’t always be aligned very well. But meta-analysis is only possible when people share their datasets.
Serendipity is another benefit of data sharing – I am always
amazed at how important it is for science. Serendipity has guided us to some
seriously profound insights, for example the relationship between the Malaria
parasite and plants, or how metabolic enzymes can be used as lens crystals. It’s
been behind many of the completely weird discoveries that make biology so
wonderful, and many practical discoveries, such as CRISPR, that push the
frontiers of possibility.
I’ve stumbled happily upon Serendipity many times, and very
often others have made serendipitous discoveries based on data or methods I
have published. You’d have to be pretty cynical to begrudge your fellow
scientists such pleasure, and, frankly, a bit petty to fret over whether
they’ll remember to credit you (nearly all scientists carefully reference their
sources, if only to reassure reviewers of the credibility of the data they use).
For funders, both meta-analysis and serendipitous
discoveries compound their return on investment and make them look good. For scientists, being able to make use
of comparable data to verify or cross-validate their work, or to make unplanned
discoveries, is invaluable. For patients, knowing their contribution is being
used in lots of different and useful ways can give a sense of pride.
Sceptical about whether this really applies to clinical
research? Well, without having access to a large number of trials, I doubt
anyone could say.
Having more large datasets on hand for meta-analysis can
only benefit those planning and analysing the results of clinical trials. And as
clinical trials begin to incorporate more high-dimensional, data rich datasets
(e.g. imaging, metabolomics, multi-omics) – and to share them – there will be plenty
of opportunities to carry out sophisticated meta-analysis.
As for Serendipity, well, it can strike at any time.
The scoop
It is hardly possible for anyone to “scoop” you simply
because you released your data on publication – particularly if that dataset
represents only what is needed to support your paper. If someone else looks at
that data and comes up with an interesting observation you missed, they can potentially
make that corner of science a little bit better. Dwelling on the negatives will
get you nowhere, but looking on the bright side may land you a new
collaborator.
If the only datasets you share at publication time are those
that relate specifically to that paper, there is no need for complicated
embargo rules that provide authors enough time to perform a full analysis on
all the data collected (as proposed in the most recent NEJM editorial). Tracking and versioning might become more complicated
with later papers, but this approach does the important job of tying the
datasets to the publication in a reasonable timeframe, opening up that piece of
science for proper verification and discourse.
If you really believe you are going to be scooped for some
missing analysis on a dataset, the solution is to delay publication. If you’re
worried that making your data public will expose you to undue criticism, make
your analysis bulletproof. That will be good for you and for the system as a
whole, as understanding the strengths and weaknesses of different analyses only
makes the community stronger.
When data sharing is not straightforward
Human subjects
No matter what, we have to honour patient consent. As scientists we may wish such agreements were more future-proof, but when those consents preclude data sharing beyond the study group, we have to accept it and move on.
Exactly how to future-proof consents for clinical trials is
no simple matter. One solution would be for funding agencies or regulators to begin
insisting that consent forms provide a reasonable level of research access,
which would facilitate research but respect the privacy of individuals.
Currently, for genetic studies, there is a lightweight
vetting process, involving both individual and institutional sign off, which
assures patients that the researchers will perform appropriate research on the
dataset. This is a clunky approach and it certainly needs improvement, but it
is functional.
Standards and infrastructure
Data sharing is only feasible if the parties involved are able to do it, without worrying that they’ll run into trouble transferring files from one site to another, or that their data will disappear into some kind of black hole.
A robust, global archive for this kind of information would
be one important piece of a larger infrastructure that would make biomedical
data sharing straightforward. The EMBL-EBI model – biomolecular archives
supported by international collaborations – is a solid example. Funding for infrastructure
like this is huge value for money, and costs little in the context of global clinical
research funding.
CDISC standards are functional, and well used by the
clinical trials community. But there is a constant need to review standards and
establish new ones for emerging technologies. This work never ends, but the end
goal of harmonisation (i.e. to support meta-analysis) is a good one, and the
whole process helps us along on our eternal quest for a shared language.
Regulatory and commercial concerns
I do not have a lot of experience in this area, but it’s clear that regulation of clinical trials is a huge deal for the pharmaceutical industry. Any data release policy needs to work well for the regulators, and for commercial interests, who can have different concerns from academia. For both, the science performed in clinical trials must be very sound, so that mind-sharpening step of data release is certainly of value, but most companies that I know are delighted when other science happens from the data they release.
Evidence is beautiful
In this on-going debate about data, let us base our arguments on… data. We are all likely to change our views view when presented with compelling data and well-reasoned analysis, which is one of the nice things about being a scientist.
Refreshingly, for the most part I do not think this debate is
one of those boring political ones where everyone chooses a side, closes their
ears and steels themselves for uncomfortable dinner-table discussions. Scientists
already working in an open-data environment understandably campaign for
everyone to join them – though they are full aware of the downsides. Scientists
working in clinical trials can see there are advantages to sharing data, but
have neither the time nor the inclination to sort out the myriad details that would
make it workable.
As a starting point, we can focus on the simplest, tried-and-tested approach of publishing your data alongside your narrative – a practice that has served science well for over 300 years. But more importantly, we can keep the discussion going, and work with one another to overcome the barriers to realising the full potential of biomedical research. That would be a win for scientists, their funders and, most importantly, patients themselves.
Hi Ewan, would it be okay to publish your post on dnadigest.org?
ReplyDeleteWe would of course mention the original source and author! Many thanks!
nadia at dnadigest dot org
Zest for lunch today? Thank you for the information
ReplyDeleteObat Alami Untuk Meningkatkan Daya Ingat
Pengobatan Alternatif Sakit Maag Secara Tradisional
Pengobatan Alternatif Nyeri Sendi Bahu Tradisional
Pengobatan Alternatif Nyeri Sendi Bahu Tradisional
Obat Tradisional Darah Tinggi
present information this morning really means a lot to us thanks .
ReplyDeletePengobatan Tradisional Radang Lambung
Obat Kista Duktus Tiroglosus
Ginkgo Biloba Plus Capsule Green World
Pengobatan Alami Hipertensi Paling Mujarab
Cara Menghilangkan Lendir Di Paru-Paru Secara Alami