In any enterprise, a tool is only as good as its user’s knowledge about how to apply it. With the modern bioengineer’s toolbox overflowing with the likes of CRISPR-Cas9, RNA interference and many more, this fact is being realized by biologists more than ever before.
As tools like these are refined and perfected, we are looking at a near future where the greatest barrier to their implementation will have little to do with their reliability, but rather our limited knowledge of how and where to apply them.
That’s because, for how far we have come, our understanding of the intricacies of the human genome remains rudimentary. We are rapidly approaching a point where sequencing the genome is almost a trivial task, but comprehensively understanding the meaning of that sequence is yet a far-off goal.
In fact, we haven’t even been able to nail down a firm figure for the number of protein-coding genes in the human genome, although we do know that they make up only roughly 2% of total human DNA. Of the protein coding genes that are known, only a portion have been thoroughly characterized.
Even less is known about the significance of the noncoding regions of the genome—that is, the other 98%- not to mention the contribution of epigenetic and environmental factors.
The stakes for gathering this knowledge are high. Before we can elevate biology from a discipline of science into a discipline of engineering, we need to know exactly what we’re working with.
Successful and responsible manipulation of genomes demands an absolutely watertight understanding of their structure and function. So what is it going to take to get there? The short answer is genomes—and a lot of them.
In order to paint a detailed picture of human genetics, biobanks around the world have already compiled a monumental amount of data points consisting of genetic, clinical, and other phenotypic data from hundreds of thousands of donors. The UK Biobank contains over 500,000 samples.
Iceland’s deCODE biobank now houses the genomes of over a third of the country’s native population. The Mayo Clinic’s PMI Cohort Program is shooting to enroll a million participants in the US over the next four years.
And yet, despite the impressive scale of these collections, most biobankers will agree that the acquisition of samples is perhaps the most challenging part of conducting research that aims to learn more about the human genome. What makes sample collection so difficult for biobanks?
Many studies have attempted to answer this question and consistently found that certain concerns correlate with unwillingness to participate in biobanking endeavors. Not surprisingly, participants most commonly report that they are deterred by worries about information security and maintenance of their anonymity.
Another commonly reported concern relates to the use of their samples in “unspecified” research. For example, people might be willing to part with their DNA in the name of studying a disease that runs in their family, but aren’t willing to give broad consent to researchers for the re-use that data in other studies or the sharing of it with other research groups.
Thankfully, there is some evidence that these concerns can be overcome with better participant education and a more thoughtful, deliberate consent process. But there is one other barrier that might not be so easily overcome. Studies of biobank participants have also shown that many individuals’ willingness to participate is contingent on a return of results. Many want to get something meaningful out of the study for themselves, and if they can’t, then you can count them out.
This barrier is trickier than the others. Big data research is often not instantly gratifying and return of results can be something of a tall order, especially when demanded in conjunction with deidentified anonymity.
In fact, with genomics presently in its infancy, many biobank participants may never see an ounce of direct benefit to themselves that is based on the studies conducted on their samples.
But this doesn’t change the fact that many people expect biobanking to take the form of a more traditional transaction: you give something, you receive something back. On this front, genomics faces a particularly wicked problem. To provide results, the field must gather data, and to acquire that data, the field must provide results.
How this impasse can be reconciled is, for now, anyone’s guess. It is impossible to say how long it will take for genomics to become seasoned enough for researchers to be able to apply their bountiful toolbox at will and deliver on the expectations of results.
For now, it remains one of the more intractable challenges to biobanking and, in turn, to a future where the delivery of such results possible.