How Should Scientists' Access To Health Databanks Be Managed?

Sep 6, 2019
Originally published on September 6, 2019 11:16 am

More than a million Americans have donated genetic information and medical data for research projects. But how that information gets used varies a lot, depending on the philosophy of the organizations that have gathered the data.

Some hold the data close, while others are working to make the data as widely available to as many researchers as possible — figuring science will progress faster that way. But scientific openness can be constrained b y both practical and commercial considerations.

Three major projects in the United States illustrate these differing philosophies.

VA scientists spearhead research on veterans database

The first project involves three-quarters of a million veterans, mostly men over age 60. Every day, 400 to 500 blood samples show up in a modern lab in the basement of the Veterans Affairs hospital in Boston. Luis Selva, the center's associate director, explains that robots extract DNA from the samples and then the genetic material is sent out for analysis.

The blood samples themselves end up in gigantic, automated freezers for future use — one in Boston and a backup facility at a VA location in Albuquerque, N.M.

Even at this early stage of the process, the volunteers' names have been replaced with bar codes. Scientists can still link the DNA findings to the veterans' medical records, but the entire operation is designed to ensure that no personal information can be deduced from the findings.

Only VA scientists and their collaborators are granted access to the vets' medical records and genetic information. Dr. J. Michael Gaziano, a VA scientist and principal investigator of the Million Veteran Program, says that so far there are 30 projects involving this huge data set.

The studies emphasize health issues of concern to vets "in areas of schizophrenia and bipolar disease, in PTSD, cardiovascular disease, diabetes [and] hypertension," Gaziano says.

Gaziano and his colleagues have published the first of those results and have approved 30 research projects in total. It's a start, but a stark contrast to more than a thousand studies currently underway using the much more accessible data set at the British-based UK Biobank, which is a pioneer in this field.

The U.K. project has granted access to 10,000 qualified scientists, who can download its anonymized data and explore it. That effort had a head start: UK Biobank completed its enrollment in 2010, one year before the VA started to collect samples.

UK Biobank has reported no security or privacy issues, but Gaziano still isn't about to make the VA data in the U.S. as readily accessible.

"I don't think that we understand all the security risks as we move into this new era," he says. "So I think we're being quite cautious."

Gaziano is trying to make the data more accessible to scientists in academia, but doing so is complicated by the fact that the data are housed on computers at the VA and the Energy Department; access is strictly controlled.

"We view this as a national resource," Gaziano says, "and it's a national resource that will not only help veterans but will help all Americans and mankind."

Intermountain Healthcare teams with deCODE genetics

Our second example involves what is largely an extended family: descendants of settlers in Utah, primarily from the Church of Jesus Christ of Latter-day Saints. This year, Intermountain Healthcare in Utah announced that it was going to sequence the complete DNA of half a million of its patients, resulting in what the health system says will be the world's largest collection of complete genomes.

"We have families who have been here for three, four, five, six generations," says Dr. Lincoln Nadauld, executive director of precision medicine and genomics, "and under our care at Intermountain Healthcare, we have taken care of families for multiple generations, so we have health information and health histories on those families and patients."

Family trees provide a great shortcut for understanding the genetic basis of disease. To plumb this information, Intermountain has an exclusive deal with a company in Iceland, deCODE Genetics, which is owned by pharmaceutical giant Amgen. This data set will remain a closely held resource, not available to the broader scientific community.

"We don't anticipate sharing this data outside of the Intermountain Healthcare databases, for example," Nadauld says.

DeCODE will do the DNA sequencing and will get to scour that information with an eye toward developing new drugs.

"It would be natural for deCODE and Amgen to do that, given their expertise and experience there," Nadauld says. "Conversely, if there's an opportunity to implement some novel discovery or finding into clinical care, Intermountain Health will be the lead on that." Insights would be published in the scientific literature, he says.

Other highly restricted databases like this one include those from other medical systems, including Geisinger Health in Pennsylvania and Kaiser Permanente, based in California.

NIH's All of Us aims to diversify and democratize research

Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health, behavior and genetics. Its philosophy sharply contrasts with that of Intermountain Health.

"We do have a very strong goal around diversity, in making sure that the participants in the All of Us research program reflect the vast diversity of the United States," says Stephanie Devaney, the program's deputy director.

The program has been budgeted $1 billion in taxpayer money so far, and it's expect to take another five years to recruit the million volunteers. The program anticipates needing another billion dollars to attain its goals. (The fully operational UK Biobank has spent about $300 million, from taxpayers and charities.)

So far, Devaney says, the All of Us program is getting excellent diversity in its samples. It's also striving for good diversity among the researchers who will end up using the data.

"We set up from the beginning, when we [got consent from] our participants, that all different types of researchers would be able to ask for access to the data," Devaney says.

"We are not limited to just folks who work at a certain institution or even who live in the United States. We will be open for foreign researchers, and we will be open for folks into the private sector and the government and academia and even ultimately citizen scientists or community scientists."

(Government officials granted access will not be allowed to use the data for crime-solving or similar activities, Devaney says.)

Program officials still need to work out exactly how they will provide this access while ensuring privacy and security. They would like to put the information on computer servers that scientists can access but which will not allow data to be downloaded. The goal is to make the information secure and as accessible as possible, while not putting too many constraints on how the data can be analyzed.

The philosophy is straightforward: The more easily smart people can see the data, the more likely they are to make discoveries that can benefit us all.

You can reach NPR science correspondent Richard Harris at rharris@npr.org.

Copyright 2019 NPR. To see more, visit https://www.npr.org.

NOEL KING, HOST:

All right. More than 1 million Americans have donated their genetic information and medical data for research projects. It's great for science when this information is available, but how should it be shared? NPR science correspondent Richard Harris looked at three different philosophies.

RICHARD HARRIS, BYLINE: The first project involves three-quarters of a million veterans. Every day, 400 to 500 blood samples from this group shows up in a modern lab in the basement of the veterans hospital in Boston. Luis Selva is the center's associate director.

LUIS SELVA: This area that you're - we're standing in, we can process both robotically, the samples. And we also have an area - just to my left - where we can process the samples manually. So what you're hearing now is actually the shaking, or the agitation, that's taking place to be able to help and isolate the DNA.

HARRIS: The DNA is tagged with a code number and sent out for analysis. Selva explains that the sample, marked with a barcode, is stashed in an enormous freezer, here, for future use.

SELVA: Half of that inventory is kept locally, and the other half is sent to our secondary biorepository in Albuquerque.

HARRIS: Only VA scientists and their collaborators are granted access to the vets' medical records and genetic information. Dr. Michael Gaziano, who is principal investigator of the Million Veteran Program, says so far there are 30 projects involving this huge data set. The participants are mostly men over 60, and the studies emphasize health issues of concern to vets.

MICHAEL GAZIANO: In areas of schizophrenia and bipolar disease, in PTSD, cardiovascular disease, diabetes, hypertension.

HARRIS: Contrast those 30 studies to more than a thousand studies that are underway using the much more accessible data set at the British-based UK BioBank, which is arguably the world's leading resource of this sort. The U.K. project has had no security or privacy issues, but still, Gaziano isn't about to let the VA data be that readily accessible.

GAZIANO: I don't think that we completely understand all of the security risks as we move into this new era. So I think we're being quite cautious.

HARRIS: The data are being ported to Department of Energy computers with the hope that more scientists will eventually be able to make use of it.

GAZIANO: We view this as a national resource, and it's a national resource that will not only help veterans but will help all Americans and mankind.

HARRIS: Our second example involves what is largely an extended family - descendants of settlers in Utah. Earlier this year, Intermountain Healthcare in Utah announced it was going to sequence the complete DNA of half a million of its patients, which they say will be the largest collection of its kind. Company scientist Lincoln Nadauld says, these people are a rich and unique resource.

LINCOLN NADAULD: We have families who have been here for three, four, five, six generations, and we've taken care of families for multiple generations. So we have health information, health histories on those families and patients.

HARRIS: Family trees provide a great shortcut for understanding the genetic basis of disease. And to plumb this information, Intermountain has an exclusive deal with a company in Iceland, deCODE Genetics, which is owned by pharmaceutical giant Amgen. DeCODE will do the sequencing and get to scour that information with an eye toward developing new drugs.

NADAULD: It would be natural for deCODE and Amgen to do that, given their expertise and experience there. Conversely, if there's an opportunity to implement some novel discovery or finding into clinical care, Intermountain Healthcare will take the lead on that.

HARRIS: Our third and final example is an effort by the National Institutes of Health to recruit a million Americans for a long-term study of health and genetics. Stephanie Devaney, who is deputy director of the All Of Us program, described something quite different than the homogeneous population from Utah.

STEPHANIE DEVANEY: We do have a very strong goal around diversity, in making sure that the participants who enroll in All Of Us research program reflect the vast diversity of the United States.

HARRIS: They've been budgeted $1 billion in taxpayer money so far, and they expect it will take another five years to recruit their million volunteers. So far, she says they are getting excellent diversity in the sample.

DEVANEY: We set up from the beginning, when we consented our participants, that all different types of researchers would be able to ask for access to the data. We're not limited to just folks that work at a certain institution or even who live within the United States. I mean, we will be open for foreign researchers, and we will be open for folks in the private sector and the government and academia and even, ultimately, citizen scientists.

HARRIS: They still need to work out exactly how they will provide this access while assuring privacy and security. But the goal is to make it as accessible as possible, figuring that the more smart people who can see the data, the more likely they are to make discoveries that can benefit us all. Richard Harris, NPR News. Transcript provided by NPR, Copyright NPR.