Is it crazy to consider community curation?

Sandra Porter

or is it just an idea that's ahead of the curve?

Last week, I was stunned to discover at least 31 papers in an NCBI Gene database entry that were in the entry for the wrong gene. I wrote about this here, here, here, and here.

Now, an oversight like this is a little understandable. The titles of the entries do include the name of the wrong gene (DRD2 - the dopamine D2 receptor). And it was only four years ago that people figured out that the marker in the title of the articles mapped somewhere else.

If computers were responsible for the annotation, well, this would be understandable. The annotation programs would most likely add the article titles without ever noticing that some of them contradict each other.

But unlike GenBank, the NCBI Gene database is supposed to be a curated database.

And there is a mechanism for the community to help out. People have published papers during the last four years that had the marker in the right place. Clearly, I wasn't the first person to find the problem - with the citation, if not with the genetic mapping itself.

To me, it seems like this situation is analogous to my days in the lab. We didn't always live up to this ideal, but there was an implicit ethical standard that said that if you broke a piece of lab equipment or found a piece of broken equipment, you had some responsibility either for fixing it or for seeing it get fixed.

I think it might be time to view our shared informational resources in a similar way. Certainly the authors who published later papers on the TaqI polymorphism knew where the marker mapped and knew that the NCBI Gene database had it wrong. Just like the people who act responsibily in the lab, these authors could have acted responsibly in the information world and submitted corrections to the GeneRIFs.

Maybe this whole idea is absurd. With thousands of databases, researchers can't be expected to keep track of every place that their data has gone. However, many databases pull and repackage subsets of information from the NCBI collection anyway, maybe there could be a few NCBI databases where everyone shares the responsibility and work in keeping them up-to-date.

It could happen.

What will it take?