R.I.P. pfam database

It’s the end of the line for one of my favourite bioinformatics resources, the pfam database.

I found this out the hard way, as I was preparing to give my usual lecture on transcription factors and DNA binding domains in my course this semester. As part of the motivation for the importance of sequence-specific transcription factors, I have always reported the students the latest numbers for the human proteome: How many C2H2 Zn fingers (7983 in >1000 proteins!), Homeodomains (383 in more than 300 proteins), etc. are there in the proteome. I’ve always found this information (and other key numbers: 45% of residues in the human proteome are in pfam domains.) on the pfam website. Sadly, as of this month, the pfam website is no more.

Fortunately, the Pfam data live on (as part of interpro). But finding numbers like the ones I just reported will take actual downloading and searching of the database. Probably I will just use the old numbers now.

More generally, the end of the pfam website raises a bigger question: what happens to bioinformatics resources in the long-term? If I had a set of books where each page was a pfam family (a la Dayhoff’s Atlas of Protein Sequence and Structure), it would have 19,632 pages, about the length of an old-fashioned encyclopedia. Like the old-fashioned encyclopedia, it would quickly be out of date and painfully slow to search. On the other hand, I would have it on my shelf: it couldn’t just vanish for ever. In another example, I was recently trying to provide classic yeast microarray data for my students as part of my class. To my surprise, the Stanford Genome/Microarray database, that provided access to these data for the past 20 years, is gone without a trace. Some of these data are available as part of other resources, but the specific data set I was looking for is apparently just gone.

As bioinformatics and genome biology mature, the finite shelf-life of individual datasets and resources reminds me that ultimately, data and web tools are very important in the scientific moment, but they are also temporary. Science, in the end, is about creating and sharing knowledge and understanding. Those are the shoulders that the next generation of giants will really be able to stand upon.

Leave a Reply