Friday, 17 August 2007

Sequence align view

A recent addition to Ensembl has been sequence alignview, to handle resequencing information. An example link is:;individuals=HuAA;individuals=HuBB;individuals=HuCC

The framework for this data has been in placefor a while. Now we have probably the most obvious display of this - a multiple alignment of individuals or strains. For human individuals, as well as the 4 "Celera" humans, we will have Craig Venter's genome and Jim Watson's genome in soon. (There has been a persistent rumour that one of the 4 celera individuals was Craig, so that probably gives us 5 individuals overall, and only two, Craig and Jim, with high enough coverage to call Hetreozygote positions).

This differs from SNP data in one crucial way. One knows the difference between a base which is the same as the reference from a base which is not ascertained. This is critical for a bunch of applications. There are a whole bunch of headaches - aligning this many reads is just an engineering challenge first off, then dealing with issues about structural variants and hetreozygote calling is non trivial. But it is definitely the way the world is going, and this framework allows us to handle resequencing data in humans - and other species - elegantly.