Thursday, 30 August 2007

Orthologs and Paralogs

I am sitting in a talk (Interactome meeting) and the speaker is using InParanoid orthologs. At Ensembl we've adopted the TreeFam scheme for ortholog definition, and after alot of sweat to create statistics that assess the difference between orthologs sets, there is not a huge difference between InParanoid and TreeFam/Ensembl ortholog calls. (TreeFam/Ensembl is a little better, of course, but it always amazes me how good "simple" approaches can be).

But the real benefit in the TreeFam scheme is the use of genuine phylogenetic trees than just ortholog lists. The tree is the best way to represent the evolution of the gene family. At Ensembl we annotate internal nodes of the tree as either speciation or duplication nodes. From this one can ask far more sophisticated questions than just "which gene human is the ortholog of this gene in drosophila". One can ask for example "what are the ancient paralogs of this human gene due to the presumed whole genome duplication in vertebrates" or "for this expanded gene family, which genes would be present in the putative eutherian ancestor".

We visualise trees using GeneTreeView:

http://www.ensembl.org/Homo_sapiens/genetreeview?db=core;gene=ENSG00000120306


These trees are nice to see, but now are a bit unwidely due to the number of species in Ensembl - we need to have options to show "just these species".