Thursday, 17 December 2015

5th genome of Christmas: The Fly

The humble fruit fly – Drosophila melanogaster, to be specific – has played a central role in the history of genetics and molecular biology and continues to be important in research. Championed by the legendary Thomas Morgan at the start of the 20th Century, Drosophila provided a practical foundation for genetics – long before the discovery of DNA as vehicle for passing down heritable information through generations. Morgan and colleagues developed the concepts of 'gene' and 'linkage', and so we have 'Morgans' (and more commonly, centi-Morgans, cM) as the basic units of genetic maps.

You could argue that even the modern approach to genetics and molecular biology research was formed around this creature. The fly has influenced the way laboratories choose a direction of study and the way they share materials and data internationally, which was as critical to the success of early genetics as it is now.

After this strong start, Drosophila kept its momentum during the discovery of DNA, molecular biology and early DNA cloning. Performing large-scale, 'forward genetic' screens, where (one hopes) every possible gene has been knocked out at least once so one can look for specific phenotypes, has unearthed a rich seam of genes involved in development. These days the innovation continues with, amazingly, fly-brain manipulation at a neuronal level.

You can see the footprints of Drosophila research everywhere. The playful Drosophila naming scheme allows for gene names such as “tinman” (mutant flies that don’t have a heart), “dunce” (unable to navigate simple fruit-fly mazes), and “Antennapedia” (antennae are swapped for legs), which permeate biology. The human gene “Sonic Hedgehog” is named after its “hedgehog” ortholog in fly. The “polycomb” in “polycomb repressive complex” (one of the key genome-switching mechanisms) comes from the subtle mutation that adds more bristles (i.e., a comb) onto the fruit-fly's back. Fly molecular biologists are part of a long and great tradition, and are understandably proud of their community’s impact and continuing influence.

This explains a bit why fly genomicists were feeling a bit frustrated in the late 90s, when it became clear that the worm – usually a bit of a 'junior partner' in the metazoan model-organism world – was going to have its genome completed well before the fly. The fly genome project had done quite a bit of groundwork: a century of research had produced excellent genetic maps, helped by a clever trick involving the salivary gland chromosomes (which, bizarrely, duplicate so much that you can see them easily under a microscope). But the project had not committed to the same step-by-step sequencing efforts that the worm community had.

And then came a golden opportunity.

Craig Venter had aligned both investors and technologists to “overtake” the public human genome project with a privately funded project led by the company Celera. To do so, he had assembled a group of scientists including the brilliant computer scientist Gene Myers, who claimed the piecemeal approach taken by the worm and human projects was not necessary. Instead, he posited that a whole-genome shotgun approach was computationally feasible (more on this in another 'Christmas genome' post). Many people didn’t believe him. Others who might have given him the benefit of the doubt found it to be too risky a strategy. Craig and team were ready to bet on it.

But they needed a test project - a genome that was not as big human, but complex and worth doing.

So the Great and the Good of Drosophila, notably Gerry Rubin and Michael Ashburner, pitched the fly to Celera. In 1998/1999, its genome was 'shotgunned' and the genome became the first large, whole-genome shotgun assembly – published in 2000. 

Although shotgun assembly and automatic (computational) annotation are now commonplace, at the time this was radical stuff. There was talk of the largest computational farm ever assembled for biology at Celera, of this whole upstart world of bioinformatics and computational biology being poised to revolutionise biology. This was the dot-com era, so at the same time people were talking about new business models, and how the internet was changing everything.

The Drosophila genome work was happening when I was just ending my PhD. I went to Celera for the Drosophila genome jamboree, and GeneWise - my insanely computationally expensive software for error-tolerant protein or protein HMM alignment - was run across the genome. I also met and chatted with Gene for a while, which was my first exposure to the guts of the assembly problem. But perhaps most of all I realised that the geeks were definitely at the top table - designing and creating the experiments, not just processing the data.