Thursday, 7 November 2013

Heterogeneity in Cancer genomics


I've just come back from a great meeting on Cancer Genomics, held at EMBL Heidelberg (full disclosure: I was an organiser, so no surprise I enjoyed the talks!)

The application of genomics to cancer has been progressing for a long time, but we are now in the era where "cheap enough" exome sequencing (and increasingly whole genome sequencing) is present for both fundamental cancer research and clinical research - and there is really a sense of starting to "mainstream" sequencing into clinical care (clinical care and clinical research seem closer in the Cancer field than some other areas of medicine).


A Cluster of breast cancer cells showing visual evidence of programmed cell death (apoptosis) in yellow. Credit: Annie Cavanagh, Wellcome Images
Before I go into more detail, just a reminder for people not used to thinking about Cancer. Cancer is really a large collection of diseases where a collection of cells in the body is growing uncontrollably. For this to happen, there is always some genomic changes of the cancer cell, and sometimes quite extensive changes (there is also quite a bit of other knock-on changes in RNA and epigenetics, and possibly some of the initiating events are epigenetic changes, though it's pretty clear it the majority of cancers DNA changes are the main culprit). For a cancer cell to become a concern it not only has to start dividing, but it also has to circumvent a considerable amount of both intra-cellular and immune system based monitoring of its growth, and then if it is in a tissue, it has to encourage blood vessels to grow towards it to feed it nutrients etc - basically there are a lot of changes and features for a single cancer cell to become a tumour.

The advent of cheap(ish) genomics leads to a very simple sounding experiment: sequence cancer genomes so we have a catalog of these genomic changes. This is more technically demanding that it at first looks. Firstly by the word "changes" we mean "changes from the healthy tissue" - and of course each person is unique. So - to know the difference between the cancer genome and normal, one needs to sequence the genome of the individual who has cancer. The second problem is that the human genome is very big (3 billion bases) and this means one has to be very accurate in the sequencing of both the cancer and the normal; even a small final error rate will cause a considerable number of sequencing artefacts. This means both the cancer genome and the healthy genome needs to be sequenced at high depth to give that low error rate, and that you have to be really careful about variant calling. The third problem is that for the majority of cancers there is always a mixture of normal and cancer cells in a tumour sample (normal cells are both the surrounding tissue, and things like blood vessels which have been encouraged to feed to the tumour from the cancer, and immune system cells trying to attack the cancer). Furthermore the cancer continues to evolve, which different cancer cells changing their genome even more (very often the DNA repair and genome stability mechanisms are damaged in cancer), so there isn't even a sense of "1" cancer genome in a tumour. The fourth problem is that the genomic changes are not just  the simple changes of single bases. There are all sort of other things, in particular whole scale movements, losses and duplications of chromosomes (something that had been recognised in cancer for some time) as well as far more "focal" medium scale amplifications or losses. As well as these being challenging to "call" from sequencing data that comes in only 100-200 letter runs (these changes will be thousands to multiple millions of letters long), it also plays havoc in understanding how to call the single base pair changes in the context of all this other stuff.

Pretty much as soon as there was cheap sequencing people started to apply this to cancer genomics, but it has taken time to get on top of all these issues - and a considerable amount of these problems are in the informatics and methods - as well as getting good samples with good DNA for sequencing. But now there really is a steady stream of cancer genome projects in the 100s of cancers for a particular cancer type being reported both from the American TCGA umbrella project and the "rest of the world" ICGC umbrella project. Cancers are divided mainly by the tissue of origin (eg, Bone cancer, Breast Cancer, Colorectal Cancer) and then sometimes sub divided by features that one can see by looking at the cancer under a microscope (so called histo-pathology). Each cancer project will be taking a well defined cancer type and doing a number of cancers (in the 100s at the moment). Currently we are far far better at analysing changes to protein coding genes, so an effective approach is to focus on exomes.


Back now to the meeting. For me the major theme of the meeting was heterogeneity - heterogeneity between cancers - some cancers have relatively low number of changes (like this astrocytoma study from the DKFZ and EMBL guys, presented by Peter Lichter from DKFZ) and only knock out one or two pathways, some are just all over the place (here's a lung cancer study from TCGA, part of a tour of the TCGA pan cancer analysis from Josh Stuart); heterogeneity between patients - some cancers that look like a histologically similar have very different genomic alterations; hetereogeneity over time - cancers often come back (recur) and this is often due to a single rare change in the original cancer that was resistant to treatment. Elaine Mardis and Sam Aparicio presented results with the general theme of tracking cancer mutations longitudinally. And then there is the long list of "ways a cancer genome changes", with Jan Korbel presenting the work on Chromothripsis (shattering) in medulloblastoma. Naz Rahman showed how germline cancer pre-disposition was very variable, and a surprising feature of mosaic-ism associated with cancer (biology can endlessly surprise one!).


This heterogeneity in cancers is both a positive and a negative for clinical use of genomic sequence. The positive is that the current low response rate to treatment of some cancers may well be a function of not choosing the right drug for the right molecular type of cancer. By "typing" the cancer better, there can be better tailored treatment. The negative is that the high heterogeneity between patients means that doing well structured trials is hard. Not only are there the challenges of just turning around the whole cancer sequencing and analysis process in time for results (something elegantly presented by Steve Jones from British Columbia Cancer Agency) but the rapid branching of options means that having a simple treat with A/treat with B randomisation scheme is hard (and the confounding between feasible treatment options for a molecular subtype and the aggressiveness of the cancer makes this really annoying). Andrew Biankin - previously from Brisbane and now in Glasgow presented impressive work from their Australian Pancreatic cancer (some of which is published here). To have a good baseline of effective treatments for which one understands the molecular components one needs a thorough and controlled investigation of genomic legions vs drug response. Ultan McDermott presented the systematic cancer screening work (some of which published here) from the Sanger Institute. Finally there was a rather sobering presentation from Ivo Gut (CNAG, Barcelona) on the hetreogeneity in sequencing itself and somatic variant calling - it's clear as a community we have to get tighter and understand better this process (this is a clear cut reason why we must have the ability to go back to at least the sequence level data for cancer, and probably stay that way in research for at least another 5 years).


At some level this heterogeneity is daunting - it is going to take a lot of samples with careful analysis to sort out both what is going on biologically in cancers and then how to leverage that knowledge into improving treatments. That said, this heterogeneity is not something generated by these experiments - this is how it is for cancer, and this is task we have to collectively take on. As a bioinformatician there is both the rather conceptually mundane aspect of minimising technical variance, and it stresses again the importance of keeping raw sequence data available in archives such as EGA. In addition, I am concerned that it is going to be very easy to find correlations between all sorts of things in these datasets - between types of mutations and outcomes, or between types of RNA expression and changes, or structural variants. What will be far, far harder is deciding on whether these changes are causal (the phrase used in this field are "cancer drivers") or whether it is the consequence of a more complex process that confounds the correlation. Peter Campbell from the Sanger Institute gave a great, detailed talk dissecting out one recurrent mutation mechanism in ALL which, if you didn't know about it, would like a potential cancer driver.


But the other side of the coin is that this heterogeneity means that even sorting out things in a couple of areas might have a big effect. The "Extreme Responders" shown by Andrew Biankin - people who got targeted therapies and had remarkable improvements shows some of the potential, in particular for cancers such as Pancreatic cancer where the 5 year survival rate is a depressing 2%. Even small gains in understanding might have a real impact on this number. And we're early in this game - as the sample numbers go up from 100s to 1000s (and I am sure in the 10,000s in the future) we will have more power to sort out some of this hetereogeneity in all these areas. The pan cancer analysis - the first by the TCGA at the exome level, published this year (http://www.nature.com/tcga/) and the future plans by the ICGC to have a whole genome analysis is the start of this.


I am by nature a glass half full sort of person, and optimistic about the future of cancer genomics - but realistic about the task, and the fact that this will need to draw on all the talents of many oncologists, clinical geneticists, genomicists, bioinformaticians and mechanistic basic biology researchers worldwide.