Research News



Hi, on this page (more of a blog) I give short summaries of some of our current themes in research, in case of interest. This webpage isn't updated regularly however! Though having a background in machine learning, biomedical applications of machine learning has been a draw over the last few years. Firstly, the biomedical sciences are increasingly generating very large datasets (think of the human genome). In addition, there are often a wide variety of different types of data coming from the same patient. This suggests the use of algorithms involving data integration: a particular interest when we were working more on algorithm development. Secondly, of course, some of these types of project are potentially very high impact.

  • An ongoing theme of our work has been the development of classifiers for predicting the functional impact of human genetic variation i.e. if the variation is pathogenic (a disease-driver) or neutral. Some of this work involves the impact of single nucleotide variants (A->C or C->T, for example, in the human genome sequence) but we are also interested in predicting the pathogenic status of indels (short insertions or deletions of genetic code) and related problems, such as discovering functionally significant combinations of variants involved in disease. We are further interested in the development of disease-specific predictors, e.g. for cancer. Prediction in this context involves using about 30 different types of data and thus we use data integration algorithms extensively. This project involves Mark Rogers (ISL) and Tom Gaunt (Social Medicine), and others, and it is more fully described on the Available Software tab to the left (under FATHMM-MKL and CScape) and technical papers on my Homepage (see Recent Journal papers). We are members of the Functional Effects GeCiP (headed by Ewan Birney, EBI) of the Genomics England (100,000 genomes) project, where we expect to use these methods on rare disease genomes and cancer genomes.


  • 20% of men get prostate cancer during their lifetime and 3% die from the disease. Thus the disease is largely not life-threatening, with individuals dying with the disease, and not from it. Treatment has associated risks and should ideally be targeted at those for whom the disease would be life-threatening. As stated, this is a classic machine learning problem in which previously catalogued genomic and clinical data is used to train an algorithm which predicts if the tumor is aggressive or benign at diagnosis. Alas, prostate cancer tumors are typically very heterogeneous, most often having different genetic signatures in different regions of the tumor, and this is not readily tractable as a prospect. In the period 2005-2009, we developed an algorithmic approach called Latent Process Decomposition (LPD): most of the papers concerned are on the Publications:Bioinformatics tab to the left. The LPD algorithm attempts to maximise the probability of the model given the data. An interesting aspect of the method is that it is a mixed membership model, that is, a biopsy sample from a patient is represented as a combinatorial mixture over a number of underlying active states or processes. It is, in short, very suited to the analysis of prostate cancer samples and, indeed, many other cancers, because of the heterogeneities involved. For this reason we used LPD on prostate cancer in an earlier study with Colin Cooper's group, then at the Institute of Cancer Research, London (CS Cooper, C Campbell, S Jhavar. Nature Reviews Urology 4 (12), 677-687 (2007)). Colin Cooper's group has now acquired improved data and LPD has now given substantially improved resolution of the disease (see, e.g. national, international coverage, cancer news or the technical paper). Specifically, it has indicated a 45 gene signature (the DESNT signature) which plays a core role in the ability of the tumor to progress towards aggressive outcome. We are working with Colin Cooper and his team to progress understanding of this signature, via use of novel data and alterations to the algorithmic framework.


  • With David Wraith's team (Institute of Immunology, University of Birmingham) we are interested in using machine learning to further understand Multiple Sclerosis. David's team, Apitope Ltd and collaborators have developed the peptide vaccine, ATX-MS-1467. This agent has been found to be safely tolerated and to deliver a significant reduction in disease-related activity (as measured by gadolinium enhanced MRI scan) in phase 1 and 2 drug trials (by Merck Serono). This ongoing project is focussed on understanding alteration to the disease process effected by ATX-MS-1467, in addition to acquiring any novel insights into disease-driver mechanisms.