CEGG Research Projects

Current Projects


Arthropod Genomics

Arthropods are a highly successful group of animals that constitute more than 80% of all described living animal species. CEGG has contributed to the analyses of several arthropod genomes, and continues to be involved with new projects to analyse newly sequenced insects and other arthropods.

Conserved Non-Coding Sequences

About 5% of the human genome is under purifying selection although only 1.5% of the genome encodes proteins. The Conserved Non-Coding sequences thus represents an interesting repertoire of functional elements. We are developing a pipeline to identify these CNCs comprehensively across vertebrate lineages and assess their function.


Metagenomics ...

miRNA Target Prediction

The miRNAs post-transcriptionally repress the expression of protein coding genes. Over 1000 miRNAs are suggested in human that affect majority of the messenger RNAs. However, identification of miRNA targets, and more precisely the target repression strength remain challenging. We are developing a comprehensive method to predict miRNA target strength.

Orthology Delineation

The concept of gene orthology (homologous genes from different species) is of interest in e.g functional inference. Given a gene with known function in one species, one can retrieve its orthologous group and then test if the other genes in the group have the same functionality. In addition, orthology is of great interest in evolutionary studies as it enables you to study how well-conserved a given gene is over the phylogeny. Here we have developed a pipeline which takes the initial fasta files, makes the gene-by-gene comparison and then cluster the result in orthologous groups. The groups are published in a publically available database.

Synteny Delineation

Genome rearrangement events (such as inversion, translocation, fusion and fission), which shape genome architectures over millions of years of evolution, are essentially random; however, the outcome may be constrained by possible fitness costs associated with particular break-points. We are developing a pipeline to identify orthologous genomic regions (synteny) across various species, and this should help to distinguish the numerous random breaks from rare “forbidden breaks” that are constrained by selection.

Systems Genetics

Systems Genetics is emerging as a new frontier that goes beyond traditional linkage and association techniques by using network concepts to uncover the cause-and-effect relationships among genes. By taking advantage of high-throughput sequencing and gene expression data, we are using this integrative framework to analyze the variation in genotypes and phenotypes from segregating populations to uncover polygenic interactions underlying complex traits.

Vertebrate Genomics

We have participated in the comparative analyses of several vertebrate genomes include the mouse, rat, chicken and cow. We continue to employ newly sequenced genomes of additional vertebrate species to investigate properties such as the evolutionary forces shaping the size, structure and sequence of the genomes and their encoded genes.

Viral Evolution

Understanding how viruses (from the common cold to HIV) interact with their host, and how they respond and adapt in the presence of anti-viral drugs, is crucial for developing more effective and customized disease therapies. We use deep-sequencing data from clinical samples taken over time to determine the composition and evolutionary dynamics of the viral population, in turn giving clinicians the means to make more informed decisions regarding anti-viral drug therapy.

Viral Genomics

The incredible amount of deep sequence data that can be extracted from clinical samples provides the opportunity to investigate the genetic diversity of a viral population in a particular host, or in a population, in great detail. After mapping the sequences to a viral reference genome and inferring haplotype composition, we use methods to reconstruct phylogenetic trees to infer evolutionary paths a virus may take during the course of an infection in a single individual, or to analyse the genetic diversity of a virus by comparing samples from many individuals.

Viral Protein Structure and Function

For a virus to gain entry into a mammalian cell, the proteins on its outer coating must first interact with host cell-surface receptors. As viruses evolve quickly, certain amino acid changes in these coat proteins can facilitate host cell entry, and sometimes they make the virus more resist to anti-viral drugs and compounds. Here we use various comparative modelling techniques and protein design strategies to better understand how small mutations can lead to big changes in viral behavior.

Published Research

cre overview

Aedes aegypti: the dengue mosquito

Comparative genomic analysis of the Aedes mosquito, the primary vector for yellow fever and dengue fever, with a genome at ~1.38 Gbp is ~5-fold larger in size than the genome of the malaria vector, Anopheles gambiae.

Culex quinquefasciatus: the Southern house mosquito

Genomics and Pathogenomics of the Culex Mosquito: a blood-sucking transmitter of elephantiasis-causing worms and encephalitis-inducing viruses.

Gene Losses

Gene content shows considerable variation among eukaryotic genomes and the availability of many complete-genome sequences allowed us to consistently quantify gene losses in 5 insect and 5 vertebrate species.

Insect Genome Evolution

The sequencing of twelve insect genomes has enabled us to quantify their divergence using synteny conservation and sequence identity of single-copy orthologs.

Insect Immunity

The sequencing of the Aedes aegypti genome, and the availability of the Anopheles gambiae genome provided the first opportunity to undertake a detailed comparative genomic analysis between two mosquito species, both vectors of devastating human diseases.


miROrtho: the catalogue of animal microRNA genes - access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data.

Nasonia vitripennis: the parasitoid wasp

Comparative analysis of Nasonia genomes reveals many features that could be useful to pest control and medicine, and to enhance our understanding of genetics and evolution.

Pediculus humanus: the human body louse

Comparative analysis of the Pediculus genome offers new insights into the intriguing biology of this disease-vector insect.

Picornavirus Genomics

Phylogeny and genomic features human rhinoviruses, cis-acting replication elements that define human enterovirus and rhinovirus species, genotyping of circulating picornaviruses.

Tribolium castaneum: the red flour beetle

Comparative genomic analysis of the Tribolium beetle, a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products.