|
Navigation
Here you will find an archive of publications published by our group members. Social insect genomes: dramatic evolution in gene composition & regulation, preserving regulatory features linked to socialitySimola DF, Wissler L, Donahue G, Waterhouse RM, Helmkampf M, Roux J, Nygaard S, Glastad K, Hagen DE, Viljakainen L, Reese JT, Hunt BG, Graur D, Elhaik E, Kriventseva E, Wen J, Parker BJ, Cash E, Privman E, Childers CP, Munos-Torres MC, Boomsma JJ, Bornberg-Bauer E, Currie C, Elsik CG, Suen G, Goodisman MA, Keller L, Liebig J, Rawls A, Reinberg D, Smith CD, Smith CR, Tsutsui N, Wurm Y, Zdobnov EM, Berger SL, Gadau J. Genome Research PMID: 23636946 Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ~4,000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared to Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of non-coding regulatory elements, however extant conserved regions are enriched for novel non-coding RNAs and transcription factor binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., CREB) and trans (e.g., Forkhead) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, as two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared to other ants or Drosophila. Thus, while the "socio-genomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineage-specific eusocial adaptations.
OrthoDB: a hierarchical catalog of animal, fungal and bacterial orthologsWaterhouse RM, Tegenfeldt F, Li J, Zdobnov EM, Kriventseva EV. Nucleic Acids Res. PMID: 23180791 The concept of orthology provides a foundation for formulating hypotheses on gene and genome evolution, and thus forms the cornerstone of comparative genomics, phylogenomics and metagenomics. We present the update of OrthoDB-the hierarchical catalog of orthologs (http://www.orthodb.org). From its conception, OrthoDB promoted delineation of orthologs at varying resolution by explicitly referring to the hierarchy of species radiations, now also adopted by other resources. The current release provides comprehensive coverage of animals and fungi representing 252 eukaryotic species, and is now extended to prokaryotes with the inclusion of 1115 bacteria. Functional annotations of orthologous groups are provided through mapping to InterPro, GO, OMIM and model organism phenotypes, with cross-references to major resources including UniProt, NCBI and FlyBase. Uniquely, OrthoDB provides computed evolutionary traits of orthologs, such as gene duplicability and loss profiles, divergence rates, sibling groups, and now extended with exon-intron architectures, syntenic orthologs and parent-child trees. The interactive web interface allows navigation along the species phylogenies, complex queries with various identifiers, annotation keywords and phrases, as well as with gene copy-number profiles and sequence homology searches. With the explosive growth of available data, OrthoDB also provides mapping of newly sequenced genomes and transcriptomes to the current orthologous groups.
Identification of Site-Specific Adaptations Conferring Increased Neural Cell Tropism during Human Enterovirus 71 Infection.Cordey S, Petty TJ, Schibler M, Martinez Y, Gerlach D, van Belle S, Turin L, Zdobnov EM, Kaiser L, Tapparel C. PLoS Pathog PMID: 22910880 Enterovirus 71 (EV71) is one of the most virulent enteroviruses, but the specific molecular features that enhance its ability to disseminate in humans remain unknown. We analyzed the genomic features of EV71 in an immunocompromised host with disseminated disease according to the different sites of infection. Comparison of five full-length genomes sequenced directly from respiratory, gastrointestinal, nervous system, and blood specimens revealed three nucleotide changes that occurred within a five-day period: a non-conservative amino acid change in VP1 located within the BC loop (L97R), a region considered as an immunogenic site and possibly important in poliovirus host adaptation; a conservative amino acid substitution in protein 2B (A38V); and a silent mutation in protein 3D (L175). Infectious clones were constructed using both BrCr (lineage A) and the clinical strain (lineage C) backgrounds containing either one or both non-synonymous mutations. In vitro cell tropism and competition assays revealed that the VP1(97) Leu to Arg substitution within the BC loop conferred a replicative advantage in SH-SY5Y cells of neuroblastoma origin. Interestingly, this mutation was frequently associated in vitro with a second non-conservative mutation (E167G or E167A) in the VP1 EF loop in neuroblastoma cells. Comparative models of these EV71 VP1 variants were built to determine how the substitutions might affect VP1 structure and/or interactions with host cells and suggest that, while no significant structural changes were observed, the substitutions may alter interactions with host cell receptors. Taken together, our results show that the VP1 BC loop region of EV71 plays a critical role in cell tropism independent of EV71 lineage and, thus, may have contributed to dissemination and neurotropism in the immunocompromised patient.
Structural basis of transcriptional gene silencing mediated by Arabidopsis MOM1.Nishimura T, Molinard G, Petty TJ, Broger L, Gabus C, Halazonetis TD, Thore S, Paszkowski J. PLoS Genet PMID: 22346760 Shifts between epigenetic states of transcriptional activity are typically correlated with changes in epigenetic marks. However, exceptions to this rule suggest the existence of additional, as yet uncharacterized, layers of epigenetic regulation. MOM1, a protein of 2,001 amino acids that acts as a transcriptional silencer, represents such an exception. Here we define the 82 amino acid domain called CMM2 (Conserved MOM1 Motif 2) as a minimal MOM1 fragment capable of transcriptional regulation. As determined by X-ray crystallography, this motif folds into an unusual hendecad-based coiled-coil. Structure-based mutagenesis followed by transgenic complementation tests in plants demonstrate that CMM2 and its dimerization are effective for transcriptional suppression at chromosomal loci co-regulated by MOM1 and the siRNA pathway but not at loci controlled by MOM1 in an siRNA-independent fashion. These results reveal a surprising separation of epigenetic activities that enable the single, large MOM1 protein to coordinate cooperating mechanisms of epigenetic regulation.
A remarkably stable TipE gene cluster: evolution of insect Para sodium channel auxiliary subunitsLi J, Waterhouse RM and Zdobnov EM BMC Evolutionary Biology 2011, 11:337 (18 November 2011) PMID: 22098672 Background
First identified in fruit flies with temperature-sensitive paralysis phenotypes, the Drosophila melanogaster TipE locus encodes four voltage-gated sodium (NaV) channel auxiliary subunits. This cluster of TipE-like genes on chromosome 3L, and a fifth family member on chromosome 3R, are important for the optional expression and functionality of the Para NaV channel but appear quite distinct from auxiliary subunits in vertebrates. Here, we exploited available arthropod genomic resources to trace the origin of TipE-like genes by mapping their evolutionary histories and examining their genomic architectures.
Results
We identified a remarkably conserved synteny block of TipE-like orthologues with well-maintained local gene arrangements from 21 insect species. Homologues in the water flea, Daphnia pulex, suggest an ancestral pancrustacean repertoire of four TipE-like genes; a subsequent gene duplication may have generated functional redundancy allowing gene losses in the silk moth and mosquitoes. Intronic nesting of the insect TipE gene cluster probably occurred following the divergence from crustaceans, but in the flour beetle and silk moth genomes the clusters apparently escaped from nesting. Across Pancrustacea, TipE gene family members have experienced intronic nesting, escape from nesting, retrotransposition, translocation, and gene loss events while generally maintaining their local gene neighbourhoods. D. melanogaster TipE-like genes exhibit coordinated spatial and temporal regulation of expression distinct from their host gene but well-correlated with their regulatory target, the Para NaV channel, suggesting that functional constraints may preserve the TipE gene cluster. We identified homology between TipE-like NaV channel regulators and vertebrate Slo-beta auxiliary subunits of big-conductance calcium-activated potassium (BKCa) channels, which suggests that ion channel regulatory partners have evolved distinct lineage-specific characteristics.
Conclusions
TipE-like genes form a remarkably conserved genomic cluster across all examined insect genomes. This study reveals likely structural and functional constraints on the genomic evolution of insect TipE gene family members maintained in synteny over hundreds of millions of years of evolution. The likely common origin of these NaV channel regulators with BKCa auxiliary subunits highlights the evolutionary plasticity of ion channel regulatory mechanisms.
Rhinovirus genome variation during chronic upper and lower respiratory tract infections.Tapparel C, Cordey S, Junier T, Farinelli L, Van Belle S, Soccal PM, Aubert JD, Zdobnov EM, Kaiser L. PLoS One PMID: 21713005 Routine screening of lung transplant recipients and hospital patients for respiratory virus infections allowed to identify human rhinovirus (HRV) in the upper and lower respiratory tracts, including immunocompromised hosts chronically infected with the same strain over weeks or months. Phylogenetic analysis of 144 HRV-positive samples showed no apparent correlation between a given viral genotype or species and their ability to invade the lower respiratory tract or lead to protracted infection. By contrast, protracted infections were found almost exclusively in immunocompromised patients, thus suggesting that host factors rather than the virus genotype modulate disease outcome, in particular the immune response. Complete genome sequencing of five chronic cases to study rhinovirus genome adaptation showed that the calculated mutation frequency was in the range observed during acute human infections. Analysis of mutation hot spot regions between specimens collected at different times or in different body sites revealed that non-synonymous changes were mostly concentrated in the viral capsid genes VP1, VP2 and VP3, independent of the HRV type. In an immunosuppressed lung transplant recipient infected with the same HRV strain for more than two years, both classical and ultra-deep sequencing of samples collected at different time points in the upper and lower respiratory tracts showed that these virus populations were phylogenetically indistinguishable over the course of infection, except for the last month. Specific signatures were found in the last two lower respiratory tract populations, including changes in the 5'UTR polypyrimidine tract and the VP2 immunogenic site 2. These results highlight for the first time the ability of a given rhinovirus to evolve in the course of a natural infection in immunocompromised patients and complement data obtained from previous experimental inoculation studies in immunocompetent volunteers.
An induced fit mechanism regulates p53 DNA binding kinetics to confer sequence specificity.Petty TJ, Emamzadah S, Costantino L, Petkova I, Stavridi ES, Saven JG, Vauthey E, Halazonetis TD. EMBO J. 2011 Jun 1;30(11):2167-76. Epub 2011 Apr 26. PMID: 21522129 The p53 tumour suppressor gene, the most frequently mutated gene in human cancer, encodes a transcription factor that contains sequence-specific DNA binding and homo-tetramerization domains. Interestingly, the affinities of p53 for specific and non-specific DNA sites differ by only one order of magnitude, making it hard to understand how this protein recognizes its specific DNA targets in vivo. We describe here the structure of a p53 polypeptide containing both the DNA binding and oligomerization domains in complex with DNA. The structure reveals that sequence-specific DNA binding proceeds via an induced fit mechanism that involves a conformational switch in loop L1 of the p53 DNA binding domain. Analysis of loop L1 mutants demonstrated that the conformational switch allows DNA binding off-rates to be regulated independently of affinities. These results may explain the universal prevalence of conformational switching in sequence-specific DNA binding proteins and suggest that proteins like p53 rely more on differences in binding off-rates, than on differences in affinities, to recognize their specific DNA sites.
Loss of Dicer in Sertoli cells has a major impact on the testicular proteome of mice.Papaioannou MD, Lagarrigue M, Vejnar CE, Rolland AD, Kühne F, Aubry F, Schaad O, Fort A, Descombes P, Neerman-Arbez M, Guillou F, Zdobnov EM, Pineau C, Nef S. Molecular & Cellular Proteomics PMID: 20467044 Sertoli cells (SCs) are the central, essential coordinators of spermatogenesis, without which germ cell development cannot occur. We previously showed that Dicer, an RNaseIII endonuclease required for microRNA (miRNA) biogenesis, is absolutely essential for Sertoli cells to mature, survive, and ultimately sustain germ cell development. Here, using isotope-coded protein labeling, a technique for protein relative quantification by mass spectrometry, we investigated the impact of Sertoli cell-Dicer and subsequent miRNA loss on the testicular proteome. We found that, a large proportion of proteins (50 out of 130) are up-regulated by more that 1.3-fold in testes lacking Sertoli cell-Dicer, yet that this protein up-regulation is mild, never exceeding a 2-fold change, and is not preceeded by alterations of the corresponding mRNAs. Of note, the expression levels of six proteins of interest were further validated using the Absolute Quantification (AQUA) peptide technology. Furthermore, through 3'UTR luciferase assays we identified one up-regulated protein, SOD-1, a Cu/Zn superoxide dismutase whose overexpression has been linked to enhanced cell death through apoptosis, as a likely direct target of three Sertoli cell-expressed miRNAs, miR-125a-3p, miR-872 and miR-24. Altogether, our study, which is one of the few in vivo analyses of miRNA effects on protein output, suggests that, at least in our system, miRNAs play a significant role in translation control.
Silencing of c-Fos expression by microRNA-155 is critical for dendritic cell maturation and function.Dunand-Sauthier I, Santiago-Raber ML, Capponi L, Vejnar CE, Schaad O, Irla M, Seguín-Estévez Q, Descombes P, Zdobnov EM, Acha-Orbea H, Reith W. Blood PMID: 21385848 MicroRNAs (miRNAs) are small, noncoding RNAs that regulate target mRNAs by binding to their 3' untranslated regions. There is growing evidence that microRNA-155 (miR155) modulates gene expression in various cell types of the immune system and is a prominent player in the regulation of innate and adaptive immune responses. To define the role of miR155 in dendritic cells (DCs) we performed a detailed analysis of its expression and function in human and mouse DCs. A strong increase in miR155 expression was found to be a general and evolutionarily conserved feature associated with the activation of DCs by diverse maturation stimuli in all DC subtypes tested. Analysis of miR155-deficient DCs demonstrated that miR155 induction is required for efficient DC maturation and is critical for the ability of DCs to promote antigen-specific T-cell activation. Expression-profiling studies performed with miR155(-/-) DCs and DCs overexpressing miR155, combined with functional assays, revealed that the mRNA encoding the transcription factor c-Fos is a direct target of miR155. Finally, all of the phenotypic and functional defects exhibited by miR155(-/-) DCs could be reproduced by deregulated c-Fos expression. These results indicate that silencing of c-Fos expression by miR155 is a conserved process that is required for DC maturation and function.
The ecoresponsive genome of Daphnia pulexColbourne JK, Pfrender ME, Gilbert D, Thomas WK, Tucker A, Oakley TH, Tokishita S, Aerts A, Arnold GJ, Basu MK, Bauer DJ, Cáceres CE, Carmel L, Casola C, Choi JH, Detter JC, Dong Q, Dusheyko S, Eads BD, Fröhlich T, Geiler-Samerotte KA, Gerlach D, Hatcher P, Jogdeo S, Krijgsveld J, Kriventseva EV, Kültz D, Laforsch C, Lindquist E, Lopez J, Manak JR, Muller J, Pangilinan J, Patwardhan RP, Pitluck S, Pritham EJ, Rechtsteiner A, Rho M, Rogozin IB, Sakarya O, Salamov A, Schaack S, Shapiro H, Shiga Y, Skalitzky C, Smith Z, Souvorov A, Sung W, Tang Z, Tsuchiya D, Tu H, Vos H, Wang M, Wolf YI, Yamagata H, Yamada T, Ye Y, Shaw JR, Andrews J, Crease TJ, Tang H, Lucas SM, Robertson HM, Bork P, Koonin EV, Zdobnov EM, Grigoriev IV, Lynch M, Boore JL. Science PMID: 21292972 We describe the draft genome of the microcrustacean Daphnia pulex, which is only 200 megabases and contains at least 30,907 genes. The high gene count is a consequence of an elevated rate of gene duplication resulting in tandem gene clusters. More than a third of Daphnia's genes have no detectable homologs in any other available proteome, and the most amplified gene families are specific to the Daphnia lineage. The coexpansion of gene families interacting within metabolic pathways suggests that the maintenance of duplicated genes is not random, and the analysis of gene expression under different environmental conditions reveals that numerous paralogs acquire divergent expression patterns soon after duplication. Daphnia-specific genes, including many additional loci within sequenced regions that are otherwise devoid of annotations, are the most responsive genes to ecological challenges.
OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.Waterhouse RM, Zdobnov EM, Tegenfeldt F, Li J, Kriventseva EV. Nucleic Acids Res. PMID: 20972218 The concept of homology drives speculation on a gene's function in any given species when its biological roles in other species are characterized. With reference to a specific species radiation homologous relations define orthologs, i.e. descendants from a single gene of the ancestor. The large-scale delineation of gene genealogies is a challenging task, and the numerous approaches to the problem reflect the importance of the concept of orthology as a cornerstone for comparative studies. Here, we present the updated OrthoDB catalog of eukaryotic orthologs delineated at each radiation of the species phylogeny in an explicitly hierarchical manner of over 100 species of vertebrates, arthropods and fungi (including the metazoa level). New database features include functional annotations, and quantification of evolutionary divergence and relations among orthologous groups. The interface features extended phyletic profile querying and enhanced text-based searches. The ever-increasing sampling of sequenced eukaryotic genomes brings a clearer account of the majority of gene genealogies that will facilitate informed hypotheses of gene function in newly sequenced genomes. Furthermore, uniform analysis across lineages as different as vertebrates, arthropods and fungi with divergence levels varying from several to hundreds of millions of years will provide essential data for uncovering and quantifying long-term trends of gene evolution. OrthoDB is freely accessible from http://cegg.unige.ch/orthodb.
Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi.Waterhouse RM, Zdobnov EM, Kriventseva EV. Genome Biol Evol PMID: 21148284 Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups, and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multi-copy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the 'knockout-rate prediction' of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multi-copy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously under-appreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of "single-copy control" versus "multi-copy license" may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space.
Pathogenomics of Culex quinquefasciatus and meta-analysis of infection responses to diverse pathogensBartholomay LC, Waterhouse RM, Mayhew GF, Campbell CL, Michel K, Zou Z, Ramirez JL, Das S, Alvarez K, Arensburger P, Bryant B, Chapman SB, Dong Y, Erickson SM, Karunaratne SH, Kokoza V, Kodira CD, Pignatelli P, Shin SW, Vanlandingham DL, Atkinson PW, Birren B, Christophides GK, Clem RJ, Hemingway J, Higgs S, Megy K, Ranson H, Zdobnov EM, Raikhel AS, Christensen BM, Dimopoulos G, Muskavitch MA. Science PMID: 20929811 The mosquito Culex quinquefasciatus poses a substantial threat to human and veterinary health as a primary vector of West Nile virus (WNV), the filarial worm Wuchereria bancrofti, and an avian malaria parasite. Comparative phylogenomics revealed an expanded canonical C. quinquefasciatus immune gene repertoire compared with those of Aedes aegypti and Anopheles gambiae. Transcriptomic analysis of C. quinquefasciatus genes responsive to WNV, W. bancrofti, and non-native bacteria facilitated an unprecedented meta-analysis of 25 vector-pathogen interactions involving arboviruses, filarial worms, bacteria, and malaria parasites, revealing common and distinct responses to these pathogen types in three mosquito genera. Our findings provide support for the hypothesis that mosquito-borne pathogens have evolved to evade innate immune responses in three vector mosquito species of major medical importance.
Sequencing of Culex quinquefasciatus establishes a platform for mosquito comparative genomicsArensburger P, Megy K, Waterhouse RM, Abrudan J, Amedeo P, Antelo B, Bartholomay L, Bidwell S, Caler E, Camara F, Campbell CL, Campbell KS, Casola C, Castro MT, Chandramouliswaran I, Chapman SB, Christley S, Costas J, Eisenstadt E, Feschotte C, Fraser-Liggett C, Guigo R, Haas B, Hammond M, Hansson BS, Hemingway J, Hill SR, Howarth C, Ignell R, Kennedy RC, Kodira CD, Lobo NF, Mao C, Mayhew G, Michel K, Mori A, Liu N, Naveira H, Nene V, Nguyen N, Pearson MD, Pritham EJ, Puiu D, Qi Y, Ranson H, Ribeiro JM, Roberston HM, Severson DW, Shumway M, Stanke M, Strausberg RL, Sun C, Sutton G, Tu ZJ, Tubio JM, Unger MF, Vanlandingham DL, Vilella AJ, White O, White JR, Wondji CS, Wortman J, Zdobnov EM, Birren B, Christensen BM, Collins FH, Cornel A, Dimopoulos G, Hannick LI, Higgs S, Lanzaro GC, Lawson D, Lee NH, Muskavitch MA, Raikhel AS, Atkinson PW. Science PMID: 20929810 Culex quinquefasciatus (the southern house mosquito) is an important mosquito vector of viruses such as West Nile virus and St. Louis encephalitis virus, as well as of nematodes that cause lymphatic filariasis. C. quinquefasciatus is one species within the Culex pipiens species complex and can be found throughout tropical and temperate climates of the world. The ability of C. quinquefasciatus to take blood meals from birds, livestock, and humans contributes to its ability to vector pathogens between species. Here, we describe the genomic sequence of C. quinquefasciatus: Its repertoire of 18,883 protein-coding genes is 22% larger than that of Aedes aegypti and 52% larger than that of Anopheles gambiae with multiple gene-family expansions, including olfactory and gustatory receptors, salivary gland genes, and genes associated with xenobiotic detoxification.
Sequence-structure-function relations of the mosquito leucine-rich repeat immune proteinsWaterhouse RM, Povelones M, Christophides GK. BMC Genomics PMID: 20920294 Background
The discovery and characterisation of factors governing innate immune responses in insects has driven the elucidation of many immune system components in mammals and other organisms. Focusing on the immune system responses of the malaria mosquito, Anopheles gambiae, has uncovered an array of components and mechanisms involved in defence against pathogen infections. Two of these immune factors are LRIM1 and APL1C, which are leucine-rich repeat (LRR) containing proteins that activate complement-like defence responses against malaria parasites. In addition to their LRR domains, these leucine-rich repeat immune (LRIM) proteins share several structural features including signal peptides, patterns of cysteine residues, and coiled-coil domains.
Results
The identification and characterisation of genes related to LRIM1 and APL1C revealed putatively novel innate immune factors and furthered the understanding of their likely molecular functions. Genomic scans using the shared features of LRIM1 and APL1C identified more than 20 LRIM-like genes exhibiting all or most of their sequence features in each of three disease-vector mosquitoes with sequenced genomes: An. gambiae, Aedes aegypti, and Culex quinquefasciatus. Comparative sequence analyses revealed that this family of mosquito LRIM-like genes is characterised by a variable number of 6 to 14 LRRs of different lengths. The "Long" LRIM subfamily, with 10 or more LRRs, and the "Short" LRIMs, with 6 or 7 LRRs, also share the signal peptide, cysteine residue patterning, and coiled-coil sequence features of LRIM1 and APL1C. The "TM" LRIMs have a predicted C-terminal transmembrane region, and the "Coil-less" LRIMs exhibit the characteristic LRIM sequence signatures but lack the C-terminal coiled-coil domains.
Conclusions
The evolutionary plasticity of the LRIM LRR domains may provide templates for diverse recognition properties, while their coiled-coil domains could be involved in the formation of LRIM protein complexes or mediate interactions with other immune proteins. The conserved LRIM cysteine residue patterns are likely to be important for structural fold stability and the formation of protein complexes. These sequence-structure-function relations of mosquito LRIMs will serve to guide the experimental elucidation of their molecular roles in mosquito immunity.
X-ray diffraction analysis of the CMM2 region of the Arabidopsis thaliana Morpheus' molecule 1 protein.Petty TJ, Nishimura T, Emamzadah S, Gabus C, Paszkowski J, Halazonetis TD, Thore S. Acta Crystallogr Sect F Struct Biol Cryst Commun. PMID: 20693667 Of the known epigenetic control regulators found in plants, the Morpheus' molecule 1 (MOM1) protein is atypical in that the deletion of MOM1 does not affect the level of epigenetic marks controlling the transcriptional status of the genome. A short 197-amino-acid fragment of the MOM1 protein sequence can complement MOM1 deletion when coupled to a nuclear localization signal, suggesting that this region contains a functional domain that compensates for the loss of the full-length protein. Numerous constructs centred on the highly conserved MOM1 motif 2 (CMM2) present in these 197 residues have been generated and expressed in Escherichia coli. Following purification and crystallization screening, diamond-shaped single crystals were obtained that diffracted to approximately 3.2 A resolution. They belonged to the trigonal space group P3(1)21 (or P3(2)21), with unit-cell parameters a = 85.64, c = 292.74 A. Structure determination is ongoing.
Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyleKirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E, Gerlach D, Kriventseva EV, Elsik CG, Graur D, Hill CA, Veenstra JA, Walenz B, Tubío JM, Ribeiro JM, Rozas J, Johnston JS, Reese JT, Popadic A, Tojo M, Raoult D, Reed DL, Tomoyasu Y, Krause E, Mittapalli O, Margam VM, Li HM, Meyer JM, Johnson RM, Romero-Severson J, Vanzee JP, Alvarez-Ponce D, Vieira FG, Aguadé M, Guirao-Rico S, Anzola JM, Yoon KS, Strycharz JP, Unger MF, Christley S, Lobo NF, Seufferheld MJ, Wang N, Dasch GA, Struchiner CJ, Madey G, Hannick LI, Bidwell S, Joardar V, Caler E, Shao R, Barker SC, Cameron S, Bruggner RV, Regier A, Johnson J, Viswanathan L, Utterback TR, Sutton GG, Lawson D, Waterhouse RM, Venter JC, Strausberg RL, Berenbaum MR, Collins FH, Zdobnov EM, Pittendrigh BR Proc Natl Acad Sci U S A. 2010 Jun 21. [Epub ahead of print] PMID: 20566863 As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
The Newick Utilities: High-throughput Phylogenetic tree Processing in the UNIX Shell Junier T, Zdobnov EM Bioinformatics. 2010 May 13 PMID: 20472542 Summary: We present a suite of UNIX shell programs for processing any number of phylogenetic trees of any size. They perform frequently-used tree operations without requiring user interaction. They also allow tree drawing as scalable vector graphics (SVG), suitable for high-quality presentations and further editing, and as ASCII graphics for command-line inspection. As an example we include an implementation of bootscanning, a procedure for finding recombination breakpoints in viral genomes.
Availability: C source code, Python bindings, and executables for various platforms are available from http://cegg.unige.ch/newick_utils. The distribution includes a manual and example data. The package is distributed under the BSD License.
Rhinovirus Genome Evolution during Experimental Human InfectionCordey S, Junier T, Gerlach D, Gobbini F, Farinelli L, Zdobnov EM, Winther B, Tapparel C, Kaiser L PLoS One. 2010 May 11;5(5):e10588 PMID: 20485673 Human rhinoviruses (HRVs) evolve rapidly due in part to their error-prone RNA polymerase. Knowledge of the diversity of HRV populations emerging during the course of a natural infection is essential and represents a basis for the design of future potential vaccines and antiviral drugs. To evaluate HRV evolution in humans, nasal wash samples were collected daily for five days from 15 immunocompetent volunteers experimentally infected with a reference stock of HRV-39. In parallel, HeLa-OH cells were inoculated to compare HRV evolution in vitro. Nasal wash in vivo assessed by real-time PCR showed a viral load that peaked at 48-72 h. Ultra-deep sequencing was used to compare the low-frequency mutation populations present in the HRV-39 inoculum in two human subjects and one HeLa-OH supernatant collected 5 days post-infection. The analysis revealed hypervariable mutation locations in VP2, VP3, VP1, 2C and 3C genes and conserved regions in VP4, 2A, 2B, 3A, 3B and 3D genes. These results were confirmed by classical sequencing of additional samples, both from inoculated volunteers and independent cell infections, and suggest that HRV inter-host transmission is not associated with a strong bottleneck effect. A specific analysis of the VP1 capsid gene of 15 human cases confirmed the high mutation incidence in this capsid region, but not in the antiviral drug-binding pocket. We could also estimate a mutation frequency in vivo of 3.4x10(-4) mutations/nucleotides and 3.1x10(-4) over the entire ORF and VP1 gene, respectively. In vivo, HRV generate new variants rapidly during the course of an acute infection due to mutations that accumulate in hot spot regions located at the capsid level, as well as in 2C and 3C genes.
A Teratocarcinoma-Like Human Embryonic Stem Cell (hESC) Line and Four hESC Lines Reveal Potentially Oncogenic Genomic ChangesHovatta O, Jaconi M, Töhönen V, Béna F, Gimelli S, Bosman A, Holm F, Wyder S, Zdobnov EM, Irion O, Andrews PW, Antonarakis SE, Zucchelli M, Kere J, Feki A PLoS ONE 5(4): e10263 PMID: 20428235 The first Swiss human embryonic stem cell (hESC) line, CH-ES1, has shown features of a malignant cell line. It originated from the only single blastomere that survived cryopreservation of an embryo, and it more closely resembles teratocarcinoma lines than other hESC lines with respect to its abnormal karyotype and its formation of invasive tumors when injected into SCID mice. The aim of this study was to characterize the molecular basis of the oncogenicity of CH-ES1 cells, we looked for abnormal chromosomal copy number (by array Comparative Genomic Hybridization, aCGH) and single nucleotide polymorphisms (SNPs). To see how unique these changes were, we compared these results to data collected from the 2102Ep teratocarcinoma line and four hESC lines (H1, HS293, HS401 and SIVF-02) which displayed normal G-banding result. We identified genomic gains and losses in CH-ES1, including gains in areas containing several oncogenes. These features are similar to those observed in teratocarcinomas, and this explains the high malignancy. The CH-ES1 line was trisomic for chromosomes 1, 9, 12, 17, 19, 20 and X. Also the karyotypically (based on G-banding) normal hESC lines were also found to have several genomic changes that involved genes with known roles in cancer. The largest changes were found in the H1 line at passage number 56, when large 5 Mb duplications in chromosomes 1q32.2 and 22q12.2 were detected, but the losses and gains were seen already at passage 22. These changes found in the other lines highlight the importance of assessing the acquisition of genetic changes by hESCs before their use in regenerative medicine applications. They also point to the possibility that the acquisition of genetic changes by ESCs in culture may be used to explore certain aspects of the mechanisms regulating oncogenesis.
A caspase-like decoy molecule enhances the activity of a paralogous caspase in the yellow fever mosquito, Aedes aegypti.Bryant B, Ungerer MC, Liu Q, Waterhouse RM, Clem RJ. Insect Biochem Mol Biol. PMID: 20417712 Caspases are cysteine proteases that play critical roles in apoptosis and other key cellular processes. A mechanism of caspase regulation that has been described in mammals and nematodes involves caspase-like decoy molecules, enzymatically inactive caspase homologs that have arisen by gene duplication and acquired the ability to regulate other caspases. Caspase-like decoy molecules are not found in Drosophila melanogaster, raising the question of whether this type of caspase regulation exists in insects. Phylogenomic analysis of caspase genes from twelve Drosophila and three mosquito species revealed several examples of duplicated caspase homologs lacking critical catalytic residues, making them candidate caspase-like decoy molecules. One of these, CASPS18 from the mosquito Aedes aegypti, is a homolog of the D. melanogaster caspase Decay and contains substitutions in two critical amino acid positions, including the catalytic cysteine residue. As expected, CASPS18 lacked caspase activity, but co-expression of CASPS18 with a paralogous caspase, CASPS19, in mosquito cells or co-incubation of CASPS18 and CASPS19 recombinant proteins resulted in greatly enhanced CASPS19 activity. The discovery of potential caspase-like decoy molecules in several insect species opens new avenues for investigating caspase regulation in insects, particularly in disease vectors such as mosquitoes.
Functional Characterization of Transcription Factor Motifs Using Cross-species Comparison across Large Evolutionary DistancesKim J, Cunningham R, James B, Wyder S, Gibson JD, Niehuis O, Zdobnov EM, Robertson HM, Robinson GE, Werren JH, Sinha S PLoS Computational Biology 6(1):e1000652 PMID: 20126523 Abstract
We address the problem of finding statistically significant associations between cis-regulatory motifs and functional gene sets, in order to understand the biological roles of transcription factors. We develop a computational framework for this task, whose features include a new statistical score for motif scanning, the use of different scores for predicting targets of different motifs, and new ways to deal with redundancies among significant motif–function associations. This framework is applied to the recently sequenced genome of the jewel wasp, Nasonia vitripennis, making use of the existing knowledge of motifs and gene annotations in another insect genome, that of the fruitfly. The framework uses cross-species comparison to improve the specificity of its predictions, and does so without relying upon non-coding sequence alignment. It is therefore well suited for comparative genomics across large evolutionary divergences, where existing alignment-based methods are not applicable. We also apply the framework to find motifs associated with socially regulated gene sets in the honeybee, Apis mellifera, using comparisons with Nasonia, a solitary species, to identify honeybee-specific associations.
Author Summary
We develop a computational pipeline for predicting the functions of transcription factor motifs, through DNA sequence analysis. The pipeline is applied to the newly sequenced genome of the jewel wasp, Nasonia vitripennis. It exploits the wealth of molecular data available in another insect species, the fruitfly Drosophila melanogaster, and uses cross-species comparison to its advantage. Our main contribution is to show how this can be done despite the large evolutionary divergence between the two species. The methodology presented here may be applied more generally to other scenarios (genomes) where comparative regulatory genomics must deal with large evolutionary divergences.
Sociality is linked to rates of protein evolution in a highly social insect Hunt BG, Wyder S, Elango N, Werren JH, Zdobnov EM, Yi SY, Goodisman MAD Journal of Molecular Biology and Evolution 27(3):497-500 PMID: 20110264 Eusocial insects exhibit unparalleled levels of cooperation and dominate terrestrial ecosystems. The success of eusocial insects stems from the presence of specialized castes that undertake distinct tasks. We investigated whether the evolutionary transition to societies with discrete castes was associated with changes in protein evolution. We predicted that proteins with caste-biased gene expression would evolve rapidly due to reduced antagonistic pleiotropy. We found that queen-biased proteins of the honeybee Apis mellifera did indeed evolve rapidly, as predicted. However, worker-biased proteins exhibited slower evolutionary rates than queen-biased or non-biased proteins. We suggest that distinct selective pressures operating on caste-biased genes, rather than a general reduction in pleiotropy, explain the observed differences in evolutionary rates. Our study highlights, for the first time, the interaction between highly social behavior and dynamics of protein evolution.
Functional and evolutionary insights from the genomes of three parasitoid Nasonia speciesThe Nasonia Genome Working Group (incl. Junier T, Gerlach D, Waterhouse RM, Kriventseva EV, Wyder S, Zdobnov EM) Science. 2010 Jan 15;327(5963):343-8. PMID: 20075255 We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
Discovery of Plasmodium modulators by genome-wide analysis of circulating hemocytes in Anopheles gambiae.Pinto SB, Lombardo F, Koutsos AC, Waterhouse RM, McKay K, An C, Ramakrishnan C, Kafatos FC, Michel K. Proc Natl Acad Sci U S A. PMID: 19940242 Insect hemocytes mediate important cellular immune responses including phagocytosis and encapsulation and also secrete immune factors such as opsonins, melanization factors, and antimicrobial peptides. However, the molecular composition of these important immune cells has not been elucidated in depth, because of their scarcity in the circulating hemolymph, their adhesion to multiple tissues and the lack of primary culture methods to produce sufficient material for a genome-wide analysis. In this study, we report a genome-wide molecular characterization of circulating hemocytes collected from the hemolymph of adult female Anopheles gambiae mosquitoes-the major mosquito vector of human malaria in subSaharan Africa. Their molecular profile identified 1,485 transcripts with enriched expression in these cells, and many of these genes belong to innate immune gene families. This hemocyte-specific transcriptome is compared to those of Drosophila melanogaster and two other mosquitoes, Aedes aegypti and Armigeres subalbatus. We report the identification of two genes as ubiquitous hemocyte markers and several others as hemocyte subpopulation markers. We assess, via an RNAi screen, the roles in development of Plasmodium berghei of 63 genes expressed in hemocytes and provide a molecular comparison of the transcriptome of these cells during malaria infection.
Cyclic olefin homopolymer-based microfluidics for protein crystallization and in situ X-ray diffraction.Emamzadah S, Petty TJ, De Almeida V, Nishimura T, Joly J, Ferrer JL, Halazonetis TD Acta Crystallogr D Biol Crystallogr. 2009 Sep;65(Pt 9):913-20. PMID: 19690369 Microfluidics is a promising technology for the rapid identification of protein crystallization conditions. However, most of the existing systems utilize silicone elastomers as the chip material which, despite its many benefits, is highly permeable to water vapour. This limits the time available for protein crystallization to less than a week. Here, the use of a cyclic olefin homopolymer-based microfluidics system for protein crystallization and in situ X-ray diffraction is described. Liquid handling in this system is performed in 2 mm thin transparent cards which contain 500 chambers, each with a volume of 320 nl. Microbatch, vapour-diffusion and free-interface diffusion protocols for protein crystallization were implemented and crystals were obtained of a number of proteins, including chicken lysozyme, bovine trypsin, a human p53 protein containing both the DNA-binding and oligomerization domains bound to DNA and a functionally important domain of Arabidopsis Morpheus' molecule 1 (MOM1). The latter two polypeptides have not been crystallized previously. For X-ray diffraction analysis, either the cards were opened to allow mounting of the crystals on loops or the crystals were exposed to X-rays in situ. For lysozyme, an entire X-ray diffraction data set at 1.5 A resolution was collected without removing the crystal from the card. Thus, cyclic olefin homopolymer-based microfluidics systems have the potential to further automate protein crystallization and structural genomics efforts.
Identification of the active form of endothelial lipase, a homodimer in a head-to-tail conformation.Griffon N, Jin W, Petty TJ, Millar J, Badellino KO, Saven JG, Marchadier DH, Kempner ES, Billheimer J, Glick JM, Rader DJ J Biol Chem. 2009 Aug 28;284(35):23322-30. PMID: 19567873 Endothelial lipase (EL) is a member of a subfamily of lipases that act on triglycerides and phospholipids in plasma lipoproteins, which also includes lipoprotein lipase and hepatic lipase. EL has a tropism for high density lipoprotein, and its level of phospholipase activity is similar to its level of triglyceride lipase activity. Inhibition or loss-of-function of EL in mice results in an increase in high density lipoprotein cholesterol, making it a potential therapeutic target. Although hepatic lipase and lipoprotein lipase have been shown to function as homodimers, the active form of EL is not known. In these studies, the size and conformation of the active form of EL were determined. Immunoprecipitation experiments suggested oligomerization. Ultracentrifugation experiments showed that the active form of EL had a molecular weight higher than the molecular weight of a simple monomer but less than a dimer. A construct encoding a covalent head-to-tail homodimer of EL (EL-EL) was expressed and had similar lipolytic activity to EL. The functional molecular weights determined by radiation inactivation were similar for EL and the covalent homodimer EL-EL. We previously showed that EL could be cleaved by proprotein convertases, such as PC5, resulting in loss of activity. In cells overexpressing PC5, the covalent homodimeric EL-EL appeared to be more stable, with reduced cleavage and conserved lipolytic activity. A comparative model obtained using other lipase structures suggests a structure for the head-to-tail EL homodimer that is consistent with the experimental findings. These data confirm the hypothesis that EL is active as a homodimer in head-to-tail conformation.
Integration of microRNA miR-122 in hepatic circadian gene expressionGatfield D, Le Martelot G, Vejnar CE, Gerlach D, Schaad O, Fleury-Olela F, Ruskeepää AL, Oresic M, Esau CC, Zdobnov EM, Schibler U Genes Dev. 2009 June 1;23(11):1313-1326 PMID: 19487572 In liver, most metabolic pathways are under circadian control, and hundreds of protein-encoding genes are thus transcribed in a cyclic fashion. Here we show that rhythmic transcription extends to the locus specifying miR-122, a highly abundant, hepatocyte-specific microRNA. Genetic loss-of-function and gain-of-function experiments have identified the orphan nuclear receptor REV-ERBα as the major circadian regulator of mir-122 transcription. Although due to its long half-life mature miR-122 accumulates at nearly constant rates throughout the day, this miRNA is tightly associated with control mechanisms governing circadian gene expression. Thus, the knockdown of miR-122 expression via an antisense oligonucleotide (ASO) strategy resulted in the up- and down-regulation of hundreds of mRNAs, of which a disproportionately high fraction accumulates in a circadian fashion. miR-122 has previously been linked to the regulation of cholesterol and lipid metabolism. The transcripts associated with these pathways indeed show the strongest time point-specific changes upon miR-122 depletion. The identification of Pparβ/δ and the peroxisome proliferator-activated receptor α (PPARα) coactivator Smarcd1/Baf60a as novel miR-122 targets suggests an involvement of the circadian metabolic regulators of the PPAR family in miR-122-mediated metabolic control.
The impact of transmission clusters on primary drug resistance in newly diagnosed HIV-1 infectionYerly S, Junier T, Gayet-Ageron A, Amari EB, von Wyl V, Günthard HF, Hirschel B, Zdobnov EM, Kaiser L, and the Swiss HIV Cohort Study. AIDS. 2009 May 29. [Epub ahead of print] PMID: 19487906 OBJECTIVES::To monitor HIV-1 transmitted drug resistance (TDR) in a well defined urban area with large access to antiretroviral therapy and to assess the potential source of infection of newly diagnosed HIV individuals. METHODS:: All individuals resident in Geneva, Switzerland, with a newly diagnosed HIV infection between 2000 and 2008 were screened for HIV resistance. An infection was considered as recent when the positive test followed a negative screening test within less than 1 year. Phylogenetic analyses were performed by using the maximum likelihood method on pol sequences including 1058 individuals with chronic infection living in Geneva.
RESULTS:: Of 637 individuals with newly diagnosed HIV infection, 20% had a recent infection. Mutations associated with resistance to at least one drug class were detected in 8.5% [nucleoside reverse transcriptase inhibitors (NRTIs), 6.3%; non-nucleoside reverse transcriptase inhibitors (NNRTIs), 3.5%; protease inhibitors, 1.9%]. TDR (P-trend = 0.015) and, in particular, NNRTI resistance (P = 0.002) increased from 2000 to 2008. Phylogenetic analyses revealed that 34.9% of newly diagnosed individuals, and 52.7% of those with recent infection were linked to transmission clusters. Clusters were more frequent in individuals with TDR than in those with sensitive strains (59.3 vs. 32.6%, respectively; P < 0.0001). Moreover, 84% of newly diagnosed individuals with TDR were part of clusters composed of only newly diagnosed individuals.
CONCLUSION:: Reconstruction of the HIV transmission networks using phylogenetic analysis shows that newly diagnosed HIV infections are a significant source of onward transmission, particularly of resistant strains, thus suggesting an important self-fueling mechanism for TDR.
The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and EvolutionThe Bovine Genome Sequencing and Analysis Consortium (incl. Gerlach D, Junier T, Kriventseva EV, Zdobnov EM) Science. 2009 Apr 24;324(5926):522-528 PMID: 19390049 To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
The bovine lactation genome: insights into the evolution of mammalian milkLemay DG, Lynn DJ, Martin WF, Neville MC, Casey TM, Rincon G, Kriventseva EV, Barris WC, Hinrichs AS, Molenaar AJ, Pollard KS, Maqbool NJ, Singh K, Murney R, Zdobnov EM, Tellam RL, Medrano JF, German JB, Rijnkels M. Genome Biol. 2009;10(4):R43. Epub 2009 Apr 24. PMID: 19393040 BACKGROUND: The newly assembled Bos taurus genome sequence enables the linkage of bovine milk and lactation data with other mammalian genomes. RESULTS: Using publicly available milk proteome data and mammary expressed sequence tags, 197 milk protein genes and over 6,000 mammary genes were identified in the bovine genome. Intersection of these genes with 238 milk production quantitative trait loci curated from the literature decreased the search space for milk trait effectors by more than an order of magnitude. Genome location analysis revealed a tendency for milk protein genes to be clustered with other mammary genes. Using the genomes of a monotreme (platypus), a marsupial (opossum), and five placental mammals (bovine, human, dog, mice, rat), gene loss and duplication, phylogeny, sequence conservation, and evolution were examined. Compared with other genes in the bovine genome, milk and mammary genes are: more likely to be present in all mammals; more likely to be duplicated in therians; more highly conserved across Mammalia; and evolving more slowly along the bovine lineage. The most divergent proteins in milk were associated with nutritional and immunological components of milk, whereas highly conserved proteins were associated with secretory processes. CONCLUSIONS: Although both copy number and sequence variation contribute to the diversity of milk protein composition across species, our results suggest that this diversity is primarily due to other mechanisms. Our findings support the essentiality of milk to the survival of mammalian neonates and the establishment of milk secretory mechanisms more than 160 million years ago.
New respiratory enterovirus genotype and rhinovirus strains identified by genotyping circulating picornavirusesTapparel C, Junier T, Gerlach D, Van Belle S, Turin L, Cordey S, Muehlemann K, Regamey N, Aubert JD, Soccal PM, Eigenmann P, Zdobnov EM, Kaiser L Emerg Infect Dis. 2009 May;15(5):719-726 PMID: 19402957 Rhinoviruses and enteroviruses are leading causes of respiratory infections. To evaluate genotypic diversity and identify forces shaping picornavirus evolution, we screened persons with respiratory illnesses by using rhinovirus-specific or generic real-time PCR assays. We then sequenced the 5 untranslated region, capsid protein VP1, and protease precursor 3CD regions of virus-positive samples. Subsequent phylogenetic analysis identified the large genotypic diversity of rhinoviruses circulating in humans. We identified and completed the genome sequence of a new enterovirus genotype associated with respiratory symptoms and acute otitis media, confirming the close relationship between rhinoviruses and enteroviruses and the need to detect both viruses in respiratory specimens. Finally, we identified recombinants among circulating rhinoviruses and mapped their recombination sites, thereby demonstrating that rhinoviruses can recombine in their natural host. This study clarifies the diversity and explains the reasons for evolution of these viruses.
Leucine-Rich Repeat Protein Complex Activates Mosquito Complement in Defense Against Plasmodium ParasitesPovelones M, Waterhouse RM, Kafatos FC, Christophides GK Science PMID: 19264986 Leucine-rich repeat–containing proteins are central to host defense in plants and animals. We show that in the mosquito Anopheles gambiae, two such proteins that antagonize malaria parasite infections, LRIM1 and APL1C, circulate in the hemolymph as a high-molecular-weight complex held together by disulfide bridges. The complex interacts with the complement C3-like protein, TEP1, promoting its cleavage or stabilization and its subsequent localization on the surface of midgut-invading Plasmodium berghei parasites, targeting them for destruction. LRIM1 and APL1C are members of a protein family with orthologs in other disease vector mosquitoes and appear to be important effectors in innate mosquito defenses against human pathogens.
Community analysis of betaproteobacterial ammonia-oxidizing bacteria using the amoCAB operon.Junier P, Kim OS, Junier T, Ahn TS, Imhoff JS, and Witzel KP Applied Microbiology and Biotechnology PMID: 19274459 The genes and intergenic regions of the amoCAB operon were analyzed to establish their potential as molecular markers for analyzing ammonia-oxidizing betaproteobacterial (beta-AOB) communities. Initially, sequence similarity for related taxa, evolutionary rates from linear regressions, and the presence of conserved and variable regions were analyzed for all available sequences of the complete amoCAB operon. The gene amoB showed the highest sequence variability of the three amo genes, suggesting that it might be a better molecular marker than the most frequently used amoA to resolve closely related AOB species. To test the suitability of using the amoCAB genes for community studies, a strategy involving nested PCR was employed. Primers to amplify the whole amoCAB operon and each individual gene were tested. The specificity of the products generated was analyzed by denaturing gradient gel electrophoresis, cloning, and sequencing. The fragments obtained showed different grades of sequence identity to amoCAB sequences in the GenBank database. The nested PCR approach provides a possibility to increase the sensitivity of detection of amo genes in samples with low abundance of AOB. It also allows the amplification of the almost complete amoA gene, with about 300 bp more sequence information than the previous approaches. The coupled study of all three amo genes and the intergenic spacer regions that are under different selection pressure might allow a more detailed analysis of the evolutionary processes, which are responsible for the differentiation of AOB communities in different habitats.
Expression profiles of Urbilaterian genes uniquely shared between honey bee and vertebratesMatsui T, Yamamoto T, Wyder S, Zdobnov EM, Kadowaki T BMC Genomics 2009, 10:17 PMID: 19138430
Background
Large-scale comparison of metazoan genomes has revealed that a significant fraction of genes of the last common ancestor of Bilateria (Urbilateria) is lost in each animal lineage. This event could be one of the underlying mechanisms involved in generating metazoan diversity. However, the present functions of these ancient genes have not been addressed extensively. To understand the functions and evolutionary mechanisms of such ancient Urbilaterian genes, we carried out comprehensive expression profile analysis of genes shared between vertebrates and honey bees but not with the other sequenced ecdysozoan genomes (honey bee-vertebrate specific, HVS genes) as a model.
Results
We identified 30 honey bee and 55 mouse HVS genes. Many HVS genes exhibited tissue-selective expression patterns; intriguingly, the expression of 60% of honey bee HVS genes was found to be brain enriched, and 24% of mouse HVS genes were highly expressed in either or both the brain and testis. Moreover, a minimum of 38% of mouse HVS genes demonstrated neuron-enriched expression patterns, and 62% of them exhibited expression in selective brain areas, particularly the forebrain and cerebellum. Furthermore, gene ontology (GO) analysis of HVS genes predicted that 35% of genes are associated with DNA transcription and RNA processing.
Conclusions
These results suggest that HVS genes include genes that are biased towards expression in the brain and gonads. They also demonstrate that at least some of Urbilaterian genes retained in the specific animal lineage may be selectively maintained to support the species-specific phenotypes.
Sertoli cell Dicer is essential for spermatogenesis in micePapaioannou MD, Pitetti JL, Ro S, Park C, Aubry F, Schaad O, Vejnar CE, Kühne F, Descombes P, Zdobnov EM, McManus MT, Guillou F, Harfe BD, Yan W, Jégou B, Nef S Dev Biol. 2008 Nov 28. Epub ahead of print PMID: 19071104 Spermatogenesis requires intact, fully competent Sertoli cells. Here, we investigate the functions of Dicer, an RNaseIII endonuclease required for microRNA and small interfering RNA biogenesis, in mouse Sertoli cell function. We show that selective ablation of Dicer in Sertoli cells leads to infertility due to complete absence of spermatozoa and progressive testicular degeneration. The first morphological alterations appear already at postnatal day 5 and correlate with a severe impairment of the prepubertal spermatogenic wave, due to defective Sertoli cell maturation and incapacity to properly support meiosis and spermiogenesis. Importantly, we find several key genes known to be essential for Sertoli cell function to be significantly down-regulated in neonatal testes lacking Dicer in Sertoli cells. Overall, our results reveal novel essential roles played by the Dicer-dependent pathway in mammalian reproductive function, and thus pave the way for new insights into human infertility.
Composition of diazotrophic bacterial assemblages in bean-planted soil compared to unplanted soil Junier P, Junier T, Witzel KP, Carú M European Journal of Soil Biology The effect of common bean (Phaseolus vulgaris L.) on the composition of nitrogen fixing bacterial assemblages in soil was studied by comparing planted and unplanted soil. The community composition was studied by terminal restriction fragment length polymorphism (T-RFLP) of the nitrogenase reductase gene (nifH). Principal component analysis (PCA) of T-RFLP profiles showed the separation of profiles from planted and unplanted soil. Terminal restriction fragments (T-RFs) corresponding to rhizobial bacteria were identified preferentially in planted soil; however most nifH T-RFs in soil could not be assigned to T-RFs simulated from a database of known diazotrophs. To specifically study rhizobial bacteria in the soil and nodules, PCR products from the alpha subunit of the nitrogenase enzyme (nifD) were analyzed by denaturing gradient gel electrophoresis (DGGE). DGGE results showed the specific stimulation of the rhizobial microsymbionts in planted soil.
miROrtho: computational survey of microRNA genesGerlach D, Kriventseva EV, Rahman N, Vejnar CE, Zdobnov EM Nucleic Acids Res. 2009 Jan;37(Database issue):D111-D117. Epub 2008 Oct 15 PMID: 18927110 MicroRNAs (miRNAs) are short, non-protein coding RNAs that direct the widespread phenomenon of post-transcriptional regulation of metazoan genes. The mature approximately 22-nt long RNA molecules are processed from genome-encoded stem-loop structured precursor genes. Hundreds of such genes have been experimentally validated in vertebrate genomes, yet their discovery remains challenging, and substantially higher numbers have been estimated. The miROrtho database (http://cegg.unige.ch/mirortho)presents the results of a comprehensive computational survey of miRNA gene candidates across the majority of sequenced metazoan genomes. We designed and applied a three-tier analysis pipeline: (i) an SVM-based ab initio screen for potent hairpins, plus homologs of known miRNAs, (ii) an orthology delineation procedure and (iii) an SVM-based classifier of the ortholog multiple sequence alignments. The web interface provides direct access to putative miRNA annotations, ortholog multiple alignments, RNA secondary structure conservation, and sequence data. The miROrtho data are conceptually complementary to the miRBase catalog of experimentally verified miRNA sequences, providing a consistent comparative genomics perspective as well as identifying many novel miRNA genes with strong evolutionary support.
TRiFLe, a Program for In Silico Terminal Restriction Fragment Length Polymorphism Analysis with User-Defined Sequence SetsJunier P, Junier T, Witzel KP Applied and Environmental Microbiology PMID: 18757578 We describe TRiFLe, a freely accessible computer program that generates theoretical terminal restriction fragments (T-RFs) from any user-supplied sequence set tailored to a particular group of organisms, sequences from clone libraries, or sequences from specific genes. The program allows a rapid identification of the most polymorphic enzymes, creates a collection of T-RFs for the data set, and can potentially identify specific T-RFs in T-RF length polymorphism (T-RFLP) patterns by comparing theoretical and experimental results. TRiFLE was used for analyzing T-RFLP data generated for the amoA and pmoA genes. The peaks identified in the T-RFLP patterns show an overlap of ammonia- and methane-oxidizing bacteria in the metalimnion of a subtropical lake.
Experience using web services for biological sequence analysisStockinger H, Attwood T, Chohan S, Côté R, Cudré-Mauroux P, Falquet L, Fernandes P, Finn R, Hupponen T, Korpelainen E, Labarga A, Laugraud A, Lima T, Pafilis E, Pagni M, Pettifer S, Phan I, Rahman N Briefings in Bioinformatics PMID: 18621748 Programmatic access to data and tools through the web using so-called web services has an important role to play in bioinformatics. In this article, we discuss the most popular approaches based on SOAP/WS-I and REST and describe our, a cross section of the community, experiences with providing and using web services in the context of biological sequence analysis. We briefly review main technological approaches as well as best practice hints that are useful for both users and developers. Finally, syntactic and semantic data integration issues with multiple web services are discussed.
The cis-acting replication elements define human enterovirus and rhinovirus speciesCordey S, Gerlach D, Junier T, Zdobnov EM, Kaiser L, Tapparel C. RNA. 2008 Aug;14(8):1568-78 PMID: 18541697 Replication of picornaviruses is dependent on VPg uridylylation, which is linked to the presence of the internal cis-acting replication element (cre). Cre are located within the sequence encoding polyprotein, yet at distinct positions as demonstrated for poliovirus and coxsackievirus-B3, cardiovirus, and human rhinovirus (HRV-A and HRV-B), overlapping proteins 2C, VP2, 2A, and VP1, respectively. Here we report a novel distinct cre element located in the VP2 region of the recently reported HRV-A2 species and provide evolutionary evidence of its functionality. We also experimentally interrogated functionality of recently identified HRV-B cre in the 2C region that is orthologous to the human enterovirus (HEV) cre and show that it is dispensable for replication and appears to be a nonfunctional evolutionary relic. In addition, our mutational analysis highlights two amino acids in the 2C protein that are crucial for replication. Remarkably, we conclude that each genetic clade of HRV and HEV is characterized by a unique functional cre element, where evolutionary success of a new genetic lineage seems to be associated with an invention of a novel cre motif and decay of the ancestral one. Therefore, we propose that cre element could be considered as an additional criterion for human rhinovirus and enterovirus classification.
Genome-wide search reveals a novel GacA-regulated small RNA in Pseudomonas species.González N, Heeb S, Valverde C, Kay E, Reimmann C, Junier T, Haas D. BMC Genomics PMID: 18405392 BACKGROUND: Small RNAs (sRNAs) are widespread among bacteria and have diverse regulatory roles. Most of these sRNAs have been discovered by a combination of computational and experimental methods. In Pseudomonas aeruginosa, a ubiquitous Gram-negative bacterium and opportunistic human pathogen, the GacS/GacA two-component system positively controls the transcription of two sRNAs (RsmY, RsmZ), which are crucial for the expression of genes involved in virulence. In the biocontrol bacterium Pseudomonas fluorescens CHA0, three GacA-controlled sRNAs (RsmX, RsmY, RsmZ) regulate the response to oxidative stress and the expression of extracellular products including biocontrol factors. RsmX, RsmY and RsmZ contain multiple unpaired GGA motifs and control the expression of target mRNAs at the translational level, by sequestration of translational repressor proteins of the RsmA family. RESULTS: A combined computational and experimental approach enabled us to identify 14 intergenic regions encoding sRNAs in P. aeruginosa. Eight of these regions encode newly identified sRNAs. The intergenic region 1698 was found to specify a novel GacA-controlled sRNA termed RgsA. GacA regulation appeared to be indirect. In P. fluorescens CHA0, an RgsA homolog was also expressed under positive GacA control. This 120-nt sRNA contained a single GGA motif and, unlike RsmX, RsmY and RsmZ, was unable to derepress translation of the hcnA gene (involved in the biosynthesis of the biocontrol factor hydrogen cyanide), but contributed to the bacterium's resistance to hydrogen peroxide. In both P. aeruginosa and P. fluorescens the stress sigma factor RpoS was essential for RgsA expression. CONCLUSION: The discovery of an additional sRNA expressed under GacA control in two Pseudomonas species highlights the complexity of this global regulatory system and suggests that the mode of action of GacA control may be more elaborate than previously suspected. Our results also confirm that several GGA motifs are required in an sRNA for sequestration of the RsmA protein.
Comparative in silico analysis of PCR primers suited for diagnostics and cloning of ammonia monooxygenase genes from ammonia-oxiJunier P, Kim OS, Molina V, Limburg P, Junier T, Imhoff JF, Witzel KP. FEMS Microbiology Ecology Over recent years, several PCR primers have been described to amplify genes encoding the structural subunits of ammonia monooxygenase (AMO) from ammonia-oxidizing bacteria (AOB). Most of them target amoA, while amoB and amoC have been neglected so far. This study compared the nucleotide sequence of 33 primers that have been used to amplify different regions of the amoCAB operon with alignments of all available sequences in public databases. The advantages and disadvantages of these primers are discussed based on the original description and the spectrum of matching sequences obtained. Additionally, new primers to amplify the almost complete amoCAB operon of AOB belonging to Betaproteobacteria (betaproteobacterial AOB), a primer pair for DGGE analysis of amoA and specific primers for gammaproteobacterial AOB, are also described. The specificity of these new primers was also evaluated using the databases of the sequences created during this study.
The genome of the model beetle and pest Tribolium castaneum.Tribolium Genome Sequencing Consortium; Project leader, Richards S; Principal investigators, Gibbs RA, Weinstock GM; White paper, Brown SJ, Denell R, Beeman RW, Gibbs R; Analysis leaders, Beeman RW, Brown SJ, Bucher G, Friedrich M, Grimmelikhuijzen CJ, Klingler M, Lorenzen M, Richards S, Roth S, Schröder R, Tautz D, Zdobnov EM; DNA sequence and global analysis: DNA sequencing, Muzny D, Gibbs RA, Weinstock GM, Attaway T, Bell S, Buhay CJ, Chandrabose MN, Chavez D, Clerk-Blankenburg KP, Cree A, Dao M, Davis C, Chacko J, Dinh H, Dugan-Rocha S, Fowler G, Garner TT, Garnes J, Gnirke A, Hawes A, Hernandez J, Hines S, Holder M, Hume J, Jhangiani SN, Joshi V, Khan ZM, Jackson L, Kovar C, Kowis A, Lee S, Lewis LR, Margolis J, Morgan M, Nazareth LV, Nguyen N, Okwuonu G, Parker D, Richards S, Ruiz SJ, Santibanez J, Savard J, Scherer SE, Schneider B, Sodergren E, Tautz D, Vattahil S, Villasana D, White CS, Wright R; EST sequencing, Park Y, Beeman RW, Lord J, Oppert B, Lorenzen M, Brown S, Wang L, Savard J, Tautz D, Richards S, Weinstock G, Gibbs RA; genome assembly, Liu Y, Worley K, Weinstock G; G+C content, Elsik CG, Reese JT, Elhaik E, Landan G, Graur D; repetitive DNA, transposons and telomeres, Arensburger P, Atkinson P, Beeman RW, Beidler J, Brown SJ, Demuth JP, Drury DW, Du YZ, Fujiwara H, Lorenzen M, Maselli V, Osanai M, Park Y, Robertson HM, Tu Z, Wang JJ, Wang S; gene prediction and consensus gene set, Richards S, Song H, Zhang L, Sodergren E, Werner D, Stanke M, Morgenstern B, Solovyev V, Kosarev P, Brown G, Chen HC, Ermolaeva O, Hlavina W, Kapustin Y, Kiryutin B, Kitts P, Maglott D, Pruitt K, Sapojnikov V, Souvorov A, Mackey AJ, Waterhouse RM, Wyder S, Zdobnov EM; global gene content analysis, Zdobnov EM, Wyder S, Kriventseva EV, Kadowaki T, Bork P; Developmental processes and signalling pathways, Aranda M, Bao R, Beermann A, Berns N, Bolognesi R, Bonneton F, Bopp D, Brown SJ, Bucher G, Butts T, Chaumot A, Denell RE, Ferrier DE, Friedrich M, Gordon CM, Jindra M, Klingler M, Lan Q, Lattorff HM, Laudet V, von Levetsow C, Liu Z, Lutz R, Lynch JA, da Fonseca RN, Posnien N, Reuter R, Roth S, Savard J, Schinko JB, Schmitt C, Schoppmeier M, Schröder R, Shippy TD, Simonnet F, Marques-Souza H, Tautz D, Tomoyasu Y, Trauner J, Van der Zee M, Vervoort M, Wittkopp N, Wimmer EA, Yang X; Pest biology, senses, Medea and RNAi: ligand gated ion channels, Jones AK, Sattelle DB; oxidative phosphorylation, Ebert PR; P450 genes, Nelson D, Scott JG, Beeman RW; chitin and cuticular proteins, Muthukrishnan S, Kramer KJ, Arakane Y, Beeman RW, Zhu Q, Hogenkamp D, Dixit R; digestive proteinases, Oppert B, Jiang H, Zou Z, Marshall J, Elpidina E, Vinokurov K, Oppert C; immunity, Zou Z, Evans J, Lu Z, Zhao P, Sumathipala N, Altincicek B, Vilcinskas A, Williams M, Hultmark D, Hetru C, Jiang H; neurohormones and GPCRs, Grimmelikhuijzen CJ, Hauser F, Cazzamali G, Williamson M, Park Y, Li B, Tanaka Y, Predel R, Neupert S, Schachtner J, Verleyen P; neuropeptide processing enzymes, Raible F, Bork P; opsins, Friedrich M; odorant receptors and gustatory receptors, Walden KK, Robertson HM; odorant binding and chemosensory proteins, Angeli S, Forêt S, Bucher G, Schuetz S, Maleszka R, Wimmer EA; Medea, Beeman RW, Lorenzen M; systemic RNAi, Tomoyasu Y, Miller SC, Grossmann D, Bucher G. Nature. 2008 Mar 23 PMID: 18362917 Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. Systemic RNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control.
The Aedes aegypti genome: a comparative perspectiveWaterhouse RM, Wyder S, Zdobnov EM Insect Mol Biol. 2008 Feb;17(1):1-8 PMID: 18237279 The sequencing of the second mosquito genome, Aedes aegypti, in addition to Anopheles gambiae, is a major milestone that will drive molecular-level and genome-wide high-throughput studies of not only these but also other mosquito vectors of human pathogens. Here we overview the ancestry of the mosquito genes, list the major expansions of gene families that may relate to species adaptation processes, as exemplified by CYP9 cytochrome P450 genes, and discuss the conservation of chromosomal gene arrangements among the two mosquitoes and fruit fly. Many more invertebrate genomes are expected to be sequenced in the near future, including additional vectors of human pathogens (see http://www.vectorbase.org), and further comparative analyses will become increasingly refined and informative, hopefully improving our understanding of the genetic basis of phenotypical differences among these species, their vectorial capacity, and ultimately leading to the development of novel disease control strategies.
OrthoDB: the hierarchical catalog of eukaryotic orthologsKriventseva EV, Rahman N, Espinosa O, Zdobnov EM Nucleic Acids Res. 2008 Jan;36(Database issue):D271-5. Epub 2007 Oct 18 PMID: 17947323 The concept of orthology is widely used to relate genes across different species using comparative genomics, and it provides the basis for inferring gene function. Here we present the web accessible OrthoDB database that catalogs groups of orthologous genes in a hierarchical manner, at each radiation of the species phylogeny, from more general groups to more fine-grained delineations between closely related species. We used a COG-like and Inparanoid-like ortholog delineation procedure on the basis of all-against-all Smith-Waterman sequence comparisons to analyze 58 eukaryotic genomes, focusing on vertebrates, insects and fungi to facilitate further comparative studies. The database is freely available at orthodb.
Protein Folding, Misfolding and Aggregation: Classical Themes and Novel ApproachesLehmann A, Lanci CJ, Petty TJ, Kang SG & Saven JG RSC Biomolecular Sciences Book Series CHAPTER 9
Protein Design: Tailoring Sequence, Structure, and Folding Properties
Protein design algorithms identify protein sequences consistent with a particular fold, and often simultaneously quantify the many subtle, non-covalent interactions that govern protein folding, stability and function. Efforts in protein design stand to advance our knowledge of protein folding and function and also can identify new proteins with applications to biotechnology, catalysis, and materials research. Here, recent developments in protein design are discussed with a focus on features common to many of the computational design methods. A sampling of studies is presented in which computationally designed proteins have been experimentally realized, exemplifying what may be learned and accomplished with protein design.
Victor Muñoz (Editor)
ISBN: 978-0-85404-257-9
Copyright: 2008
Format: Hardback
Quantification of ortholog losses in insects and vertebratesWyder S, Kriventseva EV, Schroder R, Kadowaki T and Zdobnov EM Genome Biol. 2007 Nov 16;8(11):R242 PMID: 18021399 BackgroundThe increasing number of sequenced insect and vertebrate genomes of variable divergence enables refined comparative analyses to quantify the major modes of animal genome evolution and allows tracing of gene genealogy (orthology) and pinpointing of gene extinctions (losses), which can reveal lineage-specific traits. Results We compared the gene repertoires of 5 vertebrates and 5 insects, including honeybee and Tribolium beetle that represent insect orders outside the previously sequenced Diptera, to consistently quantify losses of orthologous groups of genes. We found hundreds of lost Urbilateria genes in each of the lineages and assessed their phylogenetic origin. The rate of losses correlates well with the species' rates of molecular evolution and radiation times, without distinction between insects and vertebrates, indicating their stochastic nature. Remarkably, this extends to the universal single-copy orthologs, losses of dozens of which have been tolerated in each species. Nevertheless, the propensity for loss differs substantially among genes, where roughly 20% of the orthologs have an 8-fold higher chance of becoming extinct. Extrapolation of our data also suggests that the Urbilateria genome contained more than 7,000 genes.
Conclusions Our results indicate that the seemingly higher number of observed gene losses in insects can be explained by their 2-3 fold higher evolutionary rate. Despite the profound effect of many losses on cellular machinery, overall, they seem to be guided by neutral evolution.
Evolution of genes and genomes on the Drosophila phylogenyDrosophila 12 Genomes Consortium (Zdobnov EM) Nature. 2007 Nov 8;450(7167):203-18. PMID: 17994087 Comparative analysis of multiple genomes in a phylogenetic framework dramatically improves the precision and sensitivity of evolutionary inference, producing more robust results than single-genome analyses can provide. The genomes of 12 Drosophila species, ten of which are presented here for the first time (sechellia, simulans, yakuba, erecta, ananassae, persimilis, willistoni, mojavensis, virilis and grimshawi), illustrate how rates and patterns of sequence divergence across taxa can illuminate evolutionary processes on a genomic scale. These genome sequences augment the formidable genetic tools that have made Drosophila melanogaster a pre-eminent model for animal genetics, and will further catalyse fundamental research on mechanisms of development, cell biology, genetics, disease, neurobiology, behaviour, physiology and evolution. Despite remarkable similarities among these Drosophila species, we identified many putatively non-neutral changes in protein-coding genes, non-coding RNA genes, and cis-regulatory regions. These may prove to underlie differences in the ecology and behaviour of these diverse species.
Biased distributions and decay of long interspersed nuclear elements in the chicken genome.Abrusán G, Krambeck HJ, Junier T, Giordano J, Warburton PE. Genetics PMID: 17947446 The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates.
New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic featuresTapparel C, Junier T, Gerlach D, Cordey S, Van Belle S, Perrin L, Zdobnov EM and Kaiser L BMC Genomics. 2007 Jul 10; 8(1):224 PMID: 17623054 Background
Human rhinoviruses (HRV), the most frequent cause of respiratory infections, include 99 different serotypes segregating into two species, A and B. Rhinoviruses share extensive genomic sequence similarity with enteroviruses and both are part of the picornavirus family. Nevertheless they differ significantly at the phenotypic level. The lack of HRV full-length genome sequences and the absence of analysis comparing picornaviruses at the whole genome level limit our knowledge of the genomic features supporting these differences.
Results
Here we report complete genome sequences of 12 HRV-A and HRV-B serotypes, more than doubling the current number of available HRV sequences. The whole-genome maximum-likelihood phylogenetic analysis suggests that HRV-B and human enteroviruses (HEV) diverged from the last common ancestor after their separation from HRV-A. On the other hand, compared to HEV, HRV-B are more related to HRV-A in the capsid and 3B-C regions. We also identified the presence of a 2C cis-acting replication element (cre) in HRV-B that is not present in HRV-A, and that had been previously characterized only in HEV. In contrast to HEV viruses, HRV-A and HRV-B share also markedly lower GC content along the whole genome length.
Conclusions
Our findings provide basis to speculate about both the biological similarities and the differences (e.g. tissue tropism, temperature adaptation or acid lability) of these three groups of viruses.
Evolutionary dynamics of immune-related genes and pathways in disease vector mosquitoesWaterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, Carolina Barillas-Mury C, Bian G, Blandin S, Bruce M. Christensen BM, Dong Y, Jiang H, Kanost MR, Koutsos AC, Levashina EA, Li J, Ligoxygakis P, MacCallum RM, Mayhew GF, Mendes A, Michel K, Osta MA, Paskewitz S, Shin SW, Vlachou D, Wang L, Wei W, Zheng L, Zou Z, Severson DW, Raikhel AS, Kafatos FC, Dimopoulos G, Zdobnov EM George K. Christophides GK Science. 2007 Jun 22;316(5832):1738-43. PMID: 17588928 Mosquitoes are vectors of parasitic and viral diseases of immense importance for public health.
The genome sequence of the yellow fever and Dengue vector, Aedes aegypti (Aa), has enabled a
comparative phylogenomic analysis of the insect immune repertoire: in Aa, the malaria vector
Anopheles gambiae (Ag) and the fruitfly Drosophila melanogaster (Dm). Analysis of immune
signaling pathways and response modules reveals both conservative and rapidly evolving features
associated with different functional gene categories and particular aspects of immune reactions.
These dynamics reflect in part continuous readjustment between accommodation and rejection of
pathogens and suggest how innate immunity may have evolved.
Life cycle transcriptome of the malaria mosquito Anopheles gambiae and comparison with the fruitfly Drosophila melanogasterKoutsos AC, Blass C, Meister S, Schmidt S, Maccallum RM, Soares MB, Collins FH, Benes V, Zdobnov EM, Kafatos FC, Christophides GK Proc Natl Acad Sci USA. 2007 Jun 11 PMID: 17563388 The African mosquito Anopheles gambiae is the major vector of human malaria. We report a genome-wide survey of mosquito gene expression profiles clustered temporally into developmental programs and spatially into adult tissue-specific patterns. Global expression analysis shows that genes that belong to related functional categories or that encode the same or functionally linked protein domains are associated with characteristic developmental programs or tissue patterns. Comparative analysis of our data together with data published from Drosophila melanogaster reveal an overall strong and positive correlation of developmental expression between orthologous genes. The degree of correlation varies, depending on association of orthologs with certain developmental programs or functional groups. Interestingly, the similarity of gene expression is not correlated with the coding sequence similarity of orthologs, indicating that expression profiles and coding sequences evolve independently. In addition to providing a comprehensive view of temporal and spatial gene expression during the A. gambiae life cycle, this large-scale comparative transcriptomic analysis has detected important evolutionary features of insect transcriptomes.
CsrA of Bacillus subtilis regulates translation initiation of the gene encoding the flagellin protein (hag) by blocking ribosomeYakhnin H, Pandit P, Petty TJ, Baker CS, Romeo T, Babitzke P Mol Microbiol. 2007 Jun;64(6):1605-20. PMID: 17555441 The global regulatory Csr (carbon storage regulator) and the homologous Rsm (repressor of secondary metabolites) systems of Gram-negative bacteria typically consist of an RNA-binding protein (CsrA/RsmA) and at least one sRNA that functions as a CsrA antagonist. CsrA modulates gene expression post-transcriptionally by regulating translation initiation and/or mRNA stability of target transcripts. While Csr has been extensively studied in Gram-negative bacteria, until now Csr has not been characterized in any Gram-positive organism. csrA of Bacillus subtilis is the last gene of a flagellum biosynthetic operon. In addition to the previously identified sigma(D)-dependent promoter that controls expression of the entire operon, a sigma(A)-dependent promoter was identified that temporally controls expression of the last two genes of the operon (fliW-csrA); expression peaks 1 h after cell growth deviates from exponential phase. hag, the gene encoding flagellin, was identified as a CsrA-regulated gene. CsrA was found to repress hag'-'lacZ expression, while overexpression of csrA reduces cell motility. In vitro binding studies identified two CsrA binding sites in the hag leader transcript, one of which overlaps the hag Shine-Dalgarno sequence. Toeprint and cell-free translation studies demonstrate that bound CsrA prevents ribosome binding to the hag transcript, thereby inhibiting translation initiation and Hag synthesis.
Computational and transcriptional evidence for microRNAs in the honey bee genomeWeaver DB, Anzola JM, Evans JD, Reid JG, Reese JT, Childs KL, Zdobnov EM, Samanta MP, Miller J, Elsik CG Genome Biol. 2007 Jun 1;8(6):R97 PMID: 17543122 BACKGROUND: Noncoding microRNAs (miRNAs) are key regulators of gene expression in eukaryotes. Insect miRNAs help regulate the levels of proteins involved with development, metabolism, and other life history traits. The recently sequenced honey bee genome provides an opportunity to detect novel miRNAs in both this species and others, and begin to infer the roles of miRNAs in honey bee development.
RESULTS: Three independent computational surveys of the assembled honey bee genome identified a total of 68 non-redundant candidate miRNAs, several of which appear to have previously unrecognized orthologs in the Drosophila genome. A subset of these candidate miRNAs were screened for expression by qRT-PCR and/or genome tiling arrays and most predicted miRNA's were confirmed as being expressed in at least one honey bee tissue. Interestingly, the transcript abundance for several known and novel miRNAs displayed caste or age-related differences in honey bees. Genes in proximity to miRNAs in the bee genome are disproportionately associated with the GO terms "physiological process", "nucleus" and "response to stress".
CONCLUSIONS: Computational approaches successfully identified miRNAs in the honey bee and indicated previously unrecognized miRNAs in the well-studied Drosophila melanogaster genome despite the 280MYA distance between these insects. Differentially transcribed miRNAs are likely to be involved in regulating honey bee development, and arguably in the extreme developmental switch between sterile worker bees and highly fertile queens.
Genome Sequence of Aedes aegypti, a Major Arbovirus VectorNene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, Loftus B, Xi Z, Megy K, Grabherr M, Ren Q, Zdobnov EM, Lobo NF, Campbell KS, Brown SE, Bonaldo MF, Zhu J, Sinkins SP, Hogenkamp DG, Amedo P, Arsenburger P, Atkinson PW, Bidwell S, Biedler J, Birney E, Bruggner RV, Costas J, Coy MR, Crabtree J, Crawford M, Debruyn B, Decaprio D, Eiglmeier K, Eisenstadt E, El-Dorry H, Gelbart WM, Gomes SL, Hammond M, Hannick LI, Hogan JR, Holmes MH, Jaffe D, Johnston SJ, Kennedy RC, Koo H, Kravitz S, Kriventseva EV, Kulp D, Labutti K, Lee E, Li S, Lovin DD, Mao C, Mauceli E, Menck CF, Miller JR, Montgomery P, Mori A, Nascimento AL, Naveira HF, Nusbaum C, O'leary SB, Orvis J, Pertea M, Quesneville H, Reidenbach KR, Rogers YH, Roth CW, Schneider JR, Schatz M, Shumway M, Stanke M, Stinson EO, Tubio JM, Vanzee JP, Verjovski-Almeida S, Werner D, White O, Wyder S, Zeng Q, Zhao Q, Zhao Y, Hill CA, Raikhel AS, Soares MB, Knudson DL, Lee NH, Galagan J, Salzberg SL, Paulsen IT, Dimopoulos G, Collins FH, Bruce B, Fraser-Liggett CM, Severson DW. Science. 2007 Jun 22;316(5832):1718-23. Epub 2007 May 17. PMID: 17510324 We present a draft sequence of the genome of Aedes aegypti, the primary vector for yellow fever and dengue fever, which at ~1.38 Gbp is ~5-fold larger in size than the genome of the malaria vector, Anopheles gambiae. Nearly 50% of the Aedes aegypti genome consists of transposable elements. These contribute to a ~4-6 fold increase in average gene length and the size of intergenic regions relative to Anopheles gambiae and Drosophila melanogaster. Nevertheless, chromosomal synteny is generally maintained between all three insects although conservation of orthologous gene order is higher (~2-fold) between the mosquito species than between either of them and fruit fly. An increase in genes encoding odorant binding, cytochrome P450 and cuticle domains relative to Anopheles gambiae suggests that members of these protein families underpin some of the biological differences between them.
Deep metazoan phylogenyGerlach D, Wolf M, Dandekar T, Müller T, Pokorny A, Rahmann S In Silico Biology 7, 0015 (2007) PMID: 17688440 We reconstructed a robust phylogenetic tree of the Metazoa, consisting of almost 1,500 taxa, by profile neighbor joining (PNJ), an automated computational method that inherits the efficiency of the neighbor joining algorithm. This tree supports the one proposed in the latest review on metazoan phylogeny. Our main goal is not to discuss aspects of the phylogeny itself, but rather to point out that PNJ can be a valuable tool when the basal branching pattern of a large phylogenetic tree must be estimated, whereas traditional methods would be computationally impractical.
Insights into social insects from the genome of the honeybee Apis melliferaHoneybee Genome Sequencing Consortium Nature. 2006 Oct 26;443(7114):931-49 PMID: 17073008 Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A+T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference and DNA methylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
Quantification of insect genome divergenceZdobnov EM, Bork P Trends Genet. 2007 Jan;23(1):16-20. Epub 2006 Nov 9. PMID: 17097187 The recent sequencing of twelve insect genomes has enabled us to quantify their divergence using synteny conservation and sequence identity of single-copy orthologs. Protein identity correlates well with synteny and is about three times more conserved, an observation consistent with comparisons among vertebrates. The observed distribution of the lengths of synteny blocks follows a power law and differs from the expectations of the currently accepted random breakage model. Our results show that there is only limited selection for conservation of gene order and reveal a few hundred genes, proximity among which seems to be vital.
Effects of a GTP-insensitive mutation of glutamate dehydrogenase on insulin secretion in transgenic mice.Li C, Matter A, Kelly A, Petty TJ, Najafi H, MacMullen C, Daikhin Y, Nissim I, Lazarow A, Kwagh J, Collins HW, Hsu BY, Nissim I, Yudkoff M, Matschinsky FM, Stanley CA J Biol Chem. 2006 Jun 2;281(22):15064-72. PMID: 16574664 Glutamate dehydrogenase (GDH) plays an important role in insulin secretion as evidenced in children by gain of function mutations of this enzyme that cause a hyperinsulinism-hyperammonemia syndrome (GDH-HI) and sensitize beta-cells to leucine stimulation. GDH transgenic mice were generated to express the human GDH-HI H454Y mutation and human wild-type GDH in islets driven by the rat insulin promoter. H454Y transgene expression was confirmed by increased GDH enzyme activity in islets and decreased sensitivity to GTP inhibition. The H454Y GDH transgenic mice had hypoglycemia with normal growth rates. H454Y GDH transgenic islets were more sensitive to leucine- and glutamine-stimulated insulin secretion but had decreased response to glucose stimulation. The fluxes via GDH and glutaminase were measured by tracing 15N flux from [2-15N]glutamine. The H454Y transgene in islets had higher insulin secretion in response to glutamine alone and had 2-fold greater GDH flux. High glucose inhibited both glutaminase and GDH flux, and leucine could not override this inhibition. 15NH4Cl tracing studies showed 15N was not incorporated into glutamate in either H454Y transgenic or normal islets. In conclusion, we generated a GDH-HI disease mouse model that has a hypoglycemia phenotype and confirmed that the mutation of H454Y is disease causing. Stimulation of insulin release by the H454Y GDH mutation or by leucine activation is associated with increased oxidative deamination of glutamate via GDH. This study suggests that GDH functions predominantly in the direction of glutamate oxidation rather than glutamate synthesis in mouse islets and that this flux is tightly controlled by glucose.
Overgrowth caused by misexpression of a microRNA with dispensable wild-type functionNairz K, Rottig C, Rintelen F, Zdobnov EM, Moser M, Hafen E. Dev Biol. 2006 Mar 15;291(2):314-24. Epub 2006 Jan 27 PMID: 16443211 MicroRNAs (miRNAs) represent an abundant class of non-coding RNAs that negatively regulate gene expression, primarily at the post-transcriptional level. miRNA genes are frequently located in proximity to fragile chromosomal sites associated with cancers and amplification of a miRNA cluster has been correlated with the etiology of lymphomas and solid tumors. The oncogenic potential of a miRNA polycistron has recently been demonstrated in vivo. Here, we show that misexpression of the Drosophila miRNA mirvana/mir-278 in the developing eye causes massive overgrowth, in part due to inhibition of apoptosis. A single base substitution affecting the mature miRNA blocks the gain-of-function phenotype but is not associated with a detectable reduction-of-function phenotype when homozygous. This result demonstrates that misexpressed miRNAs may acquire novel functions that cause unscheduled proliferation in vivo and thus exemplifies the potential of miRNAs to promote tumor formation.
AnoEST: toward A. gambiae functional genomicsKriventseva EV, Koutsos AC, Blass C, Kafatos FC, Christophides GK, Zdobnov EM Genome Res. 2005 Jun;15(6):893-9. Epub 2005 May 17 PMID: 15899967 Here, we present an analysis of 215,634 EST and cDNA sequences of a major vector of human malaria Anopheles gambiae structured into the AnoEST database. The expressed sequences are grouped into clusters using genomic sequence as template and associated with inferred functional annotation, including the following: corresponding Ensembl gene prediction, putative orthologous genes in other species, homology to known proteins, protein domains, associated Gene Ontology terms, and corresponding classification into broad GO-slim functional groups. AnoEST is a vital resource for interpretation of expression profiles derived using recently developed A. gambiae cDNA microarrays. Using these cDNA microarrays, we have experimentally confirmed the expression of 7961 clusters during mosquito development. Of these, 3100 are not associated with currently predicted genes. Moreover, we found that clusters with confirmed expression are nonbiased with respect to the current gene annotation or homology to known proteins. Consequently, we expect that many as yet unconfirmed clusters are likely to be actual A. gambiae genes. [AnoEST is publicly available at http://komar.embl.de, and is also accessible as a Distributed Annotation Service (DAS).].
Consistency of genome-based methods in measuring Metazoan evolutionZdobnov EM, von Mering C, Letunic I, Bork P FEBS Lett. 2005 Jun 13;579(15):3355-61. Epub 2005 Apr 18 PMID: 15943981 Seven distinct genome-wide divergence measures were applied pairwise to the nine sequenced animal genomes of human, mouse, rat, chicken, pufferfish, fruit fly, mosquito, and two nematode worms (Caenorhabditis briggsae and Caenorhabditis elegans). Qualitatively, all of these divergence measures are found to correlate with the estimated time since speciation; however, marked deviations are observed in a few lineages. The distinct genome divergence measures also correlate well among themselves, indicating that most of the processes shaping genomes are dominated by neutral events. The deviations from the clock-like scenario in some lineages are observed consistently by several measures, implicitly confirming their reliability.
A common core of secondary structure of the internal transcribed spacer 2 (ITS2) throughout the EukaryotaSchultz J, Maisel S, Gerlach D, Muller T, Wolf M RNA. 2005 Apr;11(4):361-4 PMID: 15769870 The ongoing characterization of novel species creates the need for a molecular marker which can be used for species- and, simultaneously, for mega-systematics. Recently, the use of the internal transcribed spacer 2 (ITS2) sequence was suggested, as it shows a high divergence in sequence with an assumed conservation in structure. This hypothesis was mainly based on small-scale analyses, comparing a limited number of sequences. Here, we report a large-scale analysis of more than 54,000 currently known ITS2 sequences with the goal to evaluate the hypothesis of a conserved structural core and to assess its use for automated large-scale phylogenetics. Structure prediction revealed that the previously described core structure can be found for more than 5000 sequences in a wide variety of taxa within the eukaryotes, indicating that the core secondary structure is indeed conserved. This conserved structure allowed an automated alignment of extremely divergent sequences as exemplified for the ITS2 sequences of a ctenophorean eumetazoon and a volvocalean green alga. All classified sequences, together with their structures can be accessed at www.biozentrum.uni-wuerzburg.de/bioinformatik/projects/ITS2.html. Furthermore, we found that, although sample sequences are known for most major taxa, there exists a profound divergence in coverage, which might become a hindrance for general usage. In summary, our analysis strengthens the potential of ITS2 as a general phylogenetic marker and provides a data source for further ITS2-based analyses.
Cloning, characterization and analysis by RNA interference of various genes of the Chelonus inanitus polydnavirusBonvin M, Marti D, Wyder S, Kojic D, Annaheim M, Lanzrein B J Gen Virol. 2005 Apr;86(Pt 4):973-83 PMID: 15784890 Successful parasitism of some endoparasitic wasps depends on an obligately symbiotic association with polydnaviruses. These unique viruses have a segmented genome consisting of circles of double-stranded (ds) DNA and do not replicate in the parasitized host. They are produced in the wasp's ovary and injected into the host along with the egg. Chelonus inanitus is an egg-larval parasitoid; its polydnavirus (CiV) has been shown to protect the parasitoid larva from the host's immune system and to induce developmental arrest in the prepupal stage. The genome of CiV consists of at least 10-12 segments and five have been sequenced up to now. Here, the complete (CiV12g2) or partial (CiV12g1, CiV16.8g1) cloning of three new CiV genes is reported. All three occur only on one viral segment and have no similarity to other known polydnavirus genes, with the exception of a high similarity of CiV12g1 to CiV14g1 and CiV12g2 to CiV14g2. Furthermore, the first attempt of in vivo application of RNA interference to study the function of polydnavirus genes is shown. Injection of dsRNA of two late- and one early- and late-expressed CiV genes into CiV/venom-containing host eggs partially rescued last-instar larvae from developmental arrest. Injection of the same dsRNAs into parasitized eggs partially reduced parasitoid survival, mainly by preventing the successful emergence of the parasitoid from the host. These viral genes thus seem to be involved in inducing developmental arrest and in keeping the cuticle soft, which appears to be necessary for parasitoid emergence and host feeding.
Protein coding potential of retroviruses and other transposable elements in vertebrate genomesZdobnov EM, Campillos M, Harrington ED, Torrents D, Bork P Nucleic Acids Res. 2005 Feb 16;33(3):946-54. Print 2005 PMID: 15716312 We suggest an annotation strategy for genes encoded by retroviruses and transposable elements (RETRA genes) based on a set of marker protein domains. Usually RETRA genes are masked in vertebrate genomes prior to the application of automated gene prediction pipelines under the assumption that they provide no selective advantage to the host. Yet, we show that about 1000 genes in four vertebrate gene sets analyzed contain at least one RETRA gene marker domain. Using the conservation of genomic neighborhood (synteny), we were able to discriminate between RETRA genes with putative functionality in the vertebrates and those that probably function only in the context of mobile elements. We identified 35 such genes in human, along with their corresponding mouse and rat orthologs; which included almost all known human genes with similarity to mobile elements. The results also imply that the vast majority of the remaining RETRA genes in current gene sets are unlikely to encode vertebrate functions. To automatically annotate RETRA genes in other vertebrate genomes, we provide as a tool a set of marker protein domains and a manually refined list of domesticated or ancestral RETRA genes for rescuing genes with vertebrate functions.
Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolutionInternational Chicken Genome Sequencing Consortium Nature. 2004 Dec 9;432(7018):695-716. Erratum in: Nature. 2005 Feb 17;433(7027):777 PMID: 15592404 We present here a draft genome sequence of the red jungle fowl, Gallus gallus. Because the chicken is a modern descendant of the dinosaurs and the first non-mammalian amniote to have its genome sequenced, the draft sequence of its genome--composed of approximately one billion base pairs of sequence and an estimated 20,000-23,000 genes--provides a new perspective on vertebrate genome evolution, while also improving the annotation of mammalian genomes. For example, the evolutionary distance between chicken and human provides high specificity in detecting functional elements, both non-coding and coding. Notably, many conserved non-coding sequences are far from genes and cannot be assigned to defined functional classes. In coding regions the evolutionary dynamics of protein domains and orthologous groups illustrate processes that distinguish the lineages leading to birds and mammals. The distinctive properties of avian microchromosomes, together with the inferred patterns of conserved synteny, provide additional insights into vertebrate chromosome architecture.
Methylated lysine 79 of histone H3 targets 53BP1 to DNA double-strand breaks.Huyen Y, Zgheib O, Ditullio RA Jr, Gorgoulis VG, Zacharatos P, Petty TJ, Sheston EA, Mellert HS, Stavridi ES, Halazonetis TD Nature. 2004 Nov 18;432(7015):406-11. PMID: 15525939 The mechanisms by which eukaryotic cells sense DNA double-strand breaks (DSBs) in order to initiate checkpoint responses are poorly understood. 53BP1 is a conserved checkpoint protein with properties of a DNA DSB sensor. Here, we solved the structure of the domain of 53BP1 that recruits it to sites of DSBs. This domain consists of two tandem tudor folds with a deep pocket at their interface formed by residues conserved in the budding yeast Rad9 and fission yeast Rhp9/Crb2 orthologues. In vitro, the 53BP1 tandem tudor domain bound histone H3 methylated on Lys 79 using residues that form the walls of the pocket; these residues were also required for recruitment of 53BP1 to DSBs. Suppression of DOT1L, the enzyme that methylates Lys 79 of histone H3, also inhibited recruitment of 53BP1 to DSBs. Because methylation of histone H3 Lys 79 was unaltered in response to DNA damage, we propose that 53BP1 senses DSBs indirectly through changes in higher-order chromatin structure that expose the 53BP1 binding site.
Stage-dependent expression of Chelonus inanitus polydnavirus genes in the host and the parasitoidBonvin M, Kojic D, Blank F, Annaheim M, Wehrle I, Wyder S, Kaeslin M, Lanzrein B J Insect Physiol. 2004 Nov;50(11):1015-26 PMID: 15607504 Chelonus inanitus (Braconidae) is a solitary egg-larval parasitoid of Spodoptera littoralis (Noctuidae). Along with the egg it also injects polydnaviruses (CiV) and venom, which are prerequisites for successful parasitoid development. CiV protects the parasitoid from encapsulation by the host's immune system and induces a developmental arrest in the prepupal stage. The polydnavirus genome consists of several double-stranded circular DNA segments. Proviral DNA is integrated in the wasp's genome and virus replication is restricted to the wasp's ovary. Here, the analysis of eight CiV genes located on five different segments revealed four patterns of expression in the course of parasitization: early, late, persistent but variable, and early and late. The comparison between parasitized and CiV/venom only containing hosts indicated that the presence of the parasitoid larva modulates transcript levels. Haemocytes, fat body and nervous tissue contained viral transcripts, values being highest in haemocytes. Small amounts of CiV transcripts were also observed in parasitoid larvae and pupae, suggesting transcription from the proviral integrated form of viral DNA. This is the first comparative analysis of the expression patterns of several viral genes in both parasitized and CiV/venom only containing hosts over the entire period of parasitization, and it reveals intricate interactions between the parasitoid, the polydnavirus and the host.
Genome sequence of the Brown Norway rat yields insights into mammalian evolutionGenome Sequencing Consortium Nature. 2004 Apr 1;428(6982):493-521 PMID: 15057822 The laboratory rat (Rattus norvegicus) is an indispensable tool in experimental medicine and drug development, having made inestimable contributions to human health. We report here the genome sequence of the Brown Norway (BN) rat strain. The sequence represents a high-quality 'draft' covering over 90% of the genome. The BN rat sequence is the third complete mammalian genome to be deciphered, and three-way comparisons with the human and mouse genomes resolve details of mammalian evolution. This first comprehensive analysis includes genes and proteins and their relation to human disease, repeated sequences, comparative genome-wide studies of mammalian orthologous chromosomal regions and rearrangement breakpoints, reconstruction of ancestral karyotypes and the events leading to existing species, rates of variation, and lineage-specific and lineage-independent evolutionary events such as expansion of gene families, orthology relations and protein evolution.
Free Full Text A genome-wide survey of human pseudogenesTorrents D, Suyama M, Zdobnov E, Bork P Genome Res. 2003 Dec;13(12):2559-67 PMID: 14656963 We screened all intergenic regions in the human genome to identify pseudogenes with a combination of homology searches and a functionality test using the ratio of silent to replacement nucleotide substitutions (KA/KS). We identified 19,724 regions of which 95% +/- 3% are estimated to evolve neutrally and thus are likely to encode pseudogenes. Half of these have no detectable truncation in their pseudocoding regions and therefore are not identifiable by methods that require the presence of truncations to prove nonfunctionality. A comparative analysis with the mouse genome showed that 70% of these pseudogenes have a retrotranspositional origin (processed), and the rest arose by segmental duplication (nonprocessed). Although the spread of both types of pseudogenes correlates with chromosome size, nonprocessed pseudogenes appear to be enriched in regions with high gene density. It is likely that the human pseudogenes identified here represent only a small fraction of the total, which probably exceeds the number of genes.
Genome evolution reveals biochemical networks and functional modulesvon Mering C, Zdobnov EM, Tsoka S, Ciccarelli FD, Pereira-Leal JB, Ouzounis CA, Bork P Proc Natl Acad Sci U S A. 2003 Dec 23;100(26):15428-33. Epub 2003 Dec 12 PMID: 14673105 The analysis of completely sequenced genomes uncovers an astonishing variability between species in terms of gene content and order. During genome history, the genes are frequently rear-ranged, duplicated, lost, or transferred horizontally between genomes. These events appear to be stochastic, yet they are under selective constraints resulting from the functional interactions between genes. These genomic constraints form the basis for a variety of techniques that employ systematic genome comparisons to predict functional associations among genes. The most powerful techniques to date are based on conserved gene neighborhood, gene fusion events, and common phylogenetic distributions of gene families. Here we show that these techniques, if integrated quantitatively and applied to a sufficiently large number of genomes, have reached a resolution which allows the characterization of function at a higher level than that of the individual gene: global modularity becomes detectable in a functional protein network. In Escherichia coli, the predicted modules can be bench-marked by comparison to known metabolic pathways. We found as many as 74% of the known metabolic enzymes clustering together in modules, with an average pathway specificity of at least 84%. The modules extend beyond metabolism, and have led to hundreds of reliable functional predictions both at the protein and pathway level. The results indicate that modularity in protein networks is intrinsically encoded in present-day genomes.
Ovary development and polydnavirus morphogenesis in the parasitic wasp Chelonus inanitus. I. Ovary morphogenesis, amplificationMarti D, Grossniklaus-Burgin C, Wyder S, Wyler T, Lanzrein B J Gen Virol. 2003 May;84(Pt 5):1141-50 PMID: 12692279 Polydnaviruses are unique symbiotic viruses that are replicated in the calyx cells of the ovary of some parasitic wasps. They have a segmented genome of circular double-stranded DNA and are injected along with the wasp's egg into the host, where they are essential for successful parasitism. Polydnaviruses replicate from integrated proviral DNA, and after excision of viral segments, flanking DNA is rejoined. Little is known about ovarian morphogenesis, the mode of amplification of the viral DNA and the involvement of ecdysteroids. Here we have analysed these parameters in the course of pupal-adult development in the braconid wasp Chelonus inanitus. Immediately after pupation, ovarian cells proliferated and calyx cells began to differentiate; at this stage ecdysteroids, in particular 20-hydroxyecdysone, were highest. Thereafter, calyx cells began to increase in size and DNA content and eventually became gigantic. Amplification of non-viral DNA (actin) and viral DNA in its integrated and excised form and of corresponding rejoined flanking regions was measured by quantitative real-time PCR. In the early phase of calyx cell differentiation, copy numbers of actin and integrated viral DNA increased to a similar extent. This, along with the increase in nuclear volume and DNA content in the absence of extensive cell proliferation, suggested polyploidization of the early stage calyx cells. In the following phase, integrated viral DNA was selectively and intensively amplified and eventually excised and circularized. As copy numbers of excised circular viral DNA and rejoined flanking DNA reached similarly high levels, excised viral DNA appeared not to replicate. After adult eclosion, amplification of viral DNA declined.
Fate of polydnavirus DNA of the egg-larval parasitoid Chelonus inanitus in the host Spodoptera littoralisWyder S, Blank F, Lanzrein B J Insect Physiol. 2003 May;49(5):491-500 PMID: 12770628 In situ hybridizations show that 5 min after parasitization, polydnavirus DNA is in close vicinity of the parasitoid egg, but 5 h later also in the yolk and partially in the host embryo. Fifteen hours after parasitization, the viral DNA is seen all over the host embryo and hardly in the yolk. The tissue distribution of the viral DNA was analysed and quantified by dot blots in the fifth instar parasitized larvae. On a per host basis, haemocytes and fat body contained the highest amount of viral DNA, while nervous tissue, intestinal tract and carcass contained less. Of the three viral segments tested, all were found in all tissues. Relative to the quantity of host DNA, viral DNA was most abundant in haemocytes, about five times less abundant in fat body and nervous tissue and about 25 times less abundant in intestinal tract. The total quantity of viral DNA per host was 444+/-145 pg which is similar to the quantity injected by the wasp; thus, the viral DNA persists throughout parasitization. The parasitoid larva contains 820+/-80 pg viral DNA integrated in the genome. This illustrates that the dose of viral DNA injected in virions represents approximately one third of the total viral genomic information present in a host at a late stage of parasitism.
The InterPro Database, 2003 brings increased coverage and new featuresMulder NJ, Apweiler R, Attwood TK, Bairoch A, Barrell D, Bateman A, Binns D, Biswas M, Bradley P, Bork P, Bucher P, Copley RR, Courcelle E, Das U, Durbin R, Falquet L, Fleischmann W, Griffiths-Jones S, Haft D, Harte N, Hulo N, Kahn D, Kanapin A, Krestyaninova M, Lopez R, Letunic I, Lonsdale D, Silventoinen V, Orchard SE, Pagni M, Peyruc D, Ponting CP, Selengut JD, Servant F, Sigrist CJ, Vaughan R, Zdobnov EM Nucleic Acids Res. 2003 Jan 1;31(1):315-8 PMID: 12520011 InterPro, an integrated documentation resource of protein families, domains and functional sites, was created in 1999 as a means of amalgamating the major protein signature databases into one comprehensive resource. PROSITE, Pfam, PRINTS, ProDom, SMART and TIGRFAMs have been manually integrated and curated and are available in InterPro for text- and sequence-based searching. The results are provided in a single format that rationalises the results that would be obtained by searching the member databases individually. The latest release of InterPro contains 5629 entries describing 4280 families, 1239 domains, 95 repeats and 15 post-translational modifications. Currently, the combined signatures in InterPro cover more than 74% of all proteins in SWISS-PROT and TrEMBL, an increase of nearly 15% since the inception of InterPro. New features of the database include improved searching capabilities and enhanced graphical user interfaces for visualisation of the data. The database is available via a webserver (http://www.ebi.ac.uk/interpro) and anonymous FTP (ftp://ftp.ebi.ac.uk/pub/databases/interpro).
Initial sequencing and comparative analysis of the mouse genomeMouse Genome Sequencing Consortium; Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, Antonarakis SE, Attwood J, Baertsch R, Bailey J, Barlow K, Beck S, Berry E, Birren B, Bloom T, Bork P, Botcherby M, Bray N, Brent MR, Brown DG, Brown SD, Bult C, Burton J, Butler J, Campbell RD, Carninci P, Cawley S, Chiaromonte F, Chinwalla AT, Church DM, Clamp M, Clee C, Collins FS, Cook LL, Copley RR, Coulson A, Couronne O, Cuff J, Curwen V, Cutts T, Daly M, David R, Davies J, Delehaunty KD, Deri J, Dermitzakis ET, Dewey C, Dickens NJ, Diekhans M, Dodge S, Dubchak I, Dunn DM, Eddy SR, Elnitski L, Emes RD, Eswara P, Eyras E, Felsenfeld A, Fewell GA, Flicek P, Foley K, Frankel WN, Fulton LA, Fulton RS, Furey TS, Gage D, Gibbs RA, Glusman G, Gnerre S, Goldman N, Goodstadt L, Grafham D, Graves TA, Green ED, Gregory S, Guigo R, Guyer M, Hardison RC, Haussler D, Hayashizaki Y, Hillier LW, Hinrichs A, Hlavina W, Holzer T, Hsu F, Hua A, Hubbard T, Hunt A, Jackson I, Jaffe DB, Johnson LS, Jones M, Jones TA, Joy A, Kamal M, Karlsson EK, Karolchik D, Kasprzyk A, Kawai J, Keibler E, Kells C, Kent WJ, Kirby A, Kolbe DL, Korf I, Kucherlapati RS, Kulbokas EJ, Kulp D, Landers T, Leger JP, Leonard S, Letunic I, Levine R, Li J, Li M, Lloyd C, Lucas S, Ma B, Maglott DR, Mardis ER, Matthews L, Mauceli E, Mayer JH, McCarthy M, McCombie WR, McLaren S, McLay K, McPherson JD, Meldrim J, Meredith B, Mesirov JP, Miller W, Miner TL, Mongin E, Montgomery KT, Morgan M, Mott R, Mullikin JC, Muzny DM, Nash WE, Nelson JO, Nhan MN, Nicol R, Ning Z, Nusbaum C, O'Connor MJ, Okazaki Y, Oliver K, Overton-Larty E, Pachter L, Parra G, Pepin KH, Peterson J, Pevzner P, Plumb R, Pohl CS, Poliakov A, Ponce TC, Ponting CP, Potter S, Quail M, Reymond A, Roe BA, Roskin KM, Rubin EM, Rust AG, Santos R, Sapojnikov V, Schultz B, Schultz J, Schwartz MS, Schwartz S, Scott C, Seaman S, Searle S, Sharpe T, Sheridan A, Shownkeen R, Sims S, Singer JB, Slater G, Smit A, Smith DR, Spencer B, Stabenau A, Stange-Thomann N, Sugnet C, Suyama M, Tesler G, Thompson J, Torrents D, Trevaskis E, Tromp J, Ucla C, Ureta-Vidal A, Vinson JP, Von Niederhausern AC, Wade CM, Wall M, Weber RJ, Weiss RB, Wendl MC, West AP, Wetterstrand K, Wheeler R, Whelan S, Wierzbowski J, Willey D, Williams S, Wilson RK, Winter E, Worley KC, Wyman D, Yang S, Yang SP, Zdobnov EM, Zody MC, Lander ES Nature. 2002 Dec 5;420(6915):520-62 PMID: 12466850 The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of the genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.
Comparative genome and proteome analysis of Anopheles gambiae and Drosophila melanogasterZdobnov EM, von Mering C, Letunic I, Torrents D, Suyama M, Copley RR, Christophides GK, Thomasova D, Holt RA, Subramanian GM, Mueller HM, Dimopoulos G, Law JH, Wells MA, Birney E, Charlab R, Halpern AL, Kokoza E, Kraft CL, Lai Z, Lewis S, Louis C, Barillas-Mury C, Nusskern D, Rubin GM, Salzberg SL, Sutton GG, Topalis P, Wides R, Wincker P, Yandell M, Collins FH, Ribeiro J, Gelbart WM, Kafatos FC, Bork P Science. 2002 Oct 4;298(5591):149-59 PMID: 12364792 Comparison of the genomes and proteomes of the two diptera Anopheles gambiae and Drosophila melanogaster, which diverged about 250 million years ago, reveals considerable similarities. However, numerous differences are also observed; some of these must reflect the selection and subsequent adaptation associated with different ecologies and life strategies. Almost half of the genes in both genomes are interpreted as orthologs and show an average sequence identity of about 56%, which is slightly lower than that observed between the orthologs of the pufferfish and human (diverged about 450 million years ago). This indicates that these two insects diverged considerably faster than vertebrates. Aligned sequences reveal that orthologous genes have retained only half of their intron/exon structure, indicating that intron gains or losses have occurred at a rate of about one per gene per 125 million years. Chromosomal arms exhibit significant remnants of homology between the two species, although only 34% of the genes colocalize in small "microsyntenic" clusters, and major interarm transfers as well as intra-arm shuffling of gene order are detected.
The genome sequence of the malaria mosquito Anopheles gambiaeHolt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, de Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, Shao H, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, della Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL Science. 2002 Oct 4;298(5591):129-49 PMID: 12364791 Anopheles gambiae is the principal vector of malaria, a disease that afflicts more than 500 million people and causes more than 1 million deaths each year. Tenfold shotgun sequence coverage was obtained from the PEST strain of A. gambiae and assembled into scaffolds that span 278 million base pairs. A total of 91% of the genome was organized in 303 scaffolds; the largest scaffold was 23.1 million base pairs. There was substantial genetic variation within this strain, and the apparent existence of two haplotypes of approximately equal frequency ("dual haplotypes") in a substantial fraction of the genome likely reflects the outbred nature of the PEST strain. The sequence produced a conservative inference of more than 400,000 single-nucleotide polymorphisms that showed a markedly bimodal density distribution. Analysis of the genome sequence revealed strong evidence for about 14,000 protein-encoding transcripts. Prominent expansions in specific families of proteins likely involved in cell adhesion and immunity were noted. An expressed sequence tag analysis of genes regulated by blood feeding provided insights into the physiological adaptations of a hematophagous insect.
Immunity-related genes and gene families in Anopheles gambiaeChristophides GK, Zdobnov E, Barillas-Mury C, Birney E, Blandin S, Blass C, Brey PT, Collins FH, Danielli A, Dimopoulos G, Hetru C, Hoa NT, Hoffmann JA, Kanzok SM, Letunic I, Levashina EA, Loukeris TG, Lycett G, Meister S, Michel K, Moita LF, Muller HM, Osta MA, Paskewitz SM, Reichhart JM, Rzhetsky A, Troxler L, Vernick KD, Vlachou D, Volz J, von Mering C, Xu J, Zheng L, Bork P, Kafatos FC. Science. 2002 Oct 4;298(5591):159-65 PMID: 12364793 We have identified 242 Anopheles gambiae genes from 18 gene families implicated in innate immunity and have detected marked diversification relative to Drosophila melanogaster. Immune-related gene families involved in recognition, signal modulation, and effector systems show a marked deficit of orthologs and excessive gene expansions, possibly reflecting selection pressures from different pathogens encountered in these insects' very different life-styles. In contrast, the multifunctional Toll signal transduction pathway is substantially conserved, presumably because of counterselection for developmental stability. Representative expression profiles confirm that sequence diversification is accompanied by specific responses to different immune challenges. Alternative RNA splicing may also contribute to expansion of the immune repertoire.
The EBI SRS server-new featuresZdobnov EM, Lopez R, Apweiler R, Etzold T Bioinformatics. 2002 Aug;18(8):1149-50 PMID: 12176845 MOTIVATION: Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. AVAILABILITY: SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.
Comparative genomic analysis in the region of a major Plasmodium-refractoriness locus of Anopheles gambiaeThomasova D, Ton LQ, Copley RR, Zdobnov EM, Wang X, Hong YS, Sim C, Bork P, Kafatos FC, Collins FH Proc Natl Acad Sci U S A. 2002 Jun 11;99(12):8179-84 PMID: 12060762 We have sequenced six overlapping clones from a library of bacterial artificial chromosome (BAC) clones derived from a laboratory strain of the mosquito, Anopheles gambiae, the major vector of human malaria in Africa. The resulting uninterrupted 528-kb sequence is from the 8C region of the mosquito 2R chromosome, at or very near the major refractoriness locus associated with melanotic encapsulation of parasites. This sequence represents the first extensive view of the mosquito genome structure encompassing 48 genes. Genomic comparison reveals that the majority of the orthologues are found in six microsyntenic clusters in Drosophila melanogaster. A BAC clone that is wholly contained within this region demonstrates the existence of a remarkable degree of local polymorphism in this species, which may prove important for its population structure and vectorial capacity.
Interactive InterPro-based comparisons of proteins in whole genomes.Kanapin A, Apweiler R, Biswas M, Fleischmann W, Karavidopoulou Y, Kersey P, Kriventseva EV, Mittard V, Mulder N, Oinn T, Phan I, Servant F, Zdobnov E. Bioinformatics. 2002 Feb;18(2):374-5 PMID: 11847096 MOTIVATION: The SWISS-PROT group at the EBI has developed the Proteome Analysis Database utilizing existing resources and providing comprehensive and integrated comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes. The Proteome Analysis Database is accompanied by a program that has been designed to carry out interactive InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
The EBI SRS server--recent developmentsZdobnov EM, Lopez R, Apweiler R, Etzold T Bioinformatics. 2002 Feb;18(2):368-73 PMID: 11847095 MOTIVATION: Here we report on recent developments at the EBI SRS server (http://srs.ebi.ac.uk). SRS has become an integration system for both data retrieval and sequence analysis applications. The EBI SRS server is a primary gateway to major databases in the field of molecular biology produced and supported at EBI as well as European public access point to the MEDLINE database provided by US National Library of Medicine (NLM). It is a reference server for latest developments in data and application integration. The new additions include: concept of virtual databases, integration of XML databases like the Integrated Resource of Protein Domains and Functional Sites (InterPro), Gene Ontology (GO), MEDLINE, Metabolic pathways, etc., user friendly data representation in 'Nice views', SRSQuickSearch bookmarklets. AVAILABILITY: SRS6 is a licensed product of LION Bioscience AG freely available for academics. The EBI SRS server (http://srs.ebi.ac.uk) is a free central resource for molecular biology data as well as a reference server for the latest developments in data integration.
Characterization of Chelonus inanitus polydnavirus segments: sequences and analysis, excision site and demonstration of clusteriWyder S, Tschannen A, Hochuli A, Gruber A, Saladin V, Zumbach S, Lanzrein B J Gen Virol. 2002 Jan;83(Pt 1):247-56 PMID: 11752722 Polydnaviruses (genera Ichnovirus and Bracovirus) have a segmented genome of circular double-stranded DNA molecules, replicate in the ovary of parasitic wasps and are essential for successful parasitism of the host. Here we show the first detailed analysis of various segments of a bracovirus, the Chelonus inanitus virus (CiV). Four segments were sequenced and two of them, CiV12 and CiV14, were found to be closely related while CiV14.5 and CiV16.8 were unrelated. CiV12, CiV14.5 and CiV16.8 are unique while CiV14 occurs also nested in another larger segment. All four segments are predicted to contain genes and predictions could be substantiated in most cases. Comparison with databases revealed no significant similarities at either the nucleotide or amino acid level. Inverted repeats with identities between 77% and 92% and lengths between 26 bp and 100 bp were found on all segments outside of predicted genes. Hybridization experiments indicate that CiV12 and CiV14 are both flanked by other virus segments, suggesting that proviral CiV segments are clustered in the genome of the wasp. The integration/excision site of CiV14 was analysed and compared to that of CiV12. On both termini of proviral CiV12 and CiV14 as well as in the excised circular molecule and the rejoined DNA a very similar repeat of 14 bp was found. A model to illustrate where the terminal repeats might recombine to yield the circular molecule is presented. Excision of CiV12 and CiV14 is restricted to the female and sets in at a very specific time-point in pupal-adult development.
mmsearch: a motif arrangement language and search programJunier T, Pagni M, Bucher P. Bioinformatics. 2001 Dec;17(12):1234-5 PMID: 11751236 This paper presents a language for describing arrangements of motifs in biological sequences, and a program that uses the language to find the arrangements in motif match databases. The program does not by itself search for the constituent motifs, and is thus independent of how they are detected, which allows it to use motif match data of various origins. AVAILABILITY: The program can be tested online at http://hits.isb-sib.ch and the distribution is available from ftp://ftp.isrec.isb-sib.ch/pub/software/unix/mmsearch-1.0.tar.gz CONTACT: Thomas.Junier@isrec.unil.ch SUPPLEMENTARY INFORMATION: The full documentation about mmsearchis available from http://hits.isb-sib.ch/~tjunier/mmsearch/doc.
Proteome Analysis Database: online application of InterPro and CluSTr for the functional classification of proteins in whole genApweiler R, Biswas M, Fleischmann W, Kanapin A, Karavidopoulou Y, Kersey P, Kriventseva EV, Mittard V, Mulder N, Phan I, Zdobnov E. Nucleic Acids Res. 2001 Jan 1;29(1):44-8. PMID: 11125045 The SWISS-PROT group at EBI has developed the Proteome Analysis Database utilising existing resources and providing comparative analysis of the predicted protein coding sequences of the complete genomes of bacteria, archaea and eukaryotes (http://www.ebi.ac. uk/proteome/). The two main projects used, InterPro and CluSTr, give a new perspective on families, domains and sites and cover 31-67% (InterPro statistics) of the proteins from each of the complete genomes. CluSTr covers the three complete eukaryotic genomes and the incomplete human genome data. The Proteome Analysis Database is accompanied by a program that has been designed to carry out InterPro proteome comparisons for any one proteome against any other one or more of the proteomes in the database.
trEST, trGEN and Hits: access to databases of predicted protein sequencesPagni M,Iseli C,Junier T,Falquet L,Jongeneel V,Bucher P Nucleic Acids Res. 2001 Jan 1;29(1):148-51 PMID: 11125074 High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).
Dotlet: diagonal plots in a web browserJunier T, Pagni M Bioinformatics. 2000 Feb;16(2):178-9 PMID: 10842741 Dotlet is a program for comparing sequences by the diagonal plot method. It is designed to be platform-independent and to run in a Web browser, thus enabling the majority of researchers to use it. AVAILABILITY: The applet can be tested at http://www.isrec.isb-sib.ch/java/dotlet/ Dotlet.html, and the source code is available upon request. CONTACT: Thomas.Junier Marco.Pagni @isrec.unil.ch SUPPLEMENTARY: The full documentation about d o t l e t is available from the above URL.
The eukaryotic promoter database (EPD)Perier RC, Junier T, Bonnard C, Bucher P Nucleic Acids Res. 2000 Jan 1;28(1):302-3 PMID: 10592254 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch
The Eukaryotic Promoter Database (EPD): recent developmentsPerier RC, Junier T, Bonnard C, Bucher P Nucleic Acids Res. 1999 Jan 1;27(1):307-9 PMID: 9847211 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Recent efforts have focused on exhaustive cross-referencing to the EMBL nucleotide sequence database, and on the improvement of the WWW-based user interfaces and data retrieval mechanisms. EPD can be accessed at http://www.epd.isb-sib.ch
Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNARoulot E, Fisch I, Junier T, Bucher P, Mermod N In Silico Biol. 1998;1(1):21-8 PMID: 11471239 Computational molecular biology tools are becoming the method of choice for high throughput screening of newly determined DNA sequences. Such bioinformatic methods indeed offer invaluable tools for the analysis of novel genomic sequences, as they allow for instance the identification of candidate disease-responsible genes [see Rawlings and Searls, 1997, for a review]. Effective DNA sequence analysis demands not only the faithful identification of gene elements and boundaries, but it also requires reliable information on the potential function and regulation of the identified genes. Consequently, powerful software tools are more and more relying on the coupling and integration of various prediction algorithms. Such integrated systems should include devices for the recognition of DNA sequences that act as binding sites for regulatory proteins known as transcription factors. The identification of such sites is not only relevant for locating the promoter as the 5' boundary of a gene, but they may also allow the prediction of a tissue- specific gene-expression pattern and responsiveness to known biological signaling pathways. However, binding sites for sequence-specific DNA-binding transcription factors are typically short and degenerate, and their efficient prediction requires sophisticated computational tools. Databases of promoter and transcription factors have been established [Bucher, 1990; Ghosh, 1993; Wingender et al., 1997], and these compiled data were in turn used for the development of algorithms and program packages for the identification of transcription factor binding sites on DNA
SEView: a Java applet for browsing molecular sequence dataJunier T, Bucher P In Silico Biol. 1998;1(1):13-20 PMID: 11471238 SEView is a Java applet that represents known or predicted elements of a protein or nucleotide sequence. It replaces or supplements the textual format of databases or program output with an interactive, graphical representation that is easily available through a WWW browser. Independence from the source data's format is achieved through a description language and ad hoc translators, which make the system versatile and flexible.
The Eukaryotic Promoter Database EPDCavin Perier R, Junier T, Bucher P Nucleic Acids Res. 1998 Jan 1;26(1):353-7 PMID: 9399872 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of experimentally characterised eukaryotic POL II promoters. The underlying definition of a promoter is that of a transcription initiation site. All information presented in EPD results from an independent evaluation of primary experimental data shown in the biological literature. Sequences flanking transcription initiation sites are indirectly given by pointers to EMBL sequences. The annotation part of a promoter entry includes description of the promoter-defining evidence, cross-references to other databases, and bibliographic references. Being designed as a resource for comparative sequence analysis, EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets. The database is available through the World Wide Web at URL http://cmpteam4.unil.ch
Experimental evidence for slipped loop DNA, a novel folding type for polynucleotide chainMinyat EE, Khomyakova EB, Petrova MV, Zdobnov EM, Ivanov VI. J Biomol Struct Dyn. 1995 Dec;13(3):523-7 PMID: 8825732 DNA regions with short direct repeats (5-7bp) with a spacer in between, when under super-helical stress, are known to become susceptible to single-strand specific nuclease S1. This is in accord with formation of two shifted loops protruding from the opposite chains. Such type of folding could have been additionally stabilized by base pairing between the complementary parts of the loops that explains existence of the protected from S1 moieties of the loops. To test this possibility we designed and synthesized an oligonucleotide of 56 bases, so that it forms a hairpin with a stem which fails to acquire a traditional helix due to a special sequence but may favor the formation of the proposed Slipped Loop Structure (SLS). The oligonucleotide folding was studied by a chemical modification method at one nucleotide level resolution. Three zones, protected from the used probes were found: the one that forms the stem, and the others that are located within the two by-loops in those moieties which have the base pairing potential. Proceeding from the data obtained and stereochemical analysis a 3-D scheme for the SLS form of DNA is suggested.
|