|
Navigation
Rhinovirus genome variation during chronic upper and lower respiratory tract infections.Tapparel C, Cordey S, Junier T, Farinelli L, Van Belle S, Soccal PM, Aubert JD, Zdobnov EM, Kaiser L. PLoS One PMID: 21713005 Routine screening of lung transplant recipients and hospital patients for respiratory virus infections allowed to identify human rhinovirus (HRV) in the upper and lower respiratory tracts, including immunocompromised hosts chronically infected with the same strain over weeks or months. Phylogenetic analysis of 144 HRV-positive samples showed no apparent correlation between a given viral genotype or species and their ability to invade the lower respiratory tract or lead to protracted infection. By contrast, protracted infections were found almost exclusively in immunocompromised patients, thus suggesting that host factors rather than the virus genotype modulate disease outcome, in particular the immune response. Complete genome sequencing of five chronic cases to study rhinovirus genome adaptation showed that the calculated mutation frequency was in the range observed during acute human infections. Analysis of mutation hot spot regions between specimens collected at different times or in different body sites revealed that non-synonymous changes were mostly concentrated in the viral capsid genes VP1, VP2 and VP3, independent of the HRV type. In an immunosuppressed lung transplant recipient infected with the same HRV strain for more than two years, both classical and ultra-deep sequencing of samples collected at different time points in the upper and lower respiratory tracts showed that these virus populations were phylogenetically indistinguishable over the course of infection, except for the last month. Specific signatures were found in the last two lower respiratory tract populations, including changes in the 5'UTR polypyrimidine tract and the VP2 immunogenic site 2. These results highlight for the first time the ability of a given rhinovirus to evolve in the course of a natural infection in immunocompromised patients and complement data obtained from previous experimental inoculation studies in immunocompetent volunteers.
The Newick Utilities: High-throughput Phylogenetic tree Processing in the UNIX Shell Junier T, Zdobnov EM Bioinformatics. 2010 May 13 PMID: 20472542 Summary: We present a suite of UNIX shell programs for processing any number of phylogenetic trees of any size. They perform frequently-used tree operations without requiring user interaction. They also allow tree drawing as scalable vector graphics (SVG), suitable for high-quality presentations and further editing, and as ASCII graphics for command-line inspection. As an example we include an implementation of bootscanning, a procedure for finding recombination breakpoints in viral genomes.
Availability: C source code, Python bindings, and executables for various platforms are available from http://cegg.unige.ch/newick_utils. The distribution includes a manual and example data. The package is distributed under the BSD License.
Rhinovirus Genome Evolution during Experimental Human InfectionCordey S, Junier T, Gerlach D, Gobbini F, Farinelli L, Zdobnov EM, Winther B, Tapparel C, Kaiser L PLoS One. 2010 May 11;5(5):e10588 PMID: 20485673 Human rhinoviruses (HRVs) evolve rapidly due in part to their error-prone RNA polymerase. Knowledge of the diversity of HRV populations emerging during the course of a natural infection is essential and represents a basis for the design of future potential vaccines and antiviral drugs. To evaluate HRV evolution in humans, nasal wash samples were collected daily for five days from 15 immunocompetent volunteers experimentally infected with a reference stock of HRV-39. In parallel, HeLa-OH cells were inoculated to compare HRV evolution in vitro. Nasal wash in vivo assessed by real-time PCR showed a viral load that peaked at 48-72 h. Ultra-deep sequencing was used to compare the low-frequency mutation populations present in the HRV-39 inoculum in two human subjects and one HeLa-OH supernatant collected 5 days post-infection. The analysis revealed hypervariable mutation locations in VP2, VP3, VP1, 2C and 3C genes and conserved regions in VP4, 2A, 2B, 3A, 3B and 3D genes. These results were confirmed by classical sequencing of additional samples, both from inoculated volunteers and independent cell infections, and suggest that HRV inter-host transmission is not associated with a strong bottleneck effect. A specific analysis of the VP1 capsid gene of 15 human cases confirmed the high mutation incidence in this capsid region, but not in the antiviral drug-binding pocket. We could also estimate a mutation frequency in vivo of 3.4x10(-4) mutations/nucleotides and 3.1x10(-4) over the entire ORF and VP1 gene, respectively. In vivo, HRV generate new variants rapidly during the course of an acute infection due to mutations that accumulate in hot spot regions located at the capsid level, as well as in 2C and 3C genes.
Functional and evolutionary insights from the genomes of three parasitoid Nasonia speciesThe Nasonia Genome Working Group (incl. Junier T, Gerlach D, Waterhouse RM, Kriventseva EV, Wyder S, Zdobnov EM) Science. 2010 Jan 15;327(5963):343-8. PMID: 20075255 We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclear-mitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
The impact of transmission clusters on primary drug resistance in newly diagnosed HIV-1 infectionYerly S, Junier T, Gayet-Ageron A, Amari EB, von Wyl V, Günthard HF, Hirschel B, Zdobnov EM, Kaiser L, and the Swiss HIV Cohort Study. AIDS. 2009 May 29. [Epub ahead of print] PMID: 19487906 OBJECTIVES::To monitor HIV-1 transmitted drug resistance (TDR) in a well defined urban area with large access to antiretroviral therapy and to assess the potential source of infection of newly diagnosed HIV individuals. METHODS:: All individuals resident in Geneva, Switzerland, with a newly diagnosed HIV infection between 2000 and 2008 were screened for HIV resistance. An infection was considered as recent when the positive test followed a negative screening test within less than 1 year. Phylogenetic analyses were performed by using the maximum likelihood method on pol sequences including 1058 individuals with chronic infection living in Geneva.
RESULTS:: Of 637 individuals with newly diagnosed HIV infection, 20% had a recent infection. Mutations associated with resistance to at least one drug class were detected in 8.5% [nucleoside reverse transcriptase inhibitors (NRTIs), 6.3%; non-nucleoside reverse transcriptase inhibitors (NNRTIs), 3.5%; protease inhibitors, 1.9%]. TDR (P-trend = 0.015) and, in particular, NNRTI resistance (P = 0.002) increased from 2000 to 2008. Phylogenetic analyses revealed that 34.9% of newly diagnosed individuals, and 52.7% of those with recent infection were linked to transmission clusters. Clusters were more frequent in individuals with TDR than in those with sensitive strains (59.3 vs. 32.6%, respectively; P < 0.0001). Moreover, 84% of newly diagnosed individuals with TDR were part of clusters composed of only newly diagnosed individuals.
CONCLUSION:: Reconstruction of the HIV transmission networks using phylogenetic analysis shows that newly diagnosed HIV infections are a significant source of onward transmission, particularly of resistant strains, thus suggesting an important self-fueling mechanism for TDR.
The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and EvolutionThe Bovine Genome Sequencing and Analysis Consortium (incl. Gerlach D, Junier T, Kriventseva EV, Zdobnov EM) Science. 2009 Apr 24;324(5926):522-528 PMID: 19390049 To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
New respiratory enterovirus genotype and rhinovirus strains identified by genotyping circulating picornavirusesTapparel C, Junier T, Gerlach D, Van Belle S, Turin L, Cordey S, Muehlemann K, Regamey N, Aubert JD, Soccal PM, Eigenmann P, Zdobnov EM, Kaiser L Emerg Infect Dis. 2009 May;15(5):719-726 PMID: 19402957 Rhinoviruses and enteroviruses are leading causes of respiratory infections. To evaluate genotypic diversity and identify forces shaping picornavirus evolution, we screened persons with respiratory illnesses by using rhinovirus-specific or generic real-time PCR assays. We then sequenced the 5 untranslated region, capsid protein VP1, and protease precursor 3CD regions of virus-positive samples. Subsequent phylogenetic analysis identified the large genotypic diversity of rhinoviruses circulating in humans. We identified and completed the genome sequence of a new enterovirus genotype associated with respiratory symptoms and acute otitis media, confirming the close relationship between rhinoviruses and enteroviruses and the need to detect both viruses in respiratory specimens. Finally, we identified recombinants among circulating rhinoviruses and mapped their recombination sites, thereby demonstrating that rhinoviruses can recombine in their natural host. This study clarifies the diversity and explains the reasons for evolution of these viruses.
Community analysis of betaproteobacterial ammonia-oxidizing bacteria using the amoCAB operon.Junier P, Kim OS, Junier T, Ahn TS, Imhoff JS, and Witzel KP Applied Microbiology and Biotechnology PMID: 19274459 The genes and intergenic regions of the amoCAB operon were analyzed to establish their potential as molecular markers for analyzing ammonia-oxidizing betaproteobacterial (beta-AOB) communities. Initially, sequence similarity for related taxa, evolutionary rates from linear regressions, and the presence of conserved and variable regions were analyzed for all available sequences of the complete amoCAB operon. The gene amoB showed the highest sequence variability of the three amo genes, suggesting that it might be a better molecular marker than the most frequently used amoA to resolve closely related AOB species. To test the suitability of using the amoCAB genes for community studies, a strategy involving nested PCR was employed. Primers to amplify the whole amoCAB operon and each individual gene were tested. The specificity of the products generated was analyzed by denaturing gradient gel electrophoresis, cloning, and sequencing. The fragments obtained showed different grades of sequence identity to amoCAB sequences in the GenBank database. The nested PCR approach provides a possibility to increase the sensitivity of detection of amo genes in samples with low abundance of AOB. It also allows the amplification of the almost complete amoA gene, with about 300 bp more sequence information than the previous approaches. The coupled study of all three amo genes and the intergenic spacer regions that are under different selection pressure might allow a more detailed analysis of the evolutionary processes, which are responsible for the differentiation of AOB communities in different habitats.
Composition of diazotrophic bacterial assemblages in bean-planted soil compared to unplanted soil Junier P, Junier T, Witzel KP, Carú M European Journal of Soil Biology The effect of common bean (Phaseolus vulgaris L.) on the composition of nitrogen fixing bacterial assemblages in soil was studied by comparing planted and unplanted soil. The community composition was studied by terminal restriction fragment length polymorphism (T-RFLP) of the nitrogenase reductase gene (nifH). Principal component analysis (PCA) of T-RFLP profiles showed the separation of profiles from planted and unplanted soil. Terminal restriction fragments (T-RFs) corresponding to rhizobial bacteria were identified preferentially in planted soil; however most nifH T-RFs in soil could not be assigned to T-RFs simulated from a database of known diazotrophs. To specifically study rhizobial bacteria in the soil and nodules, PCR products from the alpha subunit of the nitrogenase enzyme (nifD) were analyzed by denaturing gradient gel electrophoresis (DGGE). DGGE results showed the specific stimulation of the rhizobial microsymbionts in planted soil.
TRiFLe, a Program for In Silico Terminal Restriction Fragment Length Polymorphism Analysis with User-Defined Sequence SetsJunier P, Junier T, Witzel KP Applied and Environmental Microbiology PMID: 18757578 We describe TRiFLe, a freely accessible computer program that generates theoretical terminal restriction fragments (T-RFs) from any user-supplied sequence set tailored to a particular group of organisms, sequences from clone libraries, or sequences from specific genes. The program allows a rapid identification of the most polymorphic enzymes, creates a collection of T-RFs for the data set, and can potentially identify specific T-RFs in T-RF length polymorphism (T-RFLP) patterns by comparing theoretical and experimental results. TRiFLE was used for analyzing T-RFLP data generated for the amoA and pmoA genes. The peaks identified in the T-RFLP patterns show an overlap of ammonia- and methane-oxidizing bacteria in the metalimnion of a subtropical lake.
The cis-acting replication elements define human enterovirus and rhinovirus speciesCordey S, Gerlach D, Junier T, Zdobnov EM, Kaiser L, Tapparel C. RNA. 2008 Aug;14(8):1568-78 PMID: 18541697 Replication of picornaviruses is dependent on VPg uridylylation, which is linked to the presence of the internal cis-acting replication element (cre). Cre are located within the sequence encoding polyprotein, yet at distinct positions as demonstrated for poliovirus and coxsackievirus-B3, cardiovirus, and human rhinovirus (HRV-A and HRV-B), overlapping proteins 2C, VP2, 2A, and VP1, respectively. Here we report a novel distinct cre element located in the VP2 region of the recently reported HRV-A2 species and provide evolutionary evidence of its functionality. We also experimentally interrogated functionality of recently identified HRV-B cre in the 2C region that is orthologous to the human enterovirus (HEV) cre and show that it is dispensable for replication and appears to be a nonfunctional evolutionary relic. In addition, our mutational analysis highlights two amino acids in the 2C protein that are crucial for replication. Remarkably, we conclude that each genetic clade of HRV and HEV is characterized by a unique functional cre element, where evolutionary success of a new genetic lineage seems to be associated with an invention of a novel cre motif and decay of the ancestral one. Therefore, we propose that cre element could be considered as an additional criterion for human rhinovirus and enterovirus classification.
Genome-wide search reveals a novel GacA-regulated small RNA in Pseudomonas species.González N, Heeb S, Valverde C, Kay E, Reimmann C, Junier T, Haas D. BMC Genomics PMID: 18405392 BACKGROUND: Small RNAs (sRNAs) are widespread among bacteria and have diverse regulatory roles. Most of these sRNAs have been discovered by a combination of computational and experimental methods. In Pseudomonas aeruginosa, a ubiquitous Gram-negative bacterium and opportunistic human pathogen, the GacS/GacA two-component system positively controls the transcription of two sRNAs (RsmY, RsmZ), which are crucial for the expression of genes involved in virulence. In the biocontrol bacterium Pseudomonas fluorescens CHA0, three GacA-controlled sRNAs (RsmX, RsmY, RsmZ) regulate the response to oxidative stress and the expression of extracellular products including biocontrol factors. RsmX, RsmY and RsmZ contain multiple unpaired GGA motifs and control the expression of target mRNAs at the translational level, by sequestration of translational repressor proteins of the RsmA family. RESULTS: A combined computational and experimental approach enabled us to identify 14 intergenic regions encoding sRNAs in P. aeruginosa. Eight of these regions encode newly identified sRNAs. The intergenic region 1698 was found to specify a novel GacA-controlled sRNA termed RgsA. GacA regulation appeared to be indirect. In P. fluorescens CHA0, an RgsA homolog was also expressed under positive GacA control. This 120-nt sRNA contained a single GGA motif and, unlike RsmX, RsmY and RsmZ, was unable to derepress translation of the hcnA gene (involved in the biosynthesis of the biocontrol factor hydrogen cyanide), but contributed to the bacterium's resistance to hydrogen peroxide. In both P. aeruginosa and P. fluorescens the stress sigma factor RpoS was essential for RgsA expression. CONCLUSION: The discovery of an additional sRNA expressed under GacA control in two Pseudomonas species highlights the complexity of this global regulatory system and suggests that the mode of action of GacA control may be more elaborate than previously suspected. Our results also confirm that several GGA motifs are required in an sRNA for sequestration of the RsmA protein.
Comparative in silico analysis of PCR primers suited for diagnostics and cloning of ammonia monooxygenase genes from ammonia-oxiJunier P, Kim OS, Molina V, Limburg P, Junier T, Imhoff JF, Witzel KP. FEMS Microbiology Ecology Over recent years, several PCR primers have been described to amplify genes encoding the structural subunits of ammonia monooxygenase (AMO) from ammonia-oxidizing bacteria (AOB). Most of them target amoA, while amoB and amoC have been neglected so far. This study compared the nucleotide sequence of 33 primers that have been used to amplify different regions of the amoCAB operon with alignments of all available sequences in public databases. The advantages and disadvantages of these primers are discussed based on the original description and the spectrum of matching sequences obtained. Additionally, new primers to amplify the almost complete amoCAB operon of AOB belonging to Betaproteobacteria (betaproteobacterial AOB), a primer pair for DGGE analysis of amoA and specific primers for gammaproteobacterial AOB, are also described. The specificity of these new primers was also evaluated using the databases of the sequences created during this study.
Biased distributions and decay of long interspersed nuclear elements in the chicken genome.Abrusán G, Krambeck HJ, Junier T, Giordano J, Warburton PE. Genetics PMID: 17947446 The genomes of birds are much smaller than mammalian genomes, and transposable elements (TEs) make up only 10% of the chicken genome, compared with the 45% of the human genome. To study the mechanisms that constrain the copy numbers of TEs, and as a consequence the genome size of birds, we analyzed the distributions of LINEs (CR1's) and SINEs (MIRs) on the chicken autosomes and Z chromosome. We show that (1) CR1 repeats are longest on the Z chromosome and their length is negatively correlated with the local GC content; (2) the decay of CR1 elements is highly biased, and the 5'-ends of the insertions are lost much faster than their 3'-ends; (3) the GC distribution of CR1 repeats shows a bimodal pattern with repeats enriched in both AT-rich and GC-rich regions of the genome, but the CR1 families show large differences in their GC distribution; and (4) the few MIRs in the chicken are most abundant in regions with intermediate GC content. Our results indicate that the primary mechanism that removes repeats from the chicken genome is ectopic exchange and that the low abundance of repeats in avian genomes is likely to be the consequence of their high recombination rates.
New complete genome sequences of human rhinoviruses shed light on their phylogeny and genomic featuresTapparel C, Junier T, Gerlach D, Cordey S, Van Belle S, Perrin L, Zdobnov EM and Kaiser L BMC Genomics. 2007 Jul 10; 8(1):224 PMID: 17623054 Background
Human rhinoviruses (HRV), the most frequent cause of respiratory infections, include 99 different serotypes segregating into two species, A and B. Rhinoviruses share extensive genomic sequence similarity with enteroviruses and both are part of the picornavirus family. Nevertheless they differ significantly at the phenotypic level. The lack of HRV full-length genome sequences and the absence of analysis comparing picornaviruses at the whole genome level limit our knowledge of the genomic features supporting these differences.
Results
Here we report complete genome sequences of 12 HRV-A and HRV-B serotypes, more than doubling the current number of available HRV sequences. The whole-genome maximum-likelihood phylogenetic analysis suggests that HRV-B and human enteroviruses (HEV) diverged from the last common ancestor after their separation from HRV-A. On the other hand, compared to HEV, HRV-B are more related to HRV-A in the capsid and 3B-C regions. We also identified the presence of a 2C cis-acting replication element (cre) in HRV-B that is not present in HRV-A, and that had been previously characterized only in HEV. In contrast to HEV viruses, HRV-A and HRV-B share also markedly lower GC content along the whole genome length.
Conclusions
Our findings provide basis to speculate about both the biological similarities and the differences (e.g. tissue tropism, temperature adaptation or acid lability) of these three groups of viruses.
mmsearch: a motif arrangement language and search programJunier T, Pagni M, Bucher P. Bioinformatics. 2001 Dec;17(12):1234-5 PMID: 11751236 This paper presents a language for describing arrangements of motifs in biological sequences, and a program that uses the language to find the arrangements in motif match databases. The program does not by itself search for the constituent motifs, and is thus independent of how they are detected, which allows it to use motif match data of various origins. AVAILABILITY: The program can be tested online at http://hits.isb-sib.ch and the distribution is available from ftp://ftp.isrec.isb-sib.ch/pub/software/unix/mmsearch-1.0.tar.gz CONTACT: Thomas.Junier@isrec.unil.ch SUPPLEMENTARY INFORMATION: The full documentation about mmsearchis available from http://hits.isb-sib.ch/~tjunier/mmsearch/doc.
trEST, trGEN and Hits: access to databases of predicted protein sequencesPagni M,Iseli C,Junier T,Falquet L,Jongeneel V,Bucher P Nucleic Acids Res. 2001 Jan 1;29(1):148-51 PMID: 11125074 High throughput genome (HTG) and expressed sequence tag (EST) sequences are currently the most abundant nucleotide sequence classes in the public database. The large volume, high degree of fragmentation and lack of gene structure annotations prevent efficient and effective searches of HTG and EST data for protein sequence homologies by standard search methods. Here, we briefly describe three newly developed resources that should make discovery of interesting genes in these sequence classes easier in the future, especially to biologists not having access to a powerful local bioinformatics environment. trEST and trGEN are regularly regenerated databases of hypothetical protein sequences predicted from EST and HTG sequences, respectively. Hits is a web-based data retrieval and analysis system providing access to precomputed matches between protein sequences (including sequences from trEST and trGEN) and patterns and profiles from Prosite and Pfam. The three resources can be accessed via the Hits home page (http://hits. isb-sib.ch).
Dotlet: diagonal plots in a web browserJunier T, Pagni M Bioinformatics. 2000 Feb;16(2):178-9 PMID: 10842741 Dotlet is a program for comparing sequences by the diagonal plot method. It is designed to be platform-independent and to run in a Web browser, thus enabling the majority of researchers to use it. AVAILABILITY: The applet can be tested at http://www.isrec.isb-sib.ch/java/dotlet/ Dotlet.html, and the source code is available upon request. CONTACT: Thomas.Junier Marco.Pagni @isrec.unil.ch SUPPLEMENTARY: The full documentation about d o t l e t is available from the above URL.
The eukaryotic promoter database (EPD)Perier RC, Junier T, Bonnard C, Bucher P Nucleic Acids Res. 2000 Jan 1;28(1):302-3 PMID: 10592254 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes a description of the initiation site mapping data, exhaustive cross-references to the EMBL nucleotide sequence database, SWISS-PROT, TRANSFAC and other databases, as well as bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. WWW-based interfaces have been developed that enable the user to view EPD entries in different formats, to select and extract promoter sequences according to a variety of criteria, and to navigate to related databases exploiting different cross-references. The EPD web site also features yearly updated base frequency matrices for major eukaryotic promoter elements. EPD can be accessed at http://www.epd.isb-sib.ch
The Eukaryotic Promoter Database (EPD): recent developmentsPerier RC, Junier T, Bonnard C, Bucher P Nucleic Acids Res. 1999 Jan 1;27(1):307-9 PMID: 9847211 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of eukaryotic POL II promoters, for which the transcription start site has been determined experimentally. Access to promoter sequences is provided by pointers to positions in nucleotide sequence entries. The annotation part of an entry includes description of the initiation site mapping data, cross-references to other databases, and bibliographic references. EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets for comparative sequence analysis. Recent efforts have focused on exhaustive cross-referencing to the EMBL nucleotide sequence database, and on the improvement of the WWW-based user interfaces and data retrieval mechanisms. EPD can be accessed at http://www.epd.isb-sib.ch
Evaluation of computer tools for the prediction of transcription factor binding sites on genomic DNARoulot E, Fisch I, Junier T, Bucher P, Mermod N In Silico Biol. 1998;1(1):21-8 PMID: 11471239 Computational molecular biology tools are becoming the method of choice for high throughput screening of newly determined DNA sequences. Such bioinformatic methods indeed offer invaluable tools for the analysis of novel genomic sequences, as they allow for instance the identification of candidate disease-responsible genes [see Rawlings and Searls, 1997, for a review]. Effective DNA sequence analysis demands not only the faithful identification of gene elements and boundaries, but it also requires reliable information on the potential function and regulation of the identified genes. Consequently, powerful software tools are more and more relying on the coupling and integration of various prediction algorithms. Such integrated systems should include devices for the recognition of DNA sequences that act as binding sites for regulatory proteins known as transcription factors. The identification of such sites is not only relevant for locating the promoter as the 5' boundary of a gene, but they may also allow the prediction of a tissue- specific gene-expression pattern and responsiveness to known biological signaling pathways. However, binding sites for sequence-specific DNA-binding transcription factors are typically short and degenerate, and their efficient prediction requires sophisticated computational tools. Databases of promoter and transcription factors have been established [Bucher, 1990; Ghosh, 1993; Wingender et al., 1997], and these compiled data were in turn used for the development of algorithms and program packages for the identification of transcription factor binding sites on DNA
SEView: a Java applet for browsing molecular sequence dataJunier T, Bucher P In Silico Biol. 1998;1(1):13-20 PMID: 11471238 SEView is a Java applet that represents known or predicted elements of a protein or nucleotide sequence. It replaces or supplements the textual format of databases or program output with an interactive, graphical representation that is easily available through a WWW browser. Independence from the source data's format is achieved through a description language and ad hoc translators, which make the system versatile and flexible.
The Eukaryotic Promoter Database EPDCavin Perier R, Junier T, Bucher P Nucleic Acids Res. 1998 Jan 1;26(1):353-7 PMID: 9399872 The Eukaryotic Promoter Database (EPD) is an annotated non-redundant collection of experimentally characterised eukaryotic POL II promoters. The underlying definition of a promoter is that of a transcription initiation site. All information presented in EPD results from an independent evaluation of primary experimental data shown in the biological literature. Sequences flanking transcription initiation sites are indirectly given by pointers to EMBL sequences. The annotation part of a promoter entry includes description of the promoter-defining evidence, cross-references to other databases, and bibliographic references. Being designed as a resource for comparative sequence analysis, EPD is structured in a way that facilitates dynamic extraction of biologically meaningful promoter subsets. The database is available through the World Wide Web at URL http://cmpteam4.unil.ch
|