miROrtho: the catalogue of animal microRNA genes

About miROrtho

miROrtho contains predictions of precursor miRNA genes covering several animal genomes. The pipeline for their prediction uses a Support Vector Machine model, homology and an orthology procedure.


This is release 1.0 (August 2008) of miROrtho covering 46 animal genomes. We provide homology extended alignments of already known miRBase families and putative miRNA families exclusively predicted by our SVM and orthology pipeline. See the statistics here: Statistics

Searching the Database

There are various ways to search the data. You can search for members of a specific family by using their miRBase name (e.g. "mir-iab-4"). To find all the miRNA in one species a keyword representing the species can be supplied (e.g. "Dmel" for Drosophila melanogaster).

Another way to explore the data is to use the browse function. All miRNAs of one species and their corresponding family alignment picture can be recovered by clicking on the corresponding node in the tree.

Using the BLAST Search a FASTA sequence can be used to find a miRNA gene family containing homologs of the searched sequence.

Output Fields

There are 18 fields which provide information about the miRNA gene:
  • Species The 4 letter species name abbreviation
  • Internal ID An unique internal identifier
  • Name The name of the gene based on its miRBase homolog (if there is one)
  • Group ID A unique identifier for the orthologous group (og_id). This field is empty (e.g. "mir-1004") if the family was only found by homology to a known miRBase entry
  • Family The name of the family based on homologous sequences in miRBase
  • miRBase Link to the corresponding family in miRBase
  • Chromosome Chromosome
  • Start The starting position of the pre-miRNA gene
  • End The end position of the pre-miRNA gene
  • Strand Either the forward (+) or the reverse (-) strand on the genome
Human only:
  • UCSC Link to the UCSC genome browser showing the miRBase miRNA track and the species conservation track
  • RNAz Overlap with RNAz predictions from www.ncrna.org (see refs. 1,2,3)
  • QRNA Overlap with QRNA predictions from www.ncrna.org (see refs. 5,6)
  • EvoFold Overlap with EvoFold predictions from www.ncrna.org (see refs. 3,4)
  • RNAmicro Overlap with RNAmicro predictions from www.ncrna.org (see ref. 7)
  • Berezikov Overlap with Berezikov's predictions from www.ncrna.org (see ref. 8)
  • miRRim Overlap with miRRim predictions from www.ncrna.org (see ref. 9)
  • Li Overlap with Li's predictions from www.ncrna.org (see ref. 10)

Getting Data

Below the table a colour-coded alignment shows consistent and compensatory mutations as well as the consensus secondary structure. In addition to this, the following data is provided:
  • Alignment reliability estimation (core index) The alignment is colour-coded according to the core index. This index allows an estimation of the consistency between the alignment and the computed library (computed by MUSCLE, ProbConsRNA, MAFFT and RNAplfold). A higher consistency is an indicator of a more reliable alignment. For more information refer to ref. 17.
  • Groups with same seed Get all groups having the same seed of the mature sequences being part of the precursor miRNA alignment above
  • Alignment Get the alignment
  • Fasta Get the sequences in FASTA format
  • RNAalifold output | RNAalifold (stochastic backtracking) Output of RNAlifold for calculating the consensus secondary structure, as well as a stochastic backtracking of 10 randomly chosen structures from the ensemble of structures
  • RNAstrand output Calculates the strand of a multiple ncRNA sequence alignment

Family/Group Figures

Each family/orthologous group predicted by the SVM is represented by two figures - one showing the alignment the other one showing the conserved secondary structure. The sequences in the alignment figures are ordered according to their position in the phylogenetic tree. The putative mature part for a new group (having only the og_id and an empty mirbase_family field) is based on information content in the alignment. The mature parts for groups having a mirbase_family name are based on their homologous sequences in miRBase.

The colour-code for the alignment was produced by the Vienna Package. The sequence alignment and the corresponding conserved secondary structure are represented in the following way:


The colour hue of the square indicates the number of different consistent nucleotide pairs occurring for a given base pair:

  • red all sequences have the same two nucleotides
  • ocre two types of base pairs occur
  • green three types of base pairs occur
  • turquoise four types of base pairs occur
  • blue five types of base pairs occur
  • violet all six types of base pairs occur
The saturation of the color indicates the number of sequences that are not consistent with the base pair in the sense that they have nucleotides at the relevant positions that do not form one of the six standard RNA base pairs:
  • saturated no inconsistent sequences
  • pale color one sequence in the sample is inconsistent
  • very pale two sequence in the sample are inconsistent
  • invisible more than two sequences in the sample are inconsistent

Species abbreviations

Species Abbreviation Genome assembly
Aedes aegypti Aaeg AaegL1
Anolis carolinensis Acar anoCar1
Anopheles gambiae Agam AgamP3
Apis mellifera Amel Amel_4.0
Bombyx mori Bmor SW_scaffold_ge2k
Bos taurus Btar Btar_3.1
Caenorhabditis elegans Cele WB170
Canis familiaris Cfam CanFam 2.0
Capitella capitata CspI JGI1
Ciona intestinalis Cint JGI2
Culex pipiens Cpip CpipJ1
Danio rerio Drer ZFISH6
Daphnia pulex Dpul Dpul JAZZ 1.0
Drosophila ananassae Dana CAF1
Drosophila erecta Dere CAF1
Drosophila grimshawi Dgri CAF1
Drosophila melanogaster Dmel CAF1
Drosophila mojavensis Dmoj CAF1
Drosophila persimilis Dper CAF1
Drosophila pseudoobscura Dpse CAF1
Drosophila sechellia Dsec CAF1
Drosophila simulans Dsim CAF1
Drosophila virilis Dvir CAF1
Drosophila willistoni Dwil CAF1
Drosophila yakuba Dyak CAF1
Gallus gallus Ggal WASHUC2
Gasterosteus aculeatus Gacu BROAD S1
Helobdella robusta Hrob JGI1
Homo sapiens Hsap NCBI36
Lottia gigantea Lgig JGI1
Macaca mulatta Mmul MMUL_1
Monodelphis domestica Mdom monDom5
Mus musculus Mmus NCBIM36
Nasonia vitripennis Nvit Nvit_1.0
Nematostella vectensis Nvec JGI1
Ornithorhynchus anatinus Oana Oana-5.0
Pan troglodytes Ptro PanTro 2.1
Pediculus humanus Phum PhumU1
Petromyzon marinus Lpet Petromyzon_marinus-3.0
Rattus norvegicus Rnor RGSC 3.4
Schmidtea mediterranea Smed WUSTL v.3.0
Strongylocentrotus purpuratus Surc Spur_v2.1
Takifugu rubripes Trub FUGU4
Tetraodon nigroviridis Tnig TETRAODON7
Tribolium castaneum Tcas Tcas_2.0
Xenopus tropicalis Xtro JGI4.1


  1. Washietl S, Hofacker IL, Lukasser M, Hüttenhofer A, Stadler PF. Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome. Nat Biotechnol. 2005 Nov;23(11):1383-90. PMID: 16273071
  2. Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A. 2005 Feb 15;102(7):2454-9. Epub 2005 Jan 21. PMID: 15665081
  3. Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K. fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res. 2007 Jan;35(Database issue):D145-8. Epub 2006 Nov 11. PMID: 17099231
  4. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Identification and classification of conserved RNA secondary structures in the human genome. PLoS Comput Biol. 2006 Apr;2(4):e33. Epub 2006 Apr 21. PMID: 16628248
  5. Babak T, Blencowe BJ, Hughes TR. A systematic search for new mammalian noncoding RNAs indicates little conserved intergenic transcription. BMC Genomics. 2005 Aug 5;6:104. PMID: 16083503
  6. Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2001;2:8. Epub 2001 Oct 10. PMID: 11801179
  7. Hertel J, Stadler PF. Hairpins in a Haystack: recognizing microRNA precursors in comparative genomics data. Bioinformatics. 2006 Jul 15;22(14):e197-202. PMID: 16873472
  8. Berezikov E, Guryev V, van de Belt J, Wienholds E, Plasterk RH, Cuppen E. Phylogenetic shadowing and computational identification of human microRNA genes. Cell. 2005 Jan 14;120(1):21-4. PMID: 15652478
  9. Terai G, Komori T, Asai K, Kin T. miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity. RNA. 2007 Dec;13(12):2081-90. Epub 2007 Oct 24. PMID: 17959929
  10. Li SC, Pan CY, Lin WC. Bioinformatic discovery of microRNA precursors from human ESTs and introns. BMC Genomics. 2006 Jul 3;7:164. PMID: 16813663
  11. Wilm A, Higgins DG, Notredame C. R-Coffee: a method for multiple alignment of non-coding RNA. Nucleic Acids Res. 2008 May;36(9):e52. Epub 2008 Apr 17. PMID: 18420654
  12. Hofacker IL, et al. Fast Folding and Comparison of RNA Secondary Structures. Monatsh.Chem. 125: 167-188 (1994).
  13. Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL. The Vienna RNA websuite. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W70-4. Epub 2008 Apr 19. PMID: 18424795
  14. Reiche K, Stadler PF. RNAstrand: reading direction of structured RNAs in multiple sequence alignments. Algorithms Mol Biol. 2007 May 31;2:6. PMID: 17540014
  15. Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2008 Jan;36(Database issue):D154-8. Epub 2007 Nov 8. PMID: 17991681
  16. Clamp M, Cuff J, Searle SM, Barton GJ. The Jalview Java alignment editor. Bioinformatics. 2004 Feb 12;20(3):426-7. Epub 2004 Jan 22. PMID: 17991681
  17. http://www.tcoffee.org/Publications/Pdf/core.pp.pdf
  18. Comments and Questions

    evgeny email evgeny email