Thu, 02/09/2012 - 11:44
OrthoBlock Data Generation
1) Data Sources
- Arthropod genome GFF files: 45 Arthropods: Arachnida, Myriapoda, Crustacea, Hemimetabola, Hymenoptera, Coleoptera, Lepidoptera & Diptera retrieved from AphidBase, BeetleBase, FlyBase, Hymenoptera Genome Database, SilkDB, VectorBase, wFleaBase and various consortia.
- Pairwise species comparison: To identify synteny blocks between two genomes, we used GFF files from genome databases to extract information of gene order. Orthology information was extracted from OrthoDB6. These information was stored in a MySQL database, and we applied SQL-based procedures to generate pairwise synteny blocks.
- Species set generation: For each phylogenetic node with at least 5 species, sets of 5 selected species which represent both clades of the node are generated. The species set distance is calculated for each species set based on the branch lengths of the species tree.
- N-wise projection: For each species set, N-wise projection was performed on pairwise synteny block data to generate 5-way synteny blocks.
OrthoBlock Query Page
OrthoBlock can be queried based on gene or orthologous group.
- Gene: To search based on gene, the user must input an identifier or keyword of the gene(s). Identifiers in UniProtKB, Ensembl, AphidBase, BeetleBase, FlyBase, Hymenoptera Genome Database, SilkDB, VectorBase, wFleaBase, GenBank, RefSeq, InterPro, Gene Ontology, etc. can be used to query OrthoBlock, as well as annotation keywords such as protein/gene names, descriptions and phenotypes. The user can select a phylogenetic node (only if the input ID is the gene ID from source databases, e.g. FBgn0260642, CPIJ004297, etc.) to narrow the query if there is specific interest.
- Orthologous group: To search based on orthologous group, the user must input an ID of orthologous groups in OrthoDB6 (current version, ID with format of "EOG6XXXXX"). Optionally, the user can choose a reference species as the fixed member in the 5-species set (meaning the result synteny block must presents in the reference species).
2) Species Tree (The OrthoDB Hierarchy)
The species phylogenetic tree is computed using the multiple alignment of concatenated single-copy orthologs identified by OrthoDB. The nodes numbered are the nodes containing at least 5 species and can be selected as input "Phylogenetic Node".
3) Best Synteny Block Selection Criteria
Among all the synteny blocks identified for different species sets at certain phylogenetic node, we display only the best synteny block as the result of a query. We provide three different criteria for selecting the best synteny block:
- Maximize score: By default, the best synteny block has the maximum score (the product of "Synteny block size" and "Species set distance").
- Maximize synteny block size: The best synteny block is the longest, i.e. has the most orthologous groups inside it.
- Maximize species set distance: The best synteny block is the block identified in the most distant 5-species set.
OrthoBlock Result Page
1) Query by gene
- If the input ID is a valid gene identifier from the source genome databases, and a phylogenetic node is specified by user, the result page will directly show the best synteny block found if there is any.
- If the input ID is a valid gene identifier from the source genome databases, and no phylogenetic node is specified by user, the result page will show all the phylogenetic nodes with synteny block identified, then user can select a node to view the best block in that node.
- If user query by other IDs or keywords, the result page will show a list of mapped genes and phylogenetic nodes with synteny if there is any.
2) Query by orthologous group
- The result page will directly show the best synteny block found containing the queried orthologous group.
Show Tree & Show Help
User can view the phylogenetic tree or the help page at any time by just clicking on the buttons in the upper right corner of the pages.
Please email Jia Li: jia.li [at] unige.ch
This work by E Zdobnov lab is licensed under a Creative Commons Attribution 3.0 Unported License.