SUPPLEMENTARY DATA to m/s "Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution"
Protein section figures (PDF)
Synteny divergence trees (PDF)
Chicken - predicted orthologs
We have done systematic similarity searches of predicted chicken ORFs against proteins from Human (V19_34a) and Fugu (V19_2). Searches were done using the Smith-Waterman algorithm (considering only the single longest predicted protein at each locus). We first grouped in-paralogs (recently duplicated genes) by searching for groups of genes that are more similar to each other within a genome than to any gene in any of the other genomes. We then assembled orthologous groups by searching for triangles of reciprocality. Both steps were done using a simulated-annealing approach to first group the higher scoring hits, then the lower scoring hits. Finally, sequences that did not form triangles were allowed to form tuples. After assembling orthologous groups, we pruned them to make sure that all proteins in a group have similarity to each other (to avoid 'domain-walking').Files:
subset.orthologs.two.species.1.1 (1-to-1 Chicken/Human cases) subset.orthologs.three.species.1.1.1 (1-to-1-to-1 Chicken/Human/Fugu cases) subset.orthologs.two.species.n.n (many-to-many Chicken/Human cases) subset.orthologs.three.species.n.n.n (many-to-many Chicken/Human/Fugu cases) orthologs.two.species (all pairwise Chicken/Human orthology) orthologs.three.species (all orthologous Chicken/Human/Fugu groups) chicken_best_homolog_in_fugu.txt chicken_best_homolog_in_human.txtthe latter two files may help for genes which did not end up in any orthologous group. The files give for all chicken genes the best-scoring homolog in both human and fugu. Note that these latter lists go way down into the 'grey-zone', with bitscores as low as 30. Use with caution. Putative chicken losses/oversights (Excel files: by orthology, by homology) Gene-based synteny (distribution of chicken/human and chicken/mouse synteny blocks in chicken chromosomes)
| Human (data) | Mouse (data) | Tetraodon (data) | Fugu.v2 (data) [previous ENSEMBL] | Danio (data) | Human/Mouse (data) | Tetraodon/Fugu (data) | |
|---|---|---|---|---|---|---|---|
| Number of synteny blocks (excluding chrUn and *random) |
968 | 1158 | 1483 | 1627 [1427] | 1191 | 790 | 1483 |
| Average synteny block size counted by the number of distinct mammalian genes spanned | 10 | 8 | 2.8 | 2.8 [2.7] | 2.5 | 18.6 | 3.6 |
| Number of blocks spanning > 20 genes | 110 | 81 | 1 | 1 [0] | 0 | 227 | 4 |
| spanning > 10 genes | 268 | 263 | 12 | 5 [2] | 1 | 389 | 31 |
| spanning < 3 genes | 198 | 263 | 878 | 916 [852] | 819 | 92 | 665 |
| 10 biggest blocks bl - internal ID for synteny blocks used in the data dump; sz - number of genes in synteny block |
| bl | sz | |
| bl | sz | |
| bl | sz | |
| bl | sz | |
| bl | sz | |
| bl | sz | |
| bl | sz | |
| Total number of BRH genes |
11134 |
11062 |
8872 |
9487 [8528] |
8585 |
15501 | 14035 |
| Number of BRH in synteny |
9679 |
9313 | 4224 |
4608 [3890] |
3016 |
14707 | 5348 |
| Retained genomic neighborhood |
87% |
84% |
48% |
48% [45%] |
35% |
95% | 38% |
Supporting data is provided in tab delimited text format. Each synteny block is characterized by gene accession number, chromosome, strand and position (gene start)
for each species (marked 'a' and 'b') with the last column being a synteny block Id (e.g. all orthologous gene pairs with the same block Id define a syntenic region).
Multi-species synteny intersection
chicken-human-mm3_mouse-rat_synteny.txt
chicken-human-mm3_mouse-rat-tetra-fugu-danio_synteny.txt
chicken-human-mouse-fugu_synt.txt
chicken-human-mouse-rat_synt.txt
chicken-human-mouse-danio_synt.txt
chicken-human-mouse-tetra_synt.txt
chicken-human-mouse_synt.txt
chicken-human-rat_synt.txt
chicken-human-tetra_synt.txt
chicken-human-fugu_synt.txt
chicken-human-danio_synt.txt
chicken-mouse-tetra_synt.txt
chicken-mouse-fugu_synt.txt
chicken-mouse-danio_synt.txt
chicken-rat-tetra_synt.txt
chicken-rat-fugu_synt.txt
chicken-rat-danio_synt.txt
chicken-human-mouse-rat-tetra-fugu-danio_synt.txt
Chromosome-level correspondence
In the folowing chromosome mapping tables headers show corresponding chromosome names together with the number of putative orthologous genes in brackets.
Each cell shows the number of orthologs (with the random expectation in brackets) and the number of
syntenic blocks (with the random expectation in brackets) between each pair of chromosomes,
statistically significant similarities are marked by green color and significant dissimilarity by
red color.
| Chicken |
Tetraodon | Mouse |
|
| Human |
mapping |
mapping | mapping |
| Rat |
mapping |
mapping | mapping |
| Mouse |
mapping |
mapping | |
| Tetraodon |
mapping |
Homology-based gene prediction (putative human/mouse/rat orthologs)
hmr.brh:
16124
hmr.brh.fs
hmr.brh.gff
Candidate genes for inclusion in ENSEMBL gene set
list of 562 human genes with putative chicken counterpart (in tab delimited text format)
for which ENSEMBL chicken predictions were not recognized as orthologs by our automated procedure.
FASTA file with 818 possibly improved Ensembl proteins (number in the ID corresponds to Ensembl midbuild transcript IDs).