Chicken

SUPPLEMENTARY DATA to m/s "Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution"

Protein section figures (PDF)

Synteny divergence trees (PDF)

Chicken - predicted orthologs

We have done systematic similarity searches of predicted chicken ORFs against proteins from Human (V19_34a) and Fugu (V19_2). Searches were done using the Smith-Waterman algorithm (considering only the single longest predicted protein at each locus). We first grouped in-paralogs (recently duplicated genes) by searching for groups of genes that are more similar to each other within a genome than to any gene in any of the other genomes. We then assembled orthologous groups by searching for triangles of reciprocality. Both steps were done using a simulated-annealing approach to first group the higher scoring hits, then the lower scoring hits. Finally, sequences that did not form triangles were allowed to form tuples. After assembling orthologous groups, we pruned them to make sure that all proteins in a group have similarity to each other (to avoid 'domain-walking').

Files:

subset.orthologs.two.species.1.1 (1-to-1 Chicken/Human cases) subset.orthologs.three.species.1.1.1 (1-to-1-to-1 Chicken/Human/Fugu cases) subset.orthologs.two.species.n.n (many-to-many Chicken/Human cases) subset.orthologs.three.species.n.n.n (many-to-many Chicken/Human/Fugu cases) orthologs.two.species (all pairwise Chicken/Human orthology) orthologs.three.species (all orthologous Chicken/Human/Fugu groups) chicken_best_homolog_in_fugu.txt chicken_best_homolog_in_human.txt
the latter two files may help for genes which did not end up in any orthologous group. The files give for all chicken genes the best-scoring homolog in both human and fugu. Note that these latter lists go way down into the 'grey-zone', with bitscores as low as 30. Use with caution. Putative chicken losses/oversights (Excel files: by orthology, by homology) Gene-based synteny  (distribution of chicken/human and chicken/mouse synteny blocks in chicken chromosomes)
Human (data) Mouse (data) Tetraodon (data) Fugu.v2 (data) [previous ENSEMBL] Danio (data) Human/Mouse (data) Tetraodon/Fugu (data)
Number of synteny blocks
(excluding chrUn and *random)
968 1158 1483 1627 [1427] 1191 790 1483
Average synteny block size counted by the number of distinct mammalian genes spanned 10 8 2.8 2.8 [2.7] 2.5 18.6 3.6
Number of blocks spanning > 20 genes 110 81 1 1 [0] 0 227 4
spanning > 10 genes 268 263 12 5 [2] 1 389 31
spanning < 3 genes 198 263 878 916 [852] 819 92 665
10 biggest blocks
bl - internal ID for synteny blocks used in the data dump;
sz - number of genes in synteny block
| bl  | sz  |
+-----+-----+
| 41 | 166 |
| 873 | 151 |
| 249 | 128 |
| 156 | 100 |
| 630 | 75 |
| 884 | 70 |
| 118 | 65 |
| 526 | 61 |
| 289 | 61 |
| 97 | 61 |
| bl  | sz |
+-----+----+
| 805 | 66 |
| 110 | 63 |
| 433 | 56 |
| 771 | 56 |
| 578 | 56 |
| 820 | 56 |
| 111 | 56 |
| 773 | 55 |
| 332 | 48 |
| 221 | 46 |
| bl   | sz |
+------+----+
| 646 | 29 |
| 74 | 16 |
| 442 | 16 |
| 89 | 14 |
| 119 | 12 |
| 218 | 12 |
| 50 | 11 |
| 1056 | 11 |
| 309 | 11 |
| 580 | 11 |
| bl   | sz |
+------+----+
| 9 | 33 |
| 668 | 12 |
| 840 | 11 |
| 544 | 11 |
| 228 | 11 |
| 575 | 10 |
| 711 | 10 |
| 1522 | 10 |
| 1165 | 9 |
| 1201 | 9 |
| bl   | sz |
+------+----+
| 21 | 11 |
| 1000 | 10 |
| 365 | 9 |
| 686 | 9 |
| 468 | 9 |
| 571 | 9 |
| 714 | 9 |
| 436 | 8 |
| 1037 | 8 |
| 607 | 8 |
| bl  | sz  |
+-----+-----+
| 339 | 182 |
| 469 | 180 |
| 260 | 150 |
| 243 | 148 |
| 257 | 127 |
| 335 | 126 |
| 735 | 123 |
| 483 | 120 |
| 144 | 116 |
| 1 | 116 |
| bl   | sz |
+------+----+
| 506 | 57 |
| 4 | 30 |
| 739 | 25 |
| 466 | 22 |
| 542 | 18 |
| 1280 | 18 |
| 76 | 17 |
| 474 | 16 |
| 1129 | 16 |
| 1359 | 15 |
Total number of BRH genes
11134
11062
8872
9487  [8528]
8585
15501
14035
Number of BRH in synteny
9679
9313 4224
4608  [3890]
3016
14707
5348
Retained genomic neighborhood
87%
84%
48%
48%  [45%]
35%
95%
38%

Supporting data is provided in tab delimited text format. Each synteny block is characterized by gene accession number, chromosome, strand and position (gene start)
for each species (marked 'a' and 'b') with the last column being a synteny block Id (e.g. all orthologous gene pairs with the same block Id define a syntenic region).

Multi-species synteny intersection
chicken-human-mm3_mouse-rat_synteny.txt
chicken-human-mm3_mouse-rat-tetra-fugu-danio_synteny.txt

chicken-human-mouse-fugu_synt.txt
chicken-human-mouse-rat_synt.txt
chicken-human-mouse-danio_synt.txt
chicken-human-mouse-tetra_synt.txt
chicken-human-mouse_synt.txt
chicken-human-rat_synt.txt
chicken-human-tetra_synt.txt
chicken-human-fugu_synt.txt
chicken-human-danio_synt.txt
chicken-mouse-tetra_synt.txt
chicken-mouse-fugu_synt.txt
chicken-mouse-danio_synt.txt
chicken-rat-tetra_synt.txt
chicken-rat-fugu_synt.txt
chicken-rat-danio_synt.txt
chicken-human-mouse-rat-tetra-fugu-danio_synt.txt

Chromosome-level correspondence
In the folowing chromosome mapping tables headers show corresponding chromosome names together with the number of putative orthologous genes in brackets.
Each cell shows the number of orthologs (with the random expectation in brackets) and the number of
syntenic blocks (with the random expectation in brackets) between each pair of chromosomes,
statistically significant similarities are marked by green color and significant dissimilarity by red color.


Chicken
Tetraodon
Mouse
Human
mapping
mapping
mapping
Rat
mapping
mapping
mapping
Mouse
mapping
mapping

Tetraodon
mapping




Homology-based gene prediction (putative human/mouse/rat orthologs)
hmr.brh:             16124
hmr.brh.fs
hmr.brh.gff

Candidate genes for inclusion in ENSEMBL gene set
list of 562 human genes with putative chicken counterpart (in tab delimited text format)
for which ENSEMBL chicken predictions were not recognized as orthologs by our automated procedure.

Candidate proteins with higher similarity to human/fugu than corresponding ENSEMBL prediction
FASTA file with 818 possibly improved Ensembl proteins
(number in the ID corresponds to Ensembl midbuild transcript IDs).