2. Material and methods

↓27

2.1. In the molecular lab

2.1.1. Genomic DNA isolation

Blood samples of a female short-beaked echidna (“Annie”), Tachyglossus aculeatus, were obtained from the Toronto Zoo, and stored 1:2 in Lysis buffer (Shaw et al. 2003). Genomic DNA was extracted using a DNeasy Blood and Tissue Kit (Quiagen, Cat No.69504). Because the blood sample was very viscous, it was further diluted 1:4 in AL buffer, which comes with the kit. Contrary to the instructions in the manual, 200 µl blood was adjusted with 80 µl PBS and 120 µl AL buffer. The rest of the procedure was carried out according to the manual instructions. All three elutions were visualized on a 1% agarose gel and elution 1, which showed the clearest band, was used for further procedures (Fig. 10).

Figure 10. 1% agarose gel showing all three elutions and two DNA ladders.

2.1.2. Genome-walking PCR 

↓28

Elution 1 was used to establish a genome walker library using a Universal GenomeWalker™ Kit (Clontech, Cat. No.K1807-1) (Fig. 11). In short, four genomic libraries with a size of around 4000 base pairs were created by blunt-end digestion with the four restriction enzymes Eco R, Dra I, Pvu II, and Ssp I (Fig. 11). Restriction digests were phenol-chloroform purified and ligated to GenomeWalker Adaptors using T4 DNA Ligase (Fermentas, Cat. No.EL0011) (Fig. 11). 

Figure 11. Establishing a genome walker library

(Universal GenomeWalker™ Kit User Manual, 2000).

↓29

First round hot-start PCR was carried out using 1 µl of each genomic library as follows: an initial 1 min denaturation at 95°C followed by 7 cycles of denaturation at 94°C for 25 sec and primer annealing at 72°C for 3 min; another 32 cycles of denaturation at 94°C for 25 sec and primer annealing at 67°C for 3 min; product extension was at 67°C for 7 min. PCR products were generated using the adaptor primers (AP1 and AP2) from the GenomeWalker kit, as well as degenerate PCR primers 1, 2, and 3 obtained from Davies et al. (2007) and self-designed degenerate primers (Tab. 1).

Table 1. Self-designed degenerate primers used in first round hot-start PCR. 

Sequence name

Sequence 5’ to 3’

gw_91F

TAC CTG GCA GAG CCA TGG CAG TAC TCG GTC

gw_182F

ACG TCA CCA TCC AGC ACA AGA AMINO ACIDC TCC GCA

gw_121R

GAC CGA GTA CTG CCA TGG CTC TGC CAG GTA

gw_212R

TGC GGA GTT TCT TGT GCT GGA TGG TGA CGT

PCR products were visualized on a 1% agarose gel, fragments of interest were cut out and ligated into the pJet1.2 vector following the “sticky-end ligation” protocol from the CloneJet PCR Cloning Kit (Fermentas, Cat. No.1231). Each construct was then transformed into α-Select Silver Competent Cells (Bioline, Cat. No.85025). Cell transformation mixture was spread onto LB agar plates containing ampicillin. Three clones each were screened with a screening PCR using EconoTaq DNA Polymerase (Lucigen, Cat. No.30031-1) under following conditions: an initial denaturation of 95°C for 3 min, 30 cycles of 94°C for 1 min, 54°C for 1 min, 72°C for 1 min, and 72°C for 7 min. PCR products were visualized on a 1% agarose gel, and PCR products that had the correct band size were sequenced on a 3130XL Genetic Analyzer (Applied Biosystems) using standard T7 and pIRES primers.

↓30

2.1.3. Gene synthesis and site-directed mutagenesis 

Gene synthesis can be assessed via PCR of several fragments (Chang et al. 2007). In this study, rhodopsin protein-coding sequences of the echidna and three inferred ancestral pigments were synthesised by Geneart AG, Regensburg, Germany (www.geneart.com). In order to account for different codon usage biases (i.e. preferential use of certain DNA codons over others that code for the same amino acid) in different monophyletic groups (Sharp et al. 1988), the three hypothetical gene sequences were optimized for expression in mammalian cells. Since the echidna rhodopsin gene came from a living animal, it was not modified.

Using the echidna construct as template, coding sequences of echidna mutants at sites 158 and 169 were generated by site-directed mutagenesis according to the Quick-change method (www.stratagene.com). Sites 158 and 169 were chosen to be mutated because they are both unique to the echidna (Tab. 4) and located at interesting sites within the 3D structure of rhodopsin (Borhan et al. 2000) (Fig. 8). These sites were mutated to the condition in bovine, i.e. T158A and F169A.

↓31

For creating these mutants, a PCR was performed using a Pfu Polymerase (Fermentas, Cat. No.EP0501) and specific primers (Tab. 2) under following conditions: an initial denaturation

of 95°C for 1 min; 13 cycles of 95°C for 30 s, 55°C for 1 min, 68°C for 4 min; before a final extension of 37°C for 60 min, 1 μl of DpnI was added to each reaction in order to destroy methylated, nascent DNA derived from E.coli.

Table 2. Primers used in site-directed mutagenesis PCR in order to create echidna mutants T158A and F169A. 

Sequence name

Sequence 5’ to 3’

EcRho_T158A_s

CAT GCC ATC ATG GGT GTG GCC TTC ACT TGG ATC ATG GCC

EcRho_T158A_as

GGC CAT GAT CCA AGT GAA GGC CAC ACC CAT GAT GGC ATG

EcRho_F169A_s

CCC TGG CCT GTG CCG CGC CCC CAC TCG TTG G

EcRho_F169A_as

CCA ACG AGT GGG GGC GCG GCA CAG GCC AGG G

2.1.4. An adequate expression vector

↓32

All constructs were delivered by Geneart AG in a custom pMA vector. After transformation into α-Select Silver Competent Cells, purifications of all four plasmid DNAs were prepared with a Plasmid Maxi Kit (Quiagen, Cat. No. 12169), according to the instructions. First, the pMA vector was digested with EcoRI and BamHI restriction enzymes and 10x buffer (Fermentas), each construct was then glycogen precipitated, and ligated into the p1D4 expression vector (Morrow and Chang 2010), thereby tagged with eight amino acids (ETSQVAPA) at the carboxy terminus to allow for later purification of expressed proteins from HEK293 cells (Oprian et al. 1991). These amino acids correspond exactly to the carboxy terminus of bovine rhodopsin and are known to be the epitope for the monoclonal antibody rho 1D4 (Molday and MacKenzie 1983, MacKenzie et al. 1984). These constructs were again transformed, screened, sequenced, stored in 30% glycerol at -80°C, and finally purified according to the Plasmid Maxi kit instructions.

2.1.5. Protein expression 

In order to express the various rod visual pigments, HEK293 cells were transfected with 8 μg/plate Lipofectamine 2000 (Invitrogen, Cat No.11668-019) and 24.8 μg/plate DNA. After 48 hours, cells were harvested according to a modified protocol from Starace and Knox (1998) with 1x PBS (Sigma-Aldrich) and 10 μg/ml aprotinin and leupeptin, incubated with 4 μM 11-cis retinal (R.K. Crouch, Medical University of South Carolina and the National Eye Institute, National Institutes of Health, USA) in the dark for 2-3 hrs at 4°C, and solubilized for 3-4 hours at 4°C in 50 mM Tris (pH 6.8), 100mM MaCl, 1mM CaCl2, 0.1 mM PMSF (all

Sigma-Aldrich), and 1% DM (Anatrace). After immunoaffinity purification following a modified protocol from Chang et al. (2002a) using the 1D4 monoclonal antibody, the extracted pigments were washed several times with 50 mM Tris (pH 6.8), 0.1% DM, 100mM NaCl, and 50 mM NaPhos (pH 6.5), finally eluted by elution buffer (0.1% DM, 50 mM NaPhos (pH 6.5), and 0.18 mg/ml 1D4 peptide (University of British Columbia, Canada)) for 2-3 hours, and subjected to spectrophotometry.

2.1.6. Western blot 

↓33

In order to confirm that the correct protein had indeed been expressed, the first step was to separate the protein in the extract of the host cell tissue by PAGE (Wong 2006). The resolved protein bands in the gel were then transferred to a membrane by a technique called western blot, and subjected to immunological detection (Wong 2006).

Harvested protein lysates were resolved on a SDS-polyacrylamide gel (BioRad, Cat. No.161-1100EDU) at 20 mA for around 1 hr. Proteins were electroblotted onto a polyvinylidene fluoride (PVDF) membrane (Pal Corporation) at 50 V for 1 hr. Membranes were blocked in 1% TBS, 0.05% Tween, and 3% dry milk (all Sigma-Aldrich), and were washed in 1% TBS and 0.05% Tween. Afterwards, they were incubated with 0.2 μg/ml mouse 1D4 monoclonal antibody (GE Healthcare, Cat No.NA931) in 1% TBS, 0.05% Tween, and 3% dry milk for 2 hrs. After washing, they were incubated with 0.2 μg/ml sheep anti-mouse antibody linked to horseradish peroxidase (GE Healthcare, Cat No.NA931) for 1 hr. After final washes, membranes were developed using an ECL Plus Western Blotting Detection System (GE Healthcare, Cat No.RPN2132).

2.1.7. Spectrophotometry 

↓34

The characteristic wavelength at which a visual pigment absorbs light (λmax) is regulated by opsin-chromophore interactions (Sakmar et al. 1989).

A spectrophotometer is used to measure not only the amount of light that a sample absorbs but also at what characteristic wavelength. The instrument operates by passing a beam of light through a sample and measuring the intensity of light reaching a detector. 

Here, all absorption spectra, including the ones during hydroxylamine and acid assays, were taken with a Cary4000 Spectrophotometer (Varian Inc.) at 25°C, using a temperature control. Spectra were recorded continuously between 560 nm-250 nm, with a scan rate of 400 nm/min,

↓35

average time 0.1 sec, data interval of 0.667 nm, integration time 0.12 sec, and slit width 2 nm. Pigments were photoexcited with light from a fiber optic lamp for 60 sec. Dark spectra were curve fitted following Govardovskii’s method (Govardovskii et al. 2000).

Meta II decay assays were carried out on a CaryEclipse Fluorescence Spectrophotometer (Varian Inc.), with excitation at 295 nm and emission at 330 nm. Excitation slit width was 1.5 nm and emission slit width 10 nm. Data was collected every 30 sec, with an average time of 2 sec.

2.1.8. Functional assays: acid bleach, hydroxylamine sensitivity, and meta II decay rate 

Nowadays, various functional and biochemical assays have been developed in order to characterise the different types of visual pigments and to elucidate differences between rod and cone opsins (Kito et al. 1968, Shichida et al. 1994, Starace and Knox 1998, Imai et al. 2005, Imai et al. 2007, Sakurai et al. 2007).

↓36

In this study, three assays characterising each expressed rhodopsin were performed.

For the first functional assay, the acid bleach, successfully expressed pigments were treated with freshly prepared hydrochloric acid (HCl) such that they were at a final concentration of 2 M in 130 μl sample. Samples were kept in the dark, and the temperature was maintained at 25ºC. After the addition of HCl, absorption spectra were taken every 2-5 minutes.

If pigments react to hydrochloric acid, the Schiff base linkage between opsin and 11-cis retinal will break off and the absorption peak will shift to reach a plateau at 440 nm, which is the characteristic λmax of a protonated Schiff base 11-cis retinal free in solution (Kito et al. 1968).

↓37

In addition, the molar extinction coefficient of a visual pigment can be estimated using this method. The molar extinction coefficient is a measurement of how strongly a substance absorbs light at a given wavelength. It can be determined by the Lambert-Beer law

A = ε * c * l (in M-1 cm-1)

with A being the actual absorbance, ε the extinction coefficient, c the concentration, and l the path length. Based on the molar extinction coefficient, the concentration of a protein in solution can also be estimated.

↓38

The molar extinction coefficient of 11-cis retinal bound to a denatured opsin is known to be 30 800 M-1 cm-1 (Starace and Knox 1998). Following the formula

ε = εret * (Abs λmax / Abs λ440 nm)

extinction coefficients for all expressed rhodopsins were determined.

↓39

Second, hydroxylamine assays were performed. Hydroxylamine (NH2OH) is a chemical compound that is remarkably close in structure to ammonia and differs only by an additional hydroxyl, which gives it basic properties (Fig. 12). It competes with the 11-cis retinal for rhodopsin at the Schiff base linkage at Lys296 (Kawamura and Yokoyama 1998). If it enters the chromophore binding pocket, it forms a retinal oxime with 11-cis retinal, thereby relinquishing the rhodopsin, i.e. the apoprotein (Kawamura and Yokoyama 1998). This oxime absorbs light at 363 nm (Kawamura and Yokoyama 1998). 

Testing the sensitivity to hydroxylamine has been used in previous studies to distinguish rod opsins from cone opsins, since this reaction is substantially faster in cone opsins (Wald et al. 1955, Fager and Fager 1981, Okano et al. 1989, Wang et al. 1992, Starace and Knox 1998).

Figure 12. Structural formula of hydroxylamine.

↓40

http://de.academic.ru/pictures/dewiki/72/Hydroxylamine-2D.png.

Freshly prepared hydroxylamine in PBS was added to samples with a concentration around 0.007-0.01 μM such that the final concentration of hydroxylamine was 1 M in 130 μl sample. Samples were kept in the dark, and the temperature was maintained at 25ºC. After the addition of hydroxylamine, absorption spectra were recorded every 2-3 minutes for the first 30 min and then every 30 min for another 90 min. At the end of the experiment, the rhodopsin was exposed to light.

Curves were fitted in SigmaPlot 11 using the nonlinear regression

↓41

f = y0 + a (1 - e-bx )

which is a first order ‘Exponential Rise to Maximum’ equation with 3 parameters.

Third, meta II decay rate analyses were carried out. After photoisomerization of 11-cis retinal, rhodopsin passes through a series of photoproducts, which show different characteristic absorption maxima (Fig. 13) (Weitz and Nathans 1993, Imai et al. 2005, Kuwayama et al. 2005, Palczewski 2006, Sugawara et al. 2010).

↓42

Meta II is the key state for catalyzing the transducin GDP-GTP exchange (Fig. 13) (Weitz and Nathans 1993, Imai et al. 2005), and one of the fastest photochemical reactions known in biology (Palczewski 2006). One single molecule of photoexcited rhodopsin activates hundreds copies of transducin (Sagoo and Lagnado 1997, Menon et al. 2001).

Meta II is the active state of rhodopsin, in which the original Schiff base is intact but deprotonated, and has its absorption peak at 380 nm (Sakmar et al. 2002, Heck et al. 2003). In its ground state of rhodopsin, there is a quenching of an intrinsic Tryptophan fluorescence in the ground state of rhodopsin (Farrens and Khorana 1995). After photoexcitation and after the chromophore leaves the binding pocket, this intrinsic Tryptophan fluorescence is not quenched anymore and a rise in absorbance at 380 nm can be detected (Fig. 13) (Farrens and Khorana 1995, Schädel et al. 2003).

Upon decay, meta II converts via meta III to opsin in the correctly folded form without all-trans retinal and, subsequently, binds fresh 11-cis retinal (Fig. 13) (Sakamoto and Khorana 1995, Heck et al. 2003, Palczewski 2006). Its decay rate is much faster in cones than in rods (Shichida et al. 1994, Sakurai et al. 2007).

↓43

Samples, which were at a concentration of around 0.007-0.01 μM, were kept in the dark, and the temperature was maintained at 25ºC. After 5 minutes, samples were bleached with a fiber optic lamp for 60 sec, and recordings were taken every 30 sec for 30-40 min. Curves were fitted in SigmaPlot 11 using the nonlinear regression

f = y0  + a (1 - e-bx)

which is a first order 'Exponential Rise To Maximum' equation with 3 parameters.

↓44

Figure 13. Reaction scheme of rhodopsin photoproducts

(Yan et al. 2003).

2.2. Maximum likelihood analyses

2.2.1. PAML

Ancestral sequence reconstructions and selective constraint analyses were carried out with the program PAML 4 (Yang 2007). PAML is a package of programs that phylogenetically analyses DNA and protein sequences using maximum likelihood (Yang 2007). Its strength lies in the many sophisticated substitution models that help to understand the process of sequence evolution (Yang 2007). Maximum likelihood analyses in PAML start with an alignment of extant gene sequences, a tree describing their phylogenetic relationships, and a specified statistical model of evolution (Yang 2007, Hanson-Smith et al. 2010).

↓45

2.2.2. The dataset

The protein-coding rhodopsin sequence of the short-beaked echidna was included in an alignment together with 25 other tetrapod rhodopsin sequences downloaded from the GenBank database at NCBI (Tab. 3). The protein-coding sequence of the snake rhodopsin was kindly provided by the Chang Lab (Toronto). Sequences were aligned using MEGA 4 (Tamura et al. 2007) and checked by eye. Premature stop codons were removed from all sequences prior to the analysis. For genomic DNA, intron-exon boundaries were identified by comparison with published cDNA sequences. All sequences show intact ORFs, suggesting the genes are functional (Table 4). Amino acid positions mentioned throughout the text are numbered according to bovine rhodopsin (Palczewski et al. 2000). A tetrapod phylogeny was established manually, based on accepted literature (Fig. 14) (Bininda-Emonds et al. 2007, Meredith et al. Murphy et al. 2007, Wible et al. 2007, Asher and Helgen 2010). Taxa were sampled from a broad range of tetrapods, with only one or two representatives from closely related species being chosen in order to maximize the divergence. The amount of 27 sequences was considered reasonable, since it has been suggested that more taxa are not necessarily better for reconstructing ancestral states (Li et al. 2008). As required by PAML 4, the tree is unrooted with coelacanth and lungfish considered as outgroups.

The data acquisition was carried out in close collaboration with Jingjing Du (Toronto).

↓46

Table 3. Accession numbers of all sequences which were downloaded from NCBI and used in this study. 

Species name

Common name

NCBI accession numbers

Alligator mississippiensis

American alligator

U23802.1

Ambystoma tigrinum

Tiger salamander

U36574.1

Anolis carolinensis

Green anole

L31503.1

Bos taurus

Cattle

NM_001014890.1

Bufo bufo

European toad

U59921.1

Caluromys philander

Fat-tailed dunnart

AY159786.2

Canis lupus familiaris

Dog

NM_001008276.1

Cavia porcellus

Guinea pig

EF457995

Cricetulus griseus

Chinese hamster

X61084.1

Felis catus

Domestic cat

NM_001009242.1

Gallus gallus domesticus

Chicken

NM_001030606.1

Homo sapiens

Human

NM_000539.2

Latimeria chalumnae

Coelacanth

AF131256.1

Loxodonta africana

African elephant

AY686752.1

Macaca fascicularis

Rhesus macaque

XM_001094250.1

Neoceratodus forsteri

Australian lungfish

EF526295

Ornithorhynchus anatinus

Platypus

EF050076.1

Oryctolagus cuniculus

European rabbit

NM_001082349.1

Otolemur crassicaudata

Galago

AB112594.2

Rana temporaria

European common frog

U59920.1

Sminthopsis crassicaudata

Bare-tailed woolly opossum

AY313946.1

Sus scrofa

Wild boar

NM_214221.1

Trichechus manatus

West-Indian manatee

AF055319.1

Ursus maritimus

Polar bear

AY883926.1

Uta stansburiana

Common side-blotched lizard

DQ100323.1

Table 4. Alignment of rhodopsin amino acid sequences used in this study.

0

1

2

3

4

5

Coelacanth

MNGTEGPNFY

VPMSNKTGVV

RNPFEYPQYY

LADPWKYSAL

AAYMFFLILV

GFPINFLTLF

Lungfish

MNGTEGPNFY

VPMTNKTGVV

RSPFEYPQYY

LADPWKYSAL

AAYMFFLILT

GFPINFLTLY

Frog

MNGTEGPNFY

IPMSNKTGVV

RSPFEYPQYY

LAEPWKYSIL

AAYMFLLILL

GFPINFMTLY

Toad

MNGTEGPNFY

IPMSNKTGVV

RSPFEYPQYY

LAEPWQYSIL

CAYMFLLILL

GFPINFMTLY

Salamander

MNGTEGPNFY

VPFSNKSGVV

RSPFEYPQYY

LAEPWQYSVL

AAYMFLLILL

GFPVNFLTLY

Snake

MNGTEGLNFY

IPMSNKTGIV

RSPFEYPQYY

LADPWQYSAL

AAYMFLLILL

GFPINFLTLY

Anole

MNGTEGQNFY

VPMSNKTGVV

RNPFEYPQYY

LADPWQFSAL

AAYMFLLILL

GFPINFLTLF

Lizard

MNGTEGQNFY

IPMSNKTGVV

RSPFEYPQYY

LADPWQFSAL

AAYMFLLILL

GFPINFLTLF

Alligator

MNGTEGPDFY

IPFSNKTGVV

RSPFEYPQYY

LAEPWKYSAL

AAYMFMLIIL

GFPINFLTLY

Chicken

MNGTEGQDFY

VPMSNKTGVV

RSPFEYPQYY

LAEPWKFSAL

AAYMFMLILL

GFPVNFLTLY

Platypus

MNGTEGQDFY

IPMSNKTGVV

RSPFEYPQYY

LAEPWQYSVL

AAYMFMLIML

GFPINFLTLY

Echidna

MNGTEGQDFY

IPMSNKTGIV

RSPFEYPQYY

LAEPWQYSVL

AAYMFMLIML

GFPINFLTLY

Opossum

MNGTEGPNFY

VPFSNKTGVV

RSPFEEPQYY

LAEPWQFSCL

AAYMFMLIVL

GFPINFLTLY

Dunnart

MNGTEGPNFY

VPYSNKSGVV

RSPYEEPQYY

LAEPWMFSCL

AAYMFMLIVL

GFPINFLTLY

Elephant

MNGTEGPNFY

VPFSNKTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Manatee

MNGTEGPNFY

VPFSNKTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Pig

MNGTEGPNFY

VPFSNKTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFMLIVL

GFPINFLTLY

Cattle

MNGTEGPNFY

VPFSNKTGVV

RSPFEAPQYY

LAEPWQFSML

AAYMFLLIML

GFPINFLTLY

Cat

MNGTEGPNFY

VPFSNKTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Bear

----------

-----?TGVV

RSPFESPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Dog

MNGTEGPNFY

VPFSNKTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Hamster

MNGTEGPNFY

VPFSNATGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Guinea pig

MNGTEGENFY

IPFSNATGVV

RSPFEYPQYY

LAEPWQFSIL

AAYMFMLIVL

GFPINFLTLY

Rabbit

MNGTEGPDFY

IPMSNQTGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Galago

MNGTEGPNFY

VPFSNATGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFMLIVL

GFPINFLTLY

Macaque

MNGTEGPNFY

VPFSNATGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

Human

MNGTEGPNFY

VPFSNATGVV

RSPFEYPQYY

LAEPWQFSML

AAYMFLLIVL

GFPINFLTLY

1

1

6

7

8

9

0

1

Coelacanth

VTIQHKKLRT

PLNYILLDLA

VADLCMVFGG

FFVTMYSSMN

GYFVLGPTGC

NIEGFFATLG

Lungfish

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTMYTAMN

GYFVFGVVGC

NLEGFFATFG

Frog

VTIQHKKLRT

PLNYILLNLA

FANHFMVLCG

FTITLYTSLH

GYFVFGQSGC

YFEGFFATLG

Toad

VTIQHKKLRT

PLNYILLNLA

FANHFMVLCG

FTVTMYSSMN

GYFILGATGC

YVEGFFATLG

Salamander

VTIQHKKLRT

PLNYILLNLA

FANHFMVFGG

FPVTMYSSMH

GYFVFGQTGC

YIEGFFATMG

Snake

VTIQHKKLRT

PLNYILLNLA

VANLFMVLVG

FTTTMYTSMN

GYFIFGTVGC

NVEGFFATLG

Anole

VTIQHKKLRT

PLNYILLNLA

VANLFMVLMG

FTTTMYTSMN

GYFIFGTVGC

NIEGFFATLG

Lizard

VTIQHKKLRT

PLNYILLNLA

IANLFMVLIG

FTTTMYTSMN

GYFIFGTIGC

SIEGFFATLG

Alligator

VTVQHKKLRS

PLNYILLNLA

VADLFMVLGG

FTTTLYTSMN

GYFVFGVTGC

YFEGFFATLG

Chicken

VTIQHKKLRT

PLNYILLNLV

VADLFMVFGG

FTTTMYTSMN

GYFVFGVTGC

YIEGFFATLG

Platypus

VTIQHKKLRT

PLNYILLNLA

FANHFMVLGG

FTTTLYTSLH

GYFVFGPTGC

NIEGFFATLG

Echidna

VTIQHKKLRT

PLNYILLNLA

FANHFMVLGG

FTTTLYTSLH

GYFVFGPTGC

NIEGFFATLG

Opossum

VTIQHK???T

PLNYILLNLA

IADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

DLEGFFATLG

Dunnart

VTIQHKKLRT

PLNYILLNLA

VADLFMVICG

FTTTLVTSLN

GYFVFGTTGC

LVEGFFATTG

Elephant

VTVQHKNVRT

PLNYILLNLA

VANHFMVFGG

FTTTLYTSLH

GYFVFGSTGC

NLEGFFATLG

Manatee

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NVEGFFATLG

Pig

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Cattle

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Cat

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Bear

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Hamster

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Dog

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NVEGFFATLG

Guinea pig

VTVQHKKLRT

PLNYILLNLA

VANLFMVLGG

FTTTLYTSMN

GYFVFGPTGC

NLEGFFATLG

Rabbit

VTVQHKKLRT

PLNYILLNLA

VADLFMVLGG

FTTTLYTSLH

GYFVFGPTGC

NVEGFFATLG

Galago

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NLEGFFATLG

Macaque

VTVQHKKLRT

PLNYILLNLA

VADLFMVFGG

FTTTLYTSLH

GYFVFGPTGC

NAEGFFATLG

Human

VTVQHKKLRT

PLNYILLNLA

VADLFMVLGG

FTSTLYTSLH

GYFVFGPTGC

NLEGFFATLG

↓47

1

1

1

1

1

1

2

3

4

5

6

7

Coelacanth

GQVALWALVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVIFT

WIMALSCAVP

PLFGWSRYIP

Lungfish

GIIALWCLVV

LAIERYIVVC

KPISNFRFGE

NHAIMGVVFT

WIMALACAGP

PLFGWSRYIP

Frog

GEIALWSLVA

LAIERYIVVC

KPMSNFRFGE

NHAMMGVAFT

WIMALACAVP

PLFGWSRYIP

Toad

GEIALWSLVV

LAIERYVVVC

KPMSNFRFSE

NHAVMGVAFT

WIMALSCAVP

PLLGWSRYIP

Salamander

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVMMT

WIMALACAAP

PLFGWSRYIP

Snake

GEIALWSLVI

LAVERYVVVC

KPMSNFRFTQ

THAIIGVSLT

WIMALACAVP

PLIGWSRYIP

Anole

GEMGLWSLVV

LAVERYVVIC

KPMSNFRFGE

THALIGVSCT

WIMALACAGP

PLLGWSRYIP

Lizard

GEIALWSLVV

LAVERYVVVC

KPMSNFRFSE

THAIIGVGFT

WIMALACAGP

PLLGWSRYIP

Alligator

GEVALWCLVV

LAIERYIVVC

KPMSNFRFGE

NHAIMGVVFT

WIMALTCAAP

PLVGWSRYIP

Chicken

GEIALWSLVV

LAVERYVVVC

KPMSNFRFGE

NHAIMGVAFS

WIMAMACAAP

PLFGWSRYIP

Platypus

GEIALWSLVV

LAIERYIVVC

KPMSNFRFGE

NHAIMGVAFT

WIMALACALP

PLVGWSRYIP

Echidna

GEIALWSLVV

LAIERYIVVC

KPMSNFRFGE

NHAIMGVTFT

WIMALACAFP

PLVGWSRYIP

Opossum

GEIALWSLVV

LAIERYIVXC

KXMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLVGWSRYIP

Dunnart

GEVALWALVV

LAIERYIVVC

KPMSNFRFGE

NHAIMGVAFT

WIMALACSVP

PIFGWSRYIP

Elephant

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLVGWSRYIP

Manatee

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLAGWSRYIP

Pig

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGLALT

WVMALACAAP

PLVGWSRYIP

Cattle

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLVGWSRYIP

Cat

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLVGWSRYIP

Bear

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLVGWSRYIP

Dog

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLAGWSRYIP

Hamster

GEIALWSLVV

LAIERYVVIC

KPMSNFRFGE

NHAIMGVVFT

WIMALACAAP

PLVGWSRYIP

Guinea pig

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVVFT

WIMALACAAP

PLVGWSRYIP

Rabbit

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WIMALACAAP

PLVGWSRYIP

Galago

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGLVFT

WIMALACAAP

PLVGWSRYIP

Macaque

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLFGWSRYIP

Human

GEIALWSLVV

LAIERYVVVC

KPMSNFRFGE

NHAIMGVAFT

WVMALACAAP

PLAGWSRYIP

1

1

2

2

2

2

8

9

0

1

2

3

Coelacanth

EGMQSSCGVD

YYTLKPEVNN

ESFVIYMFVV

HFTIPLIVIF

FCYGRLVCTV

KDAAAQQQES

Lungfish

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFIV

HFTIPLIIIF

FCYGRLMCTV

KEAAAQQQES

Frog

EGMQCSCGVD

YYTLKPEINN

ESFVIYMFVV

HFLIPLIIIT

FCYGRLVCTV

KEAAAQQQES

Toad

EGMQCSCGVD

YYTLKPEVNN

ESFVIYMFVV

HFTIPLIIIF

FCYGRLVCTV

KEAAAQQQES

Salamander

EGMQCSCGVD

YYTLKPEVNN

ESFVIYMFLV

HFTIPLMIIF

FCYGRLVCTV

KEAAAQQQES

Snake

EGMQSSCGVD

YYTPTPEVHN

ESFVIYMFLV

HFVIPLTVIF

FCYGRLICTV

KEAAAQQQES

Anole

EGMQCSCGVD

YYTPTPEVHN

ESFVIYMFLV

HFVTPLTIIF

FCYGRLVCTV

KAAAAQQQES

Lizard

EGMQCSCGVD

YYTPNPEVHN

ESFVIYMFLV

HFVTPLTIIF

FCYGRLLCTV

KAAAAQQQES

Alligator

EGMQCSCGVD

YYTLKPEVNN

ESFVIYMFVV

HFAIPLAVIF

FCYGRLVCTV

KEAAAQQQES

Chicken

EGMQCSCGID

YYTLKPEINN

ESFVIYMFVV

HFMIPLAVIF

FCYGNLVCTV

KEAAAQQQES

Platypus

EGMQCSCGID

YYTLRPEVNN

ESFVIYMFVV

HFTIPMTIIF

FCYGRLVFTV

KEAAAQQQES

Echidna

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMTIIF

FCYGRLVFTV

KEAAAQQQES

Opossum

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMVVIF

FCYGQLVFTV

KEAAAQQQES

Dunnart

EGMQCSCGID

YYTLNPEFNN

ESFVIYMFVV

HFIIPLTVIF

FCYGQLVFTV

KEAAAQQQES

Elephant

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMTIIF

FCYGQLVFTV

KEAAAQQQES

Manatee

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIVIF

FCYGQLVFTV

KEAAAQQQES

Pig

EGLQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFSIPLVIIF

FCYGQLVFTV

KEAAAQQQES

Cattle

EGMQCSCGID

YYTPHEETNN

ESFVIYMFVV

HFIIPLIVIF

FCYGQLVFTV

KEAAAQQQES

Cat

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIVIF

FCYGQLVFTV

KEAAAQQQES

Bear

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIVIF

FCYGQLVFTV

KEAAAQQQES

Dog

EGMQCSCGID

YYTLKPEINN

ESFVIYMFVV

HFAIPMIVIF

FCYGQLVFTV

KEAAAQQQES

Hamster

EGMQCSCGVD

YYTLKPEVNN

ESFVIYMFVV

HFTIPLIVIF

FCYGQLVFTV

KEAAAQQQES

Guinea pig

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIIIF

FCYGQLVFTV

KEAAAQQQES

Rabbit

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPLIIIF

FCYGQLVFTV

KEAAAQQQES

Galago

EGMQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFFIPLFVIF

FCYGQLVFTV

KEAAAQQQES

Macaque

EGLQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIVIF

FCYGQLVFTV

KEARAQQQES

Human

EGLQCSCGID

YYTLKPEVNN

ESFVIYMFVV

HFTIPMIIIF

FCYGQLVFTV

KEAAAQQQES

2

2

2

2

2

2

4

5

6

7

8

9

Coelacanth

ATTQKAEKEV

TRMVIVMVIS

FLVCWVPYAS

VAAYIFFNQG

SEFGPVFMTA

PSFFAKSASF

Lungfish

ATTQKAEKEV

TRMVYIMVIS

YLVCWLPYAS

VSFYIFTHQG

SDFGPVFMTV

PAFFAKTASV

Frog

ATTQKAEKEV

TRMVIIMVIF

FLICWVPYAY

VAFYIFCNQG

SEFGPIFMTV

PAFFAKSSAI

Toad

ATTQKAEKEV

TRMVIIMVVF

FLICWVPYAS

VAFFIFSNQG

SEFGPIFMTV

PAFFAKSSSI

Salamander

ATTQKAEKEV

TRMVIIMVVA

FLICWVPYAS

VAFYIFSNQG

TDFGPIFMTV

PAFFAKSSAI

Snake

ATTQKAEKEV

TRMVILMVIA

FLICWVPYAS

VAFYIFTHQG

SDFGPVFMTI

PSFFAKSSAI

Anole

ATTQKAEREV

TRMVVIMVIS

FLVCWVPYAS

VAFYIFTHQG

SDFGPVFMTI

PAFFAKSSAI

Lizard

ATTQKAEREV

TRMVILMVIS

FLICWVPYAS

VAFYIFTHQG

SDFGPVFMTI

PAFFAKSSAI

Alligator

ATTQKAEKEV

TRMVIIMVVS

FLICWVPYAS

VAFYIFSNQG

SDFGPVFMTI

PAFFAKSSAI

Chicken

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTNQG

SDFGPIFMTI

PAFFAKSSAI

Platypus

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTV

PAFFAKSSAI

Echidna

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTA

PAFFAKSSAI

Opossum

ATTQKAEKEV

TRMVIIMVIA

FLICWLPYAG

VAFYIFTHQG

SNFGPILMTL

PAFFAKTSAV

Dunnart

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SDFGPIFMTL

PAFFAKSSSI

Elephant

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SDFGPILMTL

PAFFAKSSAI

Manatee

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTL

PAFFAKSASI

Pig

ATTQKAEKEV

TRMVIIMVVA

FLICWLPYAS

VAFYIFTHQG

SDFGPIFMTI

PAFFAKSASI

Cattle

ATTQKAEKEV

TRMVIIMVIA

FLICWLPYAG

VAFYIFTHQG

SDFGPIFMTI

PAFFAKTSAV

Cat

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTL

PAFFAKSSSI

Bear

ATTQKAEKEV

TRMVIIMVIA

FLICWLPYAG

VAFYIFTHQG

SNFGPIFMTL

PAFFAKSSSI

Dog

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SDFGPIFMTL

PAFFAKSSSI

Hamster

ATTQKAEKEV

TRMVILMVVF

FLICWFPYAG

VAFYIFTHQG

SNFGPIFMTL

PAFFAKSSSI

Guinea pig

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAAYIFTHQG

SNFGPIFMTV

PAFFAKSSSI

Rabbit

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTI

PAFFAKSSSI

Galago

ATTQKAEKEV

TRMVIIMVIA

FLICWLPYAG

VAFYIFTHQG

SNFGPIFMTL

PAFFAKTASI

Macaque

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTI

PAFFAKSASI

Human

ATTQKAEKEV

TRMVIIMVIA

FLICWVPYAS

VAFYIFTHQG

SNFGPIFMTI

PAFFAKSAAI

3

3

3

3

3

3

0

1

2

3

4

5

Coelacanth

YNPVIYILLN

KQFRNCMITT

LCCGKNPFGD

EDATSAAGSS

KTEASSVSSS

SVSPA

Lungfish

YNPVIYILMN

KQFRNCMITT

LCCGKNPFGD

EETTSA-GTS

KTEASSVSSS

QVSPA

Frog

YNPVIYIMLN

KQFRNCMITT

LCCGKNPFGD

DDASSAA-TS

KTEATSVSTS

QVSPA

Toad

YNPVIYIMLN

KQFRNCMITT

LCCGKNPFGE

DDASSAA-TS

KTEASSVSSS

QVSPA

Salamander

YNPVIYIVLN

KQFRNCMITT

ICCGKNPFGD

DETTSAA-TS

KTEASSVSSS

QVSPA

Snake

YNPVIYIVMN

KQFRNCMLTT

LCCGKNPLAE

DDTSAG---T

KTETSTVSTS

QVSPA

Anole

YNPVIYILMN

KQFRNCMIMT

LCCGKNPLGD

EETSAG---T

KTETSTVSTS

QVSPA

Lizard

YNPVIYILMN

KQFRNCMIMT

LCCGKNPLAE

EDTSAG---T

KTETSTVSTS

QVSPA

Alligator

YNPVIYIVMN

KQFRNCMITT

LCCGKNPLGD

DETATG---S

KTETSSVSTS

QVSPA

Chicken

YNPVIYIVMN

KQFRNCMITT

LCCGKNPLGD

EDTSAG----

KTETSSVSTS

QVSPA

Platypus

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASATA--S

KTEQSSVSTS

QVSPA

Echidna

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASATA--S

KTEQSSVSTS

QVSPA

Opossum

YNPVIYIMLN

KQFRTCMLTT

LCCGKIPLGD

DEASATA--S

KTETSQVA--

---PA

Dunnart

YNPVIYIMMN

KQFRNCMITT

LCCGKNPLGD

DEASTTA--S

KTETSQVA--

---PA

Elephant

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPFGE

EEGSTTA--S

KTETSQVA--

---PA

Manatee

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPFAE

EEGATTV--S

KTETSQVA--

---PA

Pig

YNPVIYIMMN

KQFRNCMLTT

LCCGKNPLGD

DEASTTT--S

KTETSQVA--

---PA

Cattle

YNPVIYIMMN

KQFRNCMVTT

LCCGKNPLGD

DEASTTV--S

KTETSQVA--

---PA

Cat

YNPVIYIMMN

KQFRNCMLTT

LCCGKNPLGD

DEASTTG--S

KTETSQVA--

---PA

Bear

YNPVIYIMMN

KQFRNCMITT

LCCGKNPLGD

DEASASA--?

----------

-----

Dog

YNPVIYIMMN

KQFRNCMITT

LCCGKNPLGD

DEASASA--S

KTETSQVA--

---PA

Hamster

YNPVIYIMMN

KQFRNCMLTT

LCCGKNILGD

DEASATA--S

KTETSQVA--

---PA

Guinea pig

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASTTV--S

KTETSQVA--

---PA

Rabbit

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASATA--S

KTETSQVA--

---PA

Galago

YNPVIYIMMN

KQFRTCMITT

LCCGKNPLGD

DEASTTA--S

KTETSQVA--

---PA

Macaque

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASATV--S

KTETSQVA--

---PA

Human

YNPVIYIMMN

KQFRNCMLTT

ICCGKNPLGD

DEASATV--S

KTETSQVA--

---PA

Figure 14. Tetrapod phylogeny used in this study, with coelacanth and lungfish as outgroups. 

Nodes for which ancestral rhodopsin sequences were reconstructed are indicated by a red star.

2.2.3. Selective constraint analyses

2.2.3.1. Introduction

↓48

ω = dN/dS is the measure of natural selection acting at protein level, with values of ω < 1, = 1, > 1 indicating purifying selection, neutral evolution (i.e. no selection), and positive selection, respectively (Kimura 1983). In order to investigate the dN/dS ratio in different amniote branches, the CODEML program in PAML 4 was used (Yang 2007).

In order to detect positive selection, two different codon models were used: branch models and branch-site models. Codon substitution models, compared to nucleotide or amino acid substitution models, consider the codon triplet as unit of evolution (Goldman and Yang 1994). They account for transition/transversion rate bias, ω ratio, and equilibrium frequency of codons (Goldman and Yang 1994).

Branch models allow some branches in a given phylogeny to have dN/dS values estimated seperately from the rest of the tree and are useful for detecting positive selection acting on particular lineages (Yang 1998, Yang and Nielsen 1998). However, positive selection acts on sites rather than branches. If there are a lot of sites changing along a branch, their signals together would be very strong and positive selection is more likely to be detected along that branch using branch models. However, if only a few sites experience selection, their signal might be overruled along that branch. Thus, branch-site models that allow the dN/dS ratio to vary among both branches and sites are also implemented to account for the positive selection at only a few sites (Yang and Nielsen 2002, Yang et al. 2005, Zhang et al. 2005).

↓49

Branch and branch-site models require an a priori specification of the foreground branches, i.e. the branches of interest, and which have their own ω estimated (Yang 1998, Yang and Nielsen 1998). Background branches comprise all other branches in the phylogeny and have only one ω ratio estimated. 

Statistical significance is assessed by using likelihood ratio tests (LRTs) comparing nested statistical models (Yang 2007).

2.2.3.2. Likelihood ratio test

A likelihood ratio test determines the feasibility of any tree for which the maximum likelihood can be computed (Navidi et al. 1991). For nested models, the alternative model with additional parameters (p1) should fit the data better than the null model (p0), as judged by the likelihood score of each model (l0 and l1) (Chang et al. 2002b, Yang 2007). If the null model is true, the difference in fit to data can be approximated by a χ2 distribution, with degrees of freedom (d.f.) being equal to the number of parameters (p1 – p0) between the two models (Chang et al. 2002b, Yang 2007). So, the test statistic 2Δl can be compared with that χ2 distribution to test whether the null model is rejected against the alternative model (Chang et al. 2002, Yang 2007).

↓50

2 * Δ * l = χ2

In general, positive selection is detected if, both, ω is bigger than 1 in the alternative model and if the LRTs show significance.

2.2.3.3. Branch models

For assessing selective constraint acting on branches of interest, two-ratio models were used in this study (Yang 1998). For these models, there are no standard names. Names here were conceived.

↓51

One alternative model (MB2a) was compared to two null models (MB1n, also known as M0, and MB2n) (Tab. 5). The alternative model MB2a estimates separate background (ω0) and foreground (ω1) ratios. In the first null model MB1n, the foreground branch was set to have the same dN/dS ratio estimated as the background branches (ω0 = ω1). By comparison with the alternative model, together with significant LRTs, it is estimated if the foreground ω of a pre-specified branch is significantly different from the background ω. If ω1 were estimated to be greater than 1 while ω0 is smaller than 1, this would indicate either relaxed purifying or positive selection acting on the foreground branch (Yang 1998).

The second null model MB2n estimates a background ratio and the foreground ratio is constrained to 1. The comparison of this model with the alternative model, accompanied by significant LRTs, indicates if ω1 is significantly different from 1, i.e. either bigger or smaller than 1. Further, only if ω1 was estimated to be bigger 1, this would indicate positive selection (Yang 1998).

Table 5. Parameters of branch models used in this study.

Model

Background ω0

Foreground ω1

Alternative model MB2a

ω0 

ω1

First null model MB1n

ω0

ω1 = ω0

Second null model MB2n

ω0 

ω1 = 1

2.2.3.4. Branch-site models

↓52

Branch-site models assume that the ω ratio varies among codon sites, and that there are four site classes in the sequence, each having their own estimated ω (Yang and Nielsen 2002, Zhang et al. 2005). Here, branch-site model A was used (Zhang et al. 2005). The names are standard.

Again, we have one alternative model (MA) and two null models (M1a and MA1) (Tab. 6). In the alternative model MA, site class ω0 is free to vary, but restricted to be smaller than 1,

which represents purifying selection. In site lass ω1, sites are fixed to 1, representing neutral selection. ω2a and ω2b are set to be bigger than or equal to 1.

↓53

In the first null model M1a, there are only two site classes. ω0 is free to vary, but restricted to be smaller than 1. ω1 is fixed to 1. Site classes ω2a and ω2b are not considered. By comparison with the alternative model, one identifies sites which have elevated ω ratios and whether these sites experienced relaxed purifying or positive selection.

The second null model MA1 has four site classes. ω0 is free to vary, as long as it is bigger than 1, which represents purifying selection. ω1 is fixed to 1 and represents neutral evolution. ω2a and ω2b are fixed to 1. Thus, comparing MA1 and MA identifies whether the sites in ω2a and ω2b classes of the MA model have ω ratios significantly bigger than 1 and if positive selection is acting.

If the likelihood ratio test suggests that some sites, i.e. codons, estimated by the branch-site model, are under positive selection, the Bayes Empirical Bayes (BEB) method is used to calculate the posterior probability that each site is from a particular site class (Yang et al. 2005, Yang 2007). Sites with high posterior probabilities that come from the class where ω > 1 are likely to be under positive selection (Yang et al. 2005). Here, posterior probabilities with a p-value greater than 0.95 were considered reliable.

↓54

Table 6. Parameters of branch-site models used in this study.

Model

Site class

ω

Alternative model MA

0

0 < ω0 < 1

1

ω1 = 1

2a

ω2a >= 1

2b

ω2b >= 1

First null model M1a

0

0 < ω0 < 1

1

ω1 = 1

Second null model MA1

0

0 < ω0 < 1

1

ω1 = 1

2a

ω2a = 1

2b

ω2b = 1

2.2.4. Ancestral sequence reconstruction

Using information of present-day sequences, nucleotide and amino acid sequences of extinct ancestors can be reconstructed in PAML 4 (Yang 2007). The likelihood approach uses branch lengths and the substitution pattern for ancestral reconstruction (Yang et al. 1995). It starts with an alignment of extant sequences, a phylogeny relating those sequences, and a statistical model of evolution, and calculates the likelihood of each possible ancestral state given that sequence, tree, and model (Smith et al. 2010). The maximum likelihood ancestral state is the state with the highest likelihood (Smith et al. 2010).

There are two, fairly similar approaches, i.e. marginal and joint reconstruction (Yang 2007). The marginal approach assigns a single character state to a single node in the tree (Koshi and Goldstein 1996, Yang et al. 2005), whereas the joint reconstruction assigns a set of character states to all ancestral nodes in the tree (Pupko et al. 2000). The marginal approach is more suitable and often used when a gene or protein sequence in an extinct ancestor is sought after (Chang et al. 2002a, Thornton 2004). It is also the default setting in PAML 4. Hence, it was also used in this study.

↓55

Ancestral reconstruction can be conducted under nucleotide, amino acid, and codon-based models (Yang 2007). Here, different (codon and amino acid) models were first compared with each other to ascertain their consistency. For codon models, branch-site model MA, with Theria marked as foreground, and site model M3 were found to be most consistent; in amino acid models it was JTT+gamma distribution. Generally, site model M3 fits most data better than branch and branch-site models (Yang 2007). First, MA and M3 models were compared with each other. Whenever amino acids differed, model JTT+gamma distribution was always consistent with model M3. Also, sites that differ were never BEB sites, except for site 218, which has a low posterior probability anyways. Also, model M3 has a much higher likelihood and less parameters than model MA.


© Die inhaltliche Zusammenstellung und Aufmachung dieser Publikation sowie die elektronische Verarbeitung sind urheberrechtlich geschützt. Jede Verwertung, die nicht ausdrücklich vom Urheberrechtsgesetz zugelassen ist, bedarf der vorherigen Zustimmung. Das gilt insbesondere für die Vervielfältigung, die Bearbeitung und Einspeicherung und Verarbeitung in elektronische Systeme.
DiML DTD Version 4.0Zertifizierter Dokumentenserver
der Humboldt-Universität zu Berlin
HTML-Version erstellt am:
28.09.2011