################################################## # Silke Werth's R Code for haploid data analysis # ################################################## Silke Werth Swiss Federal Research Institute WSL Zürcherstr. 111 CH-8903 Birmensdorf This folder contains a few files with functions which may be useful for analysing genetic distance and genetic diversity of haploid organisms. "genetic_dist_source_v3.R" This file contains a function written by Silke Werth with which one may calculate a few genetic distances and related measures: (1) $Fst.std, Latter's Fst, standardised (Takezaki & Nei 1996)(contains a bug, values > 1 appear) (2) $Fst, Latter's Fst (Takesaki & Nei 1996) (3) $D.Nei.Std, Nei's standard genetic distance (Takezaki & Nei 1996) (my formula contains a bug!) (4) $D.Nei.Min, Nei's minimum genetic distance (Takezaki & Nei 1996) (5) $D.Nei.Da , Nei's Da distance (Takezaki & Nei 1996) (6) $GeneDiversity, Nei's (1972) gene diversity (this is _not_ identical to Nei's (1978) unbiased gene diversity! Use the unbiased estimate instead, which is in the other file.) (7) $J <- J #average within (Jx) and between (Jxy) population homozygosity "unbiased.gene.diversity.Nei1978_source-v2.R" This file contains a function written by Silke Werth which calculates Nei's (1978) unbiased gene diversity H "Code_resampling.R" This is a function written by Helene H. Wagner, WSL. This function enables you to randomly draw one individual per tree to subsequently calculate gene diversity and related measures for coping with unbalanced sampling design at tree level. If you exchange "TreeID" by another variable, e.g. population ("Pop"), it will draw one individual per population. "2pop.txt" This is a data set containing number of repeat units for 6 microsatellite loci in Lobaria pulmonaria from the Alps in southern Germany (LPu03, LPu09, LPu15, LPu16a, LPu20a, LPu27a), plus the allelic state of an ITS-indel ("Insert"), plus tree coordinates (x,y), plus sample name (Sample), plus population affiliation (Pop), plus a variable indicating the tree affiliation (TreeID). This file may help you to get used to the abovementioned functions. "Code_unbiased_GeneDiversityH_Nei1978_v.2.R" This R-Code shows you how you may use the function in "unbiased.gene.diversity.Nei1978_source-v2.R" for calculating Nei's unbiased gene diversity with or without resampling. "Linkage_index of association.R" This R-Code allows you to calculate chi2-tests (Sokal & Rohlf 2001), testing for the significance of pairwise association of loci in a population. As an index of association, Cramér's V2 (Agresti 1984), and the Tschuproff contingency coefficient T (Legendre & Legendre 1998) are given. "Code_C.M.G.R" This R-Code allows you to calculate C, the minimum number of colonisation events to found a population (=number of allels at the most variable locus, Walser et al. 2003). Furthermore, you may calculate G, the number of multilocus genotypes per population and M, the percentage of multilocus genotypes (Walser et al. 2004). "diploidise.it_source_v1.R" This R-Code allows you to diploidise haploid data from an object containing haploid genotypes at one or several loci, where each column is a locus and each row contains an individual. Diploidised genotype data may be used in population genetic approaches based on allele frequency, where they give the same result as haploid data would have given. Also calculations based on maximum likelihood estimators are no problem. But with Bayesian estimators (like in Geneland, Structure, BAPS) the diploidiosed dataset will give a trade-off between the haploid Bayesian estimator and the haploid maximum likelihood estimator. As the dataset is diploidised (n*2), in Bayesian analyses, the uncertainty of parameters will be underestimated. #Example calculation: a) produce genotypes for two individuals by the following code: Genotypes <- matrix(NA, 2,3) dimnames(Genotypes) <- list(c(1:2),c("Locus1","Locus2","Locus3")) Genotypes[1, ]<- c(12,13,1) Genotypes[2, ]<- c(14,13,2) source("O://Genetik//MA//Silke//RCode//DiploidiseIt//diploidise.it_source_v1.R") #select the pathname of the source code #execute formula, and retrieve diploidised genotypes diploidised.genotypes <- diploidise.it(Genotypes) diploidised.genotypes References: Agresti, A. (1984) Analysis of ordinal categorical data. Wiley, New York, p. 23-24 Legendre & Legendre (1998) Numerical ecology, Elsevier, Amsterdam. p.221 Nei, M (1972) American Naturalist 106: 283-292. Nei, M (1978) Genetics 89: 583-590. Sokal & Rohlf (2001) Biometry. Freeman, New York. Takezaki & Nei (1996) Genetics 144:389-399. Walser, J. C., Sperisen, C., Soliva, M. & Scheidegger, C. (2003) Fungal Genetics and Biology 40, 72-82. Walser, J. C., Gugerli, F., Holderegger, R., Kuonen, D. & Scheidegger, C. (2004) Heredity 93, 322-329. ##################################################################################### PS From June 1st 2005 and onwards, pls. direct upcoming questions to swerth"at"gmx.de #####################################################################################