##################################################
# Silke Werth's R Code for haploid data analysis #
##################################################
Silke Werth
Swiss Federal Research Institute WSL
Zürcherstr. 111
CH-8903 Birmensdorf

This folder contains a few files with functions which may be 
useful for analysing genetic distance and genetic diversity of 
haploid organisms.  

"genetic_dist_source_v3.R"
This file contains a function written by Silke Werth with which one may calculate a few genetic 
distances and related measures:
(1) $Fst.std, Latter's Fst, standardised (Takezaki & Nei 1996)(contains a bug, values > 1 appear)
(2) $Fst, Latter's Fst (Takesaki & Nei 1996)
(3) $D.Nei.Std, Nei's standard genetic distance (Takezaki & Nei 1996) (my formula contains a bug!)
(4) $D.Nei.Min, Nei's minimum genetic distance (Takezaki & Nei 1996)
(5) $D.Nei.Da , Nei's Da distance (Takezaki & Nei 1996)
(6) $GeneDiversity, Nei's (1972) gene diversity (this is _not_ identical to 
Nei's (1978) unbiased gene diversity! Use the unbiased estimate instead, which is in the other file.)
(7) $J <- J #average within (Jx) and between (Jxy) population homozygosity

"unbiased.gene.diversity.Nei1978_source-v2.R"
This file contains a function written by Silke Werth which calculates Nei's (1978) 
unbiased gene diversity H

"Code_resampling.R"
This is a function written by Helene H. Wagner, WSL. This function enables you to
randomly draw one individual per tree to subsequently calculate gene diversity
and related measures for coping with unbalanced sampling design at tree level.
If you exchange "TreeID" by another variable, e.g. population ("Pop"), it will draw one 
individual per population. 

"2pop.txt"
This is a data set containing number of repeat units for 6 microsatellite loci in Lobaria
pulmonaria from the Alps in southern Germany (LPu03, LPu09, LPu15, LPu16a, LPu20a, LPu27a), plus the 
allelic state of an ITS-indel ("Insert"), plus tree coordinates (x,y), plus sample name (Sample), 
plus population affiliation (Pop), plus a variable indicating the tree affiliation (TreeID).  
This file may help you to get used to the abovementioned functions.

"Code_unbiased_GeneDiversityH_Nei1978_v.2.R"
This R-Code shows you how you may use the function in "unbiased.gene.diversity.Nei1978_source-v2.R"
for calculating Nei's unbiased gene diversity with or without resampling. 

"Linkage_index of association.R"
This R-Code allows you to calculate chi2-tests (Sokal & Rohlf 2001), testing for the significance of 
pairwise association of loci in a population. As an index of association, Cramér's V2 (Agresti 1984), 
and the Tschuproff contingency coefficient T (Legendre & Legendre 1998) are given.

"Code_C.M.G.R"
This R-Code allows you to calculate C, the minimum number of colonisation events to found a population
(=number of allels at the most variable locus, Walser et al. 2003). Furthermore, you may calculate G, 
the number of multilocus genotypes per population and M, the percentage of multilocus genotypes 
(Walser et al. 2004).

"diploidise.it_source_v1.R"
This R-Code allows you to diploidise haploid data from an object containing haploid genotypes at 
one or several loci, where each column is a locus and each row contains an individual.
Diploidised genotype data may be used in population genetic approaches based on allele frequency, where 
they give the same result as haploid data would have given. Also calculations based on maximum likelihood 
estimators are no problem. But with Bayesian estimators (like in Geneland, Structure, BAPS) 
the diploidiosed dataset will give a trade-off between the haploid Bayesian estimator and the haploid 
maximum likelihood estimator. As the dataset is diploidised (n*2), in Bayesian analyses, the uncertainty
of parameters will be underestimated.
#Example calculation: a) produce genotypes for two individuals by the following code:
Genotypes <- matrix(NA, 2,3)
dimnames(Genotypes) <- list(c(1:2),c("Locus1","Locus2","Locus3"))
Genotypes[1, ]<- c(12,13,1)
Genotypes[2, ]<- c(14,13,2) 
source("O://Genetik//MA//Silke//RCode//DiploidiseIt//diploidise.it_source_v1.R") 
#select the pathname of the source code
#execute formula, and retrieve diploidised genotypes
diploidised.genotypes <- diploidise.it(Genotypes)
diploidised.genotypes


References: 
Agresti, A. (1984) Analysis of ordinal categorical data. Wiley, New York, p. 23-24
Legendre & Legendre (1998) Numerical ecology, Elsevier, Amsterdam. p.221
Nei, M (1972) American Naturalist 106: 283-292.
Nei, M (1978) Genetics 89: 583-590.
Sokal & Rohlf (2001) Biometry. Freeman, New York.
Takezaki & Nei (1996) Genetics 144:389-399. 
Walser, J. C., Sperisen, C., Soliva, M. & Scheidegger, C. (2003) Fungal Genetics and Biology 40, 72-82.
Walser, J. C., Gugerli, F., Holderegger, R., Kuonen, D. & Scheidegger, C. (2004) Heredity 93, 322-329.
#####################################################################################
PS From June 1st 2005 and onwards, pls. direct upcoming questions to swerth"at"gmx.de
#####################################################################################