Swiss Federal Institute for Forest, Snow and Landscape Research WSL

## S-plus codes

Rita Ghosh, Personal Homepage

S-plus codes for T3-plots: write to rita.ghosh@wsl.ch

The empirical moment generating function (emgf): Based on the random sample X1, X2, ..., Xn from a probability distribution F, and a real number t, the empirical moment generating function or the emgf is defined as the sample mean mn(t) = {etX1+ etX2 +...+ etXn }/n. This quantity is an unbiased estimator of its population counterpart, namely the moment generating function m(t) = E[etX1], provided that m(t) exists in an open interval around zero. Due to their uniqueness properties, the mgf (emgf) can be used for goodness-of-fit tests. The empirical characteristic function (ecf) is a similarly defined quantity where however t is replaced by it, where i =√(-1). Like the emgf, the ecf is also an unbiased estimate of the characteristic function, which always exists. Methods based on the emgf and the ecf are typically of asymptotic nature. The T3-plots are based on the emgf. For goodness of fit tests based on the ecf, see for instance Ghosh, S., Ruymgaart, F. (1992) Canadian Journal of Statistics, 20: 429-440) and the references therein. Additional references to these topics can be found in Ghosh (1996, 2013) and Ghosh & Beran (2000).

T3-plots : The T3-plots make use of the emgf and are graphical tools for testing univariate normality and for comparing two distributions of arbitrary shapes. T3-plots can be used graphically as well as for formal hypothesis testing, i.e. given a level of significance. In the one sample case [see (1) below], the test statistic (the sample T3-function) is the third derivative (with respect to the argument t) of the logarithm of the emgf or the cumulant generating function. In the two sample case [see (2) below], the test statistic is the difference between the two T3-functions. To fully understand the theoretical properties of these methods, background in asymptotic theory of mathematical statistics is required. However, implementation of these methods is not difficult and can easily be performed by practitioners even without prior experience in interpreting probability plots.

For using T3-plots to test the null hypothesis of normality of stationary long-memory time series observations, see Ghosh, S. (2013):  Normality testing for a long-memory sequence using the empirical moment generating function. Journal of Statistical Planning and Inference 143, 944–954.

(1) One-sample T3 plot: Graphical test of univariate normality With this method one can test the null hypothesis that a set of univariate independent and identically distributed (iid) observations are normally distributed with an unknown mean and an unknown variance. While the approach is based on asymptotic arguments, the method incorporates finite sample corrections and it is location and scale invariant. Missing values are allowed in the S-plus code and it is not necessary to standardize the data prior to analysis.
References:
Ghosh, S. (1996) Journal of the Royal Statistical Society, Series B.
Ghosh, S. (1999) Encyclopedia for Statistical Sciences, John Wiley.
Details: This procedure is used when the problem is to test the null hypothesis Ho: X~N(µ, σ2), i.e. the probability distribution of X is normal with unknown parameters µ and σ2. The relevant S-plus command is, T3plot(X), where X is the vector of iid observations in S-Plus. This creates the T3-function which is plotted against its argument. In addition to the T3-function, the 99% and 95% rejection limits are also plotted. The null hypothesis of normality is rejected if the T3-function of the given sample deviates significantly from the horizontal zero-line by crossing the rejection limits.

(2) Two-sample T3 plot: Graphical comparison of two distributions Based on two independent random samples, this method tests the null hypothesis that the shapes of the two underlying distributions are the same. The method is location and scale invariant. Small sample corrections are incorporated in the S-plus code. Missing values are allowed and it is not necessary to standardize the data prior to analysis.
Reference:
Ghosh, S. & Beran, J. (2000) Journal of Computational and Graphical Statistics.
Details: This procedure is used when the problem is to test the null hypothesis Ho: F1 = F2, i.e. the probability distributions of the univariate and independent random variables X1 and X2 are the same. The Two-sample T3 plot works quite like the one sample method, except that in the two sample case, the twot3 function is used which creates a plot of the difference between the two one-sample T3-functions and the corresponding 99% and 95% rejection limits. The null hypothesis is rejected if the two sample T3 function crosses the rejection limits. In the two sample case, bootstrap is used to construct the rejection bands.