TestMCARNormality.Rd
This is a function from MissMech
package. The description of this
function in the original package is as following: The main purpose of this
function is to test whether the missing data mechanism, for an incompletely
observed data set, is one of missing completely at random (MCAR). As a by
product, however, this package has the capabilities of imputing incomplete
data, performing a test to determine whether data have a multivariate normal
distribution, performing a test of equality of covariances for groups, and
obtaining normal-theory maximum likelihood estimates for mean and covariance
when data are incomplete. The test of MCAR follows the methodology proposed
by Jamshidian and Jalal (2010). It is based on testing equality of
covariances between groups having identical missing data patterns. The data
are imputed, using two options of normality and distribution free, and the
test of equality of covariances between groups with identical missing data
patterns is performed also with options of assuming normality (Hawkins test)
or non-parametrically. Users can optionally use their own method of data
imputation as well. Multiple imputation is an additional feature of the
program that can be used as a diagnostic tool to help identify cases or
variables that contribute to rejection of MCAR, when the MCAR test is
rejected (See Jamshidian and Jalal, 2010 for details). As explained in
Jamshidian, Jalal, and Jansen (2014), this package can also be used for
imputing missing data, test of multivariate normality, and test of equality
of covariances between several groups when data are completly observed.
TestMCARNormality(
data,
del.lesscases = 6,
imputation.number = 1,
method = "Auto",
imputation.method = "Dist.Free",
nrep = 10000,
n.min = 30,
seed = 110,
alpha = 0.05,
imputed.data = NA
)
A matrix or data frame consisting of at least two columns. Values must be numerical with missing data indicated by NA.
Missing data patterns consisting of del.lesscases number of cases or less will be removed from the data set.
Number of imputations to be used, if data are to be multiply imputed.
method is an option that allows the user to select one of the methods of Hawkins or nonparametric for the test. If the user is certain that data have multivariate normal distribution, the method="Hawkins" should be selected. On the other hand if data are not normally distributed, then method="Nonparametric" should be used. If the user is unsure, then the default value of method="Auto" will be used, in which case both the Hawkins and the nonparametric tests will be run, and the default output follows the recommendation by Jamshidian and Jalal (2010) outlined in their flowchart given in Figure 7 of their paper.
"Dist.Free": Missing data are imputed nonparametrically using the method of Sirvastava and Dolatabadi (2009); also see Jamshidian and Jalal (2010).
"normal": Missing data are imputed assuming that the data come from a multivariate normal distribution. The maximum likelihood estimate of the mean and covariance obtained from Mls is used for generating imputed values. The imputed values are based on the conditional distribution of the missing variables given the observed variables; see Jamshidian and Jalal (2010) for more details.
Number of replications used to simulate the Neyman distribution to determine the cut off value for the Neyman test in the program SimNey. Larger values increase the accuracy of the Neyman test.
The minimum number of cases in a group that triggers the use of asymptotic Chi distribution in place of the emprical distribution in the Neyman test of uniformity.
An initial random number generator seed. The default is 110 that can be reset to a user selected number. If the value is set to NA, a system selected seed is used.
The significance level at which tests are performed.
The user can optionally provide an imputed data set. In this case the program will not impute the data and will use the imputed data set for the tests performed. Note that the order of cases in the imputed data set should be the same as that of the incomplete data set.
analyzed.data
The data that were used in the analysis. If
del.lesscases=0, this is the same as the orginal data inputted. If
del.lesscases > 0, then this is the data with cases removed.
imputed.data
The analyzed.data after imputation.
If imputation.number > 1, the first imputed data set is returned.
ordered.data
The analyzed.data ordered according to missing
data pattern, usin the function OrderMissing.
caseorder
A mapping of case number indices from ordered.data
to the original data.
More specifically, the j-th row of the ordered.data is the caseorder[j]-th
(the j-th element of caseorder) row of the original data.
pnormality
p-value for the nonparametric test: When
imputation.number > 1, this is a vector with each element corresponding to
each of the imputed data sets.
adistar
A matrix consisting of the Anderson-Darling test
statistic for each group (columns) and each imputation (rows).
adstar
Sum of adistar: When imputation.number >1, this is a
vector with each element corresponding to each of the imputed data sets.
pvalcomb
p-value for the Hawkins test: When
imputation.number > 1, this is a vector with each element corresponding to
each of the imputed data sets.
pvalsn
A matrix consisting of Hawkins test statistics for
each group (columns) and each imputation (rows).
g
Number of patterns used in the analysis.
combp
Hawkins test statistic: When imputation.number > 1, this
is a vector with each element corresponding to each of the imputed data sets.
alpha
The significance level at which the hypothesis tests are
performed.
patcnt
A vector consisting the number of cases corresponding
to each pattern in patused.
patused
A matrix indicating the missing data patterns in the
data set, using 1 and NA's.
imputation.number
A value greater than or equal to 1. If a
value larger than 1 is used, data will be imputed imputation.number times.
mu
The normal-theory maximum likelihood estimate of the
variables means.
sigma
The normal-theory maximum likelihood estimate of the
variables covariance matrix.
Jamshidian, M. and Bentler, P. M. (1999). ``ML estimation of mean and covariance structures with missing data using complete data routines.'' Journal of Educational and Behavioral Statistics, 24, 21-41.
Jamshidian, M. and Jalal, S. (2010). ``Tests of homoscedasticity, normality, and missing at random for incomplete multivariate data,'' Psychometrika, 75, 649-674.
Jamshidian, M. Jalal, S., and Jansen, C. (2014). `` MissMech: An R Package for Testing Homoscedasticity, Multivariate Normality, and Missing Completely at Random (MCAR),'' Journal of Statistical Software, 56(6), 1-31.