| Title: | Multivariate Hypothesis Tests |
|---|---|
| Description: | Multivariate hypothesis tests and confidence intervals... |
| Authors: | Hasan Bulut [aut, cre] |
| Maintainer: | Hasan Bulut <[email protected]> |
| License: | GPL-2 |
| Version: | 2.3.3 |
| Built: | 2026-05-22 18:58:12 UTC |
| Source: | https://github.com/hsnbulut/mvtests |
Implements an adaptive wrapped robust canonical correlation analysis procedure for potentially contaminated high-dimensional data. The method applies columnwise robust standardization and wrapping to mitigate cellwise outliers, uses a Fisher-consistency correction, enforces positive semi-definiteness of the correlation matrix, applies Ledoit–Wolf type shrinkage, and performs an MCD-based reweighting in the canonical score space to downweight casewise outliers.
AWRcca(X, Y, b = 1.5, c = 4, alpha = 0.975, n_xi = 10000, lambda_cap = 0.5)AWRcca(X, Y, b = 1.5, c = 4, alpha = 0.975, n_xi = 10000, lambda_cap = 0.5)
X |
A numeric matrix of dimension |
Y |
A numeric matrix of dimension |
b |
Lower wrapping threshold ( |
c |
Upper wrapping threshold. Default is 4. |
alpha |
Reweighting cutoff probability for chi-square threshold. Default is 0.975. |
n_xi |
Monte Carlo sample size for the consistency correction. Default is 10000. |
lambda_cap |
Upper bound for the shrinkage intensity. Default is 0.5. |
The wrapping transformation is based on a smooth redescending function
applied to robust z-scores (median/MAD). The shrinkage intensity is
estimated in a Ledoit–Wolf spirit and then capped by lambda_cap to avoid
overshrinkage.
The function returns (i) canonical correlations, (ii) the shrinkage intensity used, (iii) 0/1 reweighting indicators, and (iv) the first canonical score pair computed from the initial solution (useful for diagnostic plots).
A list with components:
cor: vector of canonical correlations.
shrink_used: shrinkage intensity used in the correlation regularization.
weights: 0/1 weights from MCD-based reweighting in score space.
u1: first canonical score for X (initial solution).
v1: first canonical score for Y (initial solution).
# Example: correlated blocks via a shared latent factor set.seed(123) n <- 50; p <- 30; q <- 20 u <- rnorm(n) ax <- rnorm(p); ax <- ax / sqrt(sum(ax^2)) by <- rnorm(q); by <- by / sqrt(sum(by^2)) X <- 1.0 * u %*% t(ax) + matrix(rnorm(n*p), n, p) Y <- 1.0 * u %*% t(by) + matrix(rnorm(n*q), n, q) fit <- AWRcca(X, Y) fit$cor[1]# Example: correlated blocks via a shared latent factor set.seed(123) n <- 50; p <- 30; q <- 20 u <- rnorm(n) ax <- rnorm(p); ax <- ax / sqrt(sum(ax^2)) by <- rnorm(q); by <- by / sqrt(sum(by^2)) X <- 1.0 * u %*% t(ax) + matrix(rnorm(n*p), n, p) Y <- 1.0 * u %*% t(by) + matrix(rnorm(n*q), n, q) fit <- AWRcca(X, Y) fit$cor[1]
Bcov function tests whether the covariance matrix is equal to a
given matrix or not.
Bcov(data, Sigma)Bcov(data, Sigma)
data |
a data frame. |
Sigma |
The covariance matrix in NULL hypothesis. |
This function computes Bartlett's test statistic for the covariance matrix of one sample.
a list with 3 elements:
ChiSquare |
The value of Test Statistic |
df |
The Chi-Square statistic's degree of freedom |
p.value |
p value |
Hasan BULUT <[email protected]>
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
data(iris) S<-matrix(c(5.71,-0.8,-0.6,-0.5,-0.8,4.09,-0.74,-0.54,-0.6, -0.74,7.38,-0.18,-0.5,-0.54,-0.18,8.33),ncol=4,nrow=4) result <- Bcov(data=iris[,1:4],Sigma=S) summary(result)data(iris) S<-matrix(c(5.71,-0.8,-0.6,-0.5,-0.8,4.09,-0.74,-0.54,-0.6, -0.74,7.38,-0.18,-0.5,-0.54,-0.18,8.33),ncol=4,nrow=4) result <- Bcov(data=iris[,1:4],Sigma=S) summary(result)
BoxM function tests whether the covariance matrices of independent
samples are equal or not.
BoxM(data, group)BoxM(data, group)
data |
a data frame. |
group |
grouping vector. |
This function computes Box-M test statistic for the covariance matrices of independent samples. The hypotheses are defined as H0:The Covariance matrices are homogeneous and H1:The Covariance matrices are not homogeneous
a list with 3 elements:
ChiSquare |
The value of Test Statistic |
df |
The Chi-Square statistic's degree of freedom |
p.value |
p value |
Hasan BULUT <[email protected]>
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
data(iris) results <- BoxM(data=iris[,1:4],group=iris[,5]) summary(results)data(iris) results <- BoxM(data=iris[,1:4],group=iris[,5]) summary(results)
Bsper function tests whether a correlation matrix is equal to
the identity matrix or not.
Bsper(data)Bsper(data)
data |
a data frame. |
This function computes Bartlett's test statistic for Sphericity Test.
The hypotheses are H0:R is equal to I and H1:R is not equal to I.
a list with 4 elements:
ChiSquare |
The value of Test Statistic |
df |
The Chi-Square statistic's degree of freedom |
p.value |
p value |
R |
Correlation matrix |
Hasan BULUT <[email protected]>
Tatlidil, H. (1996). Uygulamali Cok Degiskenli Istatistiksel Yontemler. Cem Web.
data(iris) results <- Bsper(data=iris[,1:4]) summary(results)data(iris) results <- Bsper(data=iris[,1:4]) summary(results)
Classical Concordance Correlation Coefficient
ccc(x, y)ccc(x, y)
x |
the vector which contains the first variable values |
y |
the vector which contains the second variable values |
ccc function calculates directly classical concordance correlation coefficient.
a list with 1 elements:
coef |
The value of concordance correlation coeffient |
Hasan BULUT <[email protected]>
Bulut, H (2025). A Robust Concordance Correlation Coefficient. (Unpublished)
Lin, L. I. "A Concordance Correlation-Coefficient to Evaluate Reproducibility." Biometrics 45, no. 1 (1989): 255-68.
x<-rnorm(50) y<-2+3*x+rnorm(50,mean = 3) ccc(x,y)x<-rnorm(50) y<-2+3*x+rnorm(50,mean = 3) ccc(x,y)
Performs a cellMCD-based robust two-sample Hotelling T^2 test for comparing the mean vectors of two independent multivariate samples. The p-value is obtained by a permutation procedure.
CellMCDT2( X1, X2, B = 999, alpha = 0.75, quant = 0.99, crit = 1e-04, seed = NULL, na.rm = TRUE, ... )CellMCDT2( X1, X2, B = 999, alpha = 0.75, quant = 0.99, crit = 1e-04, seed = NULL, na.rm = TRUE, ... )
X1 |
A numeric matrix or data frame for the first group. |
X2 |
A numeric matrix or data frame for the second group. |
B |
Number of permutations. Default is 999. |
alpha |
The cellMCD alpha parameter. Default is 0.75. |
quant |
Quantile used in the cellMCD procedure. Default is 0.99. |
crit |
Convergence criterion used in the cellMCD procedure. Default is 1e-04. |
seed |
Optional random seed. |
na.rm |
Logical. If TRUE, rows with missing values are removed. Default is TRUE. |
... |
Additional arguments passed to |
An object of class MVTests containing the test statistic,
permutation p-value, number of successful permutations, and related information.
Hasan BULUT <[email protected]>
Raymaekers, J. and Rousseeuw, P. J. (2024). The cellwise minimum covariance determinant estimator. Journal of the American Statistical Association, 119(548), 2610–2621.
Bulut, H. and Esmeray, M. A cellwise robust Hotelling test for two-sample comparisons (Unpublished).
if (requireNamespace("mvtnorm", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { set.seed(123) x1 <- mvtnorm::rmvnorm(n = 30, mean = rep(0, 5), sigma = diag(5)) x2 <- mvtnorm::rmvnorm(n = 30, mean = rep(0, 5), sigma = diag(5)) fit <- CellMCDT2(X1 = x1, X2 = x2, B = 9, seed = 123) fit$p.value }if (requireNamespace("mvtnorm", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { set.seed(123) x1 <- mvtnorm::rmvnorm(n = 30, mean = rep(0, 5), sigma = diag(5)) x2 <- mvtnorm::rmvnorm(n = 30, mean = rep(0, 5), sigma = diag(5)) fit <- CellMCDT2(X1 = x1, X2 = x2, B = 9, seed = 123) fit$p.value }
The data set is given in Table 5.3 in Rencher (2003). The data set consists of 2 variables (Depth and Number), 2 treatments and 15 observations. The first column of the data is Location numbers.
CoatedCoated
A data frame with 15 rows and 5 columns. The columns are as follows:
The location numbers of observations.
The Depth values in the first treatment
The Number values in the first treatment
The Depth values in the second treatment
The Number values in the second treatment
The data set is used in the book entitled Methods of Multivariate Analysis (Rencher,2003).
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
The Iris dataset is consists of 4 variables, 3 groups and 150 observations. The last column of the data is Iris species.
irisiris
A data frame with 150 rows and 5 columns. The columns are as follows:
The Sepal length values of iris flowers
The Sepal width values of iris flowers
The Petal length values of iris flowers
The Petal width values of iris flowers
The species of iris flowers
https://archive.ics.uci.edu/ml/datasets/Iris
Pair-Wise comparison of covariance matrices between hth and gth sample
Mhg(Sh, Sg, S, nh, ng, n)Mhg(Sh, Sg, S, nh, ng, n)
Sh |
the robust covariance matrix of the hth sample |
Sg |
the robust covariance matrix of the gth sample |
S |
the robust pooled covariance matrix. |
nh |
the sample size of the hth sample |
ng |
the sample size of the gth sample |
n |
the sample size of the full data |
Mhg function computes proposed Mgh values as defined in the paper.
a list with 1 elements:
Mhg |
Mgh value |
Hasan BULUT <[email protected]>
Bulut, H (2024). A robust permutational test to compare covariance matrices in high dimensional data. (Unpublished)
if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = diag(20)) x2<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 2*diag(20)) x3<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 3*diag(20)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,10),rep(2,10),rep(3,10)) n <- nrow(data) p <- ncol(data) nk <- table(group_label) g <- length(nk) Levels <- unique(group_label) Si.matrices<-lapply(1:g, function(i) rrcov::CovMrcd(data[(group_label==Levels[i]),], alpha=0.9)@cov) Spool <- Reduce("+", Map("*", nk, Si.matrices)) / n #for the first and second groups Mhg(Sh = Si.matrices[[1]], Sg = Si.matrices[[2]],S = Spool, nh = nk[1], ng = nk[2], n = n)}if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = diag(20)) x2<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 2*diag(20)) x3<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 3*diag(20)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,10),rep(2,10),rep(3,10)) n <- nrow(data) p <- ncol(data) nk <- table(group_label) g <- length(nk) Levels <- unique(group_label) Si.matrices<-lapply(1:g, function(i) rrcov::CovMrcd(data[(group_label==Levels[i]),], alpha=0.9)@cov) Spool <- Reduce("+", Map("*", nk, Si.matrices)) / n #for the first and second groups Mhg(Sh = Si.matrices[[1]], Sg = Si.matrices[[2]],S = Spool, nh = nk[1], ng = nk[2], n = n)}
Mpaired function computes the value of test statistic based on
Hotelling T Square
approach in multivariate paired data sets.
Mpaired(T1, T2)Mpaired(T1, T2)
T1 |
The first treatment data. |
T2 |
The second treatment data. |
This function computes one sample Hotelling T^2 statistics for paired data sets.
a list with 7 elements:
HT2 |
The value of Hotelling T^2 Test Statistic |
F |
The value of F Statistic |
df |
The F statistic's degree of freedom |
p.value |
p value |
Descriptive1 |
The descriptive statistics of the first treatment |
Descriptive2 |
The descriptive statistics of the second treatment |
Descriptive.Difference |
The descriptive statistics of the differences |
Hasan BULUT <[email protected]>
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
data(Coated) X<-Coated[,2:3]; Y<-Coated[,4:5] result <- Mpaired(T1=X,T2=Y) summary(result)data(Coated) X<-Coated[,2:3]; Y<-Coated[,4:5] result <- Mpaired(T1=X,T2=Y) summary(result)
OneSampleHT2 computes one sample Hotelling T^2 statistics and gives
confidence intervals
OneSampleHT2(data, mu0, alpha = 0.05)OneSampleHT2(data, mu0, alpha = 0.05)
data |
a data frame. |
mu0 |
mean vector that is used to test whether population mean parameter is equal to it. |
alpha |
Significance Level that will be used for confidence intervals.
|
This function computes one sample Hotelling T^2 statistics that is used to
test whether population mean vector is equal to a vector given by a user.
When H0 is rejected, this function computes confidence intervals
for all variables.
a list with 7 elements:
HT2 |
The value of Hotelling T^2 Test Statistic |
F |
The value of F Statistic |
df |
The F statistic's degree of freedom |
p.value |
p value |
CI |
The lower and upper limits of confidence intervals obtained for all variables |
alpha |
The alpha value using in confidence intervals |
Descriptive |
Descriptive Statistics |
Hasan BULUT <[email protected]>
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Tatlidil, H. (1996). Uygulamali Cok Degiskenli Istatistiksel Yontemler. Cem Web.
data(iris) mean0<-c(6,3,1,0.25) result <- OneSampleHT2(data=iris[1:50,-5],mu0=mean0,alpha=0.05) summary(result)data(iris) mean0<-c(6,3,1,0.25) result <- OneSampleHT2(data=iris[1:50,-5],mu0=mean0,alpha=0.05) summary(result)
Computes a robust concordance correlation coefficient using Minimum Covariance Determinant (MCD) estimates.
rccc(x, y, alpha = 0.75)rccc(x, y, alpha = 0.75)
x |
Numeric vector; first variable. |
y |
Numeric vector; second variable. |
alpha |
Numeric in (0.5, 1]; MCD subset size proportion. Default 0.75. |
The rCCC replaces means and (co)variances in Lin's CCC with their
MCD counterparts: .
A list with one element:
coef |
Robust concordance correlation coefficient |
Hasan BULUT <[email protected]>
Bulut, H. (2025). A Robust Concordance Correlation Coefficient. (Unpublished)
if (requireNamespace("robustbase", quietly = TRUE)) { set.seed(1) x <- rnorm(50) y <- 2 + 3*x + rnorm(50, mean = 3) rccc(x, y) }if (requireNamespace("robustbase", quietly = TRUE)) { set.seed(1) x <- rnorm(50) y <- 2 + 3*x + rnorm(50, mean = 3) rccc(x, y) }
Robust Hotelling T^2 Test for One Sample in high Dimensional Data
RHT2(data, mu0, alpha = 0.75, d, q)RHT2(data, mu0, alpha = 0.75, d, q)
data |
the data. It must be matrix or data.frame. |
mu0 |
the mean vector which will be used to test the null hypothesis. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
d |
the constant in Equation (11) in the study by Bulut (2021). |
q |
the second degree of freedom value of the approximate F distribution in Equation (11) in the study by Bulut (2021). |
RHT2 function performs a robust Hotelling T^2 test in high dimensional test based on the minimum regularized covariance determinant estimators.
This function needs the q and d values. These values can be obtained simRHT2 function.
For more detailed information, you can see the study by Bulut (2021).
a list with 3 elements:
T2 |
The Robust Hotelling T^2 value in high dimensional data |
Fval |
The F value based on T2 |
pval |
The p value based on the approximate F distribution |
Hasan BULUT <[email protected]>
Bulut, H (2021). A robust Hotelling test statistic for one sample case in high dimensional data, Communication in Statistics: Theory and Methods.
if (requireNamespace("rrcov", quietly = TRUE)) { utils::data("octane", package = "rrcov") mu.clean <- colMeans(octane[-c(25,26,36,37,38,39), ]) RHT2(data = octane, mu0 = mu.clean, alpha = 0.84, d = 1396.59, q = 1132.99)}if (requireNamespace("rrcov", quietly = TRUE)) { utils::data("octane", package = "rrcov") mu.clean <- colMeans(octane[-c(25,26,36,37,38,39), ]) RHT2(data = octane, mu0 = mu.clean, alpha = 0.84, d = 1396.59, q = 1132.99)}
Robust Test for Covariance Matrices in High Dimensional Data
Rob_CovTest(x, group, alpha = 0.75)Rob_CovTest(x, group, alpha = 0.75)
x |
the data matrix |
group |
the grouping vector. It must be factor. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
Rob_CovTest function computes the calculated value of the test statistic for covariance matrices of two or more independent samples in high dimensional data based on the minimum regularized covariance determinant estimators.
a list with 1 elements:
TM |
The calculated value of test statistics based on raw data |
Hasan BULUT <[email protected]>
Bulut, H (2024). A robust permutational test to compare covariance matrices in high dimensional data. (Unpublished)
if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = diag(10)) x2<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = 2*diag(10)) x3<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = 3*diag(10)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,8),rep(2,8),rep(3,8)) Rob_CovTest(x=data, group=group_label)}if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = diag(10)) x2<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = 2*diag(10)) x3<-mvtnorm::rmvnorm(n = 8,mean = rep(0,10),sigma = 3*diag(10)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,8),rep(2,8),rep(3,8)) Rob_CovTest(x=data, group=group_label)}
RobCat computes p value based on robust CAT algorithm to compare two means vectors
under multivariate Behrens-Fisher problem.
RobCat(X, Y, M = 1000, alpha = 0.75)RobCat(X, Y, M = 1000, alpha = 0.75)
X |
a matrix or data frame for first group. |
Y |
a matrix or data frame for second group. |
M |
iteration number and the default is 1000. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized; roughly alpha*n, observations are used for computing the determinant. Allowed values are between 0.5 and 1 and the default is 0.75. |
This function computes p value based on robust CAT algorithm to compare two means vectors under multivariate Behrens-Fisher problem. When p value<0.05, it means the difference of two mean vectors is significant statistically.
a list with 2 elements:
Cstat |
Calculated value of test statistic |
pval |
The p value |
Hasan BULUT <[email protected]>
data(iris) if (requireNamespace("robustbase", quietly=TRUE)) { RobCat(X=iris[1:20,-5],Y=iris[81:100,-5])}data(iris) if (requireNamespace("robustbase", quietly=TRUE)) { RobCat(X=iris[1:20,-5],Y=iris[81:100,-5])}
Performs a cellwise robust one-sample Hotelling T^2 test based on the cellwise minimum covariance determinant (cellMCD) estimator.
RobCellT2_onesample( data, mu0, d, q, alpha = 0.75, quant = 0.99, crit = 1e-04, na.rm = TRUE, ... )RobCellT2_onesample( data, mu0, d, q, alpha = 0.75, quant = 0.99, crit = 1e-04, na.rm = TRUE, ... )
data |
A numeric matrix or data frame. |
mu0 |
The hypothesized mean vector under the null hypothesis. |
d |
The scaling constant of the approximate F distribution. |
q |
The second degree of freedom of the approximate F distribution. |
alpha |
The cellMCD alpha parameter. Default is 0.75. |
quant |
Quantile used in the cellMCD procedure. Default is 0.99. |
crit |
Convergence criterion used in the cellMCD procedure. Default is 1e-04. |
na.rm |
Logical. If TRUE, rows with missing values are removed. Default is TRUE. |
... |
Additional arguments passed to |
The function computes a robust Hotelling T^2 statistic by replacing the
classical sample mean vector and covariance matrix with the cellMCD location
and scatter estimates. The statistic is converted to an approximate F
statistic using the constants d and q. These constants can be
obtained by the simRobCellT2_onesample() function.
An object of class MVTests containing:
T2 |
The cellMCD-based robust Hotelling T^2 statistic. |
Fval |
The approximate F statistic. |
p.value |
The p-value based on the approximate F distribution. |
mu |
The cellMCD location estimate. |
S |
The cellMCD scatter estimate. |
Hasan BULUT <[email protected]>
Raymaekers, J. and Rousseeuw, P. J. (2024). The cellwise minimum covariance determinant estimator. Journal of the American Statistical Association, 119(548), 2610–2621.
Willems, G., Pison, G., Rousseeuw, P. J., and Van Aelst, S. (2002). A robust Hotelling test. Metrika, 55, 125–138.
if (requireNamespace("MASS", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { set.seed(123) X <- MASS::mvrnorm(n = 50, mu = rep(0, 5), Sigma = diag(5)) const <- simRobCellT2_onesample(n = 50, p = 5, nrep = 50, seed = 123) fit <- RobCellT2_onesample( data = X, mu0 = rep(0, 5), d = const$d, q = const$q ) fit$p.value }if (requireNamespace("MASS", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { set.seed(123) X <- MASS::mvrnorm(n = 50, mu = rep(0, 5), Sigma = diag(5)) const <- simRobCellT2_onesample(n = 50, p = 5, nrep = 50, seed = 123) fit <- RobCellT2_onesample( data = X, mu0 = rep(0, 5), d = const$d, q = const$q ) fit$p.value }
Performs a weighted minimum regularized covariance determinant (MRCD)-based robust one-way MANOVA test for high-dimensional data.
RobHDMANOVA( x, group, N = 100, alpha = 0.75, tau = 0.975, cutoff = c("normal", "chisq"), seed = NULL, verbose = FALSE )RobHDMANOVA( x, group, N = 100, alpha = 0.75, tau = 0.975, cutoff = c("normal", "chisq"), seed = NULL, verbose = FALSE )
x |
A numeric data matrix or data frame. Rows represent observations and columns represent variables. |
group |
A grouping vector indicating the group membership of each observation. It will be internally converted to a factor. |
N |
The number of permutations used to approximate the null distribution.
The default is |
alpha |
Numeric parameter controlling the size of the subsets over which
the MRCD determinant is minimized. Allowed values are between 0.5 and 1.
The default is |
tau |
Cutoff probability used in the robust distance-based reweighting
step. The default is |
cutoff |
The cutoff rule for robust distances. Options are
|
seed |
An optional integer used to set the random seed for the permutation
procedure. The default is |
verbose |
Logical. If |
The RobHDMANOVA function tests the equality of multivariate group
location vectors in one-way MANOVA settings, particularly when the number of
variables is large relative to the sample size and the data may contain
outlying observations.
The procedure first computes groupwise MRCD location estimates. Then, a pooled MRCD covariance matrix is obtained from group-centered observations. Robust distances are calculated using this pooled covariance matrix, and binary weights are assigned according to a robust distance cutoff. Reweighted group means are then used to construct a robust between-group scatter matrix. A robust Wilks-type statistic is computed as
where is the pooled MRCD covariance matrix and is the
robust between-group scatter matrix. The test statistic is
Since the finite-sample null distribution is unknown, the p-value is obtained using a permutation procedure.
A list of class MVTests with the following elements:
Lambda |
The robust Wilks' Lambda value. |
TR |
The observed robust MANOVA test statistic. |
p.value |
The permutation-based p-value. |
Permutations_TR |
The test statistic values obtained from permutations. |
alpha |
The trimming parameter used in MRCD estimation. |
tau |
The cutoff probability used for robust distance-based reweighting. |
cutoff |
The cutoff rule used for robust distances. |
group.centers |
The reweighted robust group centers. |
weights |
The binary robust weights for observations in each group. |
Test |
The name of the test. |
Hasan BULUT <[email protected]>
Boudt, K., Rousseeuw, P. J., Vanduffel, S., and Verdonck, T. (2020). The minimum regularized covariance determinant estimator. Statistics and Computing, 30, 113–128.
Todorov, V. and Filzmoser, P. (2010). Robust statistic for the one-way MANOVA. Computational Statistics and Data Analysis, 54, 37–48.
Bulut, H. (2020). Mahalanobis distance based on minimum regularized covariance determinant estimators for high dimensional data. Communications in Statistics - Theory and Methods, 49, 5897–5907.
if (requireNamespace("rrcov", quietly = TRUE) && requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(123) x1 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x2 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x3 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x <- rbind(x1, x2, x3) group <- c(rep(1, 10), rep(2, 10), rep(3, 10)) RobHDMANOVA(x = x, group = group, N = 19, alpha = 0.75, tau = 0.975, seed = 123) }if (requireNamespace("rrcov", quietly = TRUE) && requireNamespace("mvtnorm", quietly = TRUE)) { set.seed(123) x1 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x2 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x3 <- mvtnorm::rmvnorm(n = 10, mean = rep(0, 20), sigma = diag(20)) x <- rbind(x1, x2, x3) group <- c(rep(1, 10), rep(2, 10), rep(3, 10)) RobHDMANOVA(x = x, group = group, N = 19, alpha = 0.75, tau = 0.975, seed = 123) }
Robust Permutation Test for Covariance Matrices in High Dimensional Data
RobPer_CovTest(x, group, N = 100, alpha = 0.75)RobPer_CovTest(x, group, N = 100, alpha = 0.75)
x |
the data matrix |
group |
the grouping vector. It must be factor. |
N |
the permutation number and the default value is 100. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
RobPer_CovTest function calculates directly p-value based on the calculated value of test statistics and the permutational distribution of test statistics for covariance matrices of two or more independent samples in high dimensional data based on the minimum regularized covariance determinant estimators.
a list with 3 elements:
pval |
p-value of the robust permutation test process |
TM |
The calculated value of test statistics based on raw data |
Permutations_TM |
The calculated values of test statistics based on each permutational data |
Hasan BULUT <[email protected]>
Bulut, H (2024). A robust permutational test to compare covariance matrices in high dimensional data. (Unpublished)
if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = diag(20)) x2<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 2*diag(20)) x3<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 3*diag(20)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,10),rep(2,10),rep(3,10)) RobPer_CovTest(x=data, group=group_label)}if (requireNamespace("rrcov", quietly=TRUE)) { x1<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = diag(20)) x2<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 2*diag(20)) x3<-mvtnorm::rmvnorm(n = 10,mean = rep(0,20),sigma = 3*diag(20)) data<-rbind(x1,x2,x3) group_label<-c(rep(1,10),rep(2,10),rep(3,10)) RobPer_CovTest(x=data, group=group_label)}
Robust Permutation Hotelling T^2 Test for Two Independent Samples in high Dimensional Data
RperT2(X1, X2, alpha = 0.75, N = 100)RperT2(X1, X2, alpha = 0.75, N = 100)
X1 |
the data matrix for the first group. It must be matrix or data.frame. |
X2 |
the data matrix for the first group. It must be matrix or data.frame. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
N |
the permutation number |
RperT2 function performs a robust permutation Hotelling T^2 test for two independent samples in high dimensional test based on the minimum regularized covariance determinant estimators.
a list with 2 elements:
T2 |
The calculated value of Robust Hotelling T^2 statistic based on MRCD estimations |
p.value |
p value obtained from test process |
Hasan BULUT <[email protected]>
Bulut et al. (2024). A Robust High-Dimensional Test for Two-Sample Comparisons, Axioms.
if (requireNamespace("rrcov", quietly=TRUE)) { x<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(0,20)) y<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(1,20)) RperT2(X1=x,X2=y)$p.value}if (requireNamespace("rrcov", quietly=TRUE)) { x<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(0,20)) y<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(1,20)) RperT2(X1=x,X2=y)$p.value}
Monte Carlo Simulation to obtain d and q constants for RHT2 function
simRHT2(n, p, nrep = 500, alpha = 0.75)simRHT2(n, p, nrep = 500, alpha = 0.75)
n |
the sample size |
p |
the number of variables |
nrep |
the number of iteration. The default value is 500. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
simRHT2 function computes d and q constants to construct an approximate
F distribution of robust Hotelling T^2 statistic in high dimensional data.
These constants are used in RHT2 function.
For more detailed information, you can see the study by Bulut (2021).
a list with 2 elements:
q |
The q value |
d |
The d value |
Hasan BULUT <[email protected]>
Bulut, H (2021). A robust Hotelling test statistic for one sample case in highdimensional data, Communication in Statistics: Theory and Methods.
Computes the constants d and q required for the approximate
F distribution of the cellMCD-based robust one-sample Hotelling T^2 statistic.
simRobCellT2_onesample( n, p, nrep = 3000, alpha = 0.75, quant = 0.99, crit = 1e-04, seed = NULL, ... )simRobCellT2_onesample( n, p, nrep = 3000, alpha = 0.75, quant = 0.99, crit = 1e-04, seed = NULL, ... )
n |
The sample size. |
p |
The number of variables. |
nrep |
The number of Monte Carlo replications. Default is 3000. |
alpha |
The cellMCD alpha parameter. Default is 0.75. |
quant |
Quantile used in the cellMCD procedure. Default is 0.99. |
crit |
Convergence criterion used in the cellMCD procedure. Default is 1e-04. |
seed |
Optional random seed. |
... |
Additional arguments passed to |
A list with the following elements:
d |
The scaling constant of the approximate F distribution. |
q |
The second degree of freedom of the approximate F distribution. |
mean.T2 |
The Monte Carlo mean of the simulated T^2 statistics. |
var.T2 |
The Monte Carlo variance of the simulated T^2 statistics. |
n.success |
The number of successful Monte Carlo replications. |
Hasan BULUT <[email protected]>
if (requireNamespace("MASS", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { simRobCellT2_onesample(n = 50, p = 5, nrep = 50, seed = 123) }if (requireNamespace("MASS", quietly = TRUE) && requireNamespace("cellWise", quietly = TRUE)) { simRobCellT2_onesample(n = 50, p = 5, nrep = 50, seed = 123) }
summary.MVTests function summarizes of results of functions in this
package.
## S3 method for class 'MVTests' summary(object, ...)## S3 method for class 'MVTests' summary(object, ...)
object |
an object of class |
... |
additional parameters. |
This function prints a summary of the results of multivariate hypothesis
tests in the MVTests package.
the input object is returned silently.
Hasan BULUT <[email protected]>
# One Sample Hotelling T Square Test data(iris) X <- iris[1:50, 1:4] mean0 <- c(6, 3, 1, 0.25) result.onesample <- OneSampleHT2(data = X, mu0 = mean0, alpha = 0.05) summary(result.onesample) # Two Independent Sample Hotelling T Square Test data(iris) G <- c(rep(1, 50), rep(2, 50)) result.twosamples <- TwoSamplesHT2(data = iris[1:100, 1:4], group = G, alpha = 0.05) summary(result.twosamples) # Box's M Test data(iris) result.BoxM <- BoxM(data = iris[, 1:4], group = iris[, 5]) summary(result.BoxM) # Bartlett's Test of Sphericity data(iris) result.Bsper <- Bsper(data = iris[, 1:4]) summary(result.Bsper) # Bartlett's Test for One Sample Covariance Matrix data(iris) S <- matrix(c(5.71, -0.8, -0.6, -0.5, -0.8, 4.09, -0.74, -0.54, -0.6, -0.74, 7.38, -0.18, -0.5, -0.54, -0.18, 8.33), ncol = 4, nrow = 4) result.bcov <- Bcov(data = iris[, 1:4], Sigma = S) summary(result.bcov)# One Sample Hotelling T Square Test data(iris) X <- iris[1:50, 1:4] mean0 <- c(6, 3, 1, 0.25) result.onesample <- OneSampleHT2(data = X, mu0 = mean0, alpha = 0.05) summary(result.onesample) # Two Independent Sample Hotelling T Square Test data(iris) G <- c(rep(1, 50), rep(2, 50)) result.twosamples <- TwoSamplesHT2(data = iris[1:100, 1:4], group = G, alpha = 0.05) summary(result.twosamples) # Box's M Test data(iris) result.BoxM <- BoxM(data = iris[, 1:4], group = iris[, 5]) summary(result.BoxM) # Bartlett's Test of Sphericity data(iris) result.Bsper <- Bsper(data = iris[, 1:4]) summary(result.Bsper) # Bartlett's Test for One Sample Covariance Matrix data(iris) S <- matrix(c(5.71, -0.8, -0.6, -0.5, -0.8, 4.09, -0.74, -0.54, -0.6, -0.74, 7.38, -0.18, -0.5, -0.54, -0.18, 8.33), ncol = 4, nrow = 4) result.bcov <- Bcov(data = iris[, 1:4], Sigma = S) summary(result.bcov)
Robust Hotelling T^2 Test Statistic for Two Independent Samples in high Dimensional Data
TR2(x1, x2, alpha = 0.75)TR2(x1, x2, alpha = 0.75)
x1 |
the data matrix for the first group. It must be matrix or data.frame. |
x2 |
the data matrix for the first group. It must be matrix or data.frame. |
alpha |
numeric parameter controlling the size of the subsets over which the determinant is minimized. Allowed values are between 0.5 and 1 and the default is 0.75. |
TR2 function calculates the robust Hotelling T^2 test statistic for two independent samples in high dimensional data based on the minimum regularized covariance determinant estimators.
a list with 2 elements:
TR2 |
The calculated value of Robust Hotelling T^2 statistic based on MRCD estimations |
Hasan BULUT <[email protected]>
Bulut et al. (2024). A Robust High-Dimensional Test for Two-Sample Comparisons, Axioms
if (requireNamespace("rrcov", quietly=TRUE)) { x<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(0,20)) y<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(1,20)) TR2(x1=x,x2=y)}if (requireNamespace("rrcov", quietly=TRUE)) { x<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(0,20)) y<-mvtnorm::rmvnorm(n=10,sigma=diag(20),mean=rep(1,20)) TR2(x1=x,x2=y)}
TwoSamplesHT2 function computes Hotelling T^2 statistic for two
independent samples and gives confidence intervals.
TwoSamplesHT2(data, group, alpha = 0.05, Homogenity = TRUE)TwoSamplesHT2(data, group, alpha = 0.05, Homogenity = TRUE)
data |
a data frame. |
group |
a group vector consisting of 1 and 2 values. |
alpha |
Significance Level that will be used for confidence intervals. default=0.05 |
Homogenity |
a logical argument. If sample covariance matrices are
homogeneity,then |
This function computes two independent samples Hotelling T^2 statistics
that is used to test
whether two population mean vectors are equal to each other.
When H0 is rejected, this function computes confidence intervals
for all variables to determine variable(s) affecting on rejection decision.
Moreover, when covariance matrices are not homogeneity, the approach proposed
by D. G. Nel and V. D. Merwe (1986) is used.
a list with 8 elements:
HT2 |
The value of Hotelling T^2 Test Statistic |
F |
The value of F Statistic |
df |
The F statistic's degree of freedom |
p.value |
p value |
CI |
The lower and upper limits of confidence intervals obtained for all variables |
alpha |
The alpha value using in confidence intervals |
Descriptive1 |
Descriptive Statistics for the first group |
Descriptive2 |
Descriptive Statistics for the second group |
Hasan BULUT <[email protected]>
Rencher, A. C. (2003). Methods of multivariate analysis (Vol. 492). John Wiley & Sons.
Tatlidil, H. (1996). Uygulamali Cok Degiskenli Istatistiksel Yontemler. Cem Web.
D.G. Nel & C.A. Van Der Merwe (1986) A solution to the multivariate behrens fisher problem, Communications in Statistics:Theory and Methods, 15:12, 3719-3735
data(iris) G<-c(rep(1,50),rep(2,50)) # When covariances matrices are homogeneity results1 <- TwoSamplesHT2(data=iris[1:100,1:4],group=G,alpha=0.05) summary(results1) # When covariances matrices are not homogeneity results2 <- TwoSamplesHT2(data=iris[1:100,1:4],group=G,Homogenity=FALSE) summary(results2)data(iris) G<-c(rep(1,50),rep(2,50)) # When covariances matrices are homogeneity results1 <- TwoSamplesHT2(data=iris[1:100,1:4],group=G,alpha=0.05) summary(results1) # When covariances matrices are not homogeneity results2 <- TwoSamplesHT2(data=iris[1:100,1:4],group=G,Homogenity=FALSE) summary(results2)