| Title: | Inference on the Overlap Coefficient |
|---|---|
| Description: | Provides functions to construct confidence intervals for the Overlap Coefficient (OVL). OVL measures the similarity between two distributions through the overlapping area of their distribution functions. Given its intuitive description and ease of visual representation by the straightforward depiction of the amount of overlap between the two corresponding histograms based on samples of measurements from each one of the two distributions, the development of accurate methods for confidence interval construction can be useful for applied researchers. Implements methods based on the work of Franco-Pereira, A.M., Nakas, C.T., Reiser, B., and Pardo, M.C. (2021) <doi:10.1177/09622802211046386> as well as extensions for multimodal distributions proposed by Alcaraz-Peñalba, A., Franco-Pereira, A., and Pardo, M.C. (2025) <doi:10.1007/s10182-025-00545-2>. |
| Authors: | Alba M. Franco-Pereira [aut, cre, cph], Christos T. Nakas [aut], Benjamin Reiser [aut], M.Carmen Pardo [aut], Alba Alcaraz-Peñalba [aut, cph] |
| Maintainer: | Alba M. Franco-Pereira <[email protected]> |
| License: | GPL-2 |
| Version: | 0.1.1 |
| Built: | 2026-05-23 08:28:09 UTC |
| Source: | https://github.com/cran/OVL.CI |
Fits a univariate Gaussian mixture model using the Expectation-Maximization (EM) algorithm. The function is intended as a lightweight fallback implementation (e.g., when mixtools is unavailable or fails).
EM(X, K = 2, max_iter = 100, tol = 1e-05)EM(X, K = 2, max_iter = 100, tol = 1e-05)
X |
Numeric vector of observations. |
K |
Integer. Number of mixture components. |
max_iter |
Integer. Maximum number of EM iterations. |
tol |
Positive numeric. Convergence tolerance for the absolute change in the log-likelihood. |
The algorithm is initialized using the k-means clustering procedure and then alternates between:
E-step: computing the expectation of the complete log-likelihood function.
M-step: maximizing the expectation of the complete log-likelihood function.
A list with the following components:
Numeric vector of estimated component means (length K).
Numeric vector of estimated component standard deviations (length K).
Numeric vector of estimated mixing proportions (length K).
Number of iterations performed.
Matrix of posterior probabilities (responsibilities) with
dimension length(X) by K.
set.seed(1) x <- c(rnorm(100, -2, 1), rnorm(100, 2, 1)) fit <- EM(x, K = 2) fit$mu fit$piset.seed(1) x <- c(rnorm(100, -2, 1), rnorm(100, 2, 1)) fit <- EM(x, K = 2) fit$mu fit$pi
Contains control and case samples generated from a normal distribution and a two-component normal mixture distribution, respectively.
data(mixnorm_data)data(mixnorm_data)
A data frame with 100 rows and 2 variables:
Simulated data from a N(5,1) normal distribution.
Simulated data from a two-component normal mixture distribution: 0.8N(2,1) + 0.2N(3,1).
This dataset was artificially generated for the OVL.CI package.
data(mixnorm_data)data(mixnorm_data)
Parametric approach using a bootstrap-based approach to estimate the variance.
OVL.BCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)OVL.BCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCAN (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCAN (controls,cases)
Parametric approach using a bootstrap bias-corrected approach.
OVL.BCbias(x, y, alpha = 0.05, B = 100, h_ini = -0.6)OVL.BCbias(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCAN (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCAN (controls,cases)
Parametric approach using a bootstrap percentil approach.
OVL.BCPB(x, y, alpha = 0.05, B = 100, h_ini = -0.6)OVL.BCPB(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCPB (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.BCPB (controls,cases)
Parametric approach using the delta method.
OVL.D(x, y, alpha = 0.05)OVL.D(x, y, alpha = 0.05)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.D (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.D (controls,cases)
Parametric approach using the delta method after the Box-Cox transformation.
OVL.DBC(x, y, alpha = 0.05, h_ini = -0.6)OVL.DBC(x, y, alpha = 0.05, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.DBC (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.DBC (controls,cases)
Parametric approach using the delta method after the Box-Cox transformation taking into account the variability of the estimated transformation parameter.
OVL.DBCL(x, y, alpha = 0.05, h_ini = -0.6)OVL.DBCL(x, y, alpha = 0.05, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.DBCL (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.DBCL (controls,cases)
Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using EM-based estimation and the delta method.
OVL.Delta.mix( x, y, alpha = 0.05, h = 10^(-5), interv = c(0, 20), all_mix = FALSE )OVL.Delta.mix( x, y, alpha = 0.05, h = 10^(-5), interv = c(0, 20), all_mix = FALSE )
x |
Numeric vector. Data from the first group. When |
y |
Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture. |
alpha |
confidence level. |
h |
Step size used to compute numerical derivatives. |
interv |
Numeric vector of length 2. Search interval for intersection points between the corresponding densities. |
all_mix |
Logical. If |
A list containing a confidence interval.
Additional elements (e.g., var_OVL, parameter estimates, OVL_hat) may also be returned.
set.seed(1) x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1)) y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1)) res <- OVL.Delta.mix(x, y, all_mix = TRUE, interv = c(-10, 10)) res$IC1 res$IC2set.seed(1) x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1)) y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1)) res <- OVL.Delta.mix(x, y, all_mix = TRUE, interv = c(-10, 10)) res$IC1 res$IC2
Parametric approach based on generalized inference.
OVL.GPQ(x, y, alpha = 0.05, K = 2500, h_ini = -1.6, BC = FALSE)OVL.GPQ(x, y, alpha = 0.05, K = 2500, h_ini = -1.6, BC = FALSE)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
K |
Number of simulated generalized pivotal quantities. |
h_ini |
initial value in the optimization problem. |
BC |
Logical. Indicates whether a Box–Cox transformation is applied to the data. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.GPQ (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.GPQ (controls,cases)
Computes a confidence interval for the OVL between two populations under Gaussian and two-component Gaussian mixture models, or both populations modeled as two-component Gaussian mixtures, using generalized inference.
OVL.GPQ.mix(x, y, alpha = 0.05, interv = c(0, 20), k = 1000, all_mix = FALSE)OVL.GPQ.mix(x, y, alpha = 0.05, interv = c(0, 20), k = 1000, all_mix = FALSE)
x |
Numeric vector. Data from the first group. When |
y |
Numeric vector. Data from the second group, modeled as a two-component Gaussian mixture. |
alpha |
confidence level. |
interv |
Numeric vector of length 2. Search interval for intersection points between the corresponding densities. |
k |
Number of simulated generalized pivotal quantities. |
all_mix |
Logical. If |
confidence interval.
set.seed(1) x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1)) y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1)) res <- OVL.GPQ.mix(x, y, all_mix = TRUE, interv = c(-10, 10)) res$IC1 res$IC2set.seed(1) x <- ifelse(runif(100) < 0.5, rnorm(100, mean = 0, sd = 1), rnorm(100, mean = 2, sd = 1)) y <- ifelse(runif(100) < 0.5, rnorm(100, mean = 2.5, sd = 1), rnorm(100, mean = 2, sd = 1)) res <- OVL.GPQ.mix(x, y, all_mix = TRUE, interv = c(-10, 10)) res$IC1 res$IC2
Kernel approach estimating the variance via bootstrap.
OVL.K(x, y, alpha = 0.05, B = 100, k = 1, h = 1)OVL.K(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.K (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.K (controls,cases)
Kernel approach using a bootstrap percentile approach.
OVL.KPB(x, y, alpha = 0.05, B = 100, k = 1, h = 1)OVL.KPB(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.KPB (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.KPB (controls,cases)
BCAN procedure carried out in the logit scale and back-transformed.
OVL.LogitBCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)OVL.LogitBCAN(x, y, alpha = 0.05, B = 100, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitBCAN (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitBCAN (controls,cases)
Parametric approach using the delta method after switching to a logit scale and then transforming back.
OVL.LogitD(x, y, alpha = 0.05)OVL.LogitD(x, y, alpha = 0.05)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitD (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitD (controls,cases)
Parametric approach using the delta method after the Box-Cox transformation after switching to a logit scale and then transforming back.
OVL.LogitDBC(x, y, alpha = 0.05, h_ini = -0.6)OVL.LogitDBC(x, y, alpha = 0.05, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitDBC (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitDBC (controls,cases)
OVL.LogitDBCL
OVL.LogitDBCL(x, y, alpha = 0.05, h_ini = -0.6)OVL.LogitDBCL(x, y, alpha = 0.05, h_ini = -0.6)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
h_ini |
initial value in the optimization problem. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitDBCL (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitDBCL (controls,cases)
Kernel approach estimating the variance via bootstrap in the logit scale and back-transformed.
OVL.LogitK(x, y, alpha = 0.05, B = 100, k = 1, h = 1)OVL.LogitK(x, y, alpha = 0.05, B = 100, k = 1, h = 1)
x |
Numeric vector. Data from the first group. |
y |
Numeric vector. Data from the second group. |
alpha |
confidence level. |
B |
bootstrap size. |
k |
kernel. When k=1 (default value) the kernel used in the estimation is the Gaussian kernel. Otherwise, the Epanechnikov kernel is used instead. |
h |
bandwidth. When h=1 (default value) the cross-validation bandwidth is chosen. Otherwise, the bandwidth considered by Schmid and Schmidt (2006) is used instead. |
confidence interval.
controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitK (controls,cases)controls = rnorm(50,6,1) cases = rnorm(100,6.5,0.5) OVL.LogitK (controls,cases)
Contains controls and cases data from normal distributions.
data(test_data)data(test_data)
A data frame with 100 rows and 2 variables:
Simulated data from a N(10,1)distribution for the control group.
Simulated data from a N(10.5,0.5)distribution for the case group.
This data set was artificially created for the OVL.CI package.
data(test_data)data(test_data)