| Title: | Exploratory and Person/Item Misfit Diagnostics for Polytomous Data |
|---|---|
| Description: | Analysis of items and persons in data. To identify and remove person misfit in polytomous item-response data using either 'mokken' or a graded response model (GRM, via 'mirt'). Provides automatic thresholds, visual diagnostics (2D/3D), and export utilities. Methods build on Mokken scaling as in Mokken (1971, ISBN:9789027968821) and on the graded response model of Samejima (1969) <doi:10.1007/BF03372160>. |
| Authors: | Hasan Bulut [aut, cre] (ORCID: <https://orcid.org/0000-0002-6924-9651>), Asiye Şengül Avşar [aut] (ORCID: <https://orcid.org/0000-0001-5522-2514>) |
| Maintainer: | Hasan Bulut <[email protected]> |
| License: | GPL-3 |
| Version: | 1.1.1 |
| Built: | 2026-05-24 06:20:18 UTC |
| Source: | https://github.com/hsnbulut/epmfd |
clean_epmfd() removes individuals flagged as misfitting according to
a chosen decision rule and returns a cleaned dataset that can be passed
directly to scale_epmfd().
clean_epmfd(misfit, criterion = c("union", "intersection"), clean_item = FALSE)clean_epmfd(misfit, criterion = c("union", "intersection"), clean_item = FALSE)
misfit |
An |
criterion |
Character string, either |
clean_item |
is a logical argument. If clean_item=TRUE, then the function can clean items. The defaul value is FALSE. |
The function uses logical misfit indicators stored in misfit$table,
including:
misfit_any: TRUE if at least one statistic flagged the person.
Statistic-specific columns (e.g., Gnp, U3p, lpz) indicating
per-statistic misfit decisions.
The set of statistics actually considered is taken from misfit$stats.
Under the "intersection" rule, a person is removed only if all of those
statistics are TRUE. Internally, rowSums(..., na.rm = TRUE) is
used so that NA values do not force removal (i.e., NA behaves
as “not flagged” in the intersection count).
Only items listed in misfit$scaled$kept are retained in the output.
Person identifiers from the original raw object are preserved for the kept rows.
An epmfd_clean list with:
raw: An epmfd_raw object containing only the retained persons and
items, directly usable in scale_epmfd().
clean_data: The cleaned raw data frame (persons × kept items).
n_removed: Number of persons removed.
criterion: The applied decision rule.
misfit: The original epmfd_misfit object (as provided).
"union" (default): A person is removed if at least one statistic
(e.g., Gnp, U3p, lpz) flags them as misfitting. This is stricter.
"intersection": A person is removed only if all statistics flag them
as misfitting. This is more lenient.
library(epmfd) data<-load_epmfd(sampledata) scaling_data<-scale_epmfd(data) misfit_result<-misfit_epmfd(scaling_data) clean_data<-clean_epmfd(misfit_result) head(clean_data$clean_data) dim(data$data) # the dimension of raw data dim(clean_data$clean_data) # the dimension of clean datalibrary(epmfd) data<-load_epmfd(sampledata) scaling_data<-scale_epmfd(data) misfit_result<-misfit_epmfd(scaling_data) clean_data<-clean_epmfd(misfit_result) head(clean_data$clean_data) dim(data$data) # the dimension of raw data dim(clean_data$clean_data) # the dimension of clean data
export_epmfd() writes commonly used tables from epmfd_* objects to
CSV / Excel / SPSS files, and (optionally) saves the object itself as an RDS.
export_epmfd( object, dir = NULL, prefix = NULL, format = c("csv", "xlsx", "sav"), save_rds = FALSE, include_misfit = FALSE )export_epmfd( object, dir = NULL, prefix = NULL, format = c("csv", "xlsx", "sav"), save_rds = FALSE, include_misfit = FALSE )
object |
One of: |
dir |
Target directory. If |
prefix |
File name prefix (without extension). If |
format |
Output format; one of
|
save_rds |
Logical; if |
include_misfit |
Logical; if |
What is produced depends on the object class:
epmfd_clean: cleaned person-by-item data (clean);
if include_misfit = TRUE and a misfit object is attached, also misfit.
epmfd_misfit: if include_misfit = TRUE, misfit.
epmfd_scaled: item status summary (scale).
When format = "sav", logical columns are converted to labelled factors
(FALSE/TRUE) for SPSS compatibility. Writing .sav does not support list
columns; the function aborts if such columns are present.
If dir is NULL, a named list containing the tables that would be
written (e.g., clean, misfit, scale). If dir is non-NULL, (invisibly)
a character vector of file paths that were written.
dir is provided)Files are named <prefix>_<name>.<format> under dir. For example:
study1_clean.csv, study1_misfit.xlsx, or study1_scale.sav.
saveRDS(), readr, openxlsx, haven
# Minimal toy objects created inside the example ---- set.seed(1) toy_clean <- data.frame( I1 = sample(0:1, 6, TRUE), I2 = sample(0:1, 6, TRUE) ) toy_misfit <- data.frame( person = 1:6, Gpn = runif(6), U3p = runif(6) ) clean_obj <- structure( list(clean_data = toy_clean, misfit = list(table = toy_misfit)), class = "epmfd_clean" ) misfit_obj <- structure( list(table = toy_misfit, method = "mokken"), class = "epmfd_misfit" ) scaled_obj <- structure( list(kept = c("I1", "I2"), removed = character()), class = "epmfd_scaled" ) # 1) No writing: return list lst <- export_epmfd(clean_obj, dir = NULL, include_misfit = TRUE) str(lst) # 2) Write to a temporary directory (CRAN policy) tmpdir <- tempdir() export_epmfd(clean_obj, dir = tmpdir, prefix = "study1", format = "csv", save_rds = TRUE) # Optional formats guarded by Suggests (run only if installed) if (requireNamespace("haven", quietly = TRUE)) { export_epmfd(misfit_obj, dir = tmpdir, format = "sav", include_misfit = TRUE) } if (requireNamespace("openxlsx", quietly = TRUE)) { export_epmfd(scaled_obj, dir = tmpdir, prefix = "scaleA", format = "xlsx") }# Minimal toy objects created inside the example ---- set.seed(1) toy_clean <- data.frame( I1 = sample(0:1, 6, TRUE), I2 = sample(0:1, 6, TRUE) ) toy_misfit <- data.frame( person = 1:6, Gpn = runif(6), U3p = runif(6) ) clean_obj <- structure( list(clean_data = toy_clean, misfit = list(table = toy_misfit)), class = "epmfd_clean" ) misfit_obj <- structure( list(table = toy_misfit, method = "mokken"), class = "epmfd_misfit" ) scaled_obj <- structure( list(kept = c("I1", "I2"), removed = character()), class = "epmfd_scaled" ) # 1) No writing: return list lst <- export_epmfd(clean_obj, dir = NULL, include_misfit = TRUE) str(lst) # 2) Write to a temporary directory (CRAN policy) tmpdir <- tempdir() export_epmfd(clean_obj, dir = tmpdir, prefix = "study1", format = "csv", save_rds = TRUE) # Optional formats guarded by Suggests (run only if installed) if (requireNamespace("haven", quietly = TRUE)) { export_epmfd(misfit_obj, dir = tmpdir, format = "sav", include_misfit = TRUE) } if (requireNamespace("openxlsx", quietly = TRUE)) { export_epmfd(scaled_obj, dir = tmpdir, prefix = "scaleA", format = "xlsx") }
load_epmfd() prepares raw item-response data for subsequent
functions in the epmfd workflow. It validates input, ensures that all
item responses fall within the expected range of categories, converts
items to ordered factors, and attaches person IDs.
load_epmfd(data, id_col = NULL, likert_levels = NULL)load_epmfd(data, id_col = NULL, likert_levels = NULL)
data |
A data.frame or tibble with persons in rows and items in columns.
All item responses must be integers in |
id_col |
Optional |
likert_levels |
Optional |
Each column of data is validated to ensure responses are within
1:K. Values outside this range cause an error. Missing values
are allowed and reported.
An object of class epmfd_raw, a list with elements:
data: A data.frame of ordered-factor responses
id: Vector of person IDs
K: Maximum number of categories per item
# Example: 5 persons × 3 items, responses 1–4 df <- data.frame( Pid = paste0("P", 1:5), Item1 = c(1, 2, 3, 2, 1), Item2 = c(2, 3, 4, 2, 1), Item3 = c(3, 4, 1, 2, 2) ) raw <- load_epmfd(df, id_col = "Pid", likert_levels = 4) str(raw)# Example: 5 persons × 3 items, responses 1–4 df <- data.frame( Pid = paste0("P", 1:5), Item1 = c(1, 2, 3, 2, 1), Item2 = c(2, 3, 4, 2, 1), Item3 = c(3, 4, 1, 2, 2) ) raw <- load_epmfd(df, id_col = "Pid", likert_levels = 4) str(raw)
misfit_epmfd() computes selected person-fit statistics for polytomous
responses and returns an epmfd_misfit object with scores, thresholds,
and logical flags per person.
misfit_epmfd(object, stats = c("auto", "lpz", "Gnp", "U3p"), cut.off = "auto")misfit_epmfd(object, stats = c("auto", "lpz", "Gnp", "U3p"), cut.off = "auto")
object |
An |
stats |
Character vector choosing which statistics to compute.
Allowed values:
|
cut.off |
Cut-off for |
Auto vs manual decision for misfit_final:
If stats contains "auto":
for "mokken": misfit_final = Gnp & U3p
for "mirt": misfit_final = (lpz & Gnp & U3p) if any such rows exist,
otherwise fallback to lpz only.
If stats is manual (no "auto"): misfit_final is the AND over the
selected statistics (if only one selected, it is used directly).
Polytomous PerFit statistics assume a common design K (number of
categories) across items. This function uses object$raw$K as the global
design K and maps item responses to 0..K-1 without compressing per-item
gaps (unused categories are allowed and do not trigger an error).
An epmfd_misfit list with:
scaled: the input epmfd_scaled object
method: detected method ("mirt" or "mokken")
stats: actually computed statistics (subset of c("lpz","Gnp","U3p"))
thresholds: named list of lists with value and tail
scores: named list of numeric score vectors per statistic
table: a tibble with id, one logical column per statistic,
misfit_any (OR over selected stats), and misfit_final
(see Details)
library(epmfd) data<-load_epmfd(sampledata) scaling_data<-scale_epmfd(data) misfit_result<-misfit_epmfd(scaling_data) misfit_result plot_misfit(misfit_result,threeD=TRUE)library(epmfd) data<-load_epmfd(sampledata) scaling_data<-scale_epmfd(data) misfit_result<-misfit_epmfd(scaling_data) misfit_result plot_misfit(misfit_result,threeD=TRUE)
Quick visual summaries for three object classes:
epmfd_scaled: Item-level retention summary (Kept vs Removed)
and a quality-statistic histogram (either discrimination a for
mirt or scalability H_i for mokken).
epmfd_misfit: Bar plot of misfit counts per statistic and a global bar summarizing overall misfit ratio.
epmfd_clean: Bar plot comparing remaining vs removed persons.
## S3 method for class 'epmfd_scaled' plot(x, ...) ## S3 method for class 'epmfd_misfit' plot(x, ...) ## S3 method for class 'epmfd_clean' plot(x, ...)## S3 method for class 'epmfd_scaled' plot(x, ...) ## S3 method for class 'epmfd_misfit' plot(x, ...) ## S3 method for class 'epmfd_clean' plot(x, ...)
x |
An |
... |
Additional aesthetics or layers forwarded to the underlying
ggplot2 geoms (e.g., |
If the patchwork package is installed, paired plots are stacked
vertically and returned as a single patchwork object; otherwise a
list of two ggplot2 objects is returned.
A single ggplot2 object, a patchwork object (if
available), or a list of ggplot2 objects—depending on the
class and whether combined layout is possible.
These methods use ggplot2. For epmfd_scaled objects fitted with
mirt, the method accesses model coefficients via mirt if that
package is installed (it is not required for mokken). Stacking
multiple plots uses patchwork when available.
plot_misfit for 2D/3D scatter visualizations of
person-level misfit, and misfit_epmfd / clean_epmfd
for producing the inputs to these plots.
# Scaled object p_scaled <- plot(scaled_obj) # item retention + quality histogram # Misfit object p_mf <- plot(misfit_obj) # per-statistic counts + overall ratio # Cleaned object p_cl <- plot(clean_obj) # remaining vs removed persons # Add ggplot2 options through '...' plot(misfit_obj, alpha = 0.8)# Scaled object p_scaled <- plot(scaled_obj) # item retention + quality histogram # Misfit object p_mf <- plot(misfit_obj) # per-statistic counts + overall ratio # Cleaned object p_cl <- plot(clean_obj) # remaining vs removed persons # Add ggplot2 options through '...' plot(misfit_obj, alpha = 0.8)
plot_misfit() visualizes person-level misfit statistics stored in an
epmfd_misfit object. It supports:
2D: scatter plots for all pairwise combinations of the
selected statistics (or a single 2D plot if exactly two are given).
Points are coloured by the joint exceedance logic (see any).
Axis titles are the statistic names; the main title shows the cut-offs
in parentheses, and dashed lines mark the cut-offs per axis.
3D: an interactive scatter using plotly if three
statistics are supplied; optionally adds three semi-transparent planes
at the x/y/z cut-offs when planes = TRUE.
plot_misfit( object, stats = NULL, threeD = FALSE, any = FALSE, planes = TRUE, label_ids = FALSE, ... )plot_misfit( object, stats = NULL, threeD = FALSE, any = FALSE, planes = TRUE, label_ids = FALSE, ... )
object |
An |
stats |
Character vector of length 2 or 3 naming statistics found in
|
threeD |
Logical. If |
any |
Logical. Colouring rule:
|
planes |
Logical (3D only). If |
label_ids |
Logical. If |
... |
Additional aesthetics passed to
|
Cut-off logic. For each selected statistic, a person is deemed to
exceed if its score is greater than the cut-off for upper-tailed statistics
or less than the cut-off for lower-tailed statistics. In 2D, dashed vertical
and horizontal lines indicate the cut-offs; the plot title shows
"Y (cutY) vs X (cutX)" with formatted values. In 3D, axis titles
include the cut-off values in parentheses, and (optionally) three grey planes
make the cut-offs explicit.
Returned value. With two statistics, a single ggplot is returned;
with three statistics and threeD = FALSE, a named list of ggplots is
returned for all 2D pairs. With threeD = TRUE and three statistics, a
plotly object is returned.
Dependencies. This function uses ggplot2 for 2D plots and,
for 3D, plotly (required only when threeD = TRUE). Optional labels in
2D use ggrepel when installed.
A ggplot object (2D), a named list of ggplots (all 2D pairs), or a
plotly object (3D), depending on stats and threeD.
misfit_epmfd() for computing statistics and thresholds;
clean_epmfd() for removing misfitting persons.
# Suppose 'mf' is an epmfd_misfit with scores Gnp, U3p, lpz # 2D: single plot plot_misfit(mf, stats = c("Gnp","U3p"), any = TRUE) # 2D: all pairwise plots plot_misfit(mf, stats = c("Gnp","U3p","lpz")) # 3D: with cut-off planes plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = TRUE) # 3D: points only (no planes) plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = FALSE)# Suppose 'mf' is an epmfd_misfit with scores Gnp, U3p, lpz # 2D: single plot plot_misfit(mf, stats = c("Gnp","U3p"), any = TRUE) # 2D: all pairwise plots plot_misfit(mf, stats = c("Gnp","U3p","lpz")) # 3D: with cut-off planes plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = TRUE) # 3D: points only (no planes) plot_misfit(mf, stats = c("Gnp","U3p","lpz"), threeD = TRUE, planes = FALSE)
Prints summary information for an epmfd_misfit object.
## S3 method for class 'epmfd_misfit' print(x, ...)## S3 method for class 'epmfd_misfit' print(x, ...)
x |
An object of class |
... |
Further arguments passed to or from other methods. |
The input object x, returned (invisibly) after printing.
A small toy dataset included in the epmfd package, containing polytomous item responses from simulated persons.
sampledatasampledata
A data frame with 20 persons (rows) and 6 items (columns). Each item takes ordered values 1–5.
data(sampledata) head(sampledata)data(sampledata) head(sampledata)
scale_epmfd() fits either a parametric graded response model (GRM, via
mirt) or a nonparametric Mokken model (via mokken) to
polytomous item-response data and filters out weak items based on
user-specified thresholds.
scale_epmfd( object, method = c("auto", "mirt", "mokken"), a_thr = 0.5, H_thr = 0.3 )scale_epmfd( object, method = c("auto", "mirt", "mokken"), a_thr = 0.5, H_thr = 0.3 )
object |
An |
method |
Scaling method. One of:
|
a_thr |
Numeric. Threshold for item discrimination parameter |
H_thr |
Numeric. Threshold for item scalability coefficient |
The function converts ordered factors to numeric before analysis.
For GRM (mirt), items are filtered by their discrimination
parameter a. The overall model fit is attempted using
mirt::M2(); if this fails (e.g., due to insufficient df), a warning
is issued and model_fit = NULL.
For Mokken, item scalability coefficients H_i are computed and
compared to H_thr.
An object of class epmfd_scaled, a list containing:
raw: the original epmfd_raw object
method: scaling method actually used ("mirt" or
"mokken")
kept: names of items retained
removed: names of items removed
model: fitted GRM model (for "mirt"), else NULL
ai: item parameter estimates (for "mirt")
a_thr: discrimination threshold used (for "mirt")
model_fit: results of mirt::M2() (if available)
Hi: vector of item scalability coefficients (for
"mokken")
H_thr: scalability threshold used (for "mokken")
items: the vector containing all items names.
load_epmfd(), misfit_epmfd(), plot.epmfd_scaled()
library(epmfd) data<-load_epmfd(sampledata) scale_epmfd(data)library(epmfd) data<-load_epmfd(sampledata) scale_epmfd(data)
epmfd_clean objectsSummary method for epmfd_clean objects
## S3 method for class 'epmfd_clean' summary(object, ...)## S3 method for class 'epmfd_clean' summary(object, ...)
object |
An object of class |
... |
Further arguments (ignored). |
Invisibly returns a named list with summary numbers.