Title: | Indices of Effect Size |
---|---|
Description: | Provide utilities to work with indices of effect size for a wide variety of models and hypothesis tests (see list of supported models using the function 'insight::supported_models()'), allowing computation of and conversion between indices such as Cohen's d, r, odds, etc. References: Ben-Shachar et al. (2020) <doi:10.21105/joss.02815>. |
Authors: | Mattan S. Ben-Shachar [aut, cre] (<https://orcid.org/0000-0002-4287-4801>, @mattansb), Dominique Makowski [aut] (<https://orcid.org/0000-0001-5375-9967>, @Dom_Makowski), Daniel Lüdecke [aut] (<https://orcid.org/0000-0002-8895-3206>, @strengejacke), Indrajeet Patil [aut] (<https://orcid.org/0000-0003-1995-6531>, @patilindrajeets), Brenton M. Wiernik [aut] (<https://orcid.org/0000-0001-9560-6336>, @bmwiernik), Rémi Thériault [aut] (<https://orcid.org/0000-0003-4315-6788>, @rempsyc), Ken Kelley [ctb], David Stanley [ctb], Aaron Caldwell [ctb] , Jessica Burnett [rev] , Johannes Karreth [rev] , Philip Waggoner [aut, ctb] |
Maintainer: | Mattan S. Ben-Shachar <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.8.9 |
Built: | 2024-11-03 09:26:48 UTC |
Source: | https://github.com/easystats/effectsize |
to
and Other Correlation-like Effect SizesConvert between (chi-square),
(phi), Cramer's
, Tschuprow's
, Cohen's
,
פ (Fei) and Pearson's
for contingency
tables or goodness of fit.
chisq_to_phi( chisq, n, nrow = 2, ncol = 2, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_cohens_w( chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ... ) chisq_to_cramers_v( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_tschuprows_t( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_fei(chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ...) chisq_to_pearsons_c( chisq, n, nrow, ncol, ci = 0.95, alternative = "greater", ... ) phi_to_chisq(phi, n, ...)
chisq_to_phi( chisq, n, nrow = 2, ncol = 2, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_cohens_w( chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ... ) chisq_to_cramers_v( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_tschuprows_t( chisq, n, nrow, ncol, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) chisq_to_fei(chisq, n, nrow, ncol, p, ci = 0.95, alternative = "greater", ...) chisq_to_pearsons_c( chisq, n, nrow, ncol, ci = 0.95, alternative = "greater", ... ) phi_to_chisq(phi, n, ...)
chisq |
The |
n |
Total sample size. |
nrow , ncol
|
The number of rows/columns in the contingency table. |
adjust |
Should the effect size be corrected for small-sample bias?
Defaults to |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Arguments passed to or from other methods. |
p |
Vector of expected values. See |
phi |
The |
These functions use the following formulas:
Where are the expected probabilities.
For versions adjusted for small-sample bias of ,
, and
,
see Bergsma, 2013.
A data frame with the effect size(s), and confidence interval(s). See
cramers_v()
.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532-574.
Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Bergsma, W. (2013). A bias-correction for Cramer's V and Tschuprow's T. Journal of the Korean Statistical Society, 42(3), 323-328.
Johnston, J. E., Berry, K. J., & Mielke Jr, P. W. (2006). Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and motor skills, 103(2), 412-414.
Rosenberg, M. S. (2010). A generalized formula for converting chi-square tests to effect sizes for meta-analysis. PloS one, 5(4), e10059.
phi()
for more details.
Other effect size from test statistic:
F_to_eta2()
,
t_to_d()
data("Music_preferences") # chisq.test(Music_preferences) #> #> Pearson's Chi-squared test #> #> data: Music_preferences #> X-squared = 95.508, df = 6, p-value < 2.2e-16 #> chisq_to_cohens_w(95.508, n = sum(Music_preferences), nrow = nrow(Music_preferences), ncol = ncol(Music_preferences) ) data("Smoking_FASD") # chisq.test(Smoking_FASD, p = c(0.015, 0.010, 0.975)) #> #> Chi-squared test for given probabilities #> #> data: Smoking_FASD #> X-squared = 7.8521, df = 2, p-value = 0.01972 chisq_to_fei( 7.8521, n = sum(Smoking_FASD), nrow = 1, ncol = 3, p = c(0.015, 0.010, 0.975) )
data("Music_preferences") # chisq.test(Music_preferences) #> #> Pearson's Chi-squared test #> #> data: Music_preferences #> X-squared = 95.508, df = 6, p-value < 2.2e-16 #> chisq_to_cohens_w(95.508, n = sum(Music_preferences), nrow = nrow(Music_preferences), ncol = ncol(Music_preferences) ) data("Smoking_FASD") # chisq.test(Smoking_FASD, p = c(0.015, 0.010, 0.975)) #> #> Chi-squared test for given probabilities #> #> data: Smoking_FASD #> X-squared = 7.8521, df = 2, p-value = 0.01972 chisq_to_fei( 7.8521, n = sum(Smoking_FASD), nrow = 1, ncol = 3, p = c(0.015, 0.010, 0.975) )
Compute effect size indices for standardized differences: Cohen's d,
Hedges' g and Glass’s delta (). (This function returns the
population estimate.) Pair with any reported
stats::t.test()
.
Both Cohen's d and Hedges' g are the estimated the standardized
difference between the means of two populations. Hedges' g provides a
correction for small-sample bias (using the exact method) to Cohen's d. For
sample sizes > 20, the results for both statistics are roughly equivalent.
Glass’s delta is appropriate when the standard deviations are significantly
different between the populations, as it uses only the second group's
standard deviation.
cohens_d( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, adjust = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) hedges_g( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) glass_delta( x, y = NULL, data = NULL, mu = 0, adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
cohens_d( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, adjust = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) hedges_g( x, y = NULL, data = NULL, pooled_sd = TRUE, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) glass_delta( x, y = NULL, data = NULL, mu = 0, adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y
|
A numeric vector, or a character name of one in |
data |
An optional data frame containing the variables. |
pooled_sd |
If |
mu |
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). |
paired |
If |
adjust |
Should the effect size be adjusted for small-sample bias using
Hedges' method? Note that |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
Set pooled_sd = FALSE
for effect sizes that are to accompany a Welch's
t-test (Delacre et al, 2021).
A data frame with the effect size ( Cohens_d
, Hedges_g
,
Glass_delta
) and their CIs (CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
The indices here give the population estimated standardized difference. Some statistical packages give the sample estimate instead (without applying Bessel's correction).
Algina, J., Keselman, H. J., & Penfield, R. D. (2006). Confidence intervals for an effect size when variances are not equal. Journal of Modern Applied Statistical Methods, 5(1), 2.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Delacre, M., Lakens, D., Ley, C., Liu, L., & Leys, C. (2021, May 7). Why Hedges’ g*s based on the non-pooled standard deviation should be reported with Welch's t-test. doi:10.31234/osf.io/tu6mp
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Sage.
rm_d()
, sd_pooled()
, t_to_d()
, r_to_d()
Other standardized differences:
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
data(mtcars) mtcars$am <- factor(mtcars$am) # Two Independent Samples ---------- (d <- cohens_d(mpg ~ am, data = mtcars)) # Same as: # cohens_d("mpg", "am", data = mtcars) # cohens_d(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # More options: cohens_d(mpg ~ am, data = mtcars, pooled_sd = FALSE) cohens_d(mpg ~ am, data = mtcars, mu = -5) cohens_d(mpg ~ am, data = mtcars, alternative = "less") hedges_g(mpg ~ am, data = mtcars) glass_delta(mpg ~ am, data = mtcars) # One Sample ---------- cohens_d(wt ~ 1, data = mtcars) # same as: # cohens_d("wt", data = mtcars) # cohens_d(mtcars$wt) # More options: cohens_d(wt ~ 1, data = mtcars, mu = 3) hedges_g(wt ~ 1, data = mtcars, mu = 3) # Paired Samples ---------- data(sleep) cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep) # same as: # cohens_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE) # cohens_d(sleep$extra[sleep$group == 1] - sleep$extra[sleep$group == 2]) # rm_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], method = "z", adjust = FALSE) # More options: cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, mu = -1, verbose = FALSE) hedges_g(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, verbose = FALSE) # Interpretation ----------------------- interpret_cohens_d(-1.48, rules = "cohen1988") interpret_hedges_g(-1.48, rules = "sawilowsky2009") interpret_glass_delta(-1.48, rules = "gignac2016") # Or: interpret(d, rules = "sawilowsky2009") # Common Language Effect Sizes d_to_u3(1.48) # Or: print(d, append_CLES = TRUE)
data(mtcars) mtcars$am <- factor(mtcars$am) # Two Independent Samples ---------- (d <- cohens_d(mpg ~ am, data = mtcars)) # Same as: # cohens_d("mpg", "am", data = mtcars) # cohens_d(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # More options: cohens_d(mpg ~ am, data = mtcars, pooled_sd = FALSE) cohens_d(mpg ~ am, data = mtcars, mu = -5) cohens_d(mpg ~ am, data = mtcars, alternative = "less") hedges_g(mpg ~ am, data = mtcars) glass_delta(mpg ~ am, data = mtcars) # One Sample ---------- cohens_d(wt ~ 1, data = mtcars) # same as: # cohens_d("wt", data = mtcars) # cohens_d(mtcars$wt) # More options: cohens_d(wt ~ 1, data = mtcars, mu = 3) hedges_g(wt ~ 1, data = mtcars, mu = 3) # Paired Samples ---------- data(sleep) cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep) # same as: # cohens_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE) # cohens_d(sleep$extra[sleep$group == 1] - sleep$extra[sleep$group == 2]) # rm_d(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], method = "z", adjust = FALSE) # More options: cohens_d(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, mu = -1, verbose = FALSE) hedges_g(Pair(extra[group == 1], extra[group == 2]) ~ 1, data = sleep, verbose = FALSE) # Interpretation ----------------------- interpret_cohens_d(-1.48, rules = "cohen1988") interpret_hedges_g(-1.48, rules = "sawilowsky2009") interpret_glass_delta(-1.48, rules = "gignac2016") # Or: interpret(d, rules = "sawilowsky2009") # Common Language Effect Sizes d_to_u3(1.48) # Or: print(d, append_CLES = TRUE)
Cohen's g is an effect size of asymmetry (or marginal heterogeneity) for
dependent (paired) contingency tables ranging between 0 (perfect symmetry)
and 0.5 (perfect asymmetry) (see stats::mcnemar.test()
). (Note this is not
not a measure of (dis)agreement between the pairs, but of (a)symmetry.)
cohens_g(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
cohens_g(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Ignored |
A data frame with the effect size (Cohens_g
, Risk_ratio
(possibly with the prefix log_
), Cohens_h
) and its CIs (CI_low
and
CI_high
).
Confidence intervals are based on the proportion ()
confidence intervals returned by
stats::prop.test()
(minus 0.5), which give
a good close approximation.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Other effect sizes for contingency table:
oddsratio()
,
phi()
data("screening_test") phi(screening_test$Diagnosis, screening_test$Test1) phi(screening_test$Diagnosis, screening_test$Test2) # Both tests seem comparable - but are the tests actually different? (tests <- table(Test1 = screening_test$Test1, Test2 = screening_test$Test2)) mcnemar.test(tests) cohens_g(tests) # Test 2 gives a negative result more than test 1!
data("screening_test") phi(screening_test$Diagnosis, screening_test$Test1) phi(screening_test$Diagnosis, screening_test$Test2) # Both tests seem comparable - but are the tests actually different? (tests <- table(Test1 = screening_test$Test1, Test2 = screening_test$Test2)) mcnemar.test(tests) cohens_g(tests) # Test 2 gives a negative result more than test 1!
Enables a conversion between different indices of effect size, such as standardized difference (Cohen's d), (point-biserial) correlation r or (log) odds ratios.
d_to_r(d, n1, n2, ...) r_to_d(r, n1, n2, ...) oddsratio_to_d(OR, log = FALSE, ...) logoddsratio_to_d(logOR, log = TRUE, ...) d_to_oddsratio(d, log = FALSE, ...) d_to_logoddsratio(d, log = TRUE, ...) oddsratio_to_r(OR, n1, n2, log = FALSE, ...) logoddsratio_to_r(logOR, log = TRUE, ...) r_to_oddsratio(r, n1, n2, log = FALSE, ...) r_to_logoddsratio(r, n1, n2, log = TRUE, ...)
d_to_r(d, n1, n2, ...) r_to_d(r, n1, n2, ...) oddsratio_to_d(OR, log = FALSE, ...) logoddsratio_to_d(logOR, log = TRUE, ...) d_to_oddsratio(d, log = FALSE, ...) d_to_logoddsratio(d, log = TRUE, ...) oddsratio_to_r(OR, n1, n2, log = FALSE, ...) logoddsratio_to_r(logOR, log = TRUE, ...) r_to_oddsratio(r, n1, n2, log = FALSE, ...) r_to_logoddsratio(r, n1, n2, log = TRUE, ...)
d , r , OR , logOR
|
Standardized difference value (Cohen's d), correlation coefficient (r), Odds ratio, or logged Odds ratio. |
n1 , n2
|
Group sample sizes. If either is missing, groups are assumed to be of equal size. |
... |
Arguments passed to or from other methods. |
log |
Take in or output the log of the ratio (such as in logistic models), e.g. when the desired input or output are log odds ratios instead odds ratios. |
Conversions between d and OR is done through these formulae:
Converting between d and r is done through these formulae:
Where .
When groups are of equal size, h reduces to approximately 4. The resulting
r is also called the binomial effect size display (BESD; Rosenthal et al.,
1982).
Converted index.
Borenstein, M., Hedges, L. V., Higgins, J. P. T., & Rothstein, H. R. (2009). Converting among effect sizes. Introduction to meta-analysis, 45-49.
Jacobs, P., & Viechtbauer, W. (2017). Estimation of the biserial correlation and its sampling variance for use in meta-analysis. Research synthesis methods, 8(2), 161-180. doi:10.1002/jrsm.1218
Rosenthal, R., & Rubin, D. B. (1982). A simple, general purpose display of magnitude of experimental effect. Journal of educational psychology, 74(2), 166.
Sánchez-Meca, J., Marín-Martínez, F., & Chacón-Moscoso, S. (2003). Effect-size indices for dichotomized outcomes in meta-analysis. Psychological methods, 8(4), 448.
Other convert between effect sizes:
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
r_to_d(0.5) d_to_oddsratio(1.154701) oddsratio_to_r(8.120534) d_to_r(1) r_to_oddsratio(0.4472136, log = TRUE) oddsratio_to_d(1.813799, log = TRUE)
r_to_d(0.5) d_to_oddsratio(1.154701) oddsratio_to_r(8.120534) d_to_r(1) r_to_oddsratio(0.4472136, log = TRUE) oddsratio_to_d(1.813799, log = TRUE)
Convert Standardized Differences to Common Language Effect Sizes
d_to_p_superiority(d) rb_to_p_superiority(rb) rb_to_vda(rb) d_to_u2(d) d_to_u1(d) d_to_u3(d) d_to_overlap(d) rb_to_wmw_odds(rb)
d_to_p_superiority(d) rb_to_p_superiority(rb) rb_to_vda(rb) d_to_u2(d) d_to_u1(d) d_to_u3(d) d_to_overlap(d) rb_to_wmw_odds(rb)
d , rb
|
A numeric vector of Cohen's d / rank-biserial correlation or
the output from |
This function use the following formulae for Cohen's d:
And the following for the rank-biserial correlation:
A list of Cohen's U3
, Overlap
, Pr(superiority)
, a
numeric vector of Pr(superiority)
, or a data frame, depending
on the input.
For d, these calculations assume that the populations have equal variance and are normally distributed.
Vargha and Delaney's A is an alias for the non-parametric probability of superiority.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Routledge.
Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society, 48(3), 413-418.
Ruscio, J. (2008). A probability-based measure of effect size: robustness to base rates and other factors. Psychological methods, 13(1), 19–30.
cohens_u3()
for descriptions of the effect sizes (also,
cohens_d()
, rank_biserial()
).
Other convert between effect sizes:
d_to_r()
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
effectsize
APIRead the Support functions for model extensions vignette.
.es_aov_simple( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_strata( aov_table, DV_names, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_table( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE )
.es_aov_simple( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_strata( aov_table, DV_names, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE ) .es_aov_table( aov_table, type = c("eta", "omega", "epsilon"), partial = TRUE, generalized = FALSE, include_intercept = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE )
aov_table |
Input data frame |
type |
Which effect size to compute? |
partial , generalized , ci , alternative , verbose
|
See |
include_intercept |
Should the intercept ( |
DV_names |
A character vector with the names of all the predictors,
including the grouping variable (e.g., |
More information regarding Confidence (Compatibiity) Intervals and how they are computed in effectsize.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
Typically, CIs are constructed as two-tailed intervals, with an equal
proportion of the cumulative probability distribution above and below the
interval. CIs can also be constructed as one-sided intervals,
giving only a lower bound or upper bound. This is analogous to computing a
1-tailed p value or conducting a 1-tailed hypothesis test.
Significance tests conducted using CIs (whether a value is inside the interval)
and using p values (whether p < alpha for that value) are only guaranteed
to agree when both are constructed using the same number of sides/tails.
Most effect sizes are not bounded by zero (e.g., r, d, g), and as such
are generally tested using 2-tailed tests and 2-sided CIs.
Some effect sizes are strictly positive–they do have a minimum value, of 0.
For example, ,
,
, and other variance-accounted-for effect
sizes, as well as Cramer's V and multiple R, range from 0 to 1. These
typically involve F- or
-statistics and are generally tested
using 1-tailed tests which test whether the estimated effect size is
larger than the hypothesized null value (e.g., 0). In order for a CI to
yield the same significance decision it must then by a 1-sided CI,
estimating only a lower bound. This is the default CI computed by
effectsize for these effect sizes, where
alternative = "greater"
is set.
This lower bound interval indicates the smallest effect size that is not
significantly different from the observed effect size. That is, it is the
minimum effect size compatible with the observed data, background model
assumptions, and level. This type of interval does not indicate
a maximum effect size value; anything up to the maximum possible value of the
effect size (e.g., 1) is in the interval.
One-sided CIs can also be used to test against a maximum effect size value
(e.g., is significantly smaller than a perfect correlation of 1.0?)
by setting
alternative = "less"
. This estimates a CI with only an
upper bound; anything from the minimum possible value of the effect size
(e.g., 0) up to this upper bound is in the interval.
We can also obtain a 2-sided interval by setting alternative = "two.sided"
.
These intervals can be interpreted in the same way as other 2-sided
intervals, such as those for r, d, or g.
An alternative approach to aligning significance tests using CIs and 1-tailed
p values that can often be found in the literature is to construct a
2-sided CI at a lower confidence level (e.g., 100(1-2)% = 100 -
2*5% = 90%. This estimates the lower bound and upper bound for the above
1-sided intervals simultaneously. These intervals are commonly reported when
conducting equivalence tests. For example, a 90% 2-sided interval gives
the bounds for an equivalence test with
= .05. However, be aware
that this interval does not give 95% coverage for the underlying effect size
parameter value. For that, construct a 95% 2-sided CI.
data("hardlyworking") fit <- lm(salary ~ n_comps, data = hardlyworking) eta_squared(fit) # default, ci = 0.95, alternative = "greater" #> For one-way between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter | Eta2 | 95% CI #> ------------------------------- #> n_comps | 0.19 | [0.14, 1.00] #> #> - One-sided CIs: upper bound fixed at [1.00].
eta_squared(fit, alternative = "less") # Test is eta is smaller than some value #> For one-way between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter | Eta2 | 95% CI #> ------------------------------- #> n_comps | 0.19 | [0.00, 0.24] #> #> - One-sided CIs: lower bound fixed at [0.00].
eta_squared(fit, alternative = "two.sided") # 2-sided bounds for alpha = .05 #> For one-way between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter | Eta2 | 95% CI #> ------------------------------- #> n_comps | 0.19 | [0.14, 0.25]
eta_squared(fit, ci = 0.9, alternative = "two.sided") # both 1-sided bounds for alpha = .05 #> For one-way between subjects designs, partial eta squared is equivalent #> to eta squared. Returning eta squared. #> # Effect Size for ANOVA #> #> Parameter | Eta2 | 90% CI #> ------------------------------- #> n_comps | 0.19 | [0.14, 0.24]
For very large sample sizes or effect sizes, the width of the CI can be smaller than the tolerance of the optimizer, resulting in CIs of width 0. This can also result in the estimated CIs excluding the point estimate.
In these cases, consider an alternative method for computing CIs, such as the bootstrap.
Bauer, P., & Kieser, M. (1996). A unifying approach for confidence intervals and testing of equivalence and difference. Biometrika, 83(4), 934-–937. doi:10.1093/biomet/83.4.934
Rafi, Z., & Greenland, S. (2020). Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Medical Research Methodology, 20(1), Article 244. doi:10.1186/s12874-020-01105-9
Schweder, T., & Hjort, N. L. (2016). Confidence, likelihood, probability: Statistical inference with confidence distributions. Cambridge University Press. doi:10.1017/CBO9781139046671
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9(2), 164–182. doi:10.1037/1082-989x.9.2.164
Xie, M., & Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81(1), 3–-39. doi:10.1111/insr.12000
Deprecated / Defunct Functions
convert_odds_to_probs(...) convert_probs_to_odds(...) convert_d_to_r(...) convert_r_to_d(...) convert_oddsratio_to_d(...) convert_d_to_oddsratio(...) convert_oddsratio_to_r(...) convert_r_to_oddsratio(...) interpret_d(...) interpret_g(...) interpret_delta(...) interpret_parameters(...) normalized_chi(...) chisq_to_normalized(...) convert_d_to_common_language(...) d_to_common_language(...) convert_rb_to_common_language(...) rb_to_common_language(...) common_language(...)
convert_odds_to_probs(...) convert_probs_to_odds(...) convert_d_to_r(...) convert_r_to_d(...) convert_oddsratio_to_d(...) convert_d_to_oddsratio(...) convert_oddsratio_to_r(...) convert_r_to_oddsratio(...) interpret_d(...) interpret_g(...) interpret_delta(...) interpret_parameters(...) normalized_chi(...) chisq_to_normalized(...) convert_d_to_common_language(...) d_to_common_language(...) convert_rb_to_common_language(...) rb_to_common_language(...) common_language(...)
... |
Arguments to the deprecated function. |
effectsize
optionsCurrently, the following global options are supported:
es.use_symbols
logical: Should proper symbols be printed (TRUE
) instead of transliterated effect size names (FALSE
; default).
This function tries to return the best effect-size measure for the provided input model. See details.
## S3 method for class 'BFBayesFactor' effectsize(model, type = NULL, ci = 0.95, test = NULL, verbose = TRUE, ...) effectsize(model, ...) ## S3 method for class 'aov' effectsize(model, type = NULL, ...) ## S3 method for class 'htest' effectsize(model, type = NULL, verbose = TRUE, ...)
## S3 method for class 'BFBayesFactor' effectsize(model, type = NULL, ci = 0.95, test = NULL, verbose = TRUE, ...) effectsize(model, ...) ## S3 method for class 'aov' effectsize(model, type = NULL, ...) ## S3 method for class 'htest' effectsize(model, type = NULL, verbose = TRUE, ...)
model |
An object of class |
type |
The effect size of interest. See details. |
ci |
Value or vector of probability of the CI (between 0 and 1)
to be estimated. Default to |
test |
The indices of effect existence to compute. Character (vector) or
list with one or more of these options: |
verbose |
Toggle off warnings. |
... |
Arguments passed to or from other methods. See details. |
For an object of class htest
, data is extracted via insight::get_data()
, and passed to the relevant function according to:
A t-test depending on type
: "cohens_d"
(default), "hedges_g"
, or one of "p_superiority"
, "u1"
, "u2"
, "u3"
, "overlap"
.
For a Paired t-test: depending on type
: "rm_rm"
, "rm_av"
, "rm_b"
, "rm_d"
, "rm_z"
.
A Chi-squared tests of independence or Fisher's Exact Test, depending on type
: "cramers_v"
(default), "tschuprows_t"
, "phi"
, "cohens_w"
, "pearsons_c"
, "cohens_h"
, "oddsratio"
, "riskratio"
, "arr"
, or "nnt"
.
A Chi-squared tests of goodness-of-fit, depending on type
: "fei"
(default) "cohens_w"
, "pearsons_c"
A One-way ANOVA test, depending on type
: "eta"
(default), "omega"
or "epsilon"
-squared, "f"
, or "f2"
.
A McNemar test returns Cohen's g.
A Wilcoxon test depending on type
: returns "rank_biserial
" correlation (default) or one of "p_superiority"
, "vda"
, "u2"
, "u3"
, "overlap"
.
A Kruskal-Wallis test depending on type
: "epsilon"
(default) or "eta"
.
A Friedman test returns Kendall's W.
(Where applicable, ci
and alternative
are taken from the htest
if not otherwise provided.)
For an object of class BFBayesFactor
, using bayestestR::describe_posterior()
,
A t-test depending on type
: "cohens_d"
(default) or one of "p_superiority"
, "u1"
, "u2"
, "u3"
, "overlap"
.
A correlation test returns r.
A contingency table test, depending on type
: "cramers_v"
(default), "phi"
, "tschuprows_t"
, "cohens_w"
, "pearsons_c"
, "cohens_h"
, "oddsratio"
, or "riskratio"
, "arr"
, or "nnt"
.
A proportion test returns p.
Objects of class anova
, aov
, aovlist
or afex_aov
, depending on type
: "eta"
(default), "omega"
or "epsilon"
-squared, "f"
, or "f2"
.
Other objects are passed to parameters::standardize_parameters()
.
For statistical models it is recommended to directly use the listed functions, for the full range of options they provide.
A data frame with the effect size (depending on input) and and its
CIs (CI_low
and CI_high
).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
vignette(package = "effectsize")
## Hypothesis Testing ## ------------------ data("Music_preferences") Xsq <- chisq.test(Music_preferences) effectsize(Xsq) effectsize(Xsq, type = "cohens_w") Tt <- t.test(1:10, y = c(7:20), alternative = "less") effectsize(Tt) Tt <- t.test( x = c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30), y = c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29), paired = TRUE ) effectsize(Tt, type = "rm_b") Aov <- oneway.test(extra ~ group, data = sleep, var.equal = TRUE) effectsize(Aov) effectsize(Aov, type = "omega") Wt <- wilcox.test(1:10, 7:20, mu = -3, alternative = "less", exact = FALSE) effectsize(Wt) effectsize(Wt, type = "u2") ## Models and Anova Tables ## ----------------------- fit <- lm(mpg ~ factor(cyl) * wt + hp, data = mtcars) effectsize(fit, method = "basic") anova_table <- anova(fit) effectsize(anova_table) effectsize(anova_table, type = "epsilon") ## Bayesian Hypothesis Testing ## --------------------------- bf_prop <- BayesFactor::proportionBF(3, 7, p = 0.3) effectsize(bf_prop) bf_corr <- BayesFactor::correlationBF(attitude$rating, attitude$complaints) effectsize(bf_corr) data(RCT_table) bf_xtab <- BayesFactor::contingencyTableBF(RCT_table, sampleType = "poisson", fixedMargin = "cols") effectsize(bf_xtab) effectsize(bf_xtab, type = "oddsratio") effectsize(bf_xtab, type = "arr") bf_ttest <- BayesFactor::ttestBF(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE, mu = -1 ) effectsize(bf_ttest)
## Hypothesis Testing ## ------------------ data("Music_preferences") Xsq <- chisq.test(Music_preferences) effectsize(Xsq) effectsize(Xsq, type = "cohens_w") Tt <- t.test(1:10, y = c(7:20), alternative = "less") effectsize(Tt) Tt <- t.test( x = c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30), y = c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29), paired = TRUE ) effectsize(Tt, type = "rm_b") Aov <- oneway.test(extra ~ group, data = sleep, var.equal = TRUE) effectsize(Aov) effectsize(Aov, type = "omega") Wt <- wilcox.test(1:10, 7:20, mu = -3, alternative = "less", exact = FALSE) effectsize(Wt) effectsize(Wt, type = "u2") ## Models and Anova Tables ## ----------------------- fit <- lm(mpg ~ factor(cyl) * wt + hp, data = mtcars) effectsize(fit, method = "basic") anova_table <- anova(fit) effectsize(anova_table) effectsize(anova_table, type = "epsilon") ## Bayesian Hypothesis Testing ## --------------------------- bf_prop <- BayesFactor::proportionBF(3, 7, p = 0.3) effectsize(bf_prop) bf_corr <- BayesFactor::correlationBF(attitude$rating, attitude$complaints) effectsize(bf_corr) data(RCT_table) bf_xtab <- BayesFactor::contingencyTableBF(RCT_table, sampleType = "poisson", fixedMargin = "cols") effectsize(bf_xtab) effectsize(bf_xtab, type = "oddsratio") effectsize(bf_xtab, type = "arr") bf_ttest <- BayesFactor::ttestBF(sleep$extra[sleep$group == 1], sleep$extra[sleep$group == 2], paired = TRUE, mu = -1 ) effectsize(bf_ttest)
Perform a Test for Practical Equivalence for indices of effect size.
## S3 method for class 'effectsize_table' equivalence_test( x, range = "default", rule = c("classic", "cet", "bayes"), ... )
## S3 method for class 'effectsize_table' equivalence_test( x, range = "default", rule = c("classic", "cet", "bayes"), ... )
x |
An effect size table, such as returned by |
range |
The range of practical equivalence of an effect. For one-sides
CIs, a single value can be proved for the lower / upper bound to test
against (but see more details below). For two-sided CIs, a single value is
duplicated to |
rule |
How should acceptance and rejection be decided? See details. |
... |
Arguments passed to or from other methods. |
The CIs used in the equivalence test are the ones in the provided effect size
table. For results equivalent (ha!) to those that can be obtained using the
TOST approach (e.g., Lakens, 2017), appropriate CIs should be extracted using
the function used to make the effect size table (cohens_d
, eta_squared
,
F_to_r
, etc), with alternative = "two.sided"
. See examples.
"classic"
- the classic method:
If the CI is completely within the ROPE - Accept H0
Else, if the CI does not contain 0 - Reject H0
Else - Undecided
"cet"
- conditional equivalence testing:
If the CI does not contain 0 - Reject H0
Else, If the CI is completely within the ROPE - Accept H0
Else - Undecided
"bayes"
- The Bayesian approach, as put forth by Kruschke:
If the CI does is completely outside the ROPE - Reject H0
Else, If the CI is completely within the ROPE - Accept H0
Else - Undecided
A data frame with the results of the equivalence test.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Campbell, H., & Gustafson, P. (2018). Conditional equivalence testing: An alternative remedy for publication bias. PLOS ONE, 13(4), e0195145. doi:10.1371/journal.pone.0195145
Kruschke, J. K. (2014). Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan. Academic Press
Kruschke, J. K. (2018). Rejecting or accepting parameter values in Bayesian estimation. Advances in Methods and Practices in Psychological Science, 1(2), 270-280. doi:10.1177/2515245918771304
Lakens, D. (2017). Equivalence Tests: A Practical Primer for t Tests, Correlations, and Meta-Analyses. Social Psychological and Personality Science, 8(4), 355–362. doi:10.1177/1948550617697177
For more details, see bayestestR::equivalence_test()
.
data("hardlyworking") model <- aov(salary ~ age + factor(n_comps) * cut(seniority, 3), data = hardlyworking) es <- eta_squared(model, ci = 0.9, alternative = "two.sided") equivalence_test(es, range = c(0, 0.15)) # TOST data("RCT_table") OR <- oddsratio(RCT_table, alternative = "greater") equivalence_test(OR, range = c(0, 1)) ds <- t_to_d( t = c(0.45, -0.65, 7, -2.2, 2.25), df_error = c(675, 525, 2000, 900, 1875), ci = 0.9, alternative = "two.sided" # TOST ) # Can also plot if (require(see)) plot(equivalence_test(ds, range = 0.2)) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "cet")) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "bayes"))
data("hardlyworking") model <- aov(salary ~ age + factor(n_comps) * cut(seniority, 3), data = hardlyworking) es <- eta_squared(model, ci = 0.9, alternative = "two.sided") equivalence_test(es, range = c(0, 0.15)) # TOST data("RCT_table") OR <- oddsratio(RCT_table, alternative = "greater") equivalence_test(OR, range = c(0, 1)) ds <- t_to_d( t = c(0.45, -0.65, 7, -2.2, 2.25), df_error = c(675, 525, 2000, 900, 1875), ci = 0.9, alternative = "two.sided" # TOST ) # Can also plot if (require(see)) plot(equivalence_test(ds, range = 0.2)) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "cet")) if (require(see)) plot(equivalence_test(ds, range = 0.2, rule = "bayes"))
and Other Effect Size for ANOVAFunctions to compute effect size measures for ANOVAs, such as Eta-
(), Omega- (
) and Epsilon- (
) squared,
and Cohen's f (or their partialled versions) for ANOVA tables. These indices
represent an estimate of how much variance in the response variables is
accounted for by the explanatory variable(s).
When passing models, effect sizes are computed using the sums of squares
obtained from anova(model)
which might not always be appropriate. See
details.
eta_squared( model, partial = TRUE, generalized = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) omega_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) epsilon_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f( model, partial = TRUE, generalized = FALSE, squared = FALSE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f_squared( model, partial = TRUE, generalized = FALSE, squared = TRUE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) eta_squared_posterior( model, partial = TRUE, generalized = FALSE, ss_function = stats::anova, draws = 500, verbose = TRUE, ... )
eta_squared( model, partial = TRUE, generalized = FALSE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) omega_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) epsilon_squared( model, partial = TRUE, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f( model, partial = TRUE, generalized = FALSE, squared = FALSE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) cohens_f_squared( model, partial = TRUE, generalized = FALSE, squared = TRUE, method = c("eta", "omega", "epsilon"), model2 = NULL, ci = 0.95, alternative = "greater", verbose = TRUE, ... ) eta_squared_posterior( model, partial = TRUE, generalized = FALSE, ss_function = stats::anova, draws = 500, verbose = TRUE, ... )
model |
An ANOVA table (or an ANOVA-like table, e.g., outputs from
|
partial |
If |
generalized |
A character vector of observed (non-manipulated) variables
to be used in the estimation of a generalized Eta Squared. Can also be
|
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods.
|
squared |
Return Cohen's f or Cohen's f-squared? |
method |
What effect size should be used as the basis for Cohen's f? |
model2 |
Optional second model for Cohen's f (/squared). If specified, returns the effect size for R-squared-change between the two models. |
ss_function |
For Bayesian models, the function used to extract
sum-of-squares. Uses |
draws |
For Bayesian models, an integer indicating the number of draws from the posterior predictive distribution to return. Larger numbers take longer to run, but provide estimates that are more stable. |
For aov
(or lm
), aovlist
and afex_aov
models, and for anova
objects
that provide Sums-of-Squares, the effect sizes are computed directly using
Sums-of-Squares. (For maov
(or mlm
) models, effect sizes are computed for
each response separately.)
For other ANOVA tables and models (converted to ANOVA-like tables via
anova()
methods), effect sizes are approximated via test statistic
conversion of the omnibus F statistic provided by the (see F_to_eta2()
for more details.)
When model
is a statistical model, the sums of squares (or F statistics)
used for the computation of the effect sizes are based on those returned by
anova(model)
. Different models have different default output type. For
example, for aov
and aovlist
these are type-1 sums of squares, but for
lmerMod
(and lmerModLmerTest
) these are type-3 sums of squares. Make
sure these are the sums of squares you are interested in. You might want to
convert your model to an ANOVA(-like) table yourself and then pass the result
to eta_squared()
. See examples below for use of car::Anova()
and the
afex
package.
For type 3 sum of squares, it is generally recommended to fit models with
orthogonal factor weights (e.g., contr.sum
) and centered covariates,
for sensible results. See examples and the afex
package.
Both Omega and Epsilon are unbiased estimators of the
population's Eta, which is especially important is small samples. But
which to choose?
Though Omega is the more popular choice (Albers and Lakens, 2018), Epsilon is
analogous to adjusted R2 (Allen, 2017, p. 382), and has been found to be less
biased (Carroll & Nordholm, 1975).
Cohen's f can take on values between zero, when the population means are all
equal, and an indefinitely large number as standard deviation of means
increases relative to the average standard deviation within each group.
When comparing two models in a sequential regression analysis, Cohen's f for
R-square change is the ratio between the increase in R-square
and the percent of unexplained variance.
Cohen has suggested that the values of 0.10, 0.25, and 0.40 represent small,
medium, and large effect sizes, respectively.
For Bayesian models (fit with brms
or rstanarm
),
eta_squared_posterior()
simulates data from the posterior predictive
distribution (ppd) and for each simulation the Eta Squared is computed for
the model's fixed effects. This means that the returned values are the
population level effect size as implied by the posterior model (and not the
effect size in the sample data). See rstantools::posterior_predict()
for
more info.
A data frame with the effect size(s) between 0-1 (Eta2
, Epsilon2
,
Omega2
, Cohens_f
or Cohens_f2
, possibly with the partial
or
generalized
suffix), and their CIs (CI_low
and CI_high
).
For eta_squared_posterior()
, a data frame containing the ppd of the Eta
squared for each fixed effect, which can then be passed to
bayestestR::describe_posterior()
for summary stats.
A data frame containing the effect size values and their confidence intervals.
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Albers, C., and Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195.
Allen, R. (2017). Statistics and Experimental Design for Psychologists: A Model Comparison Approach. World Scientific Publishing Company.
Carroll, R. M., & Nordholm, L. A. (1975). Sampling Characteristics of Kelley's epsilon and Hays' omega. Educational and Psychological Measurement, 35(3), 541-554.
Kelley, T. (1935) An unbiased correlation ratio measure. Proceedings of the National Academy of Sciences. 21(9). 554-559.
Olejnik, S., & Algina, J. (2003). Generalized eta and omega squared statistics: measures of effect size for some common research designs. Psychological methods, 8(4), 434.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182.
Other effect sizes for ANOVAs:
rank_epsilon_squared()
data(mtcars) mtcars$am_f <- factor(mtcars$am) mtcars$cyl_f <- factor(mtcars$cyl) model <- aov(mpg ~ am_f * cyl_f, data = mtcars) (eta2 <- eta_squared(model)) # More types: eta_squared(model, partial = FALSE) eta_squared(model, generalized = "cyl_f") omega_squared(model) epsilon_squared(model) cohens_f(model) model0 <- aov(mpg ~ am_f + cyl_f, data = mtcars) # no interaction cohens_f_squared(model0, model2 = model) ## Interpretation of effect sizes ## ------------------------------ interpret_omega_squared(0.10, rules = "field2013") interpret_eta_squared(0.10, rules = "cohen1992") interpret_epsilon_squared(0.10, rules = "cohen1992") interpret(eta2, rules = "cohen1992") plot(eta2) # Requires the {see} package # Recommended: Type-2 or -3 effect sizes + effects coding # ------------------------------------------------------- contrasts(mtcars$am_f) <- contr.sum contrasts(mtcars$cyl_f) <- contr.sum model <- aov(mpg ~ am_f * cyl_f, data = mtcars) model_anova <- car::Anova(model, type = 3) epsilon_squared(model_anova) # afex takes care of both type-3 effects and effects coding: data(obk.long, package = "afex") model <- afex::aov_car(value ~ gender + Error(id / (phase * hour)), data = obk.long, observed = "gender" ) omega_squared(model) eta_squared(model, generalized = TRUE) # observed vars are pulled from the afex model. ## Approx. effect sizes for mixed models ## ------------------------------------- model <- lme4::lmer(mpg ~ am_f * cyl_f + (1 | vs), data = mtcars) omega_squared(model) ## Bayesian Models (PPD) ## --------------------- fit_bayes <- rstanarm::stan_glm( mpg ~ factor(cyl) * wt + qsec, data = mtcars, family = gaussian(), refresh = 0 ) es <- eta_squared_posterior(fit_bayes, verbose = FALSE, ss_function = car::Anova, type = 3 ) bayestestR::describe_posterior(es, test = NULL) # compare to: fit_freq <- lm(mpg ~ factor(cyl) * wt + qsec, data = mtcars ) aov_table <- car::Anova(fit_freq, type = 3) eta_squared(aov_table)
data(mtcars) mtcars$am_f <- factor(mtcars$am) mtcars$cyl_f <- factor(mtcars$cyl) model <- aov(mpg ~ am_f * cyl_f, data = mtcars) (eta2 <- eta_squared(model)) # More types: eta_squared(model, partial = FALSE) eta_squared(model, generalized = "cyl_f") omega_squared(model) epsilon_squared(model) cohens_f(model) model0 <- aov(mpg ~ am_f + cyl_f, data = mtcars) # no interaction cohens_f_squared(model0, model2 = model) ## Interpretation of effect sizes ## ------------------------------ interpret_omega_squared(0.10, rules = "field2013") interpret_eta_squared(0.10, rules = "cohen1992") interpret_epsilon_squared(0.10, rules = "cohen1992") interpret(eta2, rules = "cohen1992") plot(eta2) # Requires the {see} package # Recommended: Type-2 or -3 effect sizes + effects coding # ------------------------------------------------------- contrasts(mtcars$am_f) <- contr.sum contrasts(mtcars$cyl_f) <- contr.sum model <- aov(mpg ~ am_f * cyl_f, data = mtcars) model_anova <- car::Anova(model, type = 3) epsilon_squared(model_anova) # afex takes care of both type-3 effects and effects coding: data(obk.long, package = "afex") model <- afex::aov_car(value ~ gender + Error(id / (phase * hour)), data = obk.long, observed = "gender" ) omega_squared(model) eta_squared(model, generalized = TRUE) # observed vars are pulled from the afex model. ## Approx. effect sizes for mixed models ## ------------------------------------- model <- lme4::lmer(mpg ~ am_f * cyl_f + (1 | vs), data = mtcars) omega_squared(model) ## Bayesian Models (PPD) ## --------------------- fit_bayes <- rstanarm::stan_glm( mpg ~ factor(cyl) * wt + qsec, data = mtcars, family = gaussian(), refresh = 0 ) es <- eta_squared_posterior(fit_bayes, verbose = FALSE, ss_function = car::Anova, type = 3 ) bayestestR::describe_posterior(es, test = NULL) # compare to: fit_freq <- lm(mpg ~ factor(cyl) * wt + qsec, data = mtcars ) aov_table <- car::Anova(fit_freq, type = 3) eta_squared(aov_table)
Convert Between ANOVA Effect Sizes
eta2_to_f2(es) eta2_to_f(es) f2_to_eta2(f2) f_to_eta2(f)
eta2_to_f2(es) eta2_to_f(es) f2_to_eta2(f2) f_to_eta2(f)
es |
Any measure of variance explained such as Eta-, Epsilon-, Omega-, or R-Squared, partial or otherwise. See details. |
f , f2
|
Cohen's f or f-squared. |
Any measure of variance explained can be converted to a corresponding Cohen's
f via:
If a partial Eta-Squared is used, the resulting Cohen's f is a
partial-Cohen's f; If a less biased estimate of variance explained is used
(such as Epsilon- or Omega-Squared), the resulting Cohen's f is likewise a
less biased estimate of Cohen's f.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182.
eta_squared()
for more details.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
odds_to_probs()
,
oddsratio_to_riskratio()
,
w_to_fei()
and Other ANOVA Effect SizesThese functions are convenience functions to convert F and t test statistics
to partial Eta- (), Omega- (
) Epsilon-
(
) squared (an alias for the adjusted Eta squared) and Cohen's
f. These are useful in cases where the various Sum of Squares and Mean
Squares are not easily available or their computation is not straightforward
(e.g., in liner mixed models, contrasts, etc.). For test statistics derived
from
lm
and aov
models, these functions give exact results. For all other
cases, they return close approximations.
See Effect Size from Test Statistics vignette.
F_to_eta2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_epsilon2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_epsilon2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_eta2_adj(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2_adj(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_omega2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_omega2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_f( f, df, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ... ) t_to_f(t, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ...) F_to_f2( f, df, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ... ) t_to_f2(t, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ...)
F_to_eta2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_epsilon2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_epsilon2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_eta2_adj(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_eta2_adj(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_omega2(f, df, df_error, ci = 0.95, alternative = "greater", ...) t_to_omega2(t, df_error, ci = 0.95, alternative = "greater", ...) F_to_f( f, df, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ... ) t_to_f(t, df_error, squared = FALSE, ci = 0.95, alternative = "greater", ...) F_to_f2( f, df, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ... ) t_to_f2(t, df_error, squared = TRUE, ci = 0.95, alternative = "greater", ...)
df , df_error
|
Degrees of freedom of numerator or of the error estimate (i.e., the residuals). |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Arguments passed to or from other methods. |
t , f
|
The t or the F statistics. |
squared |
Return Cohen's f or Cohen's f-squared? |
These functions use the following formulae:
For t, the conversion is based on the equality of when
.
Both Omega and Epsilon are unbiased estimators of the population Eta. But which to choose? Though Omega is the more popular choice, it should be noted that:
The formula given above for Omega is only an approximation for complex designs.
Epsilon has been found to be less biased (Carroll & Nordholm, 1975).
A data frame with the effect size(s) between 0-1 (Eta2_partial
,
Epsilon2_partial
, Omega2_partial
, Cohens_f_partial
or
Cohens_f2_partial
), and their CIs (CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Adjusted (partial) Eta-squared is an alias for (partial) Epsilon-squared.
Albers, C., & Lakens, D. (2018). When power analyses based on pilot data are biased: Inaccurate effect size estimators and follow-up bias. Journal of experimental social psychology, 74, 187-195. doi:10.31234/osf.io/b7z4q
Carroll, R. M., & Nordholm, L. A. (1975). Sampling Characteristics of Kelley's epsilon and Hays' omega. Educational and Psychological Measurement, 35(3), 541-554.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532-574.
Friedman, H. (1982). Simplified determinations of statistical power, magnitude of effect and research sample sizes. Educational and Psychological Measurement, 42(2), 521-526. doi:10.1177/001316448204200214
Mordkoff, J. T. (2019). A Simple Method for Removing Bias From a Popular Measure of Standardized Effect Size: Adjusted Partial Eta Squared. Advances in Methods and Practices in Psychological Science, 2(3), 228-232. doi:10.1177/2515245919855053
Morey, R. D., Hoekstra, R., Rouder, J. N., Lee, M. D., & Wagenmakers, E. J. (2016). The fallacy of placing confidence in confidence intervals. Psychonomic bulletin & review, 23(1), 103-123.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182.
eta_squared()
for more details.
Other effect size from test statistic:
chisq_to_phi()
,
t_to_d()
mod <- aov(mpg ~ factor(cyl) * factor(am), mtcars) anova(mod) (etas <- F_to_eta2( f = c(44.85, 3.99, 1.38), df = c(2, 1, 2), df_error = 26 )) if (require(see)) plot(etas) # Compare to: eta_squared(mod) fit <- lmerTest::lmer(extra ~ group + (1 | ID), sleep) # anova(fit) # #> Type III Analysis of Variance Table with Satterthwaite's method # #> Sum Sq Mean Sq NumDF DenDF F value Pr(>F) # #> group 12.482 12.482 1 9 16.501 0.002833 ** # #> --- # #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 F_to_eta2(16.501, 1, 9) F_to_omega2(16.501, 1, 9) F_to_epsilon2(16.501, 1, 9) F_to_f(16.501, 1, 9) ## Use with emmeans based contrasts ## -------------------------------- warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks) jt <- emmeans::joint_tests(warp.lm, by = "wool") F_to_eta2(jt$F.ratio, jt$df1, jt$df2)
mod <- aov(mpg ~ factor(cyl) * factor(am), mtcars) anova(mod) (etas <- F_to_eta2( f = c(44.85, 3.99, 1.38), df = c(2, 1, 2), df_error = 26 )) if (require(see)) plot(etas) # Compare to: eta_squared(mod) fit <- lmerTest::lmer(extra ~ group + (1 | ID), sleep) # anova(fit) # #> Type III Analysis of Variance Table with Satterthwaite's method # #> Sum Sq Mean Sq NumDF DenDF F value Pr(>F) # #> group 12.482 12.482 1 9 16.501 0.002833 ** # #> --- # #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 F_to_eta2(16.501, 1, 9) F_to_omega2(16.501, 1, 9) F_to_epsilon2(16.501, 1, 9) F_to_f(16.501, 1, 9) ## Use with emmeans based contrasts ## -------------------------------- warp.lm <- lm(breaks ~ wool * tension, data = warpbreaks) jt <- emmeans::joint_tests(warp.lm, by = "wool") F_to_eta2(jt$F.ratio, jt$df1, jt$df2)
Fictional data.
A 2-by-3 table.
data("food_class") food_class #> Soy Milk Meat #> Vegan 47 0 0 #> Not-Vegan 0 12 21
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
hardlyworking
,
rouder2016
,
screening_test
Transform a standardized vector into character, e.g., c("-1 SD", "Mean", "+1 SD")
.
format_standardize( x, reference = x, robust = FALSE, digits = 1, protect_integers = TRUE, ... )
format_standardize( x, reference = x, robust = FALSE, digits = 1, protect_integers = TRUE, ... )
x |
A standardized numeric vector. |
reference |
The reference vector from which to compute the mean and SD. |
robust |
Logical, if |
digits |
Number of digits for rounding or significant figures. May also
be |
protect_integers |
Should integers be kept as integers (i.e., without decimals)? |
... |
Other arguments to pass to |
format_standardize(c(-1, 0, 1)) format_standardize(c(-1, 0, 1, 2), reference = rnorm(1000)) format_standardize(c(-1, 0, 1, 2), reference = rnorm(1000), robust = TRUE) format_standardize(standardize(mtcars$wt), digits = 1) format_standardize(standardize(mtcars$wt, robust = TRUE), digits = 1)
format_standardize(c(-1, 0, 1)) format_standardize(c(-1, 0, 1, 2), reference = rnorm(1000)) format_standardize(c(-1, 0, 1, 2), reference = rnorm(1000), robust = TRUE) format_standardize(standardize(mtcars$wt), digits = 1) format_standardize(standardize(mtcars$wt, robust = TRUE), digits = 1)
A sample (simulated) dataset, used in tests and some examples.
A data frame with 500 rows and 5 variables:
Salary, in Shmekels
Number of overtime hours (on average, per week)
Number of compliments given to the boss (observed over the last week)
Age in years
How many years with the company
Has this person been working here for more than 4 years?
data("hardlyworking") head(hardlyworking, n = 5) #> salary xtra_hours n_comps age seniority is_senior #> 1 19744.65 4.16 1 32 3 FALSE #> 2 11301.95 1.62 0 34 3 FALSE #> 3 20635.62 1.19 3 33 5 TRUE #> 4 23047.16 7.19 1 35 3 FALSE #> 5 27342.15 11.26 0 33 4 FALSE
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
rouder2016
,
screening_test
Interpret a value based on a set of rules. See rules()
.
interpret(x, ...) ## S3 method for class 'numeric' interpret(x, rules, name = attr(rules, "rule_name"), transform = NULL, ...) ## S3 method for class 'effectsize_table' interpret(x, rules, transform = NULL, ...)
interpret(x, ...) ## S3 method for class 'numeric' interpret(x, rules, name = attr(rules, "rule_name"), transform = NULL, ...) ## S3 method for class 'effectsize_table' interpret(x, rules, transform = NULL, ...)
x |
Vector of value break points (edges defining categories), or a data
frame of class |
... |
Currently not used. |
rules |
Set of |
name |
Name of the set of rules (will be printed). |
transform |
a function (or name of a function) to apply to |
For numeric input: A character vector of interpretations.
For data frames: the x
input with an additional Interpretation
column.
rules_grid <- rules(c(0.01, 0.05), c("very significant", "significant", "not significant")) interpret(0.001, rules_grid) interpret(0.021, rules_grid) interpret(0.08, rules_grid) interpret(c(0.01, 0.005, 0.08), rules_grid) interpret(c(0.35, 0.15), c("small" = 0.2, "large" = 0.4), name = "Cohen's Rules") interpret(c(0.35, 0.15), rules(c(0.2, 0.4), c("small", "medium", "large"))) bigness <- rules(c(1, 10), c("small", "medium", "big")) interpret(abs(-5), bigness) interpret(-5, bigness, transform = abs) # ---------- d <- cohens_d(mpg ~ am, data = mtcars) interpret(d, rules = "cohen1988") d <- glass_delta(mpg ~ am, data = mtcars) interpret(d, rules = "gignac2016") interpret(d, rules = rules(1, c("tiny", "yeah okay"))) m <- lm(formula = wt ~ am * cyl, data = mtcars) eta2 <- eta_squared(m) interpret(eta2, rules = "field2013") X <- chisq.test(mtcars$am, mtcars$cyl == 8) interpret(oddsratio(X), rules = "chen2010") interpret(cramers_v(X), "lovakov2021")
rules_grid <- rules(c(0.01, 0.05), c("very significant", "significant", "not significant")) interpret(0.001, rules_grid) interpret(0.021, rules_grid) interpret(0.08, rules_grid) interpret(c(0.01, 0.005, 0.08), rules_grid) interpret(c(0.35, 0.15), c("small" = 0.2, "large" = 0.4), name = "Cohen's Rules") interpret(c(0.35, 0.15), rules(c(0.2, 0.4), c("small", "medium", "large"))) bigness <- rules(c(1, 10), c("small", "medium", "big")) interpret(abs(-5), bigness) interpret(-5, bigness, transform = abs) # ---------- d <- cohens_d(mpg ~ am, data = mtcars) interpret(d, rules = "cohen1988") d <- glass_delta(mpg ~ am, data = mtcars) interpret(d, rules = "gignac2016") interpret(d, rules = rules(1, c("tiny", "yeah okay"))) m <- lm(formula = wt ~ am * cyl, data = mtcars) eta2 <- eta_squared(m) interpret(eta2, rules = "field2013") X <- chisq.test(mtcars$am, mtcars$cyl == 8) interpret(oddsratio(X), rules = "chen2010") interpret(cramers_v(X), "lovakov2021")
Interpret Bayes Factor (BF)
interpret_bf( bf, rules = "jeffreys1961", log = FALSE, include_value = FALSE, protect_ratio = TRUE, exact = TRUE )
interpret_bf( bf, rules = "jeffreys1961", log = FALSE, include_value = FALSE, protect_ratio = TRUE, exact = TRUE )
bf |
Value or vector of Bayes factor (BF) values. |
rules |
Can be |
log |
Is the |
include_value |
Include the value in the output. |
protect_ratio |
Should values smaller than 1 be represented as ratios? |
exact |
Should very large or very small values be reported with a scientific format (e.g., 4.24e5), or as truncated values (as "> 1000" and "< 1/1000"). |
Argument names can be partially matched.
Rules apply to BF as ratios, so BF of 10 is as extreme as a BF of 0.1 (1/10).
Jeffreys (1961) ("jeffreys1961"
; default)
BF = 1 - No evidence
1 < BF <= 3 - Anecdotal
3 < BF <= 10 - Moderate
10 < BF <= 30 - Strong
30 < BF <= 100 - Very strong
BF > 100 - Extreme.
Raftery (1995) ("raftery1995"
)
BF = 1 - No evidence
1 < BF <= 3 - Weak
3 < BF <= 20 - Positive
20 < BF <= 150 - Strong
BF > 150 - Very strong
Jeffreys, H. (1961), Theory of Probability, 3rd ed., Oxford University Press, Oxford.
Raftery, A. E. (1995). Bayesian model selection in social research. Sociological methodology, 25, 111-164.
Jarosz, A. F., & Wiley, J. (2014). What are the odds? A practical guide to computing and reporting Bayes factors. The Journal of Problem Solving, 7(1), 2.
interpret_bf(1) interpret_bf(c(5, 2, 0.01))
interpret_bf(1) interpret_bf(c(5, 2, 0.01))
Interpretation of standardized differences using different sets of rules of thumb.
interpret_cohens_d(d, rules = "cohen1988", ...) interpret_hedges_g(g, rules = "cohen1988") interpret_glass_delta(delta, rules = "cohen1988")
interpret_cohens_d(d, rules = "cohen1988", ...) interpret_hedges_g(g, rules = "cohen1988") interpret_glass_delta(delta, rules = "cohen1988")
d , g , delta
|
Value or vector of effect size values. |
rules |
Can be |
... |
Not directly used. |
Rules apply to equally to positive and negative d (i.e., they are given as absolute values).
Cohen (1988) ("cohen1988"
; default)
d < 0.2 - Very small
0.2 <= d < 0.5 - Small
0.5 <= d < 0.8 - Medium
d >= 0.8 - Large
Sawilowsky (2009) ("sawilowsky2009"
)
d < 0.1 - Tiny
0.1 <= d < 0.2 - Very small
0.2 <= d < 0.5 - Small
0.5 <= d < 0.8 - Medium
0.8 <= d < 1.2 - Large
1.2 <= d < 2 - Very large
d >= 2 - Huge
Lovakov & Agadullina (2021) ("lovakov2021"
)
d < 0.15 - Very small
0.15 <= d < 0.36 - Small
0.36 <= d < 0.65 - Medium
d >= 0.65 - Large
Gignac & Szodorai (2016) ("gignac2016"
, based on the d_to_r()
conversion, see interpret_r()
)
d < 0.2 - Very small
0.2 <= d < 0.41 - Small
0.41 <= d < 0.63 - Moderate
d >= 0.63 - Large
Lovakov, A., & Agadullina, E. R. (2021). Empirically Derived Guidelines for Effect Size Interpretation in Social Psychology. European Journal of Social Psychology.
Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and individual differences, 102, 74-78.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Sawilowsky, S. S. (2009). New effect size rules of thumb.
interpret_cohens_d(.02) interpret_cohens_d(c(.5, .02)) interpret_cohens_d(.3, rules = "lovakov2021")
interpret_cohens_d(.02) interpret_cohens_d(c(.5, .02)) interpret_cohens_d(.3, rules = "lovakov2021")
Interpret Cohen's g
interpret_cohens_g(g, rules = "cohen1988", ...)
interpret_cohens_g(g, rules = "cohen1988", ...)
g |
Value or vector of effect size values. |
rules |
Can be |
... |
Not directly used. |
Rules apply to equally to positive and negative g (i.e., they are given as absolute values).
Cohen (1988) ("cohen1988"
; default)
d < 0.05 - Very small
0.05 <= d < 0.15 - Small
0.15 <= d < 0.25 - Medium
d >= 0.25 - Large
"Since g is so transparently clear a unit, it is expected that workers in any given substantive area of the behavioral sciences will very frequently be able to set relevant [effect size] values without the proposed conventions, or set up conventions of their own which are suited to their area of inquiry." - Cohen, 1988, page 147.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
interpret_cohens_g(.02) interpret_cohens_g(c(.3, .15))
interpret_cohens_g(.02) interpret_cohens_g(c(.3, .15))
Interpret Direction
interpret_direction(x)
interpret_direction(x)
x |
Numeric value. |
interpret_direction(.02) interpret_direction(c(.5, -.02))
interpret_direction(.02) interpret_direction(c(.5, -.02))
Interpretation of Bayesian diagnostic indices, such as Effective Sample Size (ESS) and Rhat.
interpret_ess(ess, rules = "burkner2017") interpret_rhat(rhat, rules = "vehtari2019")
interpret_ess(ess, rules = "burkner2017") interpret_rhat(rhat, rules = "vehtari2019")
ess |
Value or vector of Effective Sample Size (ESS) values. |
rules |
A character string (see Rules) or a custom set of |
rhat |
Value or vector of Rhat values. |
Bürkner, P. C. (2017) ("burkner2017"
; default)
ESS < 1000 - Insufficient
ESS >= 1000 - Sufficient
Vehtari et al. (2019) ("vehtari2019"
; default)
Rhat < 1.01 - Converged
Rhat >= 1.01 - Failed
Gelman & Rubin (1992) ("gelman1992"
)
Rhat < 1.1 - Converged
Rhat >= 1.1 - Failed
Bürkner, P. C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1-28.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple sequences. Statistical science, 7(4), 457-472.
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P. C. (2019). Rank-normalization, folding, and localization: An improved Rhat for assessing convergence of MCMC. arXiv preprint arXiv:1903.08008.
interpret_ess(1001) interpret_ess(c(852, 1200)) interpret_rhat(1.00) interpret_rhat(c(1.5, 0.9))
interpret_ess(1001) interpret_ess(c(852, 1200)) interpret_rhat(1.00) interpret_rhat(c(1.5, 0.9))
Interpretation of indices of fit found in confirmatory analysis or structural equation modelling, such as RMSEA, CFI, NFI, IFI, etc.
interpret_gfi(x, rules = "byrne1994") interpret_agfi(x, rules = "byrne1994") interpret_nfi(x, rules = "byrne1994") interpret_nnfi(x, rules = "byrne1994") interpret_cfi(x, rules = "byrne1994") interpret_rfi(x, rules = "default") interpret_ifi(x, rules = "default") interpret_pnfi(x, rules = "default") interpret_rmsea(x, rules = "byrne1994") interpret_srmr(x, rules = "byrne1994") ## S3 method for class 'lavaan' interpret(x, ...) ## S3 method for class 'performance_lavaan' interpret(x, ...)
interpret_gfi(x, rules = "byrne1994") interpret_agfi(x, rules = "byrne1994") interpret_nfi(x, rules = "byrne1994") interpret_nnfi(x, rules = "byrne1994") interpret_cfi(x, rules = "byrne1994") interpret_rfi(x, rules = "default") interpret_ifi(x, rules = "default") interpret_pnfi(x, rules = "default") interpret_rmsea(x, rules = "byrne1994") interpret_srmr(x, rules = "byrne1994") ## S3 method for class 'lavaan' interpret(x, ...) ## S3 method for class 'performance_lavaan' interpret(x, ...)
x |
vector of values, or an object of class |
rules |
Can be the name of a set of rules (see below) or custom set of
|
... |
Currently not used. |
Chisq: The model Chi-squared assesses overall fit and the discrepancy between the sample and fitted covariance matrices. Its p-value should be > .05 (i.e., the hypothesis of a perfect fit cannot be rejected). However, it is quite sensitive to sample size.
GFI/AGFI: The (Adjusted) Goodness of Fit is the proportion of variance
accounted for by the estimated population covariance. Analogous to R2. The
GFI and the AGFI should be > .95 and > .90, respectively (Byrne, 1994;
"byrne1994"
).
NFI/NNFI/TLI: The (Non) Normed Fit Index. An NFI of 0.95, indicates the
model of interest improves the fit by 95\
NNFI (also called the Tucker Lewis index; TLI) is preferable for smaller
samples. They should be > .90 (Byrne, 1994; "byrne1994"
) or > .95
(Schumacker & Lomax, 2004; "schumacker2004"
).
CFI: The Comparative Fit Index is a revised form of NFI. Not very
sensitive to sample size (Fan, Thompson, & Wang, 1999). Compares the fit of a
target model to the fit of an independent, or null, model. It should be > .96
(Hu & Bentler, 1999; "hu&bentler1999"
) or .90 (Byrne, 1994; "byrne1994"
).
RFI: the Relative Fit Index, also known as RHO1, is not guaranteed to vary from 0 to 1. However, RFI close to 1 indicates a good fit.
IFI: the Incremental Fit Index (IFI) adjusts the Normed Fit Index (NFI) for sample size and degrees of freedom (Bollen's, 1989). Over 0.90 is a good fit, but the index can exceed 1.
PNFI: the Parsimony-Adjusted Measures Index. There is no commonly agreed-upon cutoff value for an acceptable model for this index. Should be > 0.50.
RMSEA: The Root Mean Square Error of Approximation is a
parsimony-adjusted index. Values closer to 0 represent a good fit. It should
be < .08 (Awang, 2012; "awang2012"
) or < .05 (Byrne, 1994; "byrne1994"
).
The p-value printed with it tests the hypothesis that RMSEA is less than or
equal to .05 (a cutoff sometimes used for good fit), and thus should be not
significant.
RMR/SRMR: the (Standardized) Root Mean Square Residual represents the
square-root of the difference between the residuals of the sample covariance
matrix and the hypothesized model. As the RMR can be sometimes hard to
interpret, better to use SRMR. Should be < .08 (Byrne, 1994; "byrne1994"
).
See the documentation for fitmeasures()
.
For structural equation models (SEM), Kline (2015) suggests that at a minimum the following indices should be reported: The model chi-square, the RMSEA, the CFI and the SRMR.
When possible, it is recommended to report dynamic cutoffs of fit indices. See https://dynamicfit.app/cfa/.
Awang, Z. (2012). A handbook on SEM. Structural equation modeling.
Byrne, B. M. (1994). Structural equation modeling with EQS and EQS/Windows. Thousand Oaks, CA: Sage Publications.
Fan, X., B. Thompson, and L. Wang (1999). Effects of sample size, estimation method, and model specification on structural equation modeling fit indexes. Structural Equation Modeling, 6, 56-83.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural equation modeling: a multidisciplinary journal, 6(1), 1-55.
Kline, R. B. (2015). Principles and practice of structural equation modeling. Guilford publications.
Schumacker, R. E., and Lomax, R. G. (2004). A beginner's guide to structural equation modeling, Second edition. Mahwah, NJ: Lawrence Erlbaum Associates.
Tucker, L. R., and Lewis, C. (1973). The reliability coefficient for maximum likelihood factor analysis. Psychometrika, 38, 1-10.
interpret_gfi(c(.5, .99)) interpret_agfi(c(.5, .99)) interpret_nfi(c(.5, .99)) interpret_nnfi(c(.5, .99)) interpret_cfi(c(.5, .99)) interpret_rmsea(c(.07, .04)) interpret_srmr(c(.5, .99)) interpret_rfi(c(.5, .99)) interpret_ifi(c(.5, .99)) interpret_pnfi(c(.5, .99)) # Structural Equation Models (SEM) structure <- " ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 dem60 ~ ind60 " model <- lavaan::sem(structure, data = lavaan::PoliticalDemocracy) interpret(model)
interpret_gfi(c(.5, .99)) interpret_agfi(c(.5, .99)) interpret_nfi(c(.5, .99)) interpret_nnfi(c(.5, .99)) interpret_cfi(c(.5, .99)) interpret_rmsea(c(.07, .04)) interpret_srmr(c(.5, .99)) interpret_rfi(c(.5, .99)) interpret_ifi(c(.5, .99)) interpret_pnfi(c(.5, .99)) # Structural Equation Models (SEM) structure <- " ind60 =~ x1 + x2 + x3 dem60 =~ y1 + y2 + y3 dem60 ~ ind60 " model <- lavaan::sem(structure, data = lavaan::PoliticalDemocracy) interpret(model)
The value of an ICC lies between 0 to 1, with 0 indicating no reliability among raters and 1 indicating perfect reliability.
interpret_icc(icc, rules = "koo2016", ...)
interpret_icc(icc, rules = "koo2016", ...)
icc |
Value or vector of Intraclass Correlation Coefficient (ICC) values. |
rules |
Can be |
... |
Not used for now. |
Koo (2016) ("koo2016"
; default)
ICC < 0.50 - Poor reliability
0.5 <= ICC < 0.75 - Moderate reliability
0.75 <= ICC < 0.9 - Good reliability
**ICC >= 0.9 ** - Excellent reliability
Koo, T. K., and Li, M. Y. (2016). A guideline of selecting and reporting intraclass correlation coefficients for reliability research. Journal of chiropractic medicine, 15(2), 155-163.
interpret_icc(0.6) interpret_icc(c(0.4, 0.8))
interpret_icc(0.6) interpret_icc(c(0.4, 0.8))
Interpret Kendall's Coefficient of Concordance W
interpret_kendalls_w(w, rules = "landis1977")
interpret_kendalls_w(w, rules = "landis1977")
w |
Value or vector of Kendall's coefficient of concordance. |
rules |
Can be |
Landis & Koch (1977) ("landis1977"
; default)
0.00 <= w < 0.20 - Slight agreement
0.20 <= w < 0.40 - Fair agreement
0.40 <= w < 0.60 - Moderate agreement
0.60 <= w < 0.80 - Substantial agreement
w >= 0.80 - Almost perfect agreement
Landis, J. R., & Koch G. G. (1977). The measurement of observer agreement for categorical data. Biometrics, 33:159-74.
Interpret Odds Ratio
interpret_oddsratio(OR, rules = "chen2010", log = FALSE, ...)
interpret_oddsratio(OR, rules = "chen2010", log = FALSE, ...)
OR |
Value or vector of (log) odds ratio values. |
rules |
Can be " |
log |
Are the provided values log odds ratio. |
... |
Currently not used. |
Rules apply to OR as ratios, so OR of 10 is as extreme as a OR of 0.1 (1/10).
Chen et al. (2010) ("chen2010"
; default)
OR < 1.68 - Very small
1.68 <= OR < 3.47 - Small
3.47 <= OR < 6.71 - Medium
**OR >= 6.71 ** - Large
Cohen (1988) ("cohen1988"
, based on the oddsratio_to_d()
conversion, see interpret_cohens_d()
)
OR < 1.44 - Very small
1.44 <= OR < 2.48 - Small
2.48 <= OR < 4.27 - Medium
**OR >= 4.27 ** - Large
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Chen, H., Cohen, P., & Chen, S. (2010). How big is a big odds ratio? Interpreting the magnitudes of odds ratios in epidemiological studies. Communications in Statistics-Simulation and Computation, 39(4), 860-864.
Sánchez-Meca, J., Marín-Martínez, F., & Chacón-Moscoso, S. (2003). Effect-size indices for dichotomized outcomes in meta-analysis. Psychological methods, 8(4), 448.
interpret_oddsratio(1) interpret_oddsratio(c(5, 2))
interpret_oddsratio(1) interpret_oddsratio(c(5, 2))
Interpret ANOVA Effect Sizes
interpret_omega_squared(es, rules = "field2013", ...) interpret_eta_squared(es, rules = "field2013", ...) interpret_epsilon_squared(es, rules = "field2013", ...) interpret_r2_semipartial(es, rules = "field2013", ...)
interpret_omega_squared(es, rules = "field2013", ...) interpret_eta_squared(es, rules = "field2013", ...) interpret_epsilon_squared(es, rules = "field2013", ...) interpret_r2_semipartial(es, rules = "field2013", ...)
es |
Value or vector of (partial) eta / omega / epsilon squared or semipartial r squared values. |
rules |
Can be |
... |
Not used for now. |
Field (2013) ("field2013"
; default)
ES < 0.01 - Very small
0.01 <= ES < 0.06 - Small
0.06 <= ES < 0.14 - Medium
**ES >= 0.14 ** - Large
Cohen (1992) ("cohen1992"
) applicable to one-way anova, or to partial
eta / omega / epsilon squared in multi-way anova.
ES < 0.02 - Very small
0.02 <= ES < 0.13 - Small
0.13 <= ES < 0.26 - Medium
ES >= 0.26 - Large
Field, A (2013) Discovering statistics using IBM SPSS Statistics. Fourth Edition. Sage:London.
Cohen, J. (1992). A power primer. Psychological bulletin, 112(1), 155.
https://imaging.mrc-cbu.cam.ac.uk/statswiki/FAQ/effectSize/
interpret_eta_squared(.02) interpret_eta_squared(c(.5, .02), rules = "cohen1992")
interpret_eta_squared(.02) interpret_eta_squared(c(.5, .02), rules = "cohen1992")
Interpret p-Values
interpret_p(p, rules = "default")
interpret_p(p, rules = "default")
p |
Value or vector of p-values. |
rules |
Can be |
Default
p >= 0.05 - Not significant
p < 0.05 - Significant
Benjamin et al. (2018) ("rss"
)
p >= 0.05 - Not significant
0.005 <= p < 0.05 - Suggestive
p < 0.005 - Significant
Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmakers, E. J., Berk, R., ... & Cesarini, D. (2018). Redefine statistical significance. Nature Human Behaviour, 2(1), 6-10.
interpret_p(c(.5, .02, 0.001)) interpret_p(c(.5, .02, 0.001), rules = "rss") stars <- rules(c(0.001, 0.01, 0.05, 0.1), c("***", "**", "*", "+", ""), right = FALSE, name = "stars" ) interpret_p(c(.5, .02, 0.001), rules = stars)
interpret_p(c(.5, .02, 0.001)) interpret_p(c(.5, .02, 0.001), rules = "rss") stars <- rules(c(0.001, 0.01, 0.05, 0.1), c("***", "**", "*", "+", ""), right = FALSE, name = "stars" ) interpret_p(c(.5, .02, 0.001), rules = stars)
Interpret Probability of Direction (pd)
interpret_pd(pd, rules = "default", ...)
interpret_pd(pd, rules = "default", ...)
pd |
Value or vector of probabilities of direction. |
rules |
Can be |
... |
Not directly used. |
Default (i.e., equivalent to p-values)
pd <= 0.975 - not significant
pd > 0.975 - significant
Makowski et al. (2019) ("makowski2019"
)
pd <= 0.95 - uncertain
pd > 0.95 - possibly existing
pd > 0.97 - likely existing
pd > 0.99 - probably existing
pd > 0.999 - certainly existing
Makowski, D., Ben-Shachar, M. S., Chen, S. H., and Lüdecke, D. (2019). Indices of effect existence and significance in the Bayesian framework. Frontiers in psychology, 10, 2767.
interpret_pd(.98) interpret_pd(c(.96, .99), rules = "makowski2019")
interpret_pd(.98) interpret_pd(c(.96, .99), rules = "makowski2019")
Interpret Correlation Coefficient
interpret_r(r, rules = "funder2019", ...) interpret_phi(r, rules = "funder2019", ...) interpret_cramers_v(r, rules = "funder2019", ...) interpret_rank_biserial(r, rules = "funder2019", ...) interpret_fei(r, rules = "funder2019", ...)
interpret_r(r, rules = "funder2019", ...) interpret_phi(r, rules = "funder2019", ...) interpret_cramers_v(r, rules = "funder2019", ...) interpret_rank_biserial(r, rules = "funder2019", ...) interpret_fei(r, rules = "funder2019", ...)
r |
Value or vector of correlation coefficient. |
rules |
Can be |
... |
Not directly used. |
Since Cohen's w does not have a fixed upper bound, for all by the most
simple of cases (2-by-2 or 1-by-2 tables), interpreting Cohen's w as a
correlation coefficient is inappropriate (Ben-Shachar, et al., 2024; Cohen,
1988, p. 222). Please us cramers_v()
of the like instead.
Rules apply to positive and negative r alike.
Funder & Ozer (2019) ("funder2019"
; default)
r < 0.05 - Tiny
0.05 <= r < 0.1 - Very small
0.1 <= r < 0.2 - Small
0.2 <= r < 0.3 - Medium
0.3 <= r < 0.4 - Large
r >= 0.4 - Very large
Gignac & Szodorai (2016) ("gignac2016"
)
r < 0.1 - Very small
0.1 <= r < 0.2 - Small
0.2 <= r < 0.3 - Moderate
r >= 0.3 - Large
Cohen (1988) ("cohen1988"
)
r < 0.1 - Very small
0.1 <= r < 0.3 - Small
0.3 <= r < 0.5 - Moderate
r >= 0.5 - Large
Lovakov & Agadullina (2021) ("lovakov2021"
)
r < 0.12 - Very small
0.12 <= r < 0.24 - Small
0.24 <= r < 0.41 - Moderate
r >= 0.41 - Large
Evans (1996) ("evans1996"
)
r < 0.2 - Very weak
0.2 <= r < 0.4 - Weak
0.4 <= r < 0.6 - Moderate
0.6 <= r < 0.8 - Strong
r >= 0.8 - Very strong
As can be larger than 1 - it is recommended to compute
and interpret Cramer's V instead.
Lovakov, A., & Agadullina, E. R. (2021). Empirically Derived Guidelines for Effect Size Interpretation in Social Psychology. European Journal of Social Psychology.
Funder, D. C., & Ozer, D. J. (2019). Evaluating effect size in psychological research: sense and nonsense. Advances in Methods and Practices in Psychological Science.
Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and individual differences, 102, 74-78.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Evans, J. D. (1996). Straightforward statistics for the behavioral sciences. Thomson Brooks/Cole Publishing Co.
Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Page 88 of APA's 6th Edition.
interpret_r(.015) interpret_r(c(.5, -.02)) interpret_r(.3, rules = "lovakov2021")
interpret_r(.015) interpret_r(c(.5, -.02)) interpret_r(.3, rules = "lovakov2021")
)Interpret Coefficient of Determination ()
interpret_r2(r2, rules = "cohen1988")
interpret_r2(r2, rules = "cohen1988")
r2 |
Value or vector of |
rules |
Can be |
Cohen (1988) ("cohen1988"
; default)
R2 < 0.02 - Very weak
0.02 <= R2 < 0.13 - Weak
0.13 <= R2 < 0.26 - Moderate
R2 >= 0.26 - Substantial
Falk & Miller (1992) ("falk1992"
)
R2 < 0.1 - Negligible
R2 >= 0.1 - Adequate
Chin, W. W. (1998) ("chin1998"
)
R2 < 0.19 - Very weak
0.19 <= R2 < 0.33 - Weak
0.33 <= R2 < 0.67 - Moderate
R2 >= 0.67 - Substantial
Hair et al. (2011) ("hair2011"
)
R2 < 0.25 - Very weak
0.25 <= R2 < 0.50 - Weak
0.50 <= R2 < 0.75 - Moderate
R2 >= 0.75 - Substantial
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Falk, R. F., & Miller, N. B. (1992). A primer for soft modeling. University of Akron Press.
Chin, W. W. (1998). The partial least squares approach to structural equation modeling. Modern methods for business research, 295(2), 295-336.
Hair, J. F., Ringle, C. M., & Sarstedt, M. (2011). PLS-SEM: Indeed a silver bullet. Journal of Marketing theory and Practice, 19(2), 139-152.
interpret_r2(.02) interpret_r2(c(.5, .02))
interpret_r2(.02) interpret_r2(c(.5, .02))
Interpretation of
interpret_rope(rope, ci = 0.9, rules = "default")
interpret_rope(rope, ci = 0.9, rules = "default")
rope |
Value or vector of percentages in ROPE. |
ci |
The Credible Interval (CI) probability, corresponding to the proportion of HDI, that was used. Can be |
rules |
A character string (see details) or a custom set of |
Default
For CI < 1
Rope = 0 - Significant
0 < Rope < 1 - Undecided
Rope = 1 - Negligible
For CI = 1
Rope < 0.01 - Significant
0.01 < Rope < 0.025 - Probably significant
0.025 < Rope < 0.975 - Undecided
0.975 < Rope < 0.99 - Probably negligible
Rope > 0.99 - Negligible
BayestestR's reporting guidelines
interpret_rope(0, ci = 0.9) interpret_rope(c(0.005, 0.99), ci = 1)
interpret_rope(0, ci = 0.9) interpret_rope(c(0.005, 0.99), ci = 1)
Interpret VIF index of multicollinearity.
interpret_vif(vif, rules = "default")
interpret_vif(vif, rules = "default")
vif |
Value or vector of VIFs. |
rules |
Can be |
Default
VIF < 5 - Low
5 <= VIF < 10 - Moderate
VIF >= 10 - High
interpret_vif(c(1.4, 30.4))
interpret_vif(c(1.4, 30.4))
For use by other functions and packages.
is_effectsize_name(x, ignore_case = TRUE) get_effectsize_name(x, ignore_case = TRUE) get_effectsize_label( x, ignore_case = TRUE, use_symbols = getOption("es.use_symbols", FALSE) )
is_effectsize_name(x, ignore_case = TRUE) get_effectsize_name(x, ignore_case = TRUE) get_effectsize_label( x, ignore_case = TRUE, use_symbols = getOption("es.use_symbols", FALSE) )
x |
A character, or a vector. |
ignore_case |
Should case of input be ignored? |
use_symbols |
Should proper symbols be printed ( |
Compute effect size indices for standardized difference between two normal
multivariate distributions or between one multivariate distribution and a
defined point. This is the standardized effect size for Hotelling's
test (e.g.,
DescTools::HotellingsT2Test()
). D is computed as:
Where are the column means,
is the pooled
covariance matrix, and
is a vector of the null differences for each
variable. When there is only one variate, this formula reduces to Cohen's
d.
mahalanobis_d( x, y = NULL, data = NULL, pooled_cov = TRUE, mu = 0, ci = 0.95, alternative = "greater", verbose = TRUE, ... )
mahalanobis_d( x, y = NULL, data = NULL, pooled_cov = TRUE, mu = 0, ci = 0.95, alternative = "greater", verbose = TRUE, ... )
x , y
|
A data frame or matrix. Any incomplete observations (with |
data |
An optional data frame containing the variables. |
pooled_cov |
Should equal covariance be assumed? Currently only
|
mu |
A named list/vector of the true difference in means for each variable. Can also be a vector of length 1, which will be recycled. |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Not used. |
To specify a x
as a formula:
Two sample case: DV1 + DV2 ~ group
or cbind(DV1, DV2) ~ group
One sample case: DV1 + DV2 ~ 1
or cbind(DV1, DV2) ~ 1
A data frame with the Mahalanobis_D
and potentially its CI
(CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Del Giudice, M. (2017). Heterogeneity coefficients for Mahalanobis' D as a multivariate effect size. Multivariate Behavioral Research, 52(2), 216-221.
Mahalanobis, P. C. (1936). On the generalized distance in statistics. National Institute of Science of India.
Reiser, B. (2001). Confidence intervals for the Mahalanobis distance. Communications in Statistics-Simulation and Computation, 30(1), 37-45.
stats::mahalanobis()
, cov_pooled()
Other standardized differences:
cohens_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
## Two samples -------------- mtcars_am0 <- subset(mtcars, am == 0, select = c(mpg, hp, cyl) ) mtcars_am1 <- subset(mtcars, am == 1, select = c(mpg, hp, cyl) ) mahalanobis_d(mtcars_am0, mtcars_am1) # Or mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars) mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, alternative = "two.sided") # Different mu: mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, mu = c(mpg = -4, hp = 15, cyl = 0) ) # D is a multivariate d, so when only 1 variate is provided: mahalanobis_d(hp ~ am, data = mtcars) cohens_d(hp ~ am, data = mtcars) # One sample --------------------------- mahalanobis_d(mtcars[, c("mpg", "hp", "cyl")]) # Or mahalanobis_d(mpg + hp + cyl ~ 1, data = mtcars, mu = c(mpg = 15, hp = 5, cyl = 3) )
## Two samples -------------- mtcars_am0 <- subset(mtcars, am == 0, select = c(mpg, hp, cyl) ) mtcars_am1 <- subset(mtcars, am == 1, select = c(mpg, hp, cyl) ) mahalanobis_d(mtcars_am0, mtcars_am1) # Or mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars) mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, alternative = "two.sided") # Different mu: mahalanobis_d(mpg + hp + cyl ~ am, data = mtcars, mu = c(mpg = -4, hp = 15, cyl = 0) ) # D is a multivariate d, so when only 1 variate is provided: mahalanobis_d(hp ~ am, data = mtcars) cohens_d(hp ~ am, data = mtcars) # One sample --------------------------- mahalanobis_d(mtcars[, c("mpg", "hp", "cyl")]) # Or mahalanobis_d(mpg + hp + cyl ~ 1, data = mtcars, mu = c(mpg = 15, hp = 5, cyl = 3) )
Computes the ratio of two means (also known as the "response ratio"; RR) of
variables on a ratio scale (with an absolute 0). Pair with any reported
stats::t.test()
.
means_ratio( x, y = NULL, data = NULL, paired = FALSE, adjust = TRUE, log = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
means_ratio( x, y = NULL, data = NULL, paired = FALSE, adjust = TRUE, log = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y
|
A numeric vector, or a character name of one in |
data |
An optional data frame containing the variables. |
paired |
If |
adjust |
Should the effect size be adjusted for small-sample bias?
Defaults to |
log |
Should the log-ratio be returned? Defaults to |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
The Means Ratio ranges from 0 to , with values smaller than 1
indicating that the second mean is larger than the first, values larger than
1 indicating that the second mean is smaller than the first, and values of 1
indicating that the means are equal.
A data frame with the effect size (Means_ratio
or
Means_ratio_adjusted
) and their CIs (CI_low
and CI_high
).
Confidence intervals are estimated as described by Lajeunesse (2011 & 2015) using the log-ratio standard error assuming a normal distribution. By this method, the log is taken of the ratio of means, which makes this outcome measure symmetric around 0 and yields a corresponding sampling distribution that is closer to normality.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
The small-sample bias corrected response ratio reported from this function is derived from Lajeunesse (2015).
Lajeunesse, M. J. (2011). On the meta-analysis of response ratios for studies with correlated and multi-group designs. Ecology, 92(11), 2049-2055. doi:10.1890/11-0423.1
Lajeunesse, M. J. (2015). Bias and correction for the log response ratio in ecological meta-analysis. Ecology, 96(8), 2056-2063. doi:10.1890/14-2402.1
Hedges, L. V., Gurevitch, J., & Curtis, P. S. (1999). The meta-analysis of response ratios in experimental ecology. Ecology, 80(4), 1150–1156. doi:10.1890/0012-9658(1999)080[1150:TMAORR]2.0.CO;2
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
p_superiority()
,
rank_biserial()
,
repeated_measures_d()
x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) means_ratio(x, y) means_ratio(x, y, adjust = FALSE) means_ratio(x, y, log = TRUE) # The ratio is scale invariant, making it a standardized effect size means_ratio(3 * x, 3 * y)
x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) means_ratio(x, y) means_ratio(x, y, adjust = FALSE) means_ratio(x, y, log = TRUE) # The ratio is scale invariant, making it a standardized effect size means_ratio(3 * x, 3 * y)
Fictional data.
A 4-by-3 table, with a column for each major and a row for each type of music.
data("Music_preferences") Music_preferences #> Pop Rock Jazz Classic #> Psych 150 100 165 130 #> Econ 50 65 35 10 #> Law 2 55 40 25
Other effect size datasets:
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Fictional data, with more extreme preferences than Music_preferences
A 4-by-3 table, with a column for each major and a row for each type of music.
data("Music_preferences2") Music_preferences2 #> Pop Rock Jazz Classic #> Psych 151 130 12 7 #> Econ 77 6 111 4 #> Law 0 4 2 165
Other effect size datasets:
Music_preferences
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Convert Between Odds and Probabilities
odds_to_probs(odds, log = FALSE, ...) ## S3 method for class 'data.frame' odds_to_probs(odds, log = FALSE, select = NULL, exclude = NULL, ...) probs_to_odds(probs, log = FALSE, ...) ## S3 method for class 'data.frame' probs_to_odds(probs, log = FALSE, select = NULL, exclude = NULL, ...)
odds_to_probs(odds, log = FALSE, ...) ## S3 method for class 'data.frame' odds_to_probs(odds, log = FALSE, select = NULL, exclude = NULL, ...) probs_to_odds(probs, log = FALSE, ...) ## S3 method for class 'data.frame' probs_to_odds(probs, log = FALSE, select = NULL, exclude = NULL, ...)
odds |
The Odds (or |
log |
Take in or output log odds (such as in logistic models). |
... |
Arguments passed to or from other methods. |
select |
When a data frame is passed, character or list of of column names to be transformed. |
exclude |
When a data frame is passed, character or list of column names to be excluded from transformation. |
probs |
Probability values to convert. |
Converted index.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
oddsratio_to_riskratio()
,
w_to_fei()
odds_to_probs(3) odds_to_probs(1.09, log = TRUE) probs_to_odds(0.95) probs_to_odds(0.95, log = TRUE)
odds_to_probs(3) odds_to_probs(1.09, log = TRUE) probs_to_odds(0.95) probs_to_odds(0.95, log = TRUE)
Compute Odds Ratios, Risk Ratios, Cohen's h, Absolute Risk Reduction or
Number Needed to Treat. Report with any stats::chisq.test()
or
stats::fisher.test()
.
Note that these are computed with each column representing the different
groups, and the first column representing the treatment group and the
second column baseline (or control). Effects are given as treatment / control
. If you wish you use rows as groups you must pass a transposed
table, or switch the x
and y
arguments.
oddsratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) riskratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) cohens_h(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) arr(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) nnt(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
oddsratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) riskratio(x, y = NULL, ci = 0.95, alternative = "two.sided", log = FALSE, ...) cohens_h(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) arr(x, y = NULL, ci = 0.95, alternative = "two.sided", ...) nnt(x, y = NULL, ci = 0.95, alternative = "two.sided", ...)
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
log |
Take in or output the log of the ratio (such as in logistic models), e.g. when the desired input or output are log odds ratios instead odds ratios. |
... |
Ignored |
A data frame with the effect size (Odds_ratio
, Risk_ratio
(possibly with the prefix log_
), Cohens_h
, ARR
, NNT
) and its CIs
(CI_low
and CI_high
).
Confidence intervals are estimated using the standard normal parametric method (see Katz et al., 1978; Szumilas, 2010).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Katz, D. J. S. M., Baptista, J., Azen, S. P., & Pike, M. C. (1978). Obtaining confidence intervals for the risk ratio in cohort studies. Biometrics, 469-474.
Szumilas, M. (2010). Explaining odds ratios. Journal of the Canadian academy of child and adolescent psychiatry, 19(3), 227.
Other effect sizes for contingency table:
cohens_g()
,
phi()
data("RCT_table") RCT_table # note groups are COLUMNS oddsratio(RCT_table) oddsratio(RCT_table, alternative = "greater") riskratio(RCT_table) cohens_h(RCT_table) arr(RCT_table) nnt(RCT_table)
data("RCT_table") RCT_table # note groups are COLUMNS oddsratio(RCT_table) oddsratio(RCT_table, alternative = "greater") riskratio(RCT_table) cohens_h(RCT_table) arr(RCT_table) nnt(RCT_table)
Convert Between Odds Ratios, Risk Ratios and Other Metrics of Change in Probabilities
oddsratio_to_riskratio(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_arr(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_nnt(OR, p0, log = FALSE, verbose = TRUE, ...) logoddsratio_to_riskratio(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_arr(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_nnt(logOR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_oddsratio(RR, p0, log = FALSE, verbose = TRUE, ...) riskratio_to_arr(RR, p0, verbose = TRUE, ...) riskratio_to_logoddsratio(RR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_nnt(RR, p0, verbose = TRUE, ...) arr_to_riskratio(ARR, p0, verbose = TRUE, ...) arr_to_oddsratio(ARR, p0, log = FALSE, verbose = TRUE, ...) arr_to_logoddsratio(ARR, p0, log = TRUE, verbose = TRUE, ...) arr_to_nnt(ARR, ...) nnt_to_oddsratio(NNT, p0, log = FALSE, verbose = TRUE, ...) nnt_to_logoddsratio(NNT, p0, log = TRUE, verbose = TRUE, ...) nnt_to_riskratio(NNT, p0, verbose = TRUE, ...) nnt_to_arr(NNT, ...)
oddsratio_to_riskratio(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_arr(OR, p0, log = FALSE, verbose = TRUE, ...) oddsratio_to_nnt(OR, p0, log = FALSE, verbose = TRUE, ...) logoddsratio_to_riskratio(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_arr(logOR, p0, log = TRUE, verbose = TRUE, ...) logoddsratio_to_nnt(logOR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_oddsratio(RR, p0, log = FALSE, verbose = TRUE, ...) riskratio_to_arr(RR, p0, verbose = TRUE, ...) riskratio_to_logoddsratio(RR, p0, log = TRUE, verbose = TRUE, ...) riskratio_to_nnt(RR, p0, verbose = TRUE, ...) arr_to_riskratio(ARR, p0, verbose = TRUE, ...) arr_to_oddsratio(ARR, p0, log = FALSE, verbose = TRUE, ...) arr_to_logoddsratio(ARR, p0, log = TRUE, verbose = TRUE, ...) arr_to_nnt(ARR, ...) nnt_to_oddsratio(NNT, p0, log = FALSE, verbose = TRUE, ...) nnt_to_logoddsratio(NNT, p0, log = TRUE, verbose = TRUE, ...) nnt_to_riskratio(NNT, p0, verbose = TRUE, ...) nnt_to_arr(NNT, ...)
OR , logOR , RR , ARR , NNT
|
Odds-ratio of |
p0 |
Baseline risk |
log |
If:
|
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to and from other methods. |
Converted index, or if OR
/logOR
is a logistic regression model, a
parameter table with the converted indices.
Grant, R. L. (2014). Converting an odds ratio to a range of plausible relative risks for better communication of research findings. Bmj, 348, f7450.
oddsratio()
, riskratio()
, arr()
, and nnt()
.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
w_to_fei()
p0 <- 0.4 p1 <- 0.7 (OR <- probs_to_odds(p1) / probs_to_odds(p0)) (RR <- p1 / p0) (ARR <- p1 - p0) (NNT <- arr_to_nnt(ARR)) riskratio_to_oddsratio(RR, p0 = p0) oddsratio_to_riskratio(OR, p0 = p0) riskratio_to_arr(RR, p0 = p0) arr_to_oddsratio(nnt_to_arr(NNT), p0 = p0) m <- glm(am ~ factor(cyl), data = mtcars, family = binomial() ) oddsratio_to_riskratio(m, verbose = FALSE) # RR is relative to the intercept if p0 not provided
p0 <- 0.4 p1 <- 0.7 (OR <- probs_to_odds(p1) / probs_to_odds(p0)) (RR <- p1 / p0) (ARR <- p1 - p0) (NNT <- arr_to_nnt(ARR)) riskratio_to_oddsratio(RR, p0 = p0) oddsratio_to_riskratio(OR, p0 = p0) riskratio_to_arr(RR, p0 = p0) arr_to_oddsratio(nnt_to_arr(NNT), p0 = p0) m <- glm(am ~ factor(cyl), data = mtcars, family = binomial() ) oddsratio_to_riskratio(m, verbose = FALSE) # RR is relative to the intercept if p0 not provided
Cohen's ,
, and
, probability of superiority,
proportion of overlap, Wilcoxon-Mann-Whitney odds, and Vargha and Delaney's
A are CLESs. These are effect sizes that represent differences between two
(independent) distributions in probabilistic terms (See details). Pair with
any reported
stats::t.test()
or stats::wilcox.test()
.
p_superiority( x, y = NULL, data = NULL, mu = 0, paired = FALSE, parametric = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cohens_u1( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u2( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u3( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) p_overlap( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) vd_a( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) wmw_odds( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
p_superiority( x, y = NULL, data = NULL, mu = 0, paired = FALSE, parametric = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cohens_u1( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u2( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) cohens_u3( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) p_overlap( x, y = NULL, data = NULL, mu = 0, parametric = TRUE, ci = 0.95, alternative = "two.sided", iterations = 200, verbose = TRUE, ... ) vd_a( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) wmw_odds( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y
|
A numeric vector, or a character name of one in |
data |
An optional data frame containing the variables. |
mu |
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). |
paired |
If |
parametric |
Use parametric estimation (see |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
iterations |
The number of bootstrap replicates for computing confidence
intervals. Only applies when |
These measures of effect size present group differences in probabilistic terms:
Probability of superiority is the probability that, when sampling an observation from each of the groups at random, that the observation from the second group will be larger than the sample from the first group. For the one-sample (or paired) case, it is the probability that the sample (or difference) is larger than mu. (Vargha and Delaney's A is an alias for the non-parametric probability of superiority.)
Cohen's is the proportion of the total of both distributions
that does not overlap.
Cohen's is the proportion of one of the groups that exceeds
the same proportion in the other group.
Cohen's is the proportion of the second group that is smaller
than the median of the first group.
Overlap (OVL) is the proportional overlap between the distributions.
(When parametric = FALSE
, bayestestR::overlap()
is used.)
Wilcoxon-Mann-Whitney odds are the odds of
non-parametric superiority (via probs_to_odds()
), that is the odds that,
when sampling an observation from each of the groups at random, that the
observation from the second group will be larger than the sample from the
first group.
Where ,
, and Overlap are agnostic to the direction of
the difference between the groups,
and probability of superiority
are not.
The parametric version of these effects assumes normality of both populations and homoscedasticity. If those are not met, the non parametric versions should be used.
A data frame containing the common language effect sizes (and optionally their CIs).
For parametric CLES, the CIs are transformed CIs for Cohen's d (see
d_to_u3()
). For non-parametric (parametric = FALSE
) CLES, the CI of
Pr(superiority) is a transformed CI of the rank-biserial correlation
(rb_to_p_superiority()
), while for all others, confidence intervals are
estimated using the bootstrap method (using the {boot}
package).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
If mu
is not 0, the effect size represents the difference between the
first shifted sample (by mu
) and the second sample.
Cohen, J. (1977). Statistical power analysis for the behavioral sciences. New York: Routledge.
Reiser, B., & Faraggi, D. (1999). Confidence intervals for the overlapping coefficient: the normal equal variance case. Journal of the Royal Statistical Society, 48(3), 413-418.
Ruscio, J. (2008). A probability-based measure of effect size: robustness to base rates and other factors. Psychological methods, 13(1), 19–30.
Vargha, A., & Delaney, H. D. (2000). A critique and improvement of the CL common language effect size statistics of McGraw and Wong. Journal of Educational and Behavioral Statistics, 25(2), 101-132.
O’Brien, R. G., & Castelloe, J. (2006, March). Exploiting the link between the Wilcoxon-Mann-Whitney test and a simple odds statistic. In Proceedings of the Thirty-first Annual SAS Users Group International Conference (pp. 209-31). Cary, NC: SAS Institute.
Agresti, A. (1980). Generalized odds ratios for ordinal data. Biometrics, 59-67.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
rank_biserial()
,
repeated_measures_d()
Other rank-based effect sizes:
rank_biserial()
,
rank_epsilon_squared()
cohens_u2(mpg ~ am, data = mtcars) p_superiority(mpg ~ am, data = mtcars, parametric = FALSE) wmw_odds(mpg ~ am, data = mtcars) x <- c(1.83, 0.5, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.3) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) p_overlap(x, y) p_overlap(y, x) # direction of effect does not matter cohens_u3(x, y) cohens_u3(y, x) # direction of effect does matter
cohens_u2(mpg ~ am, data = mtcars) p_superiority(mpg ~ am, data = mtcars, parametric = FALSE) wmw_odds(mpg ~ am, data = mtcars) x <- c(1.83, 0.5, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.3) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) p_overlap(x, y) p_overlap(y, x) # direction of effect does not matter cohens_u3(x, y) cohens_u3(y, x) # direction of effect does matter
and Other Contingency Tables CorrelationsCompute phi (), Cramer's V, Tschuprow's T, Cohen's w,
פ (Fei), Pearson's contingency coefficient for
contingency tables or goodness-of-fit. Pair with any reported
stats::chisq.test()
.
phi(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) cramers_v(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) tschuprows_t( x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) cohens_w( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... ) fei(x, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ...) pearsons_c( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... )
phi(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) cramers_v(x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ...) tschuprows_t( x, y = NULL, adjust = TRUE, ci = 0.95, alternative = "greater", ... ) cohens_w( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... ) fei(x, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ...) pearsons_c( x, y = NULL, p = rep(1, length(x)), ci = 0.95, alternative = "greater", ... )
x |
a numeric vector or matrix. |
y |
a numeric vector; ignored if |
adjust |
Should the effect size be corrected for small-sample bias?
Defaults to |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Ignored. |
p |
a vector of probabilities of the same length as |
phi (), Cramer's V, Tschuprow's T, Cohen's w, and Pearson's
C are effect sizes for tests of independence in 2D contingency tables. For
2-by-2 tables, phi, Cramer's V, Tschuprow's T, and Cohen's w are
identical, and are equal to the simple correlation between two dichotomous
variables, ranging between 0 (no dependence) and 1 (perfect dependence).
For larger tables, Cramer's V, Tschuprow's T or Pearson's C should be
used, as they are bounded between 0-1. (Cohen's w can also be used, but
since it is not bounded at 1 (can be larger) its interpretation is more
difficult.) For square table, Cramer's V and Tschuprow's T give the same
results, but for non-square tables Tschuprow's T is more conservative:
while V will be 1 if either columns are fully dependent on rows (for each
column, there is only one non-0 cell) or rows are fully dependent on
columns, T will only be 1 if both are true.
For goodness-of-fit in 1D tables Cohen's W, פ (Fei)
or Pearson's C can be used. Cohen's w has no upper bound (can be
arbitrarily large, depending on the expected distribution). Fei is an
adjusted Cohen's w, accounting for the expected distribution, making it
bounded between 0-1 (Ben-Shachar et al, 2023). Pearson's C is also bounded
between 0-1.
To summarize, for correlation-like effect sizes, we recommend:
For a 2x2 table, use phi()
For larger tables, use cramers_v()
For goodness-of-fit, use fei()
A data frame with the effect size (Cramers_v
, phi
(possibly with
the suffix _adjusted
), Cohens_w
, Fei
) and its CIs (CI_low
and
CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Johnston, J. E., Berry, K. J., & Mielke Jr, P. W. (2006). Measures of effect size for chi-squared and likelihood-ratio goodness-of-fit tests. Perceptual and motor skills, 103(2), 412-414.
Rosenberg, M. S. (2010). A generalized formula for converting chi-square tests to effect sizes for meta-analysis. PloS one, 5(4), e10059.
chisq_to_phi()
for details regarding estimation and CIs.
Other effect sizes for contingency table:
cohens_g()
,
oddsratio()
## 2-by-2 tables ## ------------- data("RCT_table") RCT_table # note groups are COLUMNS phi(RCT_table) pearsons_c(RCT_table) ## Larger tables ## ------------- data("Music_preferences") Music_preferences cramers_v(Music_preferences) cohens_w(Music_preferences) pearsons_c(Music_preferences) ## Goodness of fit ## --------------- data("Smoking_FASD") Smoking_FASD fei(Smoking_FASD) cohens_w(Smoking_FASD) pearsons_c(Smoking_FASD) # Use custom expected values: fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) pearsons_c(Smoking_FASD, p = c(0.015, 0.010, 0.975))
## 2-by-2 tables ## ------------- data("RCT_table") RCT_table # note groups are COLUMNS phi(RCT_table) pearsons_c(RCT_table) ## Larger tables ## ------------- data("Music_preferences") Music_preferences cramers_v(Music_preferences) cohens_w(Music_preferences) pearsons_c(Music_preferences) ## Goodness of fit ## --------------- data("Smoking_FASD") Smoking_FASD fei(Smoking_FASD) cohens_w(Smoking_FASD) pearsons_c(Smoking_FASD) # Use custom expected values: fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) pearsons_c(Smoking_FASD, p = c(0.015, 0.010, 0.975))
{effectsize}
TablesPrinting, formatting and plotting methods for effectsize
tables.
## S3 method for class 'effectsize_table' plot(x, ...) ## S3 method for class 'effectsize_table' print(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_md(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_html( x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_table' format( x, digits = 2, output = c("text", "markdown", "html"), use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_difference' print(x, digits = 2, append_CLES = NULL, ...)
## S3 method for class 'effectsize_table' plot(x, ...) ## S3 method for class 'effectsize_table' print(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_md(x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ...) ## S3 method for class 'effectsize_table' print_html( x, digits = 2, use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_table' format( x, digits = 2, output = c("text", "markdown", "html"), use_symbols = getOption("es.use_symbols", FALSE), ... ) ## S3 method for class 'effectsize_difference' print(x, digits = 2, append_CLES = NULL, ...)
x |
Object to print. |
... |
Arguments passed to or from other functions. |
digits |
Number of digits for rounding or significant figures. May also
be |
use_symbols |
Should proper symbols be printed ( |
output |
Which output is the formatting intended for? Affects how title and footers are formatted. |
append_CLES |
Which Common Language Effect Sizes should be printed as well? Only applicable to Cohen's d, Hedges' g for independent samples of equal variance (pooled sd) or for the rank-biserial correlation for independent samples (See d_to_cles). |
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
)Compute the semi-partial (part) correlation squared (also known as
). Currently, only
lm()
models are supported.
r2_semipartial( model, type = c("terms", "parameters"), ci = 0.95, alternative = "greater", ... )
r2_semipartial( model, type = c("terms", "parameters"), ci = 0.95, alternative = "greater", ... )
model |
An |
type |
Type, either |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Arguments passed to or from other methods. |
This is similar to the last column of the "Conditional Dominance Statistics"
section of the parameters::dominance_analysis()
output. For each term, the
model is refit without the columns on the model matrix that correspond to that term. The of
this sub-model is then subtracted from the
of the full model to
yield the
. (For
type = "parameters"
, this is done for each
column in the model matrix.)
Note that this is unlike parameters::dominance_analysis()
, where term
deletion is done via the formula interface, and therefore may lead to
different results.
For other, non-lm()
models, as well as more verbose information and
options, please see the documentation for parameters::dominance_analysis()
.
A data frame with the effect size.
Confidence intervals are based on the normal approximation as provided by Alf
and Graf (1999). An adjustment to the lower bound of the CI is used, to
improve the coverage properties of the CIs, according to Algina et al (2008):
If the F test associated with the is significant (at
1-ci
level), but the lower bound of the CI is 0, it is set to a small value
(arbitrarily to a 10th of the estimated ); if the F test is not
significant, the lower bound is set to 0. (Additionally, lower and upper
bound are "fixed" so that they cannot be smaller than 0 or larger than 1.)
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Alf Jr, E. F., & Graf, R. G. (1999). Asymptotic confidence limits for the difference between two squared multiple correlations: A simplified approach. Psychological Methods, 4(1), 70-75. doi:10.1037/1082-989X.4.1.70
Algina, J., Keselman, H. J., & Penfield, R. D. (2008). Confidence intervals for the squared multiple semipartial correlation coefficient. Journal of Modern Applied Statistical Methods, 7(1), 2-10. doi:10.22237/jmasm/1209614460
eta_squared()
, cohens_f()
for comparing two models,
parameters::dominance_analysis()
and
parameters::standardize_parameters()
.
data("hardlyworking") m <- lm(salary ~ factor(n_comps) + xtra_hours * seniority, data = hardlyworking) r2_semipartial(m) r2_semipartial(m, type = "parameters") # Compare to `eta_squared()` # -------------------------- npk.aov <- lm(yield ~ N + P + K, npk) # When predictors are orthogonal, # eta_squared(partial = FALSE) gives the same effect size: performance::check_collinearity(npk.aov) eta_squared(npk.aov, partial = FALSE) r2_semipartial(npk.aov) # Compare to `dominance_analysis()` # --------------------------------- m_full <- lm(salary ~ ., data = hardlyworking) r2_semipartial(m_full) # Compare to last column of "Conditional Dominance Statistics": parameters::dominance_analysis(m_full)
data("hardlyworking") m <- lm(salary ~ factor(n_comps) + xtra_hours * seniority, data = hardlyworking) r2_semipartial(m) r2_semipartial(m, type = "parameters") # Compare to `eta_squared()` # -------------------------- npk.aov <- lm(yield ~ N + P + K, npk) # When predictors are orthogonal, # eta_squared(partial = FALSE) gives the same effect size: performance::check_collinearity(npk.aov) eta_squared(npk.aov, partial = FALSE) r2_semipartial(npk.aov) # Compare to `dominance_analysis()` # --------------------------------- m_full <- lm(salary ~ ., data = hardlyworking) r2_semipartial(m_full) # Compare to last column of "Conditional Dominance Statistics": parameters::dominance_analysis(m_full)
Compute the rank-biserial correlation () and Cliff's delta
(
) effect sizes for non-parametric
(rank sum) differences. These effect sizes of dominance are closely related
to the Common Language Effect Sizes. Pair with any reported
stats::wilcox.test()
.
rank_biserial( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cliffs_delta( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
rank_biserial( x, y = NULL, data = NULL, mu = 0, paired = FALSE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) cliffs_delta( x, y = NULL, data = NULL, mu = 0, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y
|
A numeric or ordered vector, or a character name of one in |
data |
An optional data frame containing the variables. |
mu |
a number indicating the value around which (a-)symmetry (for one-sample or paired samples) or shift (for independent samples) is to be estimated. See stats::wilcox.test. |
paired |
If |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
The rank-biserial correlation is appropriate for non-parametric tests of
differences - both for the one sample or paired samples case, that would
normally be tested with Wilcoxon's Signed Rank Test (giving the
matched-pairs rank-biserial correlation) and for two independent samples
case, that would normally be tested with Mann-Whitney's U Test (giving
Glass' rank-biserial correlation). See stats::wilcox.test. In both
cases, the correlation represents the difference between the proportion of
favorable and unfavorable pairs / signed ranks (Kerby, 2014). Values range
from -1
complete dominance of the second sample (all values of the second
sample are larger than all the values of the first sample) to +1
complete
dominance of the fist sample (all values of the second sample are smaller
than all the values of the first sample).
Cliff's delta is an alias to the rank-biserial correlation in the two sample case.
A data frame with the effect size r_rank_biserial
and its CI
(CI_low
and CI_high
).
When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. This results in an effect size of reduced magnitude. A correction has been applied for Kendall's W.
Confidence intervals for the rank-biserial correlation (and Cliff's delta) are estimated using the normal approximation (via Fisher's transformation).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Cureton, E. E. (1956). Rank-biserial correlation. Psychometrika, 21(3), 287-290.
Glass, G. V. (1965). A ranking variable analogue of biserial correlation: Implications for short-cut item analysis. Journal of Educational Measurement, 2(1), 91-95.
Kerby, D. S. (2014). The simple difference formula: An approach to teaching nonparametric correlation. Comprehensive Psychology, 3, 11-IT.
King, B. M., & Minium, E. W. (2008). Statistical reasoning in the behavioral sciences. John Wiley & Sons Inc.
Cliff, N. (1993). Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological bulletin, 114(3), 494.
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
repeated_measures_d()
Other rank-based effect sizes:
p_superiority()
,
rank_epsilon_squared()
data(mtcars) mtcars$am <- factor(mtcars$am) mtcars$cyl <- factor(mtcars$cyl) # Two Independent Samples ---------- (rb <- rank_biserial(mpg ~ am, data = mtcars)) # Same as: # rank_biserial("mpg", "am", data = mtcars) # rank_biserial(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # cliffs_delta(mpg ~ am, data = mtcars) # More options: rank_biserial(mpg ~ am, data = mtcars, mu = -5) print(rb, append_CLES = TRUE) # One Sample ---------- # from help("wilcox.test") x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) depression <- data.frame(first = x, second = y, change = y - x) rank_biserial(change ~ 1, data = depression) # same as: # rank_biserial("change", data = depression) # rank_biserial(mtcars$wt) # More options: rank_biserial(change ~ 1, data = depression, mu = -0.5) # Paired Samples ---------- (rb <- rank_biserial(Pair(first, second) ~ 1, data = depression)) # same as: # rank_biserial(depression$first, depression$second, paired = TRUE) interpret_rank_biserial(0.78) interpret(rb, rules = "funder2019")
data(mtcars) mtcars$am <- factor(mtcars$am) mtcars$cyl <- factor(mtcars$cyl) # Two Independent Samples ---------- (rb <- rank_biserial(mpg ~ am, data = mtcars)) # Same as: # rank_biserial("mpg", "am", data = mtcars) # rank_biserial(mtcars$mpg[mtcars$am=="0"], mtcars$mpg[mtcars$am=="1"]) # cliffs_delta(mpg ~ am, data = mtcars) # More options: rank_biserial(mpg ~ am, data = mtcars, mu = -5) print(rb, append_CLES = TRUE) # One Sample ---------- # from help("wilcox.test") x <- c(1.83, 0.50, 1.62, 2.48, 1.68, 1.88, 1.55, 3.06, 1.30) y <- c(0.878, 0.647, 0.598, 2.05, 1.06, 1.29, 1.06, 3.14, 1.29) depression <- data.frame(first = x, second = y, change = y - x) rank_biserial(change ~ 1, data = depression) # same as: # rank_biserial("change", data = depression) # rank_biserial(mtcars$wt) # More options: rank_biserial(change ~ 1, data = depression, mu = -0.5) # Paired Samples ---------- (rb <- rank_biserial(Pair(first, second) ~ 1, data = depression)) # same as: # rank_biserial(depression$first, depression$second, paired = TRUE) interpret_rank_biserial(0.78) interpret(rb, rules = "funder2019")
Compute rank epsilon squared () or rank eta squared
(
) (to accompany
stats::kruskal.test()
), and Kendall's W
(to accompany stats::friedman.test()
) effect sizes for non-parametric (rank
sum) one-way ANOVAs.
rank_epsilon_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) rank_eta_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) kendalls_w( x, groups, blocks, data = NULL, blocks_on_rows = TRUE, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... )
rank_epsilon_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) rank_eta_squared( x, groups, data = NULL, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... ) kendalls_w( x, groups, blocks, data = NULL, blocks_on_rows = TRUE, ci = 0.95, alternative = "greater", iterations = 200, verbose = TRUE, ... )
x |
Can be one of:
|
groups , blocks
|
A factor vector giving the group / block for the
corresponding elements of |
data |
An optional data frame containing the variables. |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
iterations |
The number of bootstrap replicates for computing confidence
intervals. Only applies when |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
blocks_on_rows |
Are blocks on rows ( |
The rank epsilon squared and rank eta squared are appropriate for
non-parametric tests of differences between 2 or more samples (a rank based
ANOVA). See stats::kruskal.test. Values range from 0 to 1, with larger
values indicating larger differences between groups.
Kendall's W is appropriate for non-parametric tests of differences between
2 or more dependent samples (a rank based rmANOVA), where each group
(e.g.,
experimental condition) was measured for each block
(e.g., subject). This
measure is also common as a measure of reliability of the rankings of the
groups
between raters (blocks
). See stats::friedman.test. Values range
from 0 to 1, with larger values indicating larger differences between groups
/ higher agreement between raters.
A data frame with the effect size and its CI.
Confidence intervals for ,
, and Kendall's W are
estimated using the bootstrap method (using the
{boot}
package).
When tied values occur, they are each given the average of the ranks that would have been given had no ties occurred. This results in an effect size of reduced magnitude. A correction has been applied for Kendall's W.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
Some effect sizes are directionless–they do have a minimum value that would
be interpreted as "no effect", but they cannot cross it. For example, a null
value of Kendall's W is 0, indicating no difference between
groups, but it can never have a negative value. Same goes for
U2 and Overlap: the null value of is
0.5, but it can never be smaller than 0.5; am Overlap of 1 means "full
overlap" (no difference), but it cannot be larger than 1.
When bootstrapping CIs for such effect sizes, the bounds of the CIs will
never cross (and often will never cover) the null. Therefore, these CIs
should not be used for statistical inference.
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Kendall, M.G. (1948) Rank correlation methods. London: Griffin.
Tomczak, M., & Tomczak, E. (2014). The need to report effect size estimates revisited. An overview of some recommended measures of effect size. Trends in sport sciences, 1(21), 19-25.
Other rank-based effect sizes:
p_superiority()
,
rank_biserial()
Other effect sizes for ANOVAs:
eta_squared()
# Rank Eta/Epsilon Squared # ======================== rank_eta_squared(mpg ~ cyl, data = mtcars) rank_epsilon_squared(mpg ~ cyl, data = mtcars) # Kendall's W # =========== dat <- data.frame( cond = c("A", "B", "A", "B", "A", "B"), ID = c("L", "L", "M", "M", "H", "H"), y = c(44.56, 28.22, 24, 28.78, 24.56, 18.78) ) (W <- kendalls_w(y ~ cond | ID, data = dat, verbose = FALSE)) interpret_kendalls_w(0.11) interpret(W, rules = "landis1977")
# Rank Eta/Epsilon Squared # ======================== rank_eta_squared(mpg ~ cyl, data = mtcars) rank_epsilon_squared(mpg ~ cyl, data = mtcars) # Kendall's W # =========== dat <- data.frame( cond = c("A", "B", "A", "B", "A", "B"), ID = c("L", "L", "M", "M", "H", "H"), y = c(44.56, 28.22, 24, 28.78, 24.56, 18.78) ) (W <- kendalls_w(y ~ cond | ID, data = dat, verbose = FALSE)) interpret_kendalls_w(0.11) interpret(W, rules = "landis1977")
Fictional Results from a Workers' Randomized Control Trial
A 2-by-2 table, with a column for each group and a row for the diagnosis.
data("RCT_table") RCT_table #> Group #> Diagnosis Treatment Control #> Sick 71 30 #> Recovered 50 100
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
Compute effect size indices for standardized mean differences in repeated
measures data. Pair with any reported stats::t.test(paired = TRUE)
.
In a repeated-measures design, the same subjects are measured in multiple
conditions or time points. Unlike the case of independent groups, there are
multiple sources of variation that can be used to standardized the
differences between the means of the conditions / times.
repeated_measures_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) rm_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
repeated_measures_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... ) rm_d( x, y, data = NULL, mu = 0, method = c("rm", "av", "z", "b", "d", "r"), adjust = TRUE, ci = 0.95, alternative = "two.sided", verbose = TRUE, ... )
x , y
|
Paired numeric vectors, or names of ones in
|
data |
An optional data frame containing the variables. |
mu |
a number indicating the true value of the mean (or difference in means if you are performing a two sample test). |
method |
Method of repeated measures standardized differences. See details. |
adjust |
Apply Hedges' small-sample bias correction? See |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
A data frame with the effect size and their CIs (CI_low
and
CI_high
).
Unlike Cohen's d for independent groups, where standardization
naturally is done by the (pooled) population standard deviation (cf. Glass’s
), when measured across two conditions are dependent, there are
many more options for what error term to standardize by. Additionally, some
options allow for data to be replicated (many measurements per condition per
individual), others require a single observation per condition per individual
(aka, paired data; so replications are aggregated).
(It should be noted that all of these have awful and confusing notations.)
Standardize by...
Difference Score Variance: (Requires paired data) - This
is akin to computing difference scores for each individual and then
computing a one-sample Cohen's d (Cohen, 1988, pp. 48; see examples).
Within-Subject Variance: (Requires paired data) - Cohen
suggested adjusting
to estimate the "standard" between-subjects
d by a factor of
, where r is the Pearson correlation
between the paired measures (Cohen, 1988, pp. 48).
Control Variance: (aka Becker's d) (Requires paired
data) - Standardized by the variance of the control condition (or in a pre-
post-treatment setting, the pre-treatment condition). This is akin to Glass'
delta (
glass_delta()
) (Becker, 1988). Note that this is taken here as the
second condition (y
).
Average Variance: (Requires paired data) - Instead of
standardizing by the variance in the of the control (or pre) condition,
Cumming suggests standardizing by the average variance of the two paired
conditions (Cumming, 2013, pp. 291).
All Variance: Just - This is the same as computing a standard
independent-groups Cohen's d (Cohen, 1988). Note that CIs do account for
the dependence, and so are typically more narrow (see examples).
Residual Variance: (Requires data with replications) -
Divide by the pooled variance after all individual differences have been
partialled out (i.e., the residual/level-1 variance in an ANOVA or MLM
setting). In between-subjects designs where each subject contributes a single
response, this is equivalent to classical Cohen’s d. Priors in the
BayesFactor
package are defined on this scale (Rouder et al., 2012).
Note that for paired data, when the two conditions have equal variance,
,
,
are equal to
.
Confidence intervals are estimated using the standard normal parametric method (see Algina & Keselman, 2003; Becker, 1988; Cooper et al., 2009; Hedges & Olkin, 1985; Pustejovsky et al., 2014).
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
rm_d()
is an alias for repeated_measures_d()
.
Algina, J., & Keselman, H. J. (2003). Approximate confidence intervals for effect sizes. Educational and Psychological Measurement, 63(4), 537-553.
Becker, B. J. (1988). Synthesizing standardized mean‐change measures. British Journal of Mathematical and Statistical Psychology, 41(2), 257-278.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Cooper, H., Hedges, L., & Valentine, J. (2009). Handbook of research synthesis and meta-analysis. Russell Sage Foundation, New York.
Cumming, G. (2013). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. Routledge.
Hedges, L. V. & Olkin, I. (1985). Statistical methods for meta-analysis. Orlando, FL: Academic Press.
Pustejovsky, J. E., Hedges, L. V., & Shadish, W. R. (2014). Design-comparable effect sizes in multiple baseline designs: A general modeling framework. Journal of Educational and Behavioral Statistics, 39(5), 368-393.
Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default Bayes factors for ANOVA designs. Journal of mathematical psychology, 56(5), 356-374.
cohens_d()
, and lmeInfo::g_mlm()
and emmeans::effsize()
for
more flexible methods.
Other standardized differences:
cohens_d()
,
mahalanobis_d()
,
means_ratio()
,
p_superiority()
,
rank_biserial()
# Paired data ------- data("sleep") sleep2 <- reshape(sleep, direction = "wide", idvar = "ID", timevar = "group" ) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2) # Same as: # repeated_measures_d(sleep$extra[sleep$group==1], # sleep$extra[sleep$group==2]) # repeated_measures_d(extra ~ group | ID, data = sleep) # More options: repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, mu = -1) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, alternative = "less") # Other methods repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "av") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "b") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "d") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "z", adjust = FALSE) # d_z is the same as Cohen's d for one sample (of individual difference): cohens_d(extra.1 - extra.2 ~ 1, data = sleep2) # Repetition data ----------- data("rouder2016") # For rm, ad, z, b, data is aggregated repeated_measures_d(rt ~ cond | id, data = rouder2016) # same as: rouder2016_wide <- tapply(rouder2016[["rt"]], rouder2016[1:2], mean) repeated_measures_d(rouder2016_wide[, 1], rouder2016_wide[, 2]) # For r or d, data is not aggragated: repeated_measures_d(rt ~ cond | id, data = rouder2016, method = "r") repeated_measures_d(rt ~ cond | id, data = rouder2016, method = "d", adjust = FALSE) # d is the same as Cohen's d for two independent groups: cohens_d(rt ~ cond, data = rouder2016, ci = NULL)
# Paired data ------- data("sleep") sleep2 <- reshape(sleep, direction = "wide", idvar = "ID", timevar = "group" ) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2) # Same as: # repeated_measures_d(sleep$extra[sleep$group==1], # sleep$extra[sleep$group==2]) # repeated_measures_d(extra ~ group | ID, data = sleep) # More options: repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, mu = -1) repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, alternative = "less") # Other methods repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "av") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "b") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "d") repeated_measures_d(Pair(extra.1, extra.2) ~ 1, data = sleep2, method = "z", adjust = FALSE) # d_z is the same as Cohen's d for one sample (of individual difference): cohens_d(extra.1 - extra.2 ~ 1, data = sleep2) # Repetition data ----------- data("rouder2016") # For rm, ad, z, b, data is aggregated repeated_measures_d(rt ~ cond | id, data = rouder2016) # same as: rouder2016_wide <- tapply(rouder2016[["rt"]], rouder2016[1:2], mean) repeated_measures_d(rouder2016_wide[, 1], rouder2016_wide[, 2]) # For r or d, data is not aggragated: repeated_measures_d(rt ~ cond | id, data = rouder2016, method = "r") repeated_measures_d(rt ~ cond | id, data = rouder2016, method = "d", adjust = FALSE) # d is the same as Cohen's d for two independent groups: cohens_d(rt ~ cond, data = rouder2016, ci = NULL)
A dataset "with 25 people each observing 50 trials in 2 conditions",
published as effectSizePuzzler.txt
by Jeff Rouder on March 24, 2016
(http://jeffrouder.blogspot.com/2016/03/the-effect-size-puzzler.html).
The data is used in examples and tests of rm_d()
.
A data frame with 2500 rows and 3 variables:
participant: 1...25
condition: 1,2
response time in seconds
data("rouder2016") head(rouder2016, n = 5) #> id cond rt #> 1 1 1 0.560 #> 2 1 1 0.930 #> 3 1 1 0.795 #> 4 1 1 0.615 #> 5 1 1 1.028
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
screening_test
Create a container for interpretation rules of thumb. Usually used in conjunction with interpret.
rules(values, labels = NULL, name = NULL, right = TRUE) is.rules(x)
rules(values, labels = NULL, name = NULL, right = TRUE) is.rules(x)
values |
Vector of reference values (edges defining categories or critical values). |
labels |
Labels associated with each category. If |
name |
Name of the set of rules (will be printed). |
right |
logical, for threshold-type rules, indicating if the thresholds themselves should be included in the interval to the right (lower values) or in the interval to the left (higher values). |
x |
An arbitrary R object. |
rules(c(0.05), c("significant", "not significant"), right = FALSE) rules(c(0.2, 0.5, 0.8), c("small", "medium", "large")) rules(c("small" = 0.2, "medium" = 0.5), name = "Cohen's Rules")
rules(c(0.05), c("significant", "not significant"), right = FALSE) rules(c(0.2, 0.5, 0.8), c("small", "medium", "large")) rules(c("small" = 0.2, "medium" = 0.5), name = "Cohen's Rules")
A sample (simulated) dataset, used in tests and some examples.
A data frame with 1600 rows and 3 variables:
Ground truth
Results given by the 1st test
Results given by the 2nd test
data("screening_test") head(screening_test, n = 5) #> Diagnosis Test1 Test2 #> 1 Neg "Neg" "Neg" #> 2 Neg "Neg" "Neg" #> 3 Neg "Neg" "Neg" #> 4 Neg "Neg" "Neg" #> 5 Neg "Neg" "Neg"
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
Smoking_FASD
,
food_class
,
hardlyworking
,
rouder2016
The Pooled Standard Deviation is a weighted average of standard deviations for two or more groups, assumed to have equal variance. It represents the common deviation among the groups, around each of their respective means.
sd_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...) mad_pooled(x, y = NULL, data = NULL, constant = 1.4826, verbose = TRUE, ...) cov_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...)
sd_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...) mad_pooled(x, y = NULL, data = NULL, constant = 1.4826, verbose = TRUE, ...) cov_pooled(x, y = NULL, data = NULL, verbose = TRUE, ...)
x , y
|
A numeric vector, or a character name of one in |
data |
An optional data frame containing the variables. |
verbose |
Toggle warnings and messages on or off. |
... |
Arguments passed to or from other methods. When |
constant |
scale factor. |
The standard version is calculated as:
The robust version is calculated as:
Numeric, the pooled standard deviation. For cov_pooled()
a matrix.
sd_pooled(mpg ~ am, data = mtcars) mad_pooled(mtcars$mpg, factor(mtcars$am)) cov_pooled(mpg + hp + cyl ~ am, data = mtcars)
sd_pooled(mpg ~ am, data = mtcars) mad_pooled(mtcars$mpg, factor(mtcars$am)) cov_pooled(mpg + hp + cyl ~ am, data = mtcars)
Fictional data.
A 1-by-3 table, with a column for each diagnosis.
data("Smoking_FASD") Smoking_FASD #> FAS PFAS TD #> 17 11 640
Other effect size datasets:
Music_preferences
,
Music_preferences2
,
RCT_table
,
food_class
,
hardlyworking
,
rouder2016
,
screening_test
These functions are convenience functions to convert t, z and F test
statistics to Cohen's d and partial r. These are useful in cases where
the data required to compute these are not easily available or their
computation is not straightforward (e.g., in liner mixed models, contrasts,
etc.).
See Effect Size from Test Statistics vignette.
t_to_d(t, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) z_to_d(z, n, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) F_to_d( f, df, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ... ) t_to_r(t, df_error, ci = 0.95, alternative = "two.sided", ...) z_to_r(z, n, ci = 0.95, alternative = "two.sided", ...) F_to_r(f, df, df_error, ci = 0.95, alternative = "two.sided", ...)
t_to_d(t, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) z_to_d(z, n, paired = FALSE, ci = 0.95, alternative = "two.sided", ...) F_to_d( f, df, df_error, paired = FALSE, ci = 0.95, alternative = "two.sided", ... ) t_to_r(t, df_error, ci = 0.95, alternative = "two.sided", ...) z_to_r(z, n, ci = 0.95, alternative = "two.sided", ...) F_to_r(f, df, df_error, ci = 0.95, alternative = "two.sided", ...)
t , f , z
|
The t, the F or the z statistics. |
paired |
Should the estimate account for the t-value being testing the difference between dependent means? |
ci |
Confidence Interval (CI) level |
alternative |
a character string specifying the alternative hypothesis;
Controls the type of CI returned: |
... |
Arguments passed to or from other methods. |
n |
The number of observations (the sample size). |
df , df_error
|
Degrees of freedom of numerator or of the error estimate (i.e., the residuals). |
These functions use the following formulae to approximate r and d:
The resulting d
effect size is an approximation to Cohen's d, and
assumes two equal group sizes. When possible, it is advised to directly
estimate Cohen's d, with cohens_d()
, emmeans::eff_size()
, or similar
functions.
A data frame with the effect size(s)(r
or d
), and their CIs
(CI_low
and CI_high
).
Unless stated otherwise, confidence (compatibility) intervals (CIs) are
estimated using the noncentrality parameter method (also called the "pivot
method"). This method finds the noncentrality parameter ("ncp") of a
noncentral t, F, or distribution that places the observed
t, F, or
test statistic at the desired probability point of
the distribution. For example, if the observed t statistic is 2.0, with 50
degrees of freedom, for which cumulative noncentral t distribution is t =
2.0 the .025 quantile (answer: the noncentral t distribution with ncp =
.04)? After estimating these confidence bounds on the ncp, they are
converted into the effect size metric to obtain a confidence interval for the
effect size (Steiger, 2004).
For additional details on estimation and troubleshooting, see effectsize_CIs.
"Confidence intervals on measures of effect size convey all the information
in a hypothesis test, and more." (Steiger, 2004). Confidence (compatibility)
intervals and p values are complementary summaries of parameter uncertainty
given the observed data. A dichotomous hypothesis test could be performed
with either a CI or a p value. The 100 (1 - )% confidence
interval contains all of the parameter values for which p >
for the current data and model. For example, a 95% confidence interval
contains all of the values for which p > .05.
Note that a confidence interval including 0 does not indicate that the null
(no effect) is true. Rather, it suggests that the observed data together with
the model and its assumptions combined do not provided clear evidence against
a parameter value of 0 (same as with any other value in the interval), with
the level of this evidence defined by the chosen level (Rafi &
Greenland, 2020; Schweder & Hjort, 2016; Xie & Singh, 2013). To infer no
effect, additional judgments about what parameter values are "close enough"
to 0 to be negligible are needed ("equivalence testing"; Bauer & Kiesser,
1996).
see
The see
package contains relevant plotting functions. See the plotting vignette in the see
package.
Friedman, H. (1982). Simplified determinations of statistical power, magnitude of effect and research sample sizes. Educational and Psychological Measurement, 42(2), 521-526. doi:10.1177/001316448204200214
Wolf, F. M. (1986). Meta-analysis: Quantitative methods for research synthesis (Vol. 59). Sage.
Rosenthal, R. (1994) Parametric measures of effect size. In H. Cooper and L.V. Hedges (Eds.). The handbook of research synthesis. New York: Russell Sage Foundation.
Steiger, J. H. (2004). Beyond the F test: Effect size confidence intervals and tests of close fit in the analysis of variance and contrast analysis. Psychological Methods, 9, 164-182.
Cumming, G., & Finch, S. (2001). A primer on the understanding, use, and calculation of confidence intervals that are based on central and noncentral distributions. Educational and Psychological Measurement, 61(4), 532-574.
Other effect size from test statistic:
F_to_eta2()
,
chisq_to_phi()
## t Tests res <- t.test(1:10, y = c(7:20), var.equal = TRUE) t_to_d(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "less") res <- with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE)) t_to_d(t = res$statistic, res$parameter, paired = TRUE) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "greater") ## Linear Regression model <- lm(rating ~ complaints + critical, data = attitude) (param_tab <- parameters::model_parameters(model)) (rs <- t_to_r(param_tab$t[2:3], param_tab$df_error[2:3])) # How does this compare to actual partial correlations? correlation::correlation(attitude, select = "rating", select2 = c("complaints", "critical"), partial = TRUE )
## t Tests res <- t.test(1:10, y = c(7:20), var.equal = TRUE) t_to_d(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "less") res <- with(sleep, t.test(extra[group == 1], extra[group == 2], paired = TRUE)) t_to_d(t = res$statistic, res$parameter, paired = TRUE) t_to_r(t = res$statistic, res$parameter) t_to_r(t = res$statistic, res$parameter, alternative = "greater") ## Linear Regression model <- lm(rating ~ complaints + critical, data = attitude) (param_tab <- parameters::model_parameters(model)) (rs <- t_to_r(param_tab$t[2:3], param_tab$df_error[2:3])) # How does this compare to actual partial correlations? correlation::correlation(attitude, select = "rating", select2 = c("complaints", "critical"), partial = TRUE )
Enables a conversion between different indices of effect size, such as Cohen's w to פ (Fei), and Cramer's V to Tschuprow's T.
w_to_fei(w, p) w_to_v(w, nrow, ncol) w_to_t(w, nrow, ncol) w_to_c(w) fei_to_w(fei, p) v_to_w(v, nrow, ncol) t_to_w(t, nrow, ncol) c_to_w(c) v_to_t(v, nrow, ncol) t_to_v(t, nrow, ncol)
w_to_fei(w, p) w_to_v(w, nrow, ncol) w_to_t(w, nrow, ncol) w_to_c(w) fei_to_w(fei, p) v_to_w(v, nrow, ncol) t_to_w(t, nrow, ncol) c_to_w(c) v_to_t(v, nrow, ncol) t_to_v(t, nrow, ncol)
w , c , v , t , fei
|
Effect size to be converted |
p |
Vector of expected values. See |
nrow , ncol
|
The number of rows/columns in the contingency table. |
Ben-Shachar, M.S., Patil, I., Thériault, R., Wiernik, B.M., Lüdecke, D. (2023). Phi, Fei, Fo, Fum: Effect Sizes for Categorical Data That Use the Chi‑Squared Statistic. Mathematics, 11, 1982. doi:10.3390/math11091982
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). New York: Routledge.
Other convert between effect sizes:
d_to_r()
,
diff_to_cles
,
eta2_to_f2()
,
odds_to_probs()
,
oddsratio_to_riskratio()
library(effectsize) ## 2D tables ## --------- data("Music_preferences2") Music_preferences2 cramers_v(Music_preferences2, adjust = FALSE) v_to_t(0.80, 3, 4) tschuprows_t(Music_preferences2) ## Goodness of fit ## --------------- data("Smoking_FASD") Smoking_FASD cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) w_to_fei(0.11, p = c(0.015, 0.010, 0.975)) fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) ## Power analysis ## -------------- # See https://osf.io/cg64s/ p0 <- c(0.35, 0.65) Fei <- 0.3 pwr::pwr.chisq.test( w = fei_to_w(Fei, p = p0), df = length(p0) - 1, sig.level = 0.01, power = 0.85 )
library(effectsize) ## 2D tables ## --------- data("Music_preferences2") Music_preferences2 cramers_v(Music_preferences2, adjust = FALSE) v_to_t(0.80, 3, 4) tschuprows_t(Music_preferences2) ## Goodness of fit ## --------------- data("Smoking_FASD") Smoking_FASD cohens_w(Smoking_FASD, p = c(0.015, 0.010, 0.975)) w_to_fei(0.11, p = c(0.015, 0.010, 0.975)) fei(Smoking_FASD, p = c(0.015, 0.010, 0.975)) ## Power analysis ## -------------- # See https://osf.io/cg64s/ p0 <- c(0.35, 0.65) Fei <- 0.3 pwr::pwr.chisq.test( w = fei_to_w(Fei, p = p0), df = length(p0) - 1, sig.level = 0.01, power = 0.85 )