Title: | Linear Regression Stability to Significance Reversal |
---|---|
Description: | Tests linear regressions for significance reversal through leave-one(multiple)-out. |
Authors: | Andrej-Nikolai Spiess [aut, cre], Michal Burdukiewicz [aut], Stefan Roediger [aut] |
Maintainer: | Andrej-Nikolai Spiess <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2 |
Built: | 2025-02-04 05:57:42 UTC |
Source: | https://github.com/anspiess/reverser |
Nonparametric and parametric bootstrap (sampling cases, residuals or distributions with replacement) method for parameter estimation and confidence interval of a linear model.
bootLM(model, type = c("cases", "residuals", "residuals2", "parametric"), R = 10000, alpha = 0.05, ret.models = FALSE)
bootLM(model, type = c("cases", "residuals", "residuals2", "parametric"), R = 10000, alpha = 0.05, ret.models = FALSE)
model |
an |
type |
what to bootstrap. See "Details". |
R |
number of bootstrap samples. |
alpha |
the |
ret.models |
logical. If |
If type = "cases"
, for all () datapoints, linear models are created by sampling
R
times - with replacement - from and building models
. This is also known as the .632-bootstrap, because the samples will, on average, contain
unique elements.
If
type = "residuals"
, for all residuals (), linear models are created by sampling
R
times - with replacement - from and building models
. If
type = "residuals2"
is selected, scaled and centered residuals according to Davison & Hinkley are used. In the
"parametric"
bootstrap, values drawn from a normal distribution
, where
, are added to the fitted values, and linear models are created
.
Parameter estimates are obtained from each sampling, from which the average
and standard error
is calculated as well as a quantile based confidence interval. p-values are calculated through inversion of the confidence interval.
A dataframe containing the estimated coefficients, their standard error, lower an upper confidence values and p-values. If ret.models = TRUE
a list with all R
models is returned.
Andrej-Nikolai Spiess
An Introduction to the Bootstrap.
Efron B, Tibshirani R.
Chapman & Hall (1993).
The Bootstrap and Edgeworth Expansion.
Hall P.
Springer, New York (1992).
Modern Statistics with R.
Thulin M.
Eos Chasma Press, Uppsala (2021).
Bootstrap methods and their application.
Davison AC, Hinkley DV.
Cambridge University Press (1997).
## Example with single influencer (#18) and insignificant model (p = 0.115), ## using case bootstrap. set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM <- lm(b ~ a) bootLM(LM, R = 100) ## using residuals bootstrap. bootLM(LM, R = 100, type = "residuals")
## Example with single influencer (#18) and insignificant model (p = 0.115), ## using case bootstrap. set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM <- lm(b ~ a) bootLM(LM, R = 100) ## using residuals bootstrap. bootLM(LM, R = 100, type = "residuals")
Two different plot types that visualize p-value influencers.
1. inflPlot
: plots the linear regression, marks the reverser(s) in darkred and displays trend lines for the full and leave-reversers-out data set (black and darkred, respectively).
2. pvalPlot
: plots the p-values for each leave-one-out data point and displays the (log) p-values as an index plot with reverser points in darkred, together with the -border as defined in
lmInfl
and the original models' p-value.
inflPlot(infl, measure, ...) pvalPlot(infl, ...)
inflPlot(infl, measure, ...) pvalPlot(infl, ...)
infl |
an object obtained from |
measure |
which influence measure to use, see 'Details'. |
... |
other plotting parameters. |
The influence measure
's to use are those listed in lmInfl
, with the following syntax:"dfb.Slope", "dffit", "cov.r", "cook.d", "hat", "sR", "hadi", "cdr", "Si"
.
The corresponding plot.
Andrej-Nikolai Spiess
Regression diagnostics: Identifying influential data and sources of collinearity.
Belsley DA, Kuh E, Welsch RE.
John Wiley, New York (2004).
Applied Regression Analysis: A Research Tool.
Rawlings JO, Pantula SG, Dickey DA.
Springer; 2nd Corrected ed. 1998. Corr. 2nd printing 2001.
Applied Regression Analysis and Generalized Linear Models.
Fox J.
SAGE Publishing, 3rd ed, 2016.
Residuals and Influence in Regression.
Cook RD & Weisberg S.
Chapman & Hall, 1st ed, New York, USA (1982).
set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM1 <- lm(b ~ a) res1 <- lmInfl(LM1) inflPlot(res1) pvalPlot(res1)
set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM1 <- lm(b ~ a) res1 <- lmInfl(LM1) inflPlot(res1) pvalPlot(res1)
Jackknife (Leave-One-Out) method for parameter estimation and confidence interval of a linear model, according to Quenouille (1956).
jackLM(model, alpha = 0.05)
jackLM(model, alpha = 0.05)
model |
an |
alpha |
the |
For all () datapoints, a linear model is created by leaving out each entry successively,
. Pseudovalues from obtained and original coefficients are then created,
, from which the average
and standard error
is calculated to obtain the classical confidence interval
.
A dataframe containg the estimated coefficients, their standard error, lower an upper confidence values and p-values.
Andrej-Nikolai Spiess
Notes on bias in estimation.
Quenouille MH.
Biometrika, 43, 1956, 353-36l.
## Example with single influencer (#18) and insignificant model (p = 0.115). ## Jackknife estimates are robust w.r.t. outlier #18. set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM1 <- lm(b ~ a) jackLM(LM1)
## Example with single influencer (#18) and insignificant model (p = 0.115). ## Jackknife estimates are robust w.r.t. outlier #18. set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM1 <- lm(b ~ a) jackLM(LM1)
Takes self-supplied x/y values or x/random values and transforms these as to deliver linear regressions (with potential replicates) with either
1) exact slope and intercept
,
2) exact p-value and intercept , or
3) exact and intercept
.
Intended for testing and education, not for cheating ! ;-)
lmExact(x = 1:20, y = NULL, ny = 1, intercept = 0, slope = 0.1, error = 0.1, seed = NULL, pval = NULL, rsq = NULL, plot = TRUE, verbose = FALSE, ...)
lmExact(x = 1:20, y = NULL, ny = 1, intercept = 0, slope = 0.1, error = 0.1, seed = NULL, pval = NULL, rsq = NULL, plot = TRUE, verbose = FALSE, ...)
x |
the predictor values. |
y |
|
ny |
the number of replicate response values per predictor value. |
intercept |
the desired intercept |
slope |
the desired slope |
error |
if a single value, the standard deviation |
seed |
optional. The random generator seed for reproducibility. |
pval |
the desired p-value of the slope. |
rsq |
the desired |
plot |
logical. If |
verbose |
logical. If |
... |
For case 1), the error
values are added to the exact values, the linear model
is fit, and the residuals
are re-added to
.
For case 2), the same as in 1) is conducted, however the slope delivering the desired p-value is found by an optimizing algorithm.
Finally, for case 3), a QR reconstruction, rescaling and refitting is conducted, using the code found under 'References'.
If y
is supplied, changes in slope, intercept and p-value will deliver the sames residuals as the linear regression through x
and y
. A different will change the response value structure, however.
A list with the following items:
lm |
the linear model of class |
x |
the predictor values. |
y |
the (random) response values. |
summary |
the model summary for quick checking of obtained parameters. |
Using both x
and y
will give a linear regression with the desired parameter values when refitted.
Andrej-Nikolai Spiess
For method 3):
http://stats.stackexchange.com/questions/15011/generate-a-random-variable-with-a-defined-correlation-to-an-existing-variable.
## No replicates, intercept = 3, slope = 0.2, sigma = 2, n = 20. res1 <- lmExact(x = 1:20, ny = 1, intercept = 3, slope = 2, error = 2) ## Same as above, but with 3 replicates, sigma = 1, n = 20. res2 <- lmExact(x = 1:20, ny = 3, intercept = 3, slope = 2, error = 1) ## No replicates, intercept = 2 and p-value = 0.025, sigma = 3, n = 50. ## => slope = 0.063 res3 <- lmExact(x = 1:50, ny = 1, intercept = 2, pval = 0.025, error = 3) ## 5 replicates, intercept = 1, R-square = 0.85, sigma = 2, n = 10. ## => slope = 0.117 res4 <- lmExact(x = 1:10, ny = 5, intercept = 1, rsq = 0.85, error = 2) ## Heteroscedastic (magnitude-dependent) noise. error <- sapply(1:20, function(x) rnorm(3, 0, x/10)) res5 <- lmExact(x = 1:20, ny = 3, intercept = 1, slope = 0.2, error = error) ## Supply own x/y values, residuals are similar to an ## initial linear regression. X <- c(1.05, 3, 5.2, 7.5, 10.2, 11.7) set.seed(123) Y <- 0.5 + 2 * X + rnorm(6, 0, 2) res6 <- lmExact(x = X, y = Y, intercept = 1, slope = 0.2) all.equal(residuals(lm(Y ~ X)), residuals(res6$lm))
## No replicates, intercept = 3, slope = 0.2, sigma = 2, n = 20. res1 <- lmExact(x = 1:20, ny = 1, intercept = 3, slope = 2, error = 2) ## Same as above, but with 3 replicates, sigma = 1, n = 20. res2 <- lmExact(x = 1:20, ny = 3, intercept = 3, slope = 2, error = 1) ## No replicates, intercept = 2 and p-value = 0.025, sigma = 3, n = 50. ## => slope = 0.063 res3 <- lmExact(x = 1:50, ny = 1, intercept = 2, pval = 0.025, error = 3) ## 5 replicates, intercept = 1, R-square = 0.85, sigma = 2, n = 10. ## => slope = 0.117 res4 <- lmExact(x = 1:10, ny = 5, intercept = 1, rsq = 0.85, error = 2) ## Heteroscedastic (magnitude-dependent) noise. error <- sapply(1:20, function(x) rnorm(3, 0, x/10)) res5 <- lmExact(x = 1:20, ny = 3, intercept = 1, slope = 0.2, error = error) ## Supply own x/y values, residuals are similar to an ## initial linear regression. X <- c(1.05, 3, 5.2, 7.5, 10.2, 11.7) set.seed(123) Y <- 0.5 + 2 * X + rnorm(6, 0, 2) res6 <- lmExact(x = X, y = Y, intercept = 1, slope = 0.2) all.equal(residuals(lm(Y ~ X)), residuals(res6$lm))
This function calculates leave-one-out (LOO) p-values for all data points and identifies those resulting in "significance reversal", i.e. in the p-value of the model's slope traversing the user-defined -level. It also extends the classical influence measures from
influence.measures
with a few newer ones (e.g, 'Hadi's measure', 'Coefficient of determination ratio' and 'Pena's Si') within an output format where each outlier is marked when exceeding the measure's specific threshold, as defined in the literature. Belsley, Kuh & Welsch's dfstat criterion is also included.
lmInfl(model, alpha = 0.05, cutoff = c("BKW", "R"), verbose = TRUE, ...)
lmInfl(model, alpha = 0.05, cutoff = c("BKW", "R"), verbose = TRUE, ...)
model |
the linear model of class |
alpha |
the |
cutoff |
use the cutoff-values from |
verbose |
logical. If |
... |
other arguments to |
The algorithm
1) calculates the p-value of the full model (all points),
2) calculates a LOO-p-value for each point removed,
3) checks for significance reversal in all data points and
4) returns all models as well as classical influence.measures
with LOO-p-values, p-values, slopes and standard errors attached.
The idea of p-value influencers was first introduced by Belsley, Kuh & Welsch, and described as an influence measure pertaining directly to the change in t-statistics, that will "show whether the conclusions of hypothesis testing would be affected", termed dfstat in [1, 2, 3] or dfstud in [4]:
where is the j-th estimate, s is the residual standard error, X is the design matrix and (i) denotes the i-th observation deleted.
dfstat, which for the regression's slope is the difference of t-statistics
is inextricably linked to the changes in p-value , calculated from
where is the Student's t cumulative distribution function with
degrees of freedom, and where significance reversal is attained when
.
Interestingly, the seemingly mandatory check of the influence of single data points on statistical inference is living in oblivion: apart from [1-4], there is, to the best of our knowledge, no reference to dfstat or
in current literature on influence measures.
Cut-off values for the different influence measures are per default (cutoff = "BKW"
) those defined in Belsley, Kuh & Welsch (1980) and additional literature.
dfbeta slope: (page 28)
dffits: (page 28)
covratio: (page 23)
Cook's D: (Cook & Weisberg, 1982)
leverage: (page 17)
studentized residual: (page 20)
If (cutoff = "R"
), the criteria from influence.measures
are employed:
dfbeta slope:
dffits:
covratio:
Cook's D:
leverage:
The influence output also includes the following more "recent" measures:
Hadi's measure (column "hadi"):
where are the diagonals of the hat matrix (leverages),
in univariate linear regression and
, and threshold value
.
Coefficient of Determination Ratio (column "cdr"):
with being the coefficient of determination without value i, and threshold
Pena's Si (column "Si"):
where is the vector of each fitted value from the original model,
, subtracted with all fitted values after 1-deletion,
,
= number of parameters, and
,
,
being the residuals. In this package, a cutoff value of 0.9 is used, as the published criterion of
seemed too conservative. Results from this function were verified by Prof. Daniel Pena through personal communication.
A list with the following items:
origModel |
the original model with all data points. |
finalModels |
a list of final models with the influencer(s) removed. |
infl |
a matrix with the original data, classical |
raw |
same as |
sel |
a vector with the influencers' indices. |
alpha |
the selected |
origP |
the original model's p-value. |
Andrej-Nikolai Spiess
For dfstat / dfstud :
Regression diagnostics: Identifying influential data and sources of collinearity.
Belsley DA, Kuh E, Welsch RE.
John Wiley, New York, USA (2004).
Econometrics, 5ed.
Baltagi B.
Springer-Verlag Berlin, Germany (2011).
Growth regressions and what the textbooks don't tell you.
Temple J.
Bull Econom Res, 52, 2000, 181-205.
Robust Regression and Outlier Detection.
Rousseeuw PJ & Leroy AM.
John Wiley & Sons, New York, NY (1987).
Hadi's measure:
A new measure of overall potential influence in linear regression.
Hadi AS.
Comp Stat & Data Anal, 14, 1992, 1-27.
Coefficient of determination ratio:
On the detection of influential outliers in linear regression analysis.
Zakaria A, Howard NK, Nkansah BK.
Am J Theor Appl Stat, 3, 2014, 100-106.
On the Coefficient of Determination Ratio for Detecting Influential Outliers in Linear Regression Analysis.
Zakaria A, Gordor BK, Nkansah BK.
Am J Theor Appl Stat, 11, 2022, 27-35.
Pena's measure:
A New Statistic for Influence in Linear Regression.
Pena D.
Technometrics, 47, 2005, 1-12.
## Example #1 with single influencer and significant model (p = 0.0089). ## Removal of #21 results in p = 0.115! set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) a <- c(a, 25); b <- c(b, 10) LM1 <- lm(b ~ a) lmInfl(LM1) ## Example #2 with single influencer and insignificant model (p = 0.115). ## Removal of #18 results in p = 0.0227! set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM2 <- lm(b ~ a) lmInfl(LM2) ## Example #3 with multiple influencers and significant model (p = 0.0269). ## Removal of #2, #17, #18 or #20 results in crossing p = 0.05! set.seed(125) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM3 <- lm(b ~ a) lmInfl(LM3) ## Large Example #4 with top 10 influencers and significant model (p = 6.72E-8). ## Not possible to achieve a crossing of alpha with any point despite strong noise. set.seed(123) a <- 1:100 b <- 5 + 0.08 * a + rnorm(100, 0, 5) LM4 <- lm(b ~ a) lmInfl(LM4)
## Example #1 with single influencer and significant model (p = 0.0089). ## Removal of #21 results in p = 0.115! set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) a <- c(a, 25); b <- c(b, 10) LM1 <- lm(b ~ a) lmInfl(LM1) ## Example #2 with single influencer and insignificant model (p = 0.115). ## Removal of #18 results in p = 0.0227! set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM2 <- lm(b ~ a) lmInfl(LM2) ## Example #3 with multiple influencers and significant model (p = 0.0269). ## Removal of #2, #17, #18 or #20 results in crossing p = 0.05! set.seed(125) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) LM3 <- lm(b ~ a) lmInfl(LM3) ## Large Example #4 with top 10 influencers and significant model (p = 6.72E-8). ## Not possible to achieve a crossing of alpha with any point despite strong noise. set.seed(123) a <- 1:100 b <- 5 + 0.08 * a + rnorm(100, 0, 5) LM4 <- lm(b ~ a) lmInfl(LM4)
This function calculates p-values from a variety of methods, specifically:
1) standard linear model
2) standard linear model with highest p-influencer removed
3) robust regression with MM-estimators
4) Theil-Sen regression
5) least absolute deviations regression
6) quantile regression
7) weighted regression with isolation forest scores as inverse weights
8) bootstrap linear model, see bootLM
9) jackknife linear model, see jackLM
pcomp(x, y = NULL, R = 1000, alpha = 0.05, ...)
pcomp(x, y = NULL, R = 1000, alpha = 0.05, ...)
x |
either a linear model of class |
y |
the optional y-values. |
R |
the number of bootstrap resamples, see |
alpha |
the |
... |
further arguments to be passed to downstream methods. |
This function is meant to provide a swift overview on the sensitivity of the p-values to different (mostly robust) linear regression methods, which correlates to a large extent with the presence of influential / outlying data points, see 'Examples'.
A vector of p-values from the above mentioned ten methods, in that order.
Andrej-Nikolai Spiess
Robust Regression and Outlier Detection.
Rousseeuw PJ & Leroy AM.
1ed (1987), Wiley (NJ, USA).
A rank-invariant method of linear and polynomial regression analysis.
Theil H.
I. Nederl. Akad. Wetensch. Proc, 53, 1950, 386-392.
Estimates of the regression coefficient based on Kendall's tau.
Sen PK.
J Am Stat Assoc, 63, 1968, 1379-1389.
Least absolute deviations estimation via the EM algorithm.
Phillips RF.
Statistics and Computing, 12, 2002, 281-285.
Quantile Regression.
Koenker R.
Cambridge University Press, Cambridge, New York (2005).
Isolation-based anomaly detection.
Liu FT, Ting KM, Zhou ZH.
ACM Transactions on Knowledge Discovery from Data, 6.1, 2012, 3.
## Example with influencer ## => a few methods indicate significant ## downward drop of the p-value set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) pcomp(a, b)
## Example with influencer ## => a few methods indicate significant ## downward drop of the p-value set.seed(123) a <- 1:20 b <- 5 + 0.08 * a + rnorm(20, 0, 1) pcomp(a, b)
The data was acquired by digitization of a graph from a 2015 PNAS paper. Contains three datapoints that exert significance reversal.
data(PNAS2015)
data(PNAS2015)
Andrej-Nikolai Spiess
## See examples in 'lmInfl' and 'lmThresh'. LM <- lm(y ~ x, data = PNAS2015) lmInfl(LM)
## See examples in 'lmInfl' and 'lmThresh'. LM <- lm(y ~ x, data = PNAS2015) lmInfl(LM)
Identifies regions of an (univariate) linear model in which a future data point would result in either
a) significance reversal, or
b) any selected influence measure as given in crit
exceed its threshold value.
This is intended mainly for visual/didactical purposes.
regionInfl(model, div.x = 20, div.y = 20, grid = TRUE, pred.int = TRUE, crit = c("P", "dfb.Slope", "dffit", "cov.r", "cook.d", "hat", "hadi", "sR", "cdr", "Si"), cex.grid = 0.5, alpha = 0.05, xlim = NULL, ylim = NULL, ...)
regionInfl(model, div.x = 20, div.y = 20, grid = TRUE, pred.int = TRUE, crit = c("P", "dfb.Slope", "dffit", "cov.r", "cook.d", "hat", "hadi", "sR", "cdr", "Si"), cex.grid = 0.5, alpha = 0.05, xlim = NULL, ylim = NULL, ...)
model |
the linear model of class |
div.x |
the number of grid division for the x-axis. |
div.y |
the number of grid division for the y-axis. |
grid |
logical. Show the grid lines on the plot or not. |
pred.int |
logical. Show the 95% prediction interval on the plot or not. |
crit |
the criterion to use. Either |
cex.grid |
size of the grid points. |
alpha |
the |
xlim |
similar to |
ylim |
similar to |
... |
For a given linear model , each
pair from a grid of values
is added to the data, and an updated model
is created. If the updated model's
or any of the influence measures does not exceed its published threshold, it is plotted in green, otherwise in orange. If
outlier = TRUE
, a possible reverser is eliminated prior to analysis but visualized in the plot.
A plot with the regions marked in orange or green, and the grid matrix (grid
) including the criterion outcome in 1 (green) or 0 (orange).
Andrej-Nikolai Spiess
## Model with p = 0.014 set.seed(7) N <- 20 x <- runif(N, 1, 100) y <- 0.05 * x + rnorm(N, 0, 2) LM1 <- lm(y ~ x) summary(LM1) regionInfl(LM1, crit = "P", div.x = 20, div.y = 20, cex.grid = 1, xlim = c(-20, 120), ylim = c(-5, 10))
## Model with p = 0.014 set.seed(7) N <- 20 x <- runif(N, 1, 100) y <- 0.05 * x + rnorm(N, 0, 2) LM1 <- lm(y ~ x) summary(LM1) regionInfl(LM1, crit = "P", div.x = 20, div.y = 20, cex.grid = 1, xlim = c(-20, 120), ylim = c(-5, 10))
This function uses a bootstrap approach to calculate the replication probability of significance, which answers the question "if we repeat this linear regression under identical conditions (similar sample size, similar residual variance), what is the probability of observing significance (or non-significance) similar to the original data?".
rpLM(model, alpha = 0.05, R = 10000, plot = TRUE, verbose = TRUE, ...)
rpLM(model, alpha = 0.05, R = 10000, plot = TRUE, verbose = TRUE, ...)
model |
a linear model of class |
alpha |
the |
R |
the number of bootstrap resamples, see |
plot |
logical. If |
verbose |
logical. If |
... |
other parameters to be supplied to |
The approach here is along the lines of Boos & Stefanski (2011), which investigated the replication probability of the P-value, as opposed to the works of
Goodman (1992), Shao & Chow (2002) and Miller (2009), where the effect size is used. In our context, for a given linear model and using a bootstrap approach, the replication probability is the proportion of bootstrap P-values with when the original model is significant, or
when not. Hence, we employ the bootstrap to assess the sampling variability of the P-value, not the sampling variability of the P-value under
, as is common, thereby preserving the non-null property of the data.
Bootstrap results are obtained from non-parametric cases bootstrapping ("np.cases"), non-parametric residuals bootstrapping ("np.resid") and parametric residuals bootstrapping ("p.resid"), see bootLM
.
A vector with the three different bootstrap results as described above.
Andrej-Nikolai Spiess
Ecological Models and Data in R.
Chapter 5: Stochastic simulation and power analysis.
Benjamin M. Bolker.
Princeton University Press (2008).
P-Value Precision and Reproducibility.
Boos DD & Stefanski LA.
Am Stat, 65, 2011, 213-212.
A comment on replication, p-values and evidence.
Goodman SN.
Stat Med, 11, 1992, 875-879.
Reproducibility probability in clinical trials.
Shao J & Chow SC.
Stat Med, 21, 2002, 1727-1742.
What is the probability of replicating a statistically significant effect?
Miller J.
Psych Bull & Review, 16, 2009, 617-640.
set.seed(125) a <- 1:20 b <- 5 + 0.08 * a + rnorm(length(a), 0, 1) LM1 <- lm(b ~ a) summary(LM1) rpLM(LM1, R = 100)
set.seed(125) a <- 1:20 b <- 5 + 0.08 * a + rnorm(length(a), 0, 1) LM1 <- lm(b ~ a) summary(LM1) rpLM(LM1, R = 100)