| smooth.spline {modreg} | R Documentation |
Fit a Smoothing Spline
Description
Fits a cubic smoothing spline to the supplied data.
Usage
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1,
control.spar = list())
Arguments
x |
a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y. |
y |
responses. If |
w |
optional vector of weights of the same length as |
df |
the desired equivalent number of degrees of freedom (trace of the smoother matrix). |
spar |
smoothing parameter, typically (but not necessarily) in
|
cv |
ordinary ( |
all.knots |
if |
df.offset |
allows the degrees of freedom to be increased by
|
penalty |
the coefficient of the penalty for degrees of freedom in the GCV criterion. |
control.spar |
optional list with named components controlling the
root finding when the smoothing parameter Note that this is partly experimental and may change with general spar computation improvements!
Note that |
Details
The x vector should contain at least four distinct values.
Distinct here means “distinct after rounding to 6 significant
digits”, i.e., x will be transformed to
unique(sort(signif(x, 6))), and y and w are
pooled accordingly.
The computational \lambda used (as a function of
s=spar) is
\lambda = r * 256^{3 s - 1}
where
r = tr(X' W X) / tr(\Sigma),
\Sigma is the matrix given by
\Sigma_{ij} = \int B_i''(t) B_j''(t) dt,
X is given by X_{ij} = B_j(x_i),
W is the diagonal matrix of weights (scaled such that
its trace is n, the original number of observations)
and B_k(.) is the k-th B-spline.
Note that with these definitions, f_i = f(x_i), and the B-spline
basis representation f = X c (i.e. c is
the vector of spline coefficients), the penalized log likelihood is
L = (y - f)' W (y - f) + \lambda c' \Sigma c, and hence
c is the solution of the (ridge regression)
(X' W X + \lambda \Sigma) c = X' W y.
If spar is missing or NULL, the value of df is used to
determine the degree of smoothing. If both are missing, leave-one-out
cross-validation is used to determine \lambda.
Note that from the above relation,
spar is s = s0 + 0.0601 * \bold{\log}\lambda,
which is intentionally different from the S-plus implementation
of smooth.spline (where spar is proportional to
\lambda). In R's (\log \lambda) scale, it makes more
sense to vary spar linearly.
Note however that currently the results may become very unreliable
for spar values smaller than about -1 or -2. The same may
happen for values larger than 2 or so. Don't think of setting
spar or the controls low and high outside such a
safe range, unless you know what you are doing!
The “generalized” cross-validation method will work correctly when
there are duplicated points in x. However, it is ambiguous what
leave-one-out cross-validation means with duplicated points, and the
internal code uses an approximation that involves leaving out groups
of duplicated points. cv=TRUE is best avoided in that case.
Value
An object of class "smooth.spline" with components
x |
the distinct |
y |
the fitted values corresponding to |
w |
the weights used at the unique values of |
yin |
the y values used at the unique |
lev |
leverages, the diagonal values of the smoother matrix. |
cv.crit |
(generalized) cross-validation score. |
pen.crit |
penalized criterion |
crit |
the criterion value minimized in the underlying
|
df |
equivalent degrees of freedom used. Note that (currently)
this value may become quite unprecise when the true |
spar |
the value of |
lambda |
the value of |
iparms |
named integer(3) vector where |
fit |
list for use by
|
call |
the matched call. |
Note
The default all.knots = FALSE entails using only O(n^{0.2})
knots instead of n for n > 49. This cuts the memory
requirement which is O({n_k} ^ 2) + O(n) where
n_k is the number of knots.
In this case where not all unique x values are
used as knots, the result is not a smoothing spline in the strict
sense, but very close unless a small smoothing parameter (or large
df) is used.
Author(s)
B.D. Ripley and Martin Maechler (spar/lambda, etc).
References
Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach; Chapman and Hall.
See Also
predict.smooth.spline for evaluating the spline
and its derivatives.
Examples
data(cars)
attach(cars)
plot(speed, dist, main = "data(cars) & smoothing splines")
cars.spl <- smooth.spline(speed, dist)
(cars.spl)
## This example has duplicate points, so avoid cv=TRUE
lines(cars.spl, col = "blue")
lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
"s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
bg='bisque')
detach()
##-- artificial example
y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4))
xx <- seq(1,length(y18), len=201)
(s2 <- smooth.spline(y18)) # GCV
(s02 <- smooth.spline(y18, spar = 0.2))
plot(y18, main=deparse(s2$call), col.main=2)
lines(s2, col = "gray"); lines(predict(s2, xx), col = 2)
lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)
## The following shows the problematic behavior of `spar' searching:
(s2 <- smooth.spline(y18, con=list(trace=TRUE,tol=1e-6, low= -1.5)))
(s2m <- smooth.spline(y18, cv=TRUE, con=list(trace=TRUE,tol=1e-6, low= -1.5)))
## both above do quite similarly (Df = 8.5 +- 0.2)