smooth.spline {modreg} | R Documentation |
Fits a cubic smoothing spline to the supplied data.
smooth.spline(x, y = NULL, w = NULL, df, spar = NULL,
cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1,
control.spar = list())
x |
a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y. |
y |
responses. If |
w |
optional vector of weights of the same length as |
df |
the desired equivalent number of degrees of freedom (trace of the smoother matrix). |
spar |
smoothing parameter, typically (but not necessarily) in
|
cv |
ordinary ( |
all.knots |
if |
df.offset |
allows the degrees of freedom to be increased by
|
penalty |
the coefficient of the penalty for degrees of freedom in the GCV criterion. |
control.spar |
optional list with named components controlling the
root finding when the smoothing parameter Note that this is partly experimental and may change with general spar computation improvements!
Note that |
The x
vector should contain at least four distinct values.
Distinct here means “distinct after rounding to 6 significant
digits”, i.e., x
will be transformed to
unique(sort(signif(x, 6)))
, and y
and w
are
pooled accordingly.
The computational \lambda
used (as a function of
s=spar
) is
\lambda = r * 256^{3 s - 1}
where
r = tr(X' W X) / tr(\Sigma)
,
\Sigma
is the matrix given by
\Sigma_{ij} = \int B_i''(t) B_j''(t) dt
,
X
is given by X_{ij} = B_j(x_i)
,
W
is the diagonal matrix of weights (scaled such that
its trace is n
, the original number of observations)
and B_k(.)
is the k
-th B-spline.
Note that with these definitions, f_i = f(x_i)
, and the B-spline
basis representation f = X c
(i.e. c
is
the vector of spline coefficients), the penalized log likelihood is
L = (y - f)' W (y - f) + \lambda c' \Sigma c
, and hence
c
is the solution of the (ridge regression)
(X' W X + \lambda \Sigma) c = X' W y
.
If spar
is missing or NULL
, the value of df
is used to
determine the degree of smoothing. If both are missing, leave-one-out
cross-validation is used to determine \lambda
.
Note that from the above relation,
spar
is s = s0 + 0.0601 * \bold{\log}\lambda
,
which is intentionally different from the S-plus implementation
of smooth.spline
(where spar
is proportional to
\lambda
). In R's (\log \lambda
) scale, it makes more
sense to vary spar
linearly.
Note however that currently the results may become very unreliable
for spar
values smaller than about -1 or -2. The same may
happen for values larger than 2 or so. Don't think of setting
spar
or the controls low
and high
outside such a
safe range, unless you know what you are doing!
The “generalized” cross-validation method will work correctly when
there are duplicated points in x
. However, it is ambiguous what
leave-one-out cross-validation means with duplicated points, and the
internal code uses an approximation that involves leaving out groups
of duplicated points. cv=TRUE
is best avoided in that case.
An object of class "smooth.spline"
with components
x |
the distinct |
y |
the fitted values corresponding to |
w |
the weights used at the unique values of |
yin |
the y values used at the unique |
lev |
leverages, the diagonal values of the smoother matrix. |
cv.crit |
(generalized) cross-validation score. |
pen.crit |
penalized criterion |
crit |
the criterion value minimized in the underlying
|
df |
equivalent degrees of freedom used. Note that (currently)
this value may become quite unprecise when the true |
spar |
the value of |
lambda |
the value of |
iparms |
named integer(3) vector where |
fit |
list for use by
|
call |
the matched call. |
The default all.knots = FALSE
entails using only O(n^{0.2})
knots instead of n
for n > 49
. This cuts the memory
requirement which is O({n_k} ^ 2) + O(n)
where
n_k
is the number of knots.
In this case where not all unique x
values are
used as knots, the result is not a smoothing spline in the strict
sense, but very close unless a small smoothing parameter (or large
df
) is used.
B.D. Ripley and Martin Maechler (spar/lambda, etc).
Green, P. J. and Silverman, B. W. (1994) Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach; Chapman and Hall.
predict.smooth.spline
for evaluating the spline
and its derivatives.
data(cars)
attach(cars)
plot(speed, dist, main = "data(cars) & smoothing splines")
cars.spl <- smooth.spline(speed, dist)
(cars.spl)
## This example has duplicate points, so avoid cv=TRUE
lines(cars.spl, col = "blue")
lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
"s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
bg='bisque')
detach()
##-- artificial example
y18 <- c(1:3,5,4,7:3,2*(2:5),rep(10,4))
xx <- seq(1,length(y18), len=201)
(s2 <- smooth.spline(y18)) # GCV
(s02 <- smooth.spline(y18, spar = 0.2))
plot(y18, main=deparse(s2$call), col.main=2)
lines(s2, col = "gray"); lines(predict(s2, xx), col = 2)
lines(predict(s02, xx), col = 3); mtext(deparse(s02$call), col = 3)
## The following shows the problematic behavior of `spar' searching:
(s2 <- smooth.spline(y18, con=list(trace=TRUE,tol=1e-6, low= -1.5)))
(s2m <- smooth.spline(y18, cv=TRUE, con=list(trace=TRUE,tol=1e-6, low= -1.5)))
## both above do quite similarly (Df = 8.5 +- 0.2)