R: Fit a Smoothing Spline

smooth.spline {modreg}

R Documentation

Fit a Smoothing Spline

Description

Fits a cubic smoothing spline to the supplied data.

Usage

smooth.spline(x, y, w = rep(1, length(x)), df = 5, spar = 0,
              cv = FALSE, all.knots = FALSE, df.offset = 0, penalty = 1)

Arguments

x

a vector giving the values of the predictor variable, or a list or a two-column matrix specifying x and y.

y

responses. If y is missing, the responses are assumed to be specified by x.

w

optional vector of weights

df

the desired equivalent number of degrees of freedom (trace of the smoother matrix).

spar

smoothing parameter, typically in (0,1]. The coefficient \lambda of the integral of the squared second derivative in the fit (penalized log likelihood) criterion is a monotone function of spar, see the details below.

cv

ordinary (TRUE) or ‘generalized’ (FALSE) cross-validation.

all.knots

if TRUE, all points in x are uses as knots. If FALSE, a suitably fine grid of knots is used.

df.offset

allows the degrees of freedom to be increased by df.offset in the GCV criterion.

penalty

the coefficient of the penalty for degrees of freedom in the GCV criterion.

Details

The x vector should contain at least ten distinct values.

The computational \lambda used (as a function of s=spar) is \lambda = r * 256^{3 s - 1} where r = tr(X' W^2 X) / tr(\Sigma), \Sigma is the matrix given by \Sigma_{ij} = \int B_i''(t) B_j''(t) dt, X is given by X_{ij} = B_j(x_i), W^2 is the diagonal matrix of scaled weights, W = diag(w)/n (i.e., the identity for default weights), and B_k(.) is the k-th B-spline.

Note that with these definitions, f_i = f(x_i), and the B-spline basis representation f = X c (i.e. c is the vector of spline coefficients), the penalized log likelihood is L = (y - f)' W^2 (y - f) + \lambda c' \Sigma c, and hence c is the solution of the (ridge regression) (X' W^2 X + \lambda \Sigma) c = X' W^2 y.

If spar is missing or 0, the value of df is used to determine the degree of smoothing. If both are missing, leave-one-out cross-validation is used to determine \lambda.

The ‘generalized’ cross-validation method will work correctly when there are duplicated points in x. However, it is ambiguous what leave-one-out cross-validation means with duplicated points, and the internal code uses an approximation that involves leaving out groups of duplicated points. cv=TRUE is best avoided in that case.

Value

An object of class "smooth.spline" with components

x

the distinct x values in increasing order.

y

the fitted values corresponding to x.

w

the weights used at the unique values of x.

yin

the y values used at the unique y values.

lev

leverages, the diagonal values of the smoother matrix.

cv.crit

(generalized) cross-validation score.

pen.crit

penalized criterion

df

equivalent degrees of freedom used.

spar

the value of \lambda chosen.

fit

list for use by predict.smooth.spline.

call

the matched call.

Author(s)

B.D. Ripley

Examples

data(cars)
attach(cars)
plot(speed, dist, main = "data(cars)  &  smoothing splines")
cars.spl <- smooth.spline(speed, dist)
(cars.spl)
## This example has duplicate points, so avoid cv=TRUE

lines(cars.spl, col = "blue")
lines(smooth.spline(speed, dist, df=10), lty=2, col = "red")
legend(5,120,c(paste("default [C.V.] => df =",round(cars.spl$df,1)),
               "s( * , df = 10)"), col = c("blue","red"), lty = 1:2,
       bg='bisque')
detach()