cdplot {graphics} | R Documentation |
Computes and plots conditional densities describing how the
conditional distribution of a categorical variable y
changes over a
numerical variable x
.
cdplot(x, ...)
## Default S3 method:
cdplot(x, y,
plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
bw = "nrd0", n = 512, from = NULL, to = NULL,
col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...)
## S3 method for class 'formula'
cdplot(formula, data = list(),
plot = TRUE, tol.ylab = 0.05, ylevels = NULL,
bw = "nrd0", n = 512, from = NULL, to = NULL,
col = NULL, border = 1, main = "", xlab = NULL, ylab = NULL,
yaxlabels = NULL, xlim = NULL, ylim = c(0, 1), ...,
subset = NULL)
x |
an object, the default method expects either a single numerical variable. |
y |
a |
formula |
a |
data |
an optional data frame. |
plot |
logical. Should the computed conditional densities be plotted? |
tol.ylab |
convenience tolerance parameter for y-axis annotation. If the distance between two labels drops under this threshold, they are plotted equidistantly. |
ylevels |
a character or numeric vector specifying in which order the levels of the dependent variable should be plotted. |
bw , n , from , to , ... |
arguments passed to |
col |
a vector of fill colors of the same length as |
border |
border color of shaded polygons. |
main , xlab , ylab |
character strings for annotation |
yaxlabels |
character vector for annotation of y axis, defaults to
|
xlim , ylim |
the range of x and y values with sensible defaults. |
subset |
an optional vector specifying a subset of observations to be used for plotting. |
cdplot
computes the conditional densities of x
given
the levels of y
weighted by the marginal distribution of y
.
The densities are derived cumulatively over the levels of y
.
This visualization technique is similar to spinograms (see spineplot
)
and plots P(y | x)
against x
. The conditional probabilities
are not derived by discretization (as in the spinogram), but using a smoothing
approach via density
.
Note, that the estimates of the conditional densities are more reliable for
high-density regions of x
. Conversely, the are less reliable in regions
with only few x
observations.
The conditional density functions (cumulative over the levels of y
)
are returned invisibly.
Achim Zeileis Achim.Zeileis@R-project.org
Hofmann, H., Theus, M. (2005), Interactive graphics for visualizing conditional distributions, Unpublished Manuscript.
spineplot
, density
## NASA space shuttle o-ring failures
fail <- factor(c(2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 1,
1, 2, 1, 1, 1, 1, 1),
levels = 1:2, labels = c("no", "yes"))
temperature <- c(53, 57, 58, 63, 66, 67, 67, 67, 68, 69, 70, 70,
70, 70, 72, 73, 75, 75, 76, 76, 78, 79, 81)
## CD plot
cdplot(fail ~ temperature)
cdplot(fail ~ temperature, bw = 2)
cdplot(fail ~ temperature, bw = "SJ")
## compare with spinogram
(spineplot(fail ~ temperature, breaks = 3))
## highlighting for failures
cdplot(fail ~ temperature, ylevels = 2:1)
## scatter plot with conditional density
cdens <- cdplot(fail ~ temperature, plot = FALSE)
plot(I(as.numeric(fail) - 1) ~ jitter(temperature, factor = 2),
xlab = "Temperature", ylab = "Conditional failure probability")
lines(53:81, 1 - cdens[[1]](53:81), col = 2)