merge {base} | R Documentation |
Merge two data frames by common columns or row names, or do other versions of database join operations.
merge(x, y, ...)
## Default S3 method:
merge(x, y, ...)
## S3 method for class 'data.frame'
merge(x, y, by = intersect(names(x), names(y)),
by.x = by, by.y = by, all = FALSE, all.x = all, all.y = all,
sort = TRUE, suffixes = c(".x",".y"), incomparables = NULL, ...)
x , y |
data frames, or objects to be coerced to one. |
by , by.x , by.y |
specifications of the common columns. See ‘Details’. |
all |
logical; |
all.x |
logical; if |
all.y |
logical; analogous to |
sort |
logical. Should the results be sorted on the |
suffixes |
character(2) specifying the suffixes to be used for
making non- |
incomparables |
values which cannot be matched. See
|
... |
arguments to be passed to or from methods. |
By default the data frames are merged on the columns with names they
both have, but separate specifications of the columns can be given by
by.x
and by.y
. Columns can be specified by name, number
or by a logical vector: the name "row.names"
or the number
0
specifies the row names. The rows in the two data frames
that match on the specified columns are extracted, and joined
together. If there is more than one match, all possible matches
contribute one row each. For the precise meaning of ‘match’,
see match
.
If by
or both by.x
and by.y
are of length 0 (a
length zero vector or NULL
), the result, r
, is the
Cartesian product of x
and y
, i.e.,
dim(r) = c(nrow(x)*nrow(y), ncol(x) + ncol(y))
.
If all.x
is true, all the non matching cases of x
are
appended to the result as well, with NA
filled in the
corresponding columns of y
; analogously for all.y
.
If the remaining columns in the data frames have any common names,
these have suffixes
(".x"
and ".y"
by default)
appended to make the names of the result unique.
The complexity of the algorithm used is proportional to the length of the answer.
In SQL database terminology, the default value of all = FALSE
gives a natural join, a special case of an inner
join. Specifying all.x = TRUE
gives a left (outer)
join, all.y = TRUE
a right (outer) join, and both
(all=TRUE
a (full) outer join. DBMSes do not match
NULL
records, equivalent to incomparables = NA
in R.
A data frame. The rows are by default lexicographically sorted on the
common columns, but for sort = FALSE
are in an unspecified order.
The columns are the common columns followed by the
remaining columns in x
and then those in y
. If the
matching involved row names, an extra character column called
Row.names
is added at the left, and in all cases the result has
‘automatic’ row names.
data.frame
,
by
,
cbind
## use character columns of names to get sensible sort order
authors <- data.frame(
surname = I(c("Tukey", "Venables", "Tierney", "Ripley", "McNeil")),
nationality = c("US", "Australia", "US", "UK", "Australia"),
deceased = c("yes", rep("no", 4)))
books <- data.frame(
name = I(c("Tukey", "Venables", "Tierney",
"Ripley", "Ripley", "McNeil", "R Core")),
title = c("Exploratory Data Analysis",
"Modern Applied Statistics ...",
"LISP-STAT",
"Spatial Statistics", "Stochastic Simulation",
"Interactive Data Analysis",
"An Introduction to R"),
other.author = c(NA, "Ripley", NA, NA, NA, NA,
"Venables & Smith"))
(m1 <- merge(authors, books, by.x = "surname", by.y = "name"))
(m2 <- merge(books, authors, by.x = "name", by.y = "surname"))
stopifnot(as.character(m1[,1]) == as.character(m2[,1]),
all.equal(m1[, -1], m2[, -1][ names(m1)[-1] ]),
dim(merge(m1, m2, by = integer(0))) == c(36, 10))
## "R core" is missing from authors and appears only here :
merge(authors, books, by.x = "surname", by.y = "name", all = TRUE)
## example of using 'incomparables'
x <- data.frame(k1=c(NA,NA,3,4,5), k2=c(1,NA,NA,4,5), data=1:5)
y <- data.frame(k1=c(NA,2,NA,4,5), k2=c(NA,NA,3,4,5), data=1:5)
merge(x, y, by=c("k1","k2")) # NA's match
merge(x, y, by=c("k1","k2"), incomparables=NA)
merge(x, y, by="k1") # NA's match, so 6 rows
merge(x, y, by="k2", incomparables=NA) # 2 rows