Syntax
One-to-one merge on specified key variables
merge 1:1 varlist using filename [, options]
Many-to-one merge on specified key variables
merge m:1 varlist using filename [, options]
One-to-many merge on specified key variables
merge 1:m varlist using filename [, options]
Many-to-many merge on specified key variables
merge m:m varlist using filename [, options]
One-to-one merge by observation
merge 1:1 _n using filename [, options]
options Description
—————————————————————————-
Options
keepusing(varlist) variables to keep from using data; default is all
generate(newvar) name of new variable to mark merge results; default is
_merge
nogenerate do not create _merge variable
nolabel do not copy value-label definitions from using
nonotes do not copy notes from using
update update missing values of same-named variables in
master with values from using
replace replace all values of same-named variables in master
with nonmissing values from using (requires update)
noreport do not display match result summary table
force allow string/numeric variable type mismatch without
error
Results
assert(results) specify required match results
keep(results) specify which match results to keep
sorted do not sort; datasets already sorted
—————————————————————————-
sorted does not appear in the dialog box.
Menu
Data > Combine datasets > Merge two datasets
Description
merge joins corresponding observations from the dataset currently in memory
(called the master dataset) with those from filename.dta (called the using
dataset), matching on one or more key variables. merge can perform match
merges (one-to-one, one-to-many, many-to-one, and many-to-many), which are
often called ‘joins’ by database people. merge can also perform sequential
merges, which have no equivalent in the relational database world.
merge is for adding new variables from a second dataset to existing
observations. You use merge, for instance, when combining hospital patient
and discharge datasets. If you wish to add new observations to existing
variables, then see [D] append. You use append, for instance, when adding
current discharges to past discharges.
By default, merge creates a new variable, _merge, containing numeric codes
concerning the source and the contents of each observation in the merged
dataset. These codes are explained below in the match results table.
If filename is specified without an extension, then .dta is assumed.
Options
+———+
—-+ Options +————————————————————-
keepusing(varlist) specifies the variables from the using dataset that are
kept in the merged dataset. By default, all variables are kept. For
example, if your using dataset contains 2,000 demographic
characteristics but you want only sex and age, then type merge …,
keepusing(sex age) ….
generate(newvar) specifies that the variable containing match results
information should be named newvar rather than _merge.
nogenerate specifies that _merge not be created. This would be useful if
you also specified keep(match), because keep(match) ensures that all
values of _merge would be 3.
nolabel specifies that value-label definitions from the using file be
ignored. This option should be rare, because definitions from the
master are already used.
nonotes specifies that notes in the using dataset not be added to the merged
dataset; see [D] notes.
update and replace both perform an update merge rather than a standard
merge. In a standard merge, the data in the master are the authority
and inviolable. For example, if the master and using datasets both
contain a variable age, then matched observations will contain values
from the master dataset, while unmatched observations will contain
values from their respective datasets.
If update is specified, then matched observations will update missing
values from the master dataset with values from the using dataset.
Nonmissing values in the master dataset will be unchanged.
If replace is specified, then matched observations will contain values
from the using dataset, unless the value in the using dataset is
missing.
Specifying either update or replace affects the meanings of the match
codes. See Treatment of overlapping variables in [D] merge for details.
noreport specifies that merge not present its summary table of match
results.
force allows string/numeric variable type mismatches, resulting in missing
values from the using dataset. If omitted, merge issues an error; if
specified, merge issues a warning.
+———+
—-+ Results +————————————————————-
assert(results) specifies the required match results. The possible results
are
numeric equivalent
code word (results) description
——————————————————————-
1 master observation appeared in master only
2 using observation appeared in using only
3 match observation appeared in both
4 match_update observation appeared in both,
missing values updated
5 match_conflict observation appeared in both,
conflicting nonmissing values
——————————————————————-
Codes 4 and 5 can arise only if the update option is specified.
If codes of both 4 and 5 could pertain to an observation, then 5 is
used.
Numeric codes and words are equivalent when used in the assert() or
keep() options.
The following synonyms are allowed: masters for master, usings for
using, matches and matched for match, match_updates for match_update,
and match_conflicts for match_conflict.
Using assert(match master) specifies that the merged file is required to
include only matched master or using observations and unmatched master
observations, and may not include unmatched using observations.
Specifying assert() results in merge issuing an error if there are match
results among those observations you allowed.
The order of the words or codes is not important, so all the following
assert() specifications would be the same:
assert(match master)
assert(master matches)
assert(1 3)
When the match results contain codes other than those allowed, return
code 9 is returned, and the merged dataset with the unanticipated
results is left in memory to allow you to investigate.
keep(results) specifies which observations are to be kept from the merged
dataset. Using keep(match master) specifies keeping only matched
observations and unmatched master observations after merging.
keep() differs from assert() because it selects observations from the
merged dataset rather than enforcing requirements. keep() is used to
pare the merged dataset to a given set of observations when you do not
care if there are other observations in the merged dataset. assert() is
used to verify that only a given set of observations is in the merged
dataset.
You can specify both assert() and keep(). If you require matched
observations and unmatched master observations but you want only the
matched observations, then you could specify assert(match master)
keep(match).
assert() and keep() are convenience options whose functionality can be
duplicated using _merge directly.
. merge …, assert(match master) keep(match)
is identical to
. merge …
. assert _merge==1 | _merge==3
. keep if _merge==3
The following option is available with merge but is not shown in the dialog
box:
sorted specifies that the master and using datasets are already sorted by
varlist. If the datasets are already sorted, then merge runs a little
more quickly; the difference is hardly detectable, so this option is of
interest only where speed is of the utmost importance.
Prior syntax
Prior to Stata 11, merge had a more primitive syntax. Code using the old
syntax will run unmodified. To assist those attempting to understand or
debug out-of-date code, the original help file for merge can be found here.
Examples
——————————————————————————
Setup
. webuse autosize
. list
. webuse autoexpense
. list
Perform 1:1 match merge
. webuse autosize
. merge 1:1 make using http://www.stata-press.com/data/r12/autoexpense
. list
——————————————————————————
Perform 1:1 match merge, requiring there to be only matches
(The merge command intentionally causes an error message.)
. webuse autosize, clear
. merge 1:1 make using http://www.stata-press.com/data/r12/autoexpense,
assert(match)
. tab _merge
. list
——————————————————————————
Perform 1:1 match merge, keeping only matches and squelching the _merge
variable
. webuse autosize, clear
. merge 1:1 make using http://www.stata-press.com/data/r12/autoexpense,
keep(match) nogen
. list
——————————————————————————
Setup
. webuse dollars, clear
. list
. webuse sforce
. list
Perform m:1 match merge with sforce in memory
. merge m:1 region using http://www.stata-press.com/data/r12/dollars
. list
——————————————————————————
Setup
. webuse overlap1, clear
. list, sepby(id)
. webuse overlap2
. list
Perform m:1 match merge, illustrating update option
. webuse overlap1
. merge m:1 id using http://www.stata-press.com/data/r12/overlap2,
update
. list
——————————————————————————
Perform m:1 match merge, illustrating update replace option
. webuse overlap1, clear
. merge m:1 id using http://www.stata-press.com/data/r12/overlap2,
update replace
. list
——————————————————————————
Perform 1:m match merge, illustrating update replace option
. webuse overlap2, clear
. merge 1:m id using http://www.stata-press.com/data/r12/overlap1,
update replace
. list
——————————————————————————
Perform sequential merge
. webuse sforce, clear
. merge 1:1 _n using http://www.stata-press.com/data/r12/dollars
. list