One to one merge
The variable on which the merge is performed (
by num
) is unique in the merged
data sets and the same values occur in both data sets.
data english;
length en $10.;
num = 1; en = 'one' ; output;
num = 2; en = 'two' ; output;
num = 3; en = 'three'; output;
num = 4; en = 'four' ; output;
num = 5; en = 'five' ; output;
run;
data german;
length gr $10.;
num = 1; gr = 'eins' ; output;
num = 2; gr = 'zwei' ; output;
num = 3; gr = 'drei' ; output;
num = 4; gr = 'vier' ; output;
num = 5; gr = 'fünf' ; output;
run;
data translation;
merge english german;
by num;
run;
proc print data=translation noobs; run;
/*
en num gr
one 1 eins
two 2 zwei
three 3 drei
four 4 vier
five 5 fünf
*/
Missing observations
In the following example, the german translation for two
is missing and the english observation for five is missing.
data english;
length en $10.;
num = 1; en = 'one' ; output;
num = 2; en = 'two' ; output;
num = 3; en = 'three'; output;
num = 4; en = 'four' ; output;
run;
data german;
length gr $10.;
num = 1; gr = 'eins' ; output;
num = 3; gr = 'drei' ; output;
num = 4; gr = 'vier' ; output;
num = 5; gr = 'fünf' ; output;
run;
data translation;
merge english german;
by num;
run;
proc print data=translation noobs; run;
/*
en num gr
one 1 eins
two 2
three 3 drei
four 4 vier
5 fünf
*/
Duplicate values
The following example contains a duplicate value in the german translation: num=5
is translated with both: fünf and fuenf.
The merge produces what we'd expect from an
SQL join:
data english;
length en $10.;
num = 1; en = 'one' ; output;
num = 2; en = 'two' ; output;
num = 3; en = 'three'; output;
num = 4; en = 'four' ; output;
num = 5; en = 'five' ; output;
run;
data german;
length gr $10.;
num = 1; gr = 'eins' ; output;
num = 2; gr = 'zwei' ; output;
num = 3; gr = 'drei' ; output;
num = 4; gr = 'vier' ; output;
num = 5; gr = 'fünf' ; output;
num = 5; gr = 'fuenf'; output;
run;
data translation;
merge english german;
by num;
run;
proc print data=translation noobs; run;
/*
en num gr
one 1 eins
two 2 zwei
three 3 drei
four 4 vier
five 5 fünf
five 5 fuenf
*/
To make things worse, we add a num=5
observation to the english data set. Unlike SQL, the merge won't produce 4 records for all combinations where num=5
:
data english;
length en $10.;
num = 1; en = 'one' ; output;
num = 2; en = 'two' ; output;
num = 3; en = 'three'; output;
num = 4; en = 'four' ; output;
num = 5; en = 'five' ; output;
num = 5; en = 'FIVE' ; output;
run;
data german;
length gr $10.;
num = 1; gr = 'eins' ; output;
num = 2; gr = 'zwei' ; output;
num = 3; gr = 'drei' ; output;
num = 4; gr = 'vier' ; output;
num = 5; gr = 'fünf' ; output;
num = 5; gr = 'fuenf'; output;
run;
data translation;
merge english german;
by num;
run;
proc print data=translation noobs; run;
/*
en num gr
one 1 eins
two 2 zwei
three 3 drei
four 4 vier
five 5 fünf
FIVE 5 fuenf
*/
/* Join with ordinary SQL produces 8 records. */
proc sql;
select
en.en,
en.num,
gr.gr
from
english en join
german gr on en.num = gr.num;
quit;
/*
en num gr
--------------------------------
one 1 eins
two 2 zwei
three 3 drei
four 4 vier
five 5 fünf
five 5 fuenf
FIVE 5 fünf
FIVE 5 fuenf
*/
Sort order
The data sets must be sorted to merge them, otherwise, SAS will complain with something like BY variables are not proberly sorted:
data english;
length en $10.;
num = 4; en = 'four' ; output;
num = 3; en = 'three'; output;
num = 2; en = 'two' ; output;
num = 1; en = 'one' ; output;
num = 5; en = 'five' ; output;
run;
data german;
length gr $10.;
num = 1; gr = 'eins' ; output;
num = 2; gr = 'zwei' ; output;
num = 5; gr = 'fünf' ; output;
num = 4; gr = 'vier' ; output;
num = 3; gr = 'drei' ; output;
run;
data translation;
merge english german;
by num;
run;
/* ERROR: BY variables are not properly sorted on data set WORK.ENGLISH. */