![]() |
Don't Be a SAS® Dinosaur:
Modernize Your SAS Programs
Combine Data: Fuzzy Merges |
You need to combine two SAS data sets whose keys cannot be matched exactly. Specifically, the value of the key variable in the first data set differs from the corresponding variable in the second data set or falls between a range of values stored in two variables in the second data set.
A match merge only handles exact value matches.
|
The original approach is to perform a loop within the DATA step to read all potential matching observations from the secondary data set for each observation of the primary data set. Download this program
data pairs_dstep;
set sashelp.class(keep=Name Age);
do i=1 to Subjects;
set sashelp.class(keep=Name Age rename=(Name=NameOlder Age=AgeOlder))
nobs=Subjects point=i;
if Age lt AgeOlder then output;
end;
run;
proc sort data=pairs_dstep;
by Age AgeOlder Name NameOlder;
run;
proc print data=pairs_dstep(obs=20);
by Age AgeOlder;
id Age AgeOlder;
title1 "Student Mentor Pairings";
run;
SAS Log4817 data pairs_dstep; 4818 set sashelp.class(keep=Name Age); 4819 do i=1 to Subjects; 4820 set sashelp.class(keep=Name Age rename=(Name=NameOlder Age=AgeOlder)) 4821 nobs=Subjects point=i; 4822 if Age lt AgeOlder then output; 4823 end; 4824 run; NOTE: There were 19 observations read from the data set SASHELP.CLASS. NOTE: The data set WORK.PAIRS_DSTEP has 145 observations and 4 variables. 4825 proc sort data=pairs_dstep; 4826 by Age AgeOlder Name NameOlder; 4827 run; NOTE: There were 145 observations read from the data set WORK.PAIRS_DSTEP. NOTE: The data set WORK.PAIRS_DSTEP has 145 observations and 4 variables. 4828 proc print data=pairs_dstep(obs=20); 4829 by Age AgeOlder; 4830 id Age AgeOlder; 4831 title1 "Student Mentor Pairings"; 4832 run; NOTE: There were 20 observations read from the data set WORK.PAIRS_DSTEP. SAS Listing Output
Student Mentor Pairings
Age Name
Age Older Name Older
11 12 Joyce James
Joyce Jane
Joyce John
Joyce Louise
Joyce Robert
Thomas James
Thomas Jane
Thomas John
Thomas Louise
Thomas Robert
11 13 Joyce Alice
Joyce Barbara
Joyce Jeffrey
Thomas Alice
Thomas Barbara
Thomas Jeffrey
11 14 Joyce Alfred
Joyce Carol
Joyce Henry
Joyce Judy
|
![]() |
|
An alternate approach is to perform the match as an SQL join that produces a partial Cartesian product of the data set, then filters down to the matching rows. Download this program
proc sql;
create table pairs_sql as
select y.Name, y.Age, o.Name as NameOlder, o.Age as AgeOlder
from sashelp.class y, sashelp.class o
where y.Age lt o.Age
order by Age, AgeOlder, Name, NameOlder;
quit;
proc print data=pairs_sql(obs=20);
by Age AgeOlder;
id Age AgeOlder;
title1 "Student Mentor Pairings";
run;
SAS Log
4844 proc sql;
4845 create table pairs_sql as
4846 select y.Name, y.Age, o.Name as NameOlder, o.Age as AgeOlder
4847 from sashelp.class y, sashelp.class o
4848 where y.Age lt o.Age
4849 order by Age, AgeOlder, Name, NameOlder;
NOTE: The execution of this query involves performing one or more Cartesian product joins
that can not be optimized.
NOTE: Table WORK.PAIRS_SQL created, with 145 rows and 4 columns.
4850 quit;
4851 proc print data=pairs_sql(obs=20);
4852 by Age AgeOlder;
4853 id Age AgeOlder;
4854 title1 "Student Mentor Pairings";
4855 run;
NOTE: There were 20 observations read from the data set WORK.PAIRS_SQL.
SAS Listing OutputSame as for the old way Advantages
of the
alternate
approach:
|
|
Back to SAS Dinosaur home page Printable copy of this page (without sample output)
SAS and all other SAS Institute Inc. product or service names are registered trademarks
or trademarks of SAS Institute Inc. in the USA and other countries.
|