Warren Repole

Don't Be a SAS® Dinosaur: Modernize Your SAS Programs
by Warren Repole

Multiple Input Data Sets: Identifying the Source Data Set


Scenario:

You need to combine a group of data sets whose names are similar.

The combined data set must contain a variable that indicates which input data set contributed that observation.


The old way: Conditional Logic using IN= Variables

The identification of the input data set is performed by checking the IN= variables defined for each input data set in the SET statement.

Download this program
data combined;
  set sashelp.prdsale(in=in_prdsale)
      sashelp.prdsal2(in=in_prdsal2)
      sashelp.prdsal3(in=in_prdsal3);
  length Source $ 32;
  if in_prdsale then source="PRDSALE";
  else if in_prdsal2 then source="PRDSAL2";
  else if in_prdsal3 then source="PRDSAL3";
run;
proc freq data=combined;
  tables Source;
  title1 "Source Data Sets for Combined Data";
run;

SAS Log

6991   data combined;
6992     set sashelp.prdsale(in=in_prdsale)
6993         sashelp.prdsal2(in=in_prdsal2)
6994         sashelp.prdsal3(in=in_prdsal3);
6995     length Source $ 32;
6996     if in_prdsale then source="PRDSALE";
6997     else if in_prdsal2 then source="PRDSAL2";
6998     else if in_prdsal3 then source="PRDSAL3";
6999   run;
 
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: There were 11520 observations read from the data set SASHELP.PRDSAL3.
NOTE: The data set WORK.COMBINED has 36000 observations and 15 variables.
 
7000   proc freq data=combined;
7001     tables Source;
7002     title1 "Source Data Sets for Combined Data";
7003   run;
 
NOTE: There were 36000 observations read from the data set WORK.COMBINED.
 

SAS Listing Output

Source Data Sets for Combined Data
 
The FREQ Procedure
 
                                    Cumulative    Cumulative
Source     Frequency     Percent     Frequency      Percent
------------------------------------------------------------
PRDSAL2       23040       64.00         23040        64.00
PRDSAL3       11520       32.00         34560        96.00
PRDSALE        1440        4.00         36000       100.00
raptor

The new way: INDSNAME= Option in the SET Statement
(available in SAS 9.2)

An alternate approach is to use the INDSNAME= option in the SET statement, available starting in SAS 9.2.

The variable defined by the INDSNAME= option contains the complete input data set including the libref, for a maximum length of 41 characters.

This technique is particularly valuable when combined with the data set list syntax introduced in SAS 9.2.

Download this program
data combined;
  set sashelp.prdsal: indsname=DataSetName;
  length Source $ 32;
  Source=scan(DataSetName,2);
run;
proc freq data=combined;
  tables Source;
  title1 "Source Data Sets for Combined Data";
run;

SAS Log

7015   data combined;
7016     set sashelp.prdsal: indsname=DataSetName;
7017     length Source $ 32;
7018     Source=scan(DataSetName,2);
7019   run;
 
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: There were 11520 observations read from the data set SASHELP.PRDSAL3.
NOTE: There were 1440 observations read from the data set SASHELP.PRDSALE.
NOTE: The data set WORK.COMBINED has 36000 observations and 15 variables.
 
7020   proc freq data=combined;
7021     tables Source;
7022     title1 "Source Data Sets for Combined Data";
7023   run;
 
NOTE: There were 36000 observations read from the data set WORK.COMBINED.
 

SAS Listing Output

Same as for the old way


Advantages of the alternate approach: Good

  • No data set options are required for the input data sets.
  • No conditional logic is required to create the input source variable.

Disadvantages of the alternate approach: Bad

  • The value of the input data set obtained through the INDSNAME= option is always uppercase.

Additional documentation for this technique can be found in SAS® 9.2 Language Reference: Dictionary. Cary, NC: SAS Institute Inc.

Visit http://support.sas.com/documentation/onlinedoc/sas9doc.html for SAS 9 documentation.

The URL for this page is http://www.repole.com/dinosaur/setname.html

These techniques are mentioned in other SAS references and publications:


Back to SAS Dinosaur home page

Printable copy of this page (without sample output)

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

® indicates USA registration.