Warren Repole

Don't Be a SAS® Dinosaur: Modernize Your SAS Programs
by Warren Repole

Combine Data: Maintain a Cumulative Data Set


Scenario:

You have a SAS data set into which you must add new observations on a regular basis. We'll call this data set a cumulative data set. The structure of the new observations exactly matches the existing observations. The new observations must be added to the end of the cumulative data set.

Occasionally (perhaps each year) you must switch to a new cumulative data set name. In that situation, you must initialize the cumulative data set instead of modiyfing an existing data set.


The old way: Concatenate Data Sets in the DATA Step

The original approach is to use a SET statement to concatenate the observations in the cumulative data set with the new observations.

Download this program
* Initialize Cumulative Data Set ;
data prdsale_master(index=(ProdType Product Country State));
  set sashelp.prdsal2;
  rename monyr=date;
run;
* Add to Cumulative Data Set ;
data prdsale_master;
  set prdsale_master sashelp.prdsal3;
run;
* Check status of indexes ;
ods Select Attributes Indexes;
proc contents data=prdsale_master;
  title1 "CLASS data set attributes after concatenating";
run;

SAS Log

7035   * Initialize Cumulative Data Set ;
7036   data prdsale_master(index=(ProdType Product Country State));
7037     set sashelp.prdsal2;
7038     rename monyr=date;
7039   run;
 
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set WORK.PRDSALE_MASTER has 23040 observations and 11 variables.
 
7040   * Add to Cumulative Data Set ;
7041   data prdsale_master;
7042     set prdsale_master sashelp.prdsal3;
7043   run;
 
NOTE: There were 23040 observations read from the data set WORK.PRDSALE_MASTER.
NOTE: There were 11520 observations read from the data set SASHELP.PRDSAL3.
NOTE: The data set WORK.PRDSALE_MASTER has 34560 observations and 11 variables.
 
7044   * Check status of indexes ;
7045   ods Select Attributes Indexes;
7046   proc contents data=prdsale_master;
7047     title1 "CLASS data set attributes after concatenating";
7048   run;
 
WARNING: Output 'Indexes' was not created.  Make sure that the output object name, label,
         or path is spelled correctly.  Also, verify that the appropriate procedure options
         are used to produce the requested output object.  For example, verify that the
         NOPRINT option is not used.
 

SAS Listing Output

CLASS data set attributes after concatenating
 
The CONTENTS Procedure
 
Data Set Name        WORK.PRDSALE_MASTER                    Observations          34560
Member Type          DATA                                   Variables             11
Engine               V9                                     Indexes               0
Created              Wednesday, May 14, 2008 09:08:53 PM    Observation Length    120
Last Modified        Wednesday, May 14, 2008 09:08:53 PM    Deleted Observations  0
Protection                                                  Compressed            NO
Data Set Type                                               Sorted                NO
Label
Data Representation  WINDOWS_32
Encoding             wlatin1  Western (Windows)
rex

The new way: Use PROC APPEND
(available in SAS Release 82)

An alternate approach is to use the APPEND procedure which was designed specifically for this scenario.

Download this program
* Initialize Cumulative Data Set ;
data prdsale_master(index=(ProdType Product Country State));
  set sashelp.prdsal2;
  rename monyr=date;
run;
* Add to Cumulative Data Set ;
proc append base=prdsale_master data=sashelp.prdsal3;
run;
* Check status of indexes ;
ods Select Attributes Indexes;
proc contents data=prdsale_master;
  title1 "CLASS data set attributes after appending";
run;

SAS Log

7060   * Initialize Cumulative Data Set ;
7061   data prdsale_master(index=(ProdType Product Country State));
7062     set sashelp.prdsal2;
7063     rename monyr=date;
7064   run;
 
NOTE: There were 23040 observations read from the data set SASHELP.PRDSAL2.
NOTE: The data set WORK.PRDSALE_MASTER has 23040 observations and 11 variables.
 
7065   * Add to Cumulative Data Set ;
7066   proc append base=prdsale_master data=sashelp.prdsal3;
7067   run;
 
NOTE: Appending SASHELP.PRDSAL3 to WORK.PRDSALE_MASTER.
NOTE: There were 11520 observations read from the data set SASHELP.PRDSAL3.
NOTE: 11520 observations added.
NOTE: The data set WORK.PRDSALE_MASTER has 34560 observations and 11 variables.
 
7068   * Check status of indexes ;
7069   ods Select Attributes Indexes;
7070   proc contents data=prdsale_master;
7071     title1 "CLASS data set attributes after appending";
7072   run;
 
 

SAS Listing Output

CLASS data set attributes after appending
 
The CONTENTS Procedure
 
Data Set Name        WORK.PRDSALE_MASTER                    Observations          34560
Member Type          DATA                                   Variables             11
Engine               V9                                     Indexes               4
Created              Wednesday, May 14, 2008 09:08:53 PM    Observation Length    120
Last Modified        Wednesday, May 14, 2008 09:08:54 PM    Deleted Observations  0
Protection                                                  Compressed            NO
Data Set Type                                               Sorted                NO
Label
Data Representation  WINDOWS_32
Encoding             wlatin1  Western (Windows)
 
 
Alphabetic List of Indexes and Attributes
 
                   # of
                 Unique
#    Index       Values
 
1    COUNTRY          3
2    PRODTYPE         2
3    PRODUCT          4
4    STATE           16

Advantages of the alternate approach: Good

  • The existing observations in the cumulative data set are never read. The SET statement reads all observations in both input data sets.
  • PROC APPEND maintains any existing indexes in the cumulative data set. The SET statement approach would require the index to be rebuilt.
  • PROC APPEND can initialize a BASE= data set that does not exist, so this approach can be used in the future to create the initial version of a new cumulative data set. To initialize a new cumulative data set with the SET statement approach, the cumulative data set name must be omitted from the SET statement.

Disadvantages of the alternate approach: Bad

  • New variables cannot be added to the cumulative data set. The DATA step permits the creation of new variables.
  • No additional processing is permitted while the new observations are being added. The DATA step supports additional processing.
  • New observations cannot be interleaved into a sorted cumulative data set. The DATA step can interleave observations when a BY statement is used.

Additional documentation for this technique can be found in Base SAS® 9.2 Procedures Guide. Cary, NC: SAS Institute Inc.

Visit http://support.sas.com/documentation/onlinedoc/sas9doc.html for SAS 9 documentation.

The URL for this page is http://www.repole.com/dinosaur/append.html

These techniques are mentioned in other SAS references and publications:


Back to SAS Dinosaur home page

Printable copy of this page (without sample output)

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

® indicates USA registration.