Warren Repole

Don't Be a SAS® Dinosaur: Modernize Your SAS Programs
by Warren Repole

Frequency Reports: Number of Levels of a Variable


Scenario:

You need to determine the number of distinct levels for several variables in a SAS data set.

If missing values are present, you wish to distinguish missing values (including regular and special missing values for numeric variables) from nonmissing values when the counts are generated.


The old way: Count Levels from PROC FREQ Output

The original approach is to collapse each variable down to unique values, then count the number of collapsed items. The counts for individual variables subsequently are concatenated to create a single result set.

Download this program
data Georgia;
  set sashelp.zipcode(where=(StateName="Georgia"));
  if msa=0 then msa=.;
run;
proc freq data=georgia;
  tables CountyNm / out=Counties noprint;
  tables AreaCode / out=AreaCodes noprint;
  tables ZIP_Class / out=ZIPclasses noprint;
  tables MSA / out=MSAs noprint;
run;
data counts;
  set Counties(in=county_level)
      AreaCodes(in=areacode_level)
      ZIPclasses(in=zipclass_level)
      MSAs(in=msa_level)
        end=no_more;
  if county_level then CountyCount+1;
  else if areacode_level then AreaCodeCount+1;
  else if zipclass_level then ZIPclassCount+1;
  else if msa_level then MSACount+1;
  if no_more then output;
  keep CountyCount AreaCodeCount ZIPclassCount MSACount;
run;
proc print data=counts noobs;
  title1 "Counts of Variable Levels";
run;

SAS Log

6798   data Georgia;
6799     set sashelp.zipcode(where=(StateName="Georgia"));
6800     if msa=0 then msa=.;
6801   run;
 
NOTE: There were 972 observations read from the data set SASHELP.ZIPCODE.
      WHERE StateName='Georgia';
NOTE: The data set WORK.GEORGIA has 972 observations and 18 variables.
 
6802   proc freq data=georgia;
6803     tables CountyNm / out=Counties noprint;
6804     tables AreaCode / out=AreaCodes noprint;
6805     tables ZIP_Class / out=ZIPclasses noprint;
6806     tables MSA / out=MSAs noprint;
6807   run;
 
NOTE: There were 972 observations read from the data set WORK.GEORGIA.
NOTE: The data set WORK.COUNTIES has 159 observations and 3 variables.
NOTE: The data set WORK.AREACODES has 6 observations and 3 variables.
NOTE: The data set WORK.ZIPCLASSES has 3 observations and 3 variables.
NOTE: The data set WORK.MSAS has 9 observations and 3 variables.
 
6808   data counts;
6809     set Counties(in=county_level)
6810         AreaCodes(in=areacode_level)
6811         ZIPclasses(in=zipclass_level)
6812         MSAs(in=msa_level)
6813           end=no_more;
6814     if county_level then CountyCount+1;
6815     else if areacode_level then AreaCodeCount+1;
6816     else if zipclass_level then ZIPclassCount+1;
6817     else if msa_level then MSACount+1;
6818     if no_more then output;
6819     keep CountyCount AreaCodeCount ZIPclassCount MSACount;
6820   run;
 
NOTE: There were 159 observations read from the data set WORK.COUNTIES.
NOTE: There were 6 observations read from the data set WORK.AREACODES.
NOTE: There were 3 observations read from the data set WORK.ZIPCLASSES.
NOTE: There were 9 observations read from the data set WORK.MSAS.
NOTE: The data set WORK.COUNTS has 1 observations and 4 variables.
 
6821   proc print data=counts noobs;
6822     title1 "Counts of Variable Levels";
6823   run;
 
NOTE: There were 1 observations read from the data set WORK.COUNTS.
 

SAS Listing Output

Counts of Variable Levels
 
           Area
County     Code    ZIPclass
 Count    Count      Count     MSACount
 
  159       6          3           9
raptor

The new way: Use the NLEVELS Option in PROC FREQ
(available in SAS 9)

An alternate approach is to apply the NLEVELS option in PROC FREQ, using the ODS OUTPUT statement to capture the counts into a SAS data set.

Download this program
data Georgia;
  set sashelp.zipcode(where=(StateName="Georgia"));
  if msa=0 then msa=.;
run;
ods output NLevels=CountsNLevels;
proc freq data=georgia nlevels;
  tables CountyNm AreaCode ZIP_Class MSA  / noprint missing;
  label ZIP_Class=" " MSA=" " CountyNm=" " AreaCode=" ";
  title1 "Counts of Variable Levels";
run;
proc print data=CountsNLevels;
  title1 "Counts of Variable Levels";
run;

SAS Log

6835   data Georgia;
6836     set sashelp.zipcode(where=(StateName="Georgia"));
6837     if msa=0 then msa=.;
6838   run;
 
NOTE: There were 972 observations read from the data set SASHELP.ZIPCODE.
      WHERE StateName='Georgia';
NOTE: The data set WORK.GEORGIA has 972 observations and 18 variables.
 
6839   ods output NLevels=CountsNLevels;
6840   proc freq data=georgia nlevels;
6841     tables CountyNm AreaCode ZIP_Class MSA
6842    / noprint missing;
6843     label ZIP_Class=" " MSA=" " CountyNm=" " AreaCode=" ";
6844     title1 "Counts of Variable Levels";
6845   run;
 
NOTE: The data set WORK.COUNTSNLEVELS has 4 observations and 4 variables.
NOTE: There were 972 observations read from the data set WORK.GEORGIA.
 
6846   proc print data=CountsNLevels;
6847     title1 "Counts of Variable Levels";
6848   run;
 
NOTE: There were 4 observations read from the data set WORK.COUNTSNLEVELS.
 

SAS Listing Output

Counts of Variable Levels
 
The FREQ Procedure
 
           Number of Variable Levels
 
                          Missing    Nonmissing
Variable       Levels      Levels        Levels
-----------------------------------------------
COUNTYNM          159           0           159
AREACODE            6           0             6
ZIP_CLASS           3           1             2
MSA                 9           1             8
========================================================================================
Counts of Variable Levels
 
                                                NNon
                                   NMiss        Miss
Obs    TableVar      NLevels      Levels      Levels
 
 1     COUNTYNM          159           0         159
 2     AREACODE            6           0           6
 3     ZIP_CLASS           3           1           2
 4     MSA                 9           1           8

Advantages of the alternate approach: Good

  • A single PROC step can analyze any number of variables.
  • In addition to determining the number of unique values, levels of missing values are identified.
  • If only a report is required, there is no need to create a data set to hold the results.

Disadvantages of the alternate approach: Bad

  • This approach produces a multiple-observation result set that must be transposed if one column per original variable is desired.

Additional documentation for this technique can be found in Base SAS® 9.2 Procedures Guide. Cary, NC: SAS Institute Inc.

Visit http://support.sas.com/documentation/onlinedoc/sas9doc.html for SAS 9 documentation.

The URL for this page is http://www.repole.com/dinosaur/nlevels.html

These techniques are mentioned in other SAS references and publications:


Back to SAS Dinosaur home page

Printable copy of this page (without sample output)

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

® indicates USA registration.