SAS: data step

Implied Loop of a Data Step

ILDS (Implied Loop of a Data Step): The data step performs an (implied) loop: for each observation, a set of statements is executed.

The automatically created variable _n_ stores the iteration number of the loop.

Writing Output

By default, on the last line of execution on the step, data is automatically written to a data set.

This can be overridden with an OUTPUT statement somewhere in the data step.

A DELETE statement returns to the start of the loop without writing any output.

If the data step starts with data _null_, no data set is created.

Default naming of output data sets

If the output data set is not explicitly named, SAS will name them DATA1, DATA2, DATA3 …

Input buffer

If the data step's data source is a text file (rather than a data set), the data is first read (linewise?) into the input buffer.

Thus, the input buffer contains raw (or unparsed) data from the text file.

The data in this buffer is then used to create the elements in the PDV.

Program Data Vector (PDV)

The PDV is an in-memory storage area for all named variables of a data step.

The PDV gets its data either from the input buffer, from a data set or from creating a variable.

Additionally, there are the two variables _n_ and _error_ which are automatically generated for all data steps.

When the processing of an obesrvation finishes, the values within the PDV are written to the output destination (except for the automatic variables and the variables marked with drop).

Compilation and execution

A data step is processed in two phases: first the compilation phase and then the execution phase.

The compilation phase checks for the syntactical correctness of the code and then creates the PDV and the decriptor portion of the output data set.

The input data is then read during the execution phase.

Renaming a variable

A variable (column name) can be renamed with the rename option:

data orig;
  col_1 = 'hello';
  col_2 =  42;
  col_3 = 'world';
run;

data changed;
     set orig (
         rename=(
            col_1 = col_one
            col_3 = col_three
         )
     );
run;

proc sql;
  describe table changed;
quit;
/*
create table WORK.CHANGED( bufsize=65536 )
  (
   col_one char(5),
   col_2 num,
   col_three char(5)
  );
*/

Github repository about-SAS, path: /programming/data-step/set/rename/variables.sas

The rename option can be empty, in which case no variable is renamed

data orig;

  foo = 'hello world';
  bar =  42;
  baz =  999;
  
run;

data rename_empty;
  set orig (rename=());
run;

proc sql;
  describe table rename_empty;
quit;
/*
create table WORK.RENAME_EMPTY( bufsize=65536 )
  (
   foo char(11),
   bar num,
   baz num
  );
*/

Github repository about-SAS, path: /programming/data-step/set/rename/empty.sas

datalines

data abc;
  infile datalines;
  input
    col_num
    col_txt $50.
  ;


datalines;
1 foo
2 bar
3 baz
4 MoreThanEightCharacters
;


data mult_2;
  set abc;
  col_num = col_num * 2;
run;

Github repository about-SAS, path: /programming/data-step/datalines.sas

by - first

data tq84_in;
  input
    txt $3.
    num 8.;

datalines;
foo 17
bar 9
bar 22
foo 86
baz 55
foo 6
bar 84
baz 21
bar 64
run;

proc sort data=tq84_in;
     by   txt;
run;

data tq84_out;
     set tq84_in;
     by  txt;

  /* Create a variable, named group_nr,
     that increases for each new group of txt */
     if first.txt then group_nr + 1;
run;

proc print data = tq84_out;
run;

Github repository about-SAS, path: /programming/data-step/by/first.sas

view

libname tq84_lib 'P:\ath\to\a\directory';

data tq84_lib.dat; /* Creates P:\ath\to\a\directory\dat.sas7bdat */
  infile datalines;
  input
    col_num
    col_txt $20.
  ;

datalines;
1 foo
2 bar
3 baz
;



/* Create view: */
data tq84_lib.vw /view=tq84_lib.vw; /* Creates P:\ath\to\a\directory\vw.sas7bvew */
  set tq84_lib.dat;

  col_num = col_num * 2;
run;

/* Modify view's underlying data set: */
proc sql;
  update tq84_lib.dat
     set col_num = col_num * 10,
         col_txt = catx(',', col_txt, col_txt);
quit;

/* Print changed data in view */
proc print data=tq84_lib.vw;
run;

Github repository about-SAS, path: /programming/data-step/view.sas