Search notes:

SAS: data step

A data step is a set of instructions (such as statements, functions and call routines) on how a data set is to be built. These instructuions are evaluated by the data step compiler.
A data steps starts with data and optional data set names.
The data step is used to perform data manipulation: an important purpose of the data step is to provide a means of reading external data and creating data sets for later use in procedures.
The data step language looks a bit like PL1.
Names to avoid for data sets: _NULL_, _DATA_, _LAST_.

Implied Loop of a Data Step

ILDS (Implied Loop of a Data Step): The data step performs an (implied) loop: for each observation, a set of statements is executed.
The automatically created variable _n_ stores the iteration number of the loop.

Writing Output

By default, on the last line of execution on the step, data is automatically written to a data set.
This can be overridden with an OUTPUT statement somewhere in the data step.
A DELETE statement returns to the start of the loop without writing any output.
If the data step starts with data _null_, no data set is created.

Default naming of output data sets

If the output data set is not explicitly named, SAS will name them DATA1, DATA2, DATA3

Input buffer

If the data step's data source is a text file (rather than a data set), the data is first read (linewise?) into the input buffer.
Thus, the input buffer contains raw (or unparsed) data from the text file.
The data in this buffer is then used to create the elements in the PDV.
See also the infile statement.

Program Data Vector (PDV)

The PDV is an in-memory storage area for all named variables of a data step.
The PDV gets its data either from the input buffer, from a data set or from creating a variable.
Additionally, there are the two variables _n_ and _error_ which are automatically generated for all data steps.
When the processing of an obesrvation finishes, the values within the PDV are written to the output destination (except for the automatic variables and the variables marked with drop).

Compilation and execution

A data step is processed in two phases: first the compilation phase and then the execution phase.
The compilation phase checks for the syntactical correctness of the code and then creates the PDV and the decriptor portion of the output data set.
The input data is then read during the execution phase.

Renaming a variable

A variable (column name) can be renamed with the rename option:
data orig;
  col_1 = 'hello';
  col_2 =  42;
  col_3 = 'world';
run;

data changed;
     set orig (
         rename=(
            col_1 = col_one
            col_3 = col_three
         )
     );
run;

proc sql;
  describe table changed;
quit;
/*
create table WORK.CHANGED( bufsize=65536 )
  (
   col_one char(5),
   col_2 num,
   col_three char(5)
  );
*/
Github repository about-SAS, path: /programming/data-step/set/rename/variables.sas
The rename option can be empty, in which case no variable is renamed
data orig;

  foo = 'hello world';
  bar =  42;
  baz =  999;
  
run;

data rename_empty;
  set orig (rename=());
run;

proc sql;
  describe table rename_empty;
quit;
/*
create table WORK.RENAME_EMPTY( bufsize=65536 )
  (
   foo char(11),
   bar num,
   baz num
  );
*/
Github repository about-SAS, path: /programming/data-step/set/rename/empty.sas

datalines

data abc;
  infile datalines;
  input
    col_num
    col_txt $50.
  ;


datalines;
1 foo
2 bar
3 baz
4 MoreThanEightCharacters
;


data mult_2;
  set abc;
  col_num = col_num * 2;
run;
Github repository about-SAS, path: /programming/data-step/datalines.sas

by - first

data tq84_in;
  input
    txt $3.
    num 8.;

datalines;
foo 17
bar 9
bar 22
foo 86
baz 55
foo 6
bar 84
baz 21
bar 64
run;

proc sort data=tq84_in;
     by   txt;
run;

data tq84_out;
     set tq84_in;
     by  txt;

  /* Create a variable, named group_nr,
     that increases for each new group of txt */
     if first.txt then group_nr + 1;
run;

proc print data = tq84_out;
run;
Github repository about-SAS, path: /programming/data-step/by/first.sas

view

libname tq84_lib 'P:\ath\to\a\directory';

data tq84_lib.dat; /* Creates P:\ath\to\a\directory\dat.sas7bdat */
  infile datalines;
  input
    col_num
    col_txt $20.
  ;

datalines;
1 foo
2 bar
3 baz
;



/* Create view: */
data tq84_lib.vw /view=tq84_lib.vw; /* Creates P:\ath\to\a\directory\vw.sas7bvew */
  set tq84_lib.dat;

  col_num = col_num * 2;
run;

/* Modify view's underlying data set: */
proc sql;
  update tq84_lib.dat
     set col_num = col_num * 10,
         col_txt = catx(',', col_txt, col_txt);
quit;

/* Print changed data in view */
proc print data=tq84_lib.vw;
run;
Github repository about-SAS, path: /programming/data-step/view.sas

See also

merge, update and merge join in data step
modifying a data set.
SAS: Interaction between macros and data steps with execute, symget and symput
data step: set
SAS Programming Language
procedural step
Specifying an Oracle schema in a data step set statement
Convert between data types
Accessing files in a data step.
data step boundaries
automatic macro variables

Index