**Data Step**

Data <new name>;

Retain <variables>; can be used to set the order of variables in the table (must come before the set statement)

Set <libref.file>; libref not required if the file is in the working library

Length <variable><$> W or <variable> w.d; where W=number of print places for character variable or for numerical W=bites and .d = decimal places

Assign new variables; type new variable name and then define

Label;

Format;

Where; create a data subset by limiting the table to a particular level of a variable

Drop <variables>; if the drop statement is used within set statement then variables are not read into the PDV (drop= ) and are thus not available for processing

Keep <variables>; limits the variables that are saved to the new dataset

If condition;

run;

Multiple data sets an be created in one data step, which is best done with a select function:

**Select (<variable>);**

**When ('level1') output <data1>;**

**When ('level2') output <data2>;**

**Otherwise; **optional to otherwise output other

**end;**

To avoid errors due to capitalisation, use the **upcase** or **lowcase** function, e.g., **select (lowcase(<variable>))**

To collapse levels of a variable, e.g, to collapse value 3 into 2

**<variable2> = <variable1>;**

**if variable1=3 then variable2=2;**

Converting character to numerical variables:

**numvar = INPUT(charvar, best32.);**

or

**<variable> = <variable> + 0**

Conditional Processing

**If <argument>; **restricts data set to the condition

Use **else if** for mutually exclusive statements (more efficient). Only 1 executable statement is allowed per IF-THEN statement; to perform multiple arguments then use IF-DO statement:

**If <argument> then do; argument 1; Argument 2; argument 3 …; end;**

**If …. then do; arguments…..; end;**

**Else do….; end; run;**

Inclusive range for IF statement can be achieved with:** 5 le <variable> le 7;** gives observations between 5-7 inclusive

** **

Numerical operators

Mean | Arithmetic mean |

Sum | Sum or arguments |

Var | Variance |

Sin | Sine |

Log | Natural log |

SQRT | Square root |

ABS | Absolute value |

Logical operators

Modify where expressions:** not / and / or**

Eg **where city not in('London','Rome','Paris')** = city is not London, Rome or Paris

Comparison operators

Equal to | eq | = |

Not equal to | ne | |

Less than | lt | < |

Greater than or equal | ge | >= |

Less than or equal | le | <= |

Equal to one of a list | in |

Special where operators (can only be used for where statements)

Operator | Definition | Char | num |

Contains | Includes substring | x | |

Between and | Inclusive range | x | x |

Is null | A missing value | x | x |

Is missing | A missing value | x | x |

Like | Matches a pattern | x | |

Where same and | Augments original condition | x | x |