Licence update

Assign library

Data Step 

 

PROC SORT

Used to create sorted report or merge two data sets.

To avoid permanently writing to data set use out option in proc statement:

proc sort data=< . > out=<name temporary data set>;

by descending <variable> or ascending <variable> ; list variables in order for sorting

 

PROC CONTENTS

Used to show descriptor portion of data set:

proc contents data=< . > <options>;

 

Can also be used to display contents of a library:

Proc contents data=<libref>._all_ <options>;

 

Options

 

Varnum

List   variables in numerical order, otherwise listed as ascending

Position

Lists the   variables in their position in the data set

Short

List the   variables in a row by row format; useful to cut and paste into editor

Out=

Specify   name for output data set

Nods

Useful   when displaying contents of library; prints only the names of the datasets in   a library

Directory

Prints   full contents of every dataset in a library

 

 

PROC PRINT

Creates reports

Proc print data=< . > <options>;

Var <variables>; prints only listed variables

Sum <variable>; sums numerical variables

ID <numeric ID variable>; replaces the observation number with custom ID

Format <variable><$> <format command >; invokes a predefined format

Where;

 

Options

 

Noobs

Removes   the observation number column; useful if printing with custom ID

Split=' '

Invokes   labels defined in data step

 

 

FORMATTING

Global formatting

Title<n> "text of title";  where n=line

This is a global statement; to turn off insert title statement with no text;

Footnote uses same syntax as title

A programme can have up to 10 title rows and 10 footnotes rows in use at any one time

 

To display current date & time use the following macro statements at beginning of programme:

%let timenow=%sysfunc (time(), time.)

%let datenow=%sysfunc (date(), date9.)

Insert &timenow &datenow in title or footnote statement

 

Label statement

Used in data step to rename a variable

Label <variable>=' '  <variable>=' ' ;

Using the split=' ' option to invoke the label in a proc print step.

 

Formats

Invoke a SAS or user defined format in a data step or proc print step with the format statement

Format  <variable> <format name>.; format name must include $ prefix if the variable is a character variable, the dot after the format name allows SAS to recognise the format

 

Format statements have the form: <variable><$> format<w>.<d> {where $ = character variable, w=width, d=decimal}

 

User defined formats can be defined using proc format:

Proc format;

Value <$><format name> 'AU'='Australia' 'US'='United States' other='miscoded'; character format name must begin with $.

Proc format;

Value <format name>   low - 499='low'

 500 - 1000='medium'

1000 - high='high';

SAS formats

 

Comma<w.d>

Writes   numeric data with commas and decimals

Dollar<w.d>

Writes   numeric data with $ commas and decimals

Mmddyy10.

Converts   SAS date to 12/31/1990

Ddmmyy10.

Converts   SAS date to 31/12/1990

Ddmmyy8.

Converts   SAS date to 31/12/90

Date7.

Converts   SAS date to 31dec90

Date9.

Converts   SAS date to 31dec19990

Year4.

Converts   SAS date to year

Monyy7.

Converts   SAS date to dec1990

Date11

Converts   SAS date to 31-12-1990

 

 

Length

If reassigning variable names SAS truncates the character limit to length of first level specified

To change this use a length statement, which must come before the IF./THEN statements : length <variable> <$> w.d;

 

 

CONDITIONAL PROCESSING

IF <argument> {restricts data set to the condition; IF cannot be used in a procedure - used WHERE}

 

IF-THEN/ELSE

Use ELSE IF for mutually exclusive statements (more efficient)

Only 1 executable statement is allowed per IF-THEN statement; to perform multiple arguments then use IF-DO statement:

If <argument> then do; argument 1; Argument 2; argument …; end;

 

DO / END

If …. Then do; arguments…..; end;

Else do….; end; run;

 

 

IN operator  can be used to pick up equivalent character variables eg if x in('y', 'Y') then y='yes' means that y=yes if x is y or Y

 

Numerical operators

 

Mean

Arithmetic   mean

Sum

Sum or   arguments

Var

Variance

Sin

Sine

Log

Natural   log

SQRT

Square   root

ABS

Absolute   value

 

 

Logical  operators

Modify where expressions: not / and  / or

Eg where city not in ('London', 'Rome','Paris') = city is not London, Rome or Paris

 

Comparison operators

 

Equal to

eq

=

Not equal   to

ne

 

Less than

lt

<

Greater   than or equal

ge

>=

Less than   or equal

le

<=

Equal to   one of a list

in

 

 

Eg where country in ('AU','US') = country either AU or US / where order_type in (1,2,3)  = order type 1,2 or 3

NB inclusive range for IF statement can be achieved  with: 5 le <variable> le 7 {gives observations between 5-7 inclusive}

 

Where statements

Used in data and proc print steps; can only be used once in a step and only with permanent variables.

 

Special where operators (can only be used for where)

 

Operator

Definition

Char

num

Contains

Includes   substring

x

 

Between   and

Inclusive   range

x

x

Is null

A missing   value

x

x

Is   missing

A missing   value

x

x

Like

Matches a   pattern

x

 

Where   same and

Augments   original condition

x

x

 

? Can be used instead of contains

Like operator: '%N' = includes anything with N; 'T_m' = any word that begins with capital T and ends with small m;

 

 

FUNCTIONS

Dates & Time

SAS dates are calculated as days from 1Jan 1960

SAS times are calculated in seconds from midnight on a given day

 

Date functions

 

Month(date)

Extracts   month of a SAS date

Year(date)

Extracts   year of a SAS date

Qtr(date)

Extracts   quarter of SAS date

Weekdays(date)

Extracts   day of week of SAS date, where 1=Sunday

Today ()

Returns   current date as SAS Date

Mdy(month,   day, year)

Creates a   SAS date value from numeric month, day, year

 

 

Arithmetic operators

 

Symbol

Definition

priority

**

Exponentiation

1

      
         
    •  
    •   

multiply

2

/

Division

2

+

Addition

3

      
         
    •  
    •   

subtraction

3

 

 

Misc operators

 

Symbol

Definition

max   (var1, var2… )

use   largest value from either variable

min   (var1, var2… )

use min   value from either variable

sum   (var1, var2… )

use sum   of variables

 

**NB: if one of the variables has a missing value, arithmetic + will return a missing value but sum will return the value of the second variable**

 

Data cleaning functions

 

Upcase

Returns   upper case

Upcase(<variable>)

 

 

CLEANING

Use upcase or lowcase function to convert all levels of a variable to single case style

Eg country=upcase(country) makes all countries names uppercase

 

In function can be used to combine incorrect variable levels

Eg if sex in('male', 'M', 'Male') then sex='m'

 

OUTPUT

Create a data file, R click on the file in SAS explorer tab and view in excel.

 

To change output display select tools>options>preferences>results

To create reports: ods pdf file="<path>filename.ext " style=<style definition>; <sas code to generate report>;run; ods pdf close; {if more than one ODS statement used then use ods _all_ close} 

Common styles: printer, journal, ocean, rtf, sasweb

 

Destination

Extension

Viewed in

Listing

 

SAS   output window

Html

Html

Web   browser

Pdf

Pdf

Adode

Rft

Rft

Word

 

 

 

if missing(age) then adult=.;

if age > 18 and age < 40 then agegroup = 1;

if 18 < age < 40 then agegroup = 1;

* (multiply), and ** (exponentiate

The array is a shorthand way of referring to a group of variables. In effect,

it provides aliases for them so that each variable can be referred to by using

the name of the array and its position within the array in braces. For example,

q12 could be referred to as qall{12} or, when the variable i has the

value 12, as qall{i}. However, the array only lasts for the duration of the

data step in which it is defined.

 

Suppose we have 20 variables,

q1 to q20, for which ‘not applicable’ has been coded –1 and we wish to set

those to missing values, we might do it as follows:

array qall {20} q1-q20;

do i= 1 to 20;

if qall{i}=-1 then qall{i}=.;

end;

 

It may be necessary to delete observations from the data set—for example, if

they contain errors. Deleting erroneous observations is best done by using

the if then statement with the delete statement:

if weightloss > startweight then delete;

In a case like this, it would also be useful to write out a message giving more

information about the observation that contains the error:

if weightloss > startweight then do;

put 'Error in weight data' idno= startweight= weightloss=;

delete;

end;

The put statement writes text (in quotes) and the values of variables to

the log.