Licence
Confirm expiry date using proc setinit; run;
Details are written to the log
To update your SAS licence run the license tool from the SAS windows start menu
ASSIGN LIBRARY
Libname <libref> <path>;
Libref max 8 characters, either letters, underscore, or numbers but cannot start with number; no characters.
It is good to start a programme with path macro: %let path=<folder path>;
To invoke the macro: libname <libref> "&path";
MACRO VARIABLES REQUIRE DOUBLE QUOTES
To change a libref: libname <libref> clear;
DATA STEP
Data <new name>;
Retain <variables>; can be used to set order of variables (must come before set statement)
Set <libref.file>;
Length <variable><$> W or <variable> w.d; where W=number of print places for character variable or for numerical W=bites and .d = decimal places
Assign new variables ; type new variable name and then define
Label;
Format
Length;
Where; create a data subset by limiting to a particular level of a variable
Drop <variables>; if drop statement is used within set statement then variables are not read into the PDV (drop= ) and are not available for processing, otherwise they can be read in any statement from pdv
Keep <variables>; limits the variables that are saved to the new dataset
If condition;
Multiple data sets an be created in one data step, which is best done with a select function:
Select (<variable>);
When ('level1') output data1;
When ('level2') output data2;
Otherwise; optional to otherwise output other
End;
To avoid errors due to capitalisation, the upcase function can be inserted eg select (upcase(<variable>))
Do-end can be added to a select group to execute multiple statements
When ('level1') do;
Newvariable=1; to new data set 'newvariable' is created with value 1
Output data1;
End;
To collapse levels of a variable, e.g, if few observations in one cateory:
Define new variable to = old variable; follow with if statement
<new_variable> = <old_variable>;
if old_variable=3 then new_variable=2; this collapses 3 into 2.
Convert character to numerical variable:
numvar = INPUT(charvar, best32.);
comp_score = comp_score + 0
PROC SORT
Used to create sorted report or merge two data sets.
To avoid permanently writing to data set use out option in proc statement:
proc sort data=< . > out=<name temporary data set>;
by descending <variable> or ascending <variable> ; list variables in order for sorting
PROC CONTENTS
Used to show descriptor portion of data set:
proc contents data=< . > <options>;
Can also be used to display contents of a library:
Proc contents data=<libref>._all_ <options>;
Options
Varnum | List variables in numerical order, otherwise listed as ascending |
Position | Lists the variables in their position in the data set |
Short | List the variables in a row by row format; useful to cut and paste into editor |
Out= | Specify name for output data set |
Nods | Useful when displaying contents of library; prints only the names of the datasets in a library |
Directory | Prints full contents of every dataset in a library |
PROC PRINT
Creates reports
Proc print data=< . > <options>;
Var <variables>; prints only listed variables
Sum <variable>; sums numerical variables
ID <numeric ID variable>; replaces the observation number with custom ID
Format <variable><$> <format command >; invokes a predefined format
Where;
Options
Noobs | Removes the observation number column; useful if printing with custom ID |
Split=' ' | Invokes labels defined in data step |
FORMATTING
Global formatting
Title<n> "text of title"; where n=line
This is a global statement; to turn off insert title statement with no text;
Footnote uses same syntax as title
A programme can have up to 10 title rows and 10 footnotes rows in use at any one time
To display current date & time use the following macro statements at beginning of programme:
%let timenow=%sysfunc (time(), time.)
%let datenow=%sysfunc (date(), date9.)
Insert &timenow &datenow in title or footnote statement
Label statement
Used in data step to rename a variable
Label <variable>=' ' <variable>=' ' ;
Using the split=' ' option to invoke the label in a proc print step.
Formats
Invoke a SAS or user defined format in a data step or proc print step with the format statement
Format <variable> <format name>.; format name must include $ prefix if the variable is a character variable, the dot after the format name allows SAS to recognise the format
Format statements have the form: <variable><$> format<w>.<d> {where $ = character variable, w=width, d=decimal}
User defined formats can be defined using proc format:
Proc format;
Value <$><format name> 'AU'='Australia' 'US'='United States' other='miscoded'; character format name must begin with $.
Proc format;
Value <format name> low - 499='low'
500 - 1000='medium'
1000 - high='high';
SAS formats
Comma<w.d> | Writes numeric data with commas and decimals |
Dollar<w.d> | Writes numeric data with $ commas and decimals |
Mmddyy10. | Converts SAS date to 12/31/1990 |
Ddmmyy10. | Converts SAS date to 31/12/1990 |
Ddmmyy8. | Converts SAS date to 31/12/90 |
Date7. | Converts SAS date to 31dec90 |
Date9. | Converts SAS date to 31dec19990 |
Year4. | Converts SAS date to year |
Monyy7. | Converts SAS date to dec1990 |
Date11 | Converts SAS date to 31-12-1990 |
Length
If reassigning variable names SAS truncates the character limit to length of first level specified
To change this use a length statement, which must come before the IF./THEN statements : length <variable> <$> w.d;
CONDITIONAL PROCESSING
IF <argument> {restricts data set to the condition; IF cannot be used in a procedure - used WHERE}
IF-THEN/ELSE
Use ELSE IF for mutually exclusive statements (more efficient)
Only 1 executable statement is allowed per IF-THEN statement; to perform multiple arguments then use IF-DO statement:
If <argument> then do; argument 1; Argument 2; argument …; end;
DO / END
If …. Then do; arguments…..; end;
Else do….; end; run;
IN operator can be used to pick up equivalent character variables eg if x in('y', 'Y') then y='yes' means that y=yes if x is y or Y
Numerical operators
Mean | Arithmetic mean |
Sum | Sum or arguments |
Var | Variance |
Sin | Sine |
Log | Natural log |
SQRT | Square root |
ABS | Absolute value |
Logical operators
Modify where expressions: not / and / or
Eg where city not in ('London', 'Rome','Paris') = city is not London, Rome or Paris
Comparison operators
Equal to | eq | = |
Not equal to | ne |
|
Less than | lt | < |
Greater than or equal | ge | >= |
Less than or equal | le | <= |
Equal to one of a list | in |
|
Eg where country in ('AU','US') = country either AU or US / where order_type in (1,2,3) = order type 1,2 or 3
NB inclusive range for IF statement can be achieved with: 5 le <variable> le 7 {gives observations between 5-7 inclusive}
Where statements
Used in data and proc print steps; can only be used once in a step and only with permanent variables.
Special where operators (can only be used for where)
Operator | Definition | Char | num |
Contains | Includes substring | x |
|
Between and | Inclusive range | x | x |
Is null | A missing value | x | x |
Is missing | A missing value | x | x |
Like | Matches a pattern | x |
|
Where same and | Augments original condition | x | x |
? Can be used instead of contains
Like operator: '%N' = includes anything with N; 'T_m' = any word that begins with capital T and ends with small m;
FUNCTIONS
Dates & Time
SAS dates are calculated as days from 1Jan 1960
SAS times are calculated in seconds from midnight on a given day
Date functions
Month(date) | Extracts month of a SAS date |
Year(date) | Extracts year of a SAS date |
Qtr(date) | Extracts quarter of SAS date |
Weekdays(date) | Extracts day of week of SAS date, where 1=Sunday |
Today () | Returns current date as SAS Date |
Mdy(month, day, year) | Creates a SAS date value from numeric month, day, year |
Arithmetic operators
Symbol | Definition | priority |
** | Exponentiation | 1 |
multiply | 2 | |
/ | Division | 2 |
+ | Addition | 3 |
subtraction | 3 |
Misc operators
Symbol | Definition |
max (var1, var2… ) | use largest value from either variable |
min (var1, var2… ) | use min value from either variable |
sum (var1, var2… ) | use sum of variables |
**NB: if one of the variables has a missing value, arithmetic + will return a missing value but sum will return the value of the second variable**
Data cleaning functions
Upcase | Returns upper case | Upcase(<variable>) |
CLEANING
Use upcase or lowcase function to convert all levels of a variable to single case style
Eg country=upcase(country) makes all countries names uppercase
In function can be used to combine incorrect variable levels
Eg if sex in('male', 'M', 'Male') then sex='m'
OUTPUT
Create a data file, R click on the file in SAS explorer tab and view in excel.
To change output display select tools>options>preferences>results
To create reports: ods pdf file="<path>filename.ext " style=<style definition>; <sas code to generate report>;run; ods pdf close; {if more than one ODS statement used then use ods _all_ close}
Common styles: printer, journal, ocean, rtf, sasweb
Destination | Extension | Viewed in |
Listing |
| SAS output window |
Html | Html | Web browser |
Adode | ||
Rft | Rft | Word |
if missing(age) then adult=.;
if age > 18 and age < 40 then agegroup = 1;
if 18 < age < 40 then agegroup = 1;
* (multiply), and ** (exponentiate
The array is a shorthand way of referring to a group of variables. In effect,
it provides aliases for them so that each variable can be referred to by using
the name of the array and its position within the array in braces. For example,
q12 could be referred to as qall{12} or, when the variable i has the
value 12, as qall{i}. However, the array only lasts for the duration of the
data step in which it is defined.
Suppose we have 20 variables,
q1 to q20, for which ‘not applicable’ has been coded –1 and we wish to set
those to missing values, we might do it as follows:
array qall {20} q1-q20;
do i= 1 to 20;
if qall{i}=-1 then qall{i}=.;
end;
It may be necessary to delete observations from the data set—for example, if
they contain errors. Deleting erroneous observations is best done by using
the if then statement with the delete statement:
if weightloss > startweight then delete;
In a case like this, it would also be useful to write out a message giving more
information about the observation that contains the error:
if weightloss > startweight then do;
put 'Error in weight data' idno= startweight= weightloss=;
delete;
end;
The put statement writes text (in quotes) and the values of variables to
the log.