SAS Data Sets: It is a data arranged in a file, which can be understand by the SAS software.

SAS Libraries: Data stored in it is permanent.

SAS Program: You will be working on your data using SAS program.

SAS Data Set:

SAS Files:

SAS Data library:

 

adi.mp3

 

http://hadoopexam.com/testing.mp3

 

Sample SAS Program:

data sasuser.hadoopexam_subset;

            set sasuser.hadoopexam;

                    where coursename=’SASBASE’

run;

proc print data=sasuser.hadoopexam_subset;

run;

 

There are mainly two steps: Data and Proc step.

Data Step: Create or modify or create new data. You will be adding your data into the SAS data set, do the calculation on it. In above example we are creating new observations which are having all the records, where course name is SASBASE. Hence, getting subset (hadoopexam_subset) from the full dataset (hadoopexam).

PORC step: Also known as procedure step. It will call routines that will be already written to process the data from your dataset. In the procedure step you can create graphs, charts, summary statistics etc. out of your data.

SAS statement: It always starts with a pre-defined keywords and ends with a semicolon. So in above program you can see there are 6 statements.

Important Points: Blank or special characters separate the words in a SAS statement, in a single line you can have multiple statement or single statement can be across multiple lines. If your test in quotation that they can be case sensitive.

SAS step: When a program finds DATA or PROC statement, it will create a new SAS step. And whenever it finds the Quit or RUN statement, it will be end to the step. So in above program you can see there are two steps both are ended with the RUN statement.

RUN statement: It is not always required between the statements. It good to use while doing development.

Log Messages: For each step, logs will be printed. Data step in our sample program creates a new dataset (sasuser.hadoopexam_subset) and produces log message. However, it does not create report or any other outpu.

Procedure output:

data sasuser.hadoopexam_subset;

            set sasuser.hadoopexam;

                    where coursename=’SASBASE’

run;

proc print data=sasuser.hadoopexam_subset;

run;

 

In the above program there is a procedure step proc print data, will generate report as well in HTML format.

Report Window: There are some SAS programs, which can open a report window and you can modify data in it as well. It is possible using report procedure as below.

proc report data=sasuser.admit;   

      columns id name sex age actlevel;   

run;

 

Tabulate Procedure:

proc tabulate data=sasuser.admit;

   class sex;

   var height weight;

   table sex*(height weight),mean;

run;

 

Only Log messages: There are some programs which produce only log messages.

proc copy in=sasuser out=work;     select admit;  run;

 

 

SAS Libraries

It is about how SAS data sets and SAS files are organized and stored. Each SAS file is stored in SAS Library. Hence, all the files in the same folder are typically a library. Only files which has SAS extension in the SAS library are considered as a part of SAS library else will be not.

SAS Data Library:  It is the top level to organizing the files in SAS.

Permanent or temporary SAS Data library: It depends on the name of the file.

Temporary Data Library:  Will be removed as soon as you close the session. If you have not specified name to the file or given name as a work. Then it will be stored in temporary library and all of its files are deleted.

Permanent Data Library: If you provide the name to the library and it should not be work. Then file will be stored permanently. However, you can explicitly delete it whenever you want.

 

Referencing SAS files:

Permanent Files: Whenever you want to access the data stored in the SAS library, you have to use two levels referencing as below.

(name of the data library).(name of the file or data set)

 

libname adi 'c:\Users\Name\sasuser';data clinic.admit2;   set clinic.admit;   weight =round(weight);run;

 

In the above example, libname statement is used to define libref, Clinic.  

Temporary Files: You will access this similar to permanent. Only difference you have to use temporary library reference. Or only use the filename, because default one is always assumed to be temporary.

Work.Test

 

Naming Convention:

  • Library reference name can only be 8 character.
  • SAS Data set and Variable names:
    • Can be 1 to 32 character long.
    • Can begin with character or _
    • Can have anything number, character or underscore.
      • HadoopExam
      • _HadoopExam
      • _HadoopExam_SAS9

 

 

SAS Data Set

To work with the data using SAS procedure data must be in the SAS dataset format.  SAS dataset file contains mainly two parts

  • Descriptor Portion: It contains the information about the data set itself.
    • Name of the data set, date and time when it is created, total number of observation, and total number of variables.
  • Data Portion: Actual data which is arranged in a tabular format.

 

Observations: Also known as rows, in the dataset. Which is generally related to a single object. As such there is no limit on the number of observation stored in SAS dataset.

Variable: Also known as columns, it’s a characteristics of the dataset.

Type: A Numeric variable type can be either character or numeric.

  1. Character:
    1. Can contain any values.
    2. Missing value represented as a blank.
  • And it can be 32767 bytes long. And will take only the required space.
  1. Numeric:
    1. can contain 0 to 9, +, -, . , and E (scientific notation).
    2. Missing value represented as a period.
  • They are by default stored in 8 byte, it does not matter how big or small digit. It is always stored as 8 bytes floating point numbers.

Variable Format: You can store the values in the format you want. For example value 1000 can be shown as a $1000.00, using format DOLLAR8.2 and COMMA8.2, it defines total width of the variable e.g. width of 8 including 2 decimal points.

While defining variable in dataset, you can permanently assign format to a variable and in the PROC you can assign as a temporary.

Informat: It helps you to define how data values are read into a SAS data set. You must use informats to read numeric values that contain letters or other special characters. For example, the numeric value $12,345.00 contains two special characters, a dollar sign ($) and a comma (,). You can use an informat to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value.

Label: It can be 256 character long. It will give more descriptive name to the variables. Hence, you can use for both to shorten and increase the length of a variable display name.

 

User Interface or Workspace:

  • In SAS environment, if menu is not displayed the type pmenus in command line to display it.

Various SAS Library: SAS Provides various library by default.

  • Sashelp: This is a permanent, read only library contains the sample data and other files which controls how SAS work at your site.
  • Sasuser: a permanent library that contains SAS files in the Profile catalog that store your personal settings. This is also a convenient place to store your own files.
  • Work: A temporary library for files that do not need to be saved from session to session.
  • Deleting SAS Library: When you delete a SASlibrary, the pointer to the library is deleted, and SAS no longer has access to the library. However, the contents of the library still exist in your operating environment.
  • Defining SAS Library: You can define SAS libraries using programming statements. Specifying Engines shows you how to write LIBNAME statements to define SAS libraries. Each engine enables you to read a different file format, including file formats from other software vendors.
  • External File: You can also create a file shortcut to an external file. An external file is a file that is created and maintained in your host operating environment. External files contain data or text, such as
    • SAS programming statements
    • records of raw data
    • procedure output.
    • External files are not managed by SAS. A s soon as you delete shortcut would be deleted but not actual file.
  • When SAS encounters a subsequent DATA, PROC, or RUN statement, SAS executes the previous step in the program.
  • Some SAS datasets also contain one or more indexes, which enables SAS to locate records in the data set more efficiently.