Intro to R

Assignment


This page is adapted from Prof. Koenker's e-TA pages and his rather concise yet hilarious Yet Another R FAQ, or How I Learned to Stop Worrying and Love Computing.

#First steps in R
Having installed R the next step is learning the syntax of the language, this means learning the rules of it. After you open R GUI or R Studio you are going to see the R console, which displays the results of your analysis or any messages associated with your code that is entered in the command line (after the arrow “>”).

For example, we can use R as a calculator. You can type arithmetical expressions at the prompt (“>”):

    2 + 2
[1] 4

or

    log(1)
[1] 0

The [1] indicates that it is the first result from the command, and in this case the only one. You can also type something with multiple values for example a sequence of integers from 10 to 40:

    10:40
 [1] 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
[24] 33 34 35 36 37 38 39 40

The first line starts with the first return value, so is labeled [1]; the second line starts with the 24th, so is labeled [24].

    log(1)
[1] 0

To quit you session just type

    q()

##Scripting your work

Rather than saving the work space, it is highly recommended that you keep a record of the commands entered, so that we can reproduce it at a later date. The easiest way to do this is to enter commands in R'??s script editor, available from the File menu. Commands are executed by highlighting them and hitting Ctrl-R. At the end of a session, save the final script for a permanent record of your work. You can also use any text editor to do so.
In R Studio the script editor opens next to the console and the mechanics is the same. Commands are executed by highlighting them and hitting Ctrl-Enter.

A script is a text file that contains lines of R code that can be saved and use over and over again. This is the preferred method to save your work and guarantee reproducibility. To know more on reproducible research you should read Professor Koenker's Reproducibility in Econometrics Research webpage

A useful tip to keep in mind is that everything that is written after a # sign is assumed to be a comment and is ignored by R.

Assignment

R has a work space known as global enviroment where you can store your objects. For example, suppose we would like to store the calculation sqrt(2) for future use. To do this type:

    x <- sqrt(2)

Now x holds the result of such operation. To see this type

    x
[1] 1.414214

Now we can use x to do any operations. For example

    x+x
[1] 2.828427
    x*x
[1] 2

Class of an Object

All objects in R have a class, reported by the function class. For simple vectors this is just the
mode, for example numeric, logical, character or list, but matrix, array, facto and data.frame are other possible values.

A special attribute known as the class of the object is used to allow for an object-oriented
style 4 of programming in R. For example if an object has class data.frame, it will be printed
in a certain way, the plot() function will display it graphically in a certain way, and other
so-called generic functions such as summary() will react to it as an argument in a way sensitive
to its class.

#Vectors

You can also enter vectors. The c() function creates a vector. For example:

    weight <- c(65,45,67,78,56)

Creates a vector containing the numbers 65, 45, 67, 78 and 56, we can see the contented by typing

    weight
[1] 65 45 67 78 56

You can also check the length of the vector

   length(weight)
[1] 5

It is possible to do some arithmetic computations, for example multiply all elements by 3

    weight*3
[1] 195 135 201 234 168

or calculate a simple formula like

    height <- c(1.7,1.8,1.76,1.65,1.74)

    bmi <- weight/height^2

    bmi
[1] 22.49135 13.88889 21.62965 28.65014 18.49650

First that we created a new vector that contains heights, and then calculated the body mass index. Note that the division is done entry wise.

Matrices and Arrays

To arrange numbers in a matrix, we can use the matrix function

    x<-matrix(1:12,nrow=3, ncol=4)
    x
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

or we can create a sequence of numbers and assign dimensions to it

    x <- 1:12
    x
 [1]  1  2  3  4  5  6  7  8  9 10 11 12
    dim(x) <- c(3,4)
    x
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

Observe that either way R fills the matrix by column, not by row. We can modify this with the option
byrow

    x<-matrix(1:12,nrow=3, ncol=4)
    x
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

We can assign names to the rows. For example, we assign the three first letters

    rownames(x) <- LETTERS[1:3]
    x
  [,1] [,2] [,3] [,4]
A    1    4    7   10
B    2    5    8   11
C    3    6    9   12

Another useful operations are:

Operator or Function Description
A * B Element-wise multiplication
A %*% B Matrix multiplication
A %o% B Outer product. AB'
t(A) Transpose
diag(x) Creates diagonal matrix with elements of x in the principal diagonal
solve(A, b) Returns vector x in the equation b = Ax (i.e., A-1b)
solve(A) Inverse of A where A is a square matrix.
cbind(A,B,...) Combine matrices(vectors) horizontally. Returns a matrix.
rbind(A,B,...) Combine matrices(vectors) vertically. Returns a matrix.
rowMeans(A) Returns vector of row means.
rowSums(A) Returns vector of row sums.
colMeans(A) Returns vector of column means.
colSums(A) Returns vector of column means.

(taken from Quick-R)

Indexing

Individual elements of an array can be referenced by the name of the array followed by the subscripts in square brackets, and separated by commas. For example:

    x<-matrix(1:12,nrow=3,ncol=4)
    x[,1]
[1] 1 2 3

refers to the first column of x.

    x[1,]
[1]  1  4  7 10

and refers to the first row. If we type

    x[,1:2]
     [,1] [,2]
[1,]    1    4
[2,]    2    5
[3,]    3    6

we get the first two columns of x. But if we type

    x[,c(2,4)]
     [,1] [,2]
[1,]    4   10
[2,]    5   11
[3,]    6   12

we obtain the second and forth column of x. We can also subset using another vector, for example:

    weight[height>1.7]
[1] 45 67 56

gets those elements in weight that have a corresponding element in height bigger than 1.7

Lists and Data Frames

An R list is an object consisting of an ordered collection of objects known as its components.

There is no particular need for the components to be of the same mode or type, and, for
example, a list could consist of a numeric vector, a logical value, a matrix, a complex vector, a
character array, a function, and so on. Here is a simple example of how to make a list:

    Lst <- list(ta.name="Mauricio", professor="Eunyi", no.students=3,
    stud.ages=c(24,21,25))

Components are always numbered and may always be referred to as such. Thus if Lst is
the name of a list with four components, these may be individually referred to as Lst[[1]],
Lst[[2]],Lst[[3]] and so forth. If, further, Lst[[4]] is a vector subscripted array then
Lst[[4]][1] is its first entry

    Lst[[4]][1]
[1] 24

If Lst is a list, then the function length(Lst) gives the number of (top level) components
it has.

Components of lists may also be named. This is a very useful convention as it makes it easier to get the right component if you forget the number. For instance

    Lst$ta.name
[1] "Mauricio"

is the same as

    Lst[[1]]
[1] "Mauricio"

Finally, a data frame is a list with class data.frame. Data frames can easily be made using the function data.frame

    my.frame <- data.frame(name=c("Mau","Eunyi"),grade=c(59,99))
    my.frame
   name grade
1   Mau    59
2 Eunyi    99

Isn’t there more to R? Packages

The potential of R is driven by its large collection of packages (10,197 available packages on CRAN as of March 2017). A package is a collection of R software that augments in some way the basic functionality of R, that is it is a way of going “beyond R.” For instance, quantreg package is a collection of functions to do quantile regression. Try downloading quantreg using

    install.packages("quantreg")
TRUE also installing the dependencies 'SparseM', 'MatrixModels'
TRUE 
TRUE The downloaded binary packages are in
TRUE    /var/folders/93/cz86n55x0qq2_m040089scdc0000gn/T//RtmpYb8ruW/downloaded_packages

Downloading and installing a package isn’t enough, you need to tell R that you would like to use it, for this you can either type:

    require(quantreg)
TRUE Loading required package: quantreg
TRUE Loading required package: SparseM
TRUE 
TRUE Attaching package: 'SparseM'
TRUE The following object is masked from 'package:base':
TRUE 
TRUE     backsolve

Manuals and Illustrations

Needless to say, there are tons on manuals and illustration for for R functions, as well as a strong and large community of users. The left side of the CRAN website has links to manuals , FAQs and contributed
documentation.

A strength of R is the fact that most of the documentation files for R functions have example code that can be easily executed. Thus, for example if you would like to see an example of how to use the command rq in the quantreg package you can type example(rq) and you will see some examples of its use. Similarly, many packages have demo files that act as auxiliary documentation. To see what demos are available for currently loaded packages, just try demos()

Finally, many packages have vignettes, short overviews of various aspects of the functionality of the
package usually with explicit examples of how to do things. For example, the quantreg package has three vignettes: one basic, one about survival modeling, and one about additive nonparametric models. Vignettes can be accessed from R by simply typing vignette("vname"). For instance, to acces the quantreg vignette, just type

    vignette("rq")

The names of the various package vignettes can be found by typing vignette().

Community

At some point in your work you're going to have questions, but R is extremely well supported. If you have a question you just can google it, post it to StackOverflow, Cross Validated or use R-blogger. If you are not convinced yet, just can type “why use the R language”" in Google and I think the results will speak by themselves.

I encourage you to visit these outlets. You can learn quite a bit by surfing these forums and reading the questions & answers of others. Just one final thought on etiquette: when posting to the forums you can save yourself from being flamed by following a few simple rules. If you need help with code, post the code and the output. Also, given the geeky topic, this is not the place for open ended “which do you like best…” type posts. Stay on topic.