Practice 1

50 points

Author

Esteban Montenegro-Montenegro

Published

January 29, 2024

What is R programming language?

R is a programming language mostly used in statistics. It was actually created by programmers who were also statisticians. As Matloff (2011) mentions in his book; R was inspired by the statistical language S developed by At&T. S stands for “statistics” and it was written based on C language. After S was sold to a small company, S-plus was created with a graphical interface.

Why should we use R?

There are many reasons, but I’ll list just a few of them:

  • R is an open source language, you are always able to check what is behind the code. You could even create your own language based on R if you need it.

  • R is free and freely distributed. You don’t have to pay for anything. You just download the installer, and then you start playing with data!

  • R is superior and more powerful than many commercial software.

  • You can install R on multiple operated systems such as Windows, Mac, Linux, and Chrome OS (you need Linux behind scenes).

  • R is not only useful for data analysis, you can generate automatic reports in pdf, Word or create a webpage or dashboard to display your results. In fact, this document and my presentations were all created using R.

  • The R community is the biggest community of users in statistics. You can search on the Internet any problem, and you will find thousands of possible answers for free by thousands of users.

  • R has one of the largest repositories with 19000 free packages!

  • If you learn R you will feel more comfortable learning new scripting software or languages.

Let’s jump into R!

That was just a tiny explanation to introduce R. In this part, I’ll ask you to replicate some exercises. Don’t feel like navigating alone, I’ll create videos to show you how to solve them. Also, I’m going to assume that you already know how to install R and RStudio. If not, this link will help.

R is an object oriented language

Programming languages such as Python or R are languages that create objects. All the elements in R will be virtual objects. Check the following case:

names_of_people <- c("Karla", "Pedro", "Andrea", "Esteban") 

In this chunk of R code, I created an object called names_of_people. This object will have properties, similar to physical objects in real life.

Important

Notice the presence of the characters <-. This arrow assigns information to the object. You can press ALT + - in your keyboard to insert this arrow, Mac users should press CMD + -.

One of the properties of this type of object is the ability to print the contents in your console by typing its name and then run the code:

names_of_people
[1] "Karla"   "Pedro"   "Andrea"  "Esteban"

You can run the code pressing CTRL + Enter on Windows or CMD + Enter on Mac.

Excercise 1
  1. Now is your turn to create your own object. Copy my code an replace the names with names of countries. Then, call the object to print the content in the console. Copy your answer in a Word document or Google document. (8.33 points)

I hope that was easy to do, we are going to walk slowly when learning R. The next property of our object is sub-setting. You can take one or two elements inside your object and print only a few elements saved in your object:

names_of_people[1]
[1] "Karla"

Notice that I’m indicating that I want to print only the first element inside my object. I’m using square brackets [] to indicate I want a “slice” of my object. I can do the same and indicate I want to print two elements:

names_of_people[c(1,3)]
[1] "Karla"  "Andrea"

In this example I’m printing the elements in located in the first position and the third position.

Excercise 2
  1. Do the same with your object containing names of countries. Print only the first and the third element of your object. (8.33 points)

I haven’t told you what’s the name of this type of object. Similar to real life, objects can be classified into categories. In this case, this example is a “character vector”. In R when you use letters they should be wrapped with quotes. Also, by using the command c(), you are creating a vector. Do you remember the concept of vector in physics? This is something similar, vector could represent a vertical space or a horizontal space. In this case is just a horizontal vector with characters inside.

Vectors can also contain numbers, I’ll create a numeric vector containing the year of release of the main Star War movies

star_wars_years <- c(1999,2002,2005,1977,1980,1983,2015,2017,2019)

Nice! Now we have a vector with numbers, we can also print only a few elements if we need it:

star_wars_years[c(1,2,5,7)]
[1] 1999 2002 1980 2015

In this example I’m printing only the elements located in the positions 1,2,5, and 7.

Excercise 3
  1. Create a numeric vector with the years of Marvel movies corresponding to The Infinity Saga reported in this link CLICK HERE. After that, print only the elements located in positions 5,8, and 9. (8.33 points)

Operations on objects

Objects in R are elements that can be manipulated and transformed exactly like objects in real life. For instance pay attention to the following example:

math_score <- c(50,86,96,87)

english_score <- c(10,25,36,56)

english_score + math_score
[1]  60 111 132 143

In the code above, I created two vectors reflecting the academic scores of two four students. The first student had a score of 50 in math whereas the score in English was 10. You may have noticed that I sum both vectors, the final result reflects the result of adding the first math_score plus the first english_score, then R does the same with the other elements in the vectors.

Excercise 4
  1. Copy the code above, replace the + sign for a - (minus) sign. Then run the code, What happened when you did that ? (8.33 points)

We need to study more important objects

R has several types of objects. We will not study all of them because this is not a computer programming class (I wish!). Instead, I’ll introduce the most important objects to understand my assignments and code.

Data frames

Data frames are the most useful objects in this class, please read the information about data frame objects on this link.

Excercise 5
  1. Create a data frame object by copying the code below. Change the object’s name, you may named it “expenses”, then change the variable names in the example (e.g. variable1). Finally run the code. How many rows does this data frame have? How many columns does this data frame have? Can you tell what happened after running the function head() (8.33 points)
Example <- data.frame(variable1 = c(30,63,96),
               variable2 = c(63,25,45),
               variable3 = c(78,100,100),
               variable4 = c(56,89,88))

head(Example)

You probably noticed that objects can have any name, right? It doesn’t matter the human language.

Everything is an object… and everything is a function

It can be convenient to revisit this topic on this link. But, if you don’t have time, I’ll explain the concept of functions. A function is an object that performs an operation based on an input. The function will ask for input information and after that, the function will give an output.

Functions can be created with the command function() which is in simple words a function that creates other functions. Sounds redundant but it is an accurate statement!

For instance, we can create a function that calculates your age:

estimateAge <- function(myBirthday){

myBirthday2 <- as.Date(myBirthday)  
today <- Sys.Date() 

age <- difftime(today,
                myBirthday2, 
                units = "days")/365

message("Your age is"," ", age)
}
1
The argument is called “myBirthday” just type your date of birth (“Year-MM-Day)
2
The function difftime() does the magic for us.

The new function estimateAge() only needs one argument, and that is any date of birth. That’s the input information that will help the function to give you a output, in this case a message with your estimated age.

## Let's enter my date of birth
estimateAge("1986-01-28") 
Your age is 38.0602739726027

I hope you are feeling fine, if not please free to insert a meme expressing how you feel in your answers for 3 extra points.

You might be thinking: Wait a second! Do we have to create our own functions all the time? The answer is NO!. R already provides tons of functions already programmed an ready to be used. If the function you need is not available in base R, you can download a package and install the package in your computer.

Important

You should check more information about functions and packages CLICKING HERE

Excercise 6 (8.33 points)
  1. You probably went to the link I recommended before, if not go a read it here. After reading the explanation about packages install the package tidyverse then, call the package. You can copy the following code and paste the code in your RStudio session. Remember to install the package first:
## Installs the package tidyverse
install.packages("tidyverse") 
library(tidyverse)
1
The function summarise_all() comes from tidyverse package.
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.2     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.2     ✔ tibble    3.2.1
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
var1 <- rnorm(200)
var2 <- rnorm(200)
var3 <- rnorm(200)

data.frame(var1, 
           var2,
           var3) |>
  summarise_all(mean)
         var1         var2       var3
1 -0.08712924 -0.009195269 0.08312085

Copy your code and output in a Word document or Google document.

Check Practice 2 very soon for more R exercises!

References

Matloff, N. (2011). The art of r programming: A tour of statistical software design. No Starch Press.