Flow structures: if-else and loops. Functions in R

Practical workbooks of Data Programming in Master in Computational Social Science (2024-2025)

Author

Javier Álvarez Liébana

1 Flow structures

A flow or control structure consists of a series of commands oriented to decide the path that your code must follow

  • If condition A is met, what happens?

  • What if B happens?

  • How can I repeat the same expression (depending on a variable)?

If you have programmed before, you may be familiar with what are known as conditional structures such as if (bla bla) {...} else {...} or loops for/while (to be avoided whenever possible).

1.1 Conditional structures

1.1.1 If

One of the most famous control structures are those known as conditional structures if.

IF a set of conditions is met (TRUE), then execute whatever is inside the curly brackets.

For example, the structure if (x == 1) { code A } what it will do is execute code A in braces but ONLY IF the condition in brackets is true (only if x is 1). In any other case, it will do nothing

Let’s define a vector of ages of 8 people

ages <- c(14, 17, 24, 56, 31, 20, 87, 73)
ages < 18
[1]  TRUE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE

Our conditional structure will do the following: if there is a minor, it will print a message.

if (any(ages < 18)) { 
  
  print("There is a minor")
  
}
[1] "There is a minor"
if (any(ages < 18)) { 
  
  print("There is a minor")
  
}

In case the conditions are not true inside if() (FALSE), nothing happens.

if (all(ages >= 18)) { 
  
  print("All of them are of legal age")
  
}

We get no message because the condition all(ages >= 18) is not TRUE, so it does not execute anything.

1.1.2 If-else

The structure if (condition) { code A } can be combined with an else { code B }: when the condition is not checked, it will [execute the alternative code B]{. hl-yellow} inside else { }, allowing us to decide what happens when it is satisfied and when it is not

For example, if (x == 1) { code A } else { code B } will execute A if x is equal to 1 and B in any other case.

if (all(ages >= 18)) { 
  
  print("All of them are of legal age")
  
} else {
  
  print("There is a minor")
}
[1] "There is a minor"

Esta estructura if - else puede ser anidada: imagina que queremos ejecutar un código si todos son menores; si no sucede, pero todos son mayores de 16, hacer otra cosa; en cualquier otra cosa, otra acción.

if (all(ages >= 18)) { 
  
  print("All of them are of legal age")
  
} else if (all(ages >= 16)) {
  
  print("There is a minor but all of them are greater or equal to 16 years old")
  
} else { print("There are any persons under 16 years of age") }
[1] "There are any persons under 16 years of age"
Tip

You can collapse the structures by clicking on the left arrow in your script.

1.1.3 If-else vectorized

This conditional structure can be vectorized (in a single line) with if_else() (from the {dplyr} package), whose arguments are

  • the condition to evaluate

  • what happens when it is met and when not

  • an optional argument for when the condition to evaluate is NA

We will label without are greater/lesser and an unknown when we don’t know.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
ages <- c(NA, ages)
if_else(ages >= 18, "legal age", "minor", missing = "unknown")
[1] "unknown"   "minor"     "minor"     "legal age" "legal age" "legal age"
[7] "legal age" "legal age" "legal age"

In R base there is ifelse(): it does not let you specify what to do with the absent ones but allows you to specify different types of data in TRUE and FALSE.

1.2 💻 It’s your turn

Try to perform the following exercises without looking at the solutions

📝 What will be the output of the following code?

if_else(sqrt(9) < 2, sqrt(9), 0)
Code
The output is 0 since sqrt(9) equals 3, and since it is not less than 2, it returns the second argument which is 0.

📝 What will be the output of the following code?

x <- c(1, NA, -1, 9)
if_else(sqrt(x) < 2, 0, 1)
Code
The output is the vector c(0, NA, NA, 1) since sqrt(1) is less than 2, sqrt(9) is not, and in the case of both sqrt(NA) (root of absent) and sqrt(-1) (returns NaN, not a number), its square root cannot be checked whether it is less than 2 or not, so the output is NA.

📝 Modify the code below so that, when the square root of a number cannot be verified to be less than 2, it returns -1.

x <- c(1, NA, -1, 9)
if_else(sqrt(x) < 2, 0, 1)
Code
x <- c(1, NA, -1, 9)
if_else(sqrt(x) < 2, 0, 1, missing = -1)

📝 What are the values of x and y of the lower code for z <- 1, z <- -1 and z <- -5?

z <- -1
if (z > 0) {
  
  x <- z^3
  y <- -sqrt(z)
  
} else if (abs(z) < 2) {
  
  x <- z^4
  y <- sqrt(-z)
  
} else {
  
  x <- z/2
  y <- abs(z)
  
}
Code
In the first case x = 1 and y = -1. In the second case x = 1 and y = 1. In the third case -1 and 2.

📝 What will happen if we execute the code below?

z <- "a"
if (z > 0) {
  
  x <- z^3
  y <- -sqrt(z)
  
} else if (abs(z) < 2) {
  
  x <- z^4
  y <- sqrt(-z)
  
} else {
  
  x <- z/2
  y <- abs(z)
  
}
Code
# will give error since it is not a numeric argument
Error in z^3 : non-numeric argument to binary operator

📝 From the {lubridate} package, the hour() function returns the time of a given date, and the now() function returns the date and time of the current time. With both functions, have cat() (cat()) print “good night” only after 21:00.

Code
# loading library
library(lubridate)

# Current date-time
current_dt <- now()

# If structure
if (hour(current_dt) > 21) {
  
  cat("Good night") # print or cat (two ways of printing)
}

1.3 Loops

Although in most occasions they can be replaced by other more efficient and readable structures, it is important to know one of the most famous control expressions: the loops.

  • for { }: allows [repeating the same code]{. hl-yellow} in a prefixed and known number of times.

  • while { }: allows repeating the same code but in an undetermined number of times (until a condition is no longer fulfilled).

1.3.1 For loop

A for loop is a structure that allows to repeat a set of commands a finite, prefixed and known number of times given a set of indices.

Let’s define a vector x <- c(0, -7, 1, 4) and another empty variable y. After that we will define a for loop with for () { }: inside the brackets we will indicate an index and some values to traverse, inside the braces the code to execute in each iteration (in this case, fill y as x + 1).

x <- c(0, -7, 1, 4)
y <- c()

for (i in 1:4) {
  
  y[i] <- x[i] + 1
  
}

Note that because R works in a default vector manner, the loop is the same as doing x + 1 directly.

x <- c(0, -7, 1, 4)
y <- c()

for (i in 1:4) {
  
  y[i] <- x[i] + 1
  
}
y
[1]  1 -6  2  5
y2 <- x + 1
y2
[1]  1 -6  2  5

Another common option is to indicate the indexes “automatically”: from the first 1 to the last (corresponding to the length of x length(x)).

x <- c(0, -7, 1, 4)
y <- c()

for (i in 1:length(x)) {
  
  y[i] <- x[i] + 1
  
}
y
[1]  1 -6  2  5

Thus the general structure of a for-loop will always be as follows

for (index in set) { 
  
  código (usually depending on index)
  
}

In the case of for loops ALWAYS we know how many iterations we have (as many as there are elements in the set to be indexed). We can see another example of a combining numbers and text loop: we define a vector of ages and names, and print the i-th name and age.

library(glue)
names <- c("Javi", "Sandra", "Carlos", "Marcos", "Marta")
ages <- c(33, 27, 18, 43, 29)

for (i in 1:5) { 
  
  print(glue("{names[i]} are {ages[i]} old")) 
  
}
Javi are 33 old
Sandra are 27 old
Carlos are 18 old
Marcos are 43 old
Marta are 29 old

Although they are usually indexed with numeric vectors, loops can be indexed on any vector structure, regardless of the type of the set.

library(stringr)
week_days <- c("monday", "tuesday", "wednesday", "thursday",
               "friday", "saturday", "sunday")

for (days in week_days) {
  
  print(days)
}
[1] "monday"
[1] "tuesday"
[1] "wednesday"
[1] "thursday"
[1] "friday"
[1] "saturday"
[1] "sunday"
1.3.1.1 for + if-else

Let’s combine conditional structures and loops: using the swiss set of the {datasets} package, let’s assign NA if the fertility values are greater than 80.

for (i in 1:nrow(swiss)) {
  
  if (swiss$Fertility[i] > 80) { 
    
    swiss$Fertility[i] <- NA
    
  }
}

This is «the same» as a vectorized if_else().

data("swiss")
swiss$Fertility <- if_else(swiss$Fertility > 80, NA, swiss$Fertility)

1.3.2 While loop

Another way to create a loop is with the while { } structure, which will loop an unknown number of times, until a condition stops being met (in fact it may never end). For example, we will inialize a variable times <- 1, which we will increment at each step, and we will not exit the loop until times > 3.

times <- 1
while(times <= 3) {
  
  print(glue("Not yet, we are in the {times}-th iteration")) 
  times <- times + 1
  
}
Not yet, we are in the 1-th iteration
Not yet, we are in the 2-th iteration
Not yet, we are in the 3-th iteration
print(glue("Now! We are in the {times}-th iteration")) 
Now! We are in the 4-th iteration

A while loop will always look like this

while(condition) {
  
  code to be executed while condition is TRUE
  # usually some variable is updated here
  
}

What happens when the condition is never FALSE? Try it yourself

while (1 > 0) {
  
  print("Press ESC to exit")
  
}

 

Warning

A while { } loop can be quite “dangerous” if we do not control well how to stop it.

1.3.2.1 break and next

We have two reserved commands to abort a loop or force it forward:

  • break: allows abort a loop even if its end has not been reached
for(i in 1:10) {
  if (i == 3) {
    
    break # if i = 3, we abort
    
  }
  print(i)
}
[1] 1
[1] 2
  • next: forces a loop to advance to the next iteration
for(i in 1:5) {
  if (i == 3) {
    
    next # if i = 3, we advance to the next iteration
    
  }
  print(i)
}
[1] 1
[1] 2
[1] 4
[1] 5

1.4 💻 It’s your turn

Try to perform the following exercises without looking at the solutions

📝 Modify the code below to print a message on the screen if and only if all the data in airquality is for a month other than January.

library(datasets)
months <- airquality$Month

if (months == 2) {
  print("No data in January")
}
Code
library(datasets)
months <- airquality$Month

if (all(months != 1)) {
  print("No data in January")
}

📝 Modify the code below to store in a variable called temp_high a TRUE if any of the records has a temperature above 90 degrees Fahrenheit and FALSE in any other case.

temp <- airquality$Temp

if (temp == 100) {
  print("Some of the records have temperatures in excess of 90 degrees Fahrenheit")
}
Code
# Option 1
temp <- airquality$Temp
temp_high <- FALSE
if (any(temp > 90)) {
   temp_high <- TRUE
}

# Option 2
temp_high <- any(airquality$Temp > 90)

📝 Modify the code below to design a for loop of 5 iterations that only loops through the first 5 odd (and at each step of the loop prints them)

for (i in 1:5) {
  
  print(i)
}
Code
for (i in c(1, 3, 5, 7, 9)) {
  
  print(i)
}

📝 Modify the code below to design a while loop that starts with a counter count <- 1 and stops when it reaches 6

count <- 1
while (count == 2) {
  
  print(count)
}
Code
count <- 1
while (count < 6) {
  
  print(count)
  count <- count + 1
  
}

2 🐣 Case study I: simulation study

To practice control structures we are going to perform a simulation exercise

2.1 Question 1

Define a variable called amount that starts at 100. Design a loop of 20 iterations where on each iteration, amount is reduced to half its value. Think about what kind of loop structure you should use. The final value of amount should be 0.000095367 (approx).

Code
# We use a for since we know the number of iterations
# by default (and it does not depend on anything).

# we initially define amount in 100
amount <- 100 

# for the loop we use e.g. i as index, ranging from 1 to 20
for (i in 1:20) {
  
  # the code is the same and does not depend on i
  amount <- amount/2
}
amount

2.2 Question 2

Design a loop structure so that you find the iteration where amount is less than 0.001 for the first time. Once found save it in iter and stop the loop.

Code
# two ways: for and while

# for
amount <- 100 

# we already know that in 20 it is less than 0.001 so we can set
# that amount as a ceiling knowing that it will not be reached
for (i in 1:20) {
  
  # if it is still not less, we continue dividing
  if (amount >= 0.001) {
    
    amount <- amount/2
    
  } else {
    
    # if it is already smaller, we save the iteration (think why i - 1)
    iter <- i - 1 
    
    # and abort it
    break
  }
  
}

# while
amount <- 100 

iter <- 0 # we must initialize the iterations

# we don't know how many iterations, only that it should stop when
# amount is below that amount
while (amount >= 0.001) {
  
  amount <- amount/2
    
  # classic while structure: if iteration runs
  # we update a value (in this case to count one iteration)
  iter <- iter + 1
}

iter

2.3 Question 3

In R we have the function %%: if we put a %% b it returns the remainder that would give the division \(a/b\). For example, 4 %%% 2 gives 0 since 4 is an even number (that is, its remainder when dividing by 2 is 0). If we put 13 %% 5 we get 3, since the remainder of dividing 13 by 5 is 3.

# Remainder by dividing by 2
3 %% 2
[1] 1
4 %% 2
[1] 0
5 %% 2
[1] 1
6 %% 2
[1] 0
# Remainder by dividing by 3
9 %% 3
[1] 0
10 %% 3
[1] 1
11 %% 3
[1] 2
12 %% 3
[1] 0

Starting at an initial amount initial_amount of 100 (euros), design a loop that adds 3€ plus the iteration you are on if the current amount is even and subtracts 5€ minus the iteration you are on if it is odd, UNLESS the amount is already equal or below 0 (in that case it should neither add nor subtract). Example: if amount is 50 euros and you are in iteration 13, it will add 3 + 13 (66 in total); if amount is 51 euros and you are in iteration 13, it will subtract 5 + 13 (33 in total); if amount is -2 euros and you are in iteration 13, it will add 3 + 13 (14 in total); if amount is -1 euros and you are in iteration 13, it will do nothing. Save the resulting amounts for each iteration (maximum of 150 iterations). Start from iteration 2

Code
initial_amount <- 100
amount <- c(initial_amount, rep(NA, 149))
for (i in 2:150) {
  
  if (amount[i - 1] %% 2 == 0) {
    
    amount[i] <- amount[i - 1] + 3 + i
    
  } else if (amount[i - 1] > 0) {
    
    amount[i] <- amount[i - 1] - (5 + i)
    
  } else {
    
    amount[i] <- amount[i - 1]
    
  }
}

What happened?

3 Functions in R

Not only can we use default functions that come already loaded in packages, we can also create our own functions to automate tasks. How to create our own function?

3.1 Basic scheme

Let’s look at its basic scheme:

  • Name: for example name_fun (no spaces or strange characters). To the name we assign the reserved word function().

  • Define input arguments (inside function()).

  • Body of the function inside { }.

  • We end the function with the output arguments with return().

name_fun <- function(arg1, arg2, ...) {
  
  code to be executed
  
  return(var_output)
  
}
  • arg1, arg2, ...: will be the input arguments, the arguments that the function takes to execute the code inside.

  • code: lines of code that we want to execute the function.

  • return(var_output): the output arguments will be entered.

name_fun <- function(arg1, arg2, ...) {
  
  # Code to be executed
  code
  
  # Output
  return(var_output)
  
}
Important

All variables that we define inside the function are LOCAL variables: they will only exist inside the function unless we specify otherwise.

Let’s look at a very simple example of a function for calculating the area of a rectangle.

Since the area of a rectangle is calculated as the product of its sides, we will need just that, its sides: those will be the input arguments and the value to return will be just its area (\(side_1 * side_2\)).

# We define the name of function and input arguments
compute_area <- function(side_1, side_2) {
  
  area <- side_1 * side_2
  return(area)
  
}

We can also make a direct definition of variables without storing along the way.

# We define the name of function and input arguments
compute_area <- function(side_1, side_2) {
  
  return(side_1 * side_2)
  
}

 

How to apply our function?

compute_area(5, 3) # area of 5 x 3 rectangle
[1] 15
compute_area(1, 5) # area of 1 x 5 rectangle
[1] 5
Tip

Although it is not necessary, it is recommendable to make explicit the calling of the arguments, specifying in the code what value is for each argument so that it does not depend on its order, making the code more readable.

compute_area(side_1 = 5, side_2 = 3) # area of 5 x 3 rectangle
[1] 15
compute_area(side_2 = 3, side_1 = 5) # area of 5 x 3 rectangle
[1] 15

3.2 Default arguments

Imagine now that we realize that 90% of the time we use such a function to default calculate the area of a square (i.e., we only need one side). To do this, we can define default arguments in the function: they will take that value unless we assign another one.

Why not assign side_2 = side_1 default, to save lines of code and time?

compute_area <- function(side_1, side_2 = side_1) {
  
  # Code to be executed
  area <- side_1 * side_2
  
  # Output
  return(area)
  
}

Now default the second side will be equal to the first (if added it will use both).

compute_area(side_1 = 5) # square
[1] 25
compute_area(side_1 = 5, side_2 = 7) # rectangle
[1] 35

3.3 Multiple outputs

Let’s complicate the function a bit and add in the output the values of each side, labeled side_1 and side_2, packing the output in a vector.

compute_area <- function(side_1, side_2 = side_1) {
  
  # Code
  area <- side_1 * side_2
  
  # Output
  return(c("area" = area, "side_1" = side_1, "side_2" = side_2))
  
}

We can complicate the output a little more by adding a fourth variable that tells us, depending on the arguments, whether rectangle or square, having to add a character (or logic) variable in the output.

compute_area <- function(side_1, side_2 = side_1) {
  
  # Code
  area <- side_1 * side_2
  
  # Output
  return(c("area" = area, "side_1" = side_1, "side_2" = side_2,
           "type" = if_else(side_1 == side_2, "square", "rectangle")))
  
}
compute_area(5, 3)
       area      side_1      side_2        type 
       "15"         "5"         "3" "rectangle" 

Problem: when trying to put numbers and text together, it converts everything to numbers. We could store it all in a tibble() as we have learned or in an object known in R as lists (we will see it later).

3.4 Order of arguments

Before we did not care about the order of the arguments, but now the order of the input arguments matters, since we include side_1 and side_2 in the output.

Tip

As mentioned, it is highly recommended to make the function call explicitly setting the arguments to improve legibility and interpretability.

# Equivalent to compute_area(5, 3)
compute_area(side_1 = 5, side_2 = 3)
       area      side_1      side_2        type 
       "15"         "5"         "3" "rectangle" 

It seems silly what we have done but we have crossed an important frontier: we have gone from consuming knowledge (code from other packages, elaborated by others), to generating knowledge, creating our own functions.

Functions are going to be key in your day-to-day work because they will allow you to automate code that you are going to repeat over and over again: by packaging that code under an alias (function name) you will be able to use it over and over again without programming it (so doing twice as much work will not imply working twice as much)

3.5 Local vs global variables

An important aspect to think about with functions: what happens if we name a variable inside a function to which we have forgotten to assign a value inside the function.

We must be cautious when using functions in R, since due to the “lexicographic rule”, if a variable is not defined inside the function, R will look for that variable in the environment of variables.

x <- 1
fun_example <- function() {
    
  print(x) # No output, just doing an action
}
fun_example()
[1] 1

If a variable is already defined outside the function (global environment), and is also used inside changing its value, the value only changes inside but not in the global environment.

x <- 1
fun_example <- function() {
    
  x <- 2
  print(x) # value inside of function
}
# value inside of function (local)
fun_example()
[1] 2
# value output of function (global)
print(x)
[1] 1

If we want it to change locally as well as globally we must use the double assignment (<<-).

x <- 1
y <- 2
fun_example <- function() {
  
  # no change in a global way, just locally
  x <- 3 
  # change in a global way
  y <<- 0 #<<
  
  print(x)
  print(y)
}

fun_example() # value inside function (local)
[1] 3
[1] 0
x # global value
[1] 1
y # global value
[1] 0

3.6 💻 It’s your turn

Try to perform the following exercises without looking at the solutions

📝 Modify the code below to define a function called sum_function, so that given two elements, it returns their sum.

name <- function(x, y) {
  sum_output <- # code
  return()
}
# we apply the function
sum_function(3, 7)
Code
sum_function<- function(x, y) {
  sum_output <- x + y
  return(sum_output)
}
sum_function(3, 7)

📝 Modify the code below to define a function called product_function, so that given two elements, it returns their product, but by default it calculates the square

name <- function(x, y) {
  prod_output <- # code
  return()
}
product_function(3)
product_function(3, -7)
Code
product_function <- function(x, y = x) {
  
  prod_output <- x * y
  return(prod_output)
  
}
product_function(3)
product_function(3, -7)

📝 Define a function called equal_names that, given two names, tells us if they are equal or not. Do this by considering case-sensitive, and case-insensitive. Use the {stringr} package.

Code
# Case-sensitive
equal_names <- function(person_1, person_2) {
  
  return(person_1 == person_2)
  
}
equal_names("Javi", "javi")
equal_names("Javi", "Lucía")

# Case-insensitive
equal_names <- function(person_1, person_2) {
  
  return(toupper(person_1) == toupper(person_2))
  
}
equal_names("Javi", "javi")
equal_names("Javi", "Lucía")

📝 Create a function called compute_BMI that, given two arguments (weight and height in meters) and a name, returns a list with the BMI (\(weight/(height^2)\)) and the name.

Code
compute_BMI <- function(name, weight, height) {
  
  return(list("name" = name, "BMI" = weight/(height^2)))
  
}

📝 Repeat the previous exercise but with another optional argument called units (by default, units = “meters”). Develop the function so that it does the right thing if units = “meters” and if units = “centimeters”.

Code
compute_BMI <- function(name, weight, height, units = "meters") {
  
  return(list("name" = name,
              "BMI" = weight / (if_else(units == "meters", height, height/100)^2)))
  
}

📝 Create a fictitious tibble of 7 persons, with three variables (invent name, and simulate weight, height in centimeters), and apply the defined function so that we obtain a fourth column with their BMI.

Code
data <-
  tibble("name" = c("javi", "sandra", "laura",
                       "ana", "carlos", "leo", NA),
         "weight" = rnorm(n = 7, mean = 70, sd = 1),
         "height" = rnorm(n = 7, mean = 168, sd = 5))

data |> 
  mutate(BMI = compute_BMI(name, weight, height, units = "centimeters")$BMI)

📝 Create a function called shortcut that has two numeric arguments x and y. If both are equal, you should return equal and have the function terminate automatically (think about when a function exits). WARNING: x and y could be vectors. If they are different (of equal length) calculate the proportion of different elements. If they are different (being of different length), it returns the elements that are not common.

Code
shortcut <- function(x, y) {
  
  if (all(x == y) & length(x) == length(y)) { return("equal") }
  else {
   
    if (length(x) == length(y)) {
      
      n_diff <- sum(x != y) / length(x)
      return(n_diff)
      
    } else {
      
      diff_elem <- unique(c(setdiff(x, y), setdiff(y, x)))
      return(diff_elem)
    }
    
  }
}

4 🐣 Case study II: temperature converter

To practice using functions we are going to create a temperature converter. Let’s start simple. Try to conceptualize the idea on paper first.

4.1 Question 1

Define a function called celsius_to_kelvin that, given a temperature in Celsius (e.g. temp as argument) converts it to Kelvin according to the conversion formula below. After defining the function apply it to a vector of temperatures.

\[K = °C + 273.15\]

Code
# define function name and arguments
celsius_to_kelvin <- function(temp) {
  
  # convert
  kelvin <- temp + 273.15
  
  # output
  return(kelvin)
  
}

x <- c(-15, -3, 0, 15, 27.5)
celsius_to_kelvin(x)

4.2 Question 2

Create the inverse function kelvin_to_celsius and apply it to another vector of temperatures. You will have to make sure that the temperature in Kelvin does not take negative values (since it is an absolute scale). In case this is not true, return NA.

Code
# define function name and arguments
kelvin_to_celsius <- function(temp) {
  
  # if negative in Kelvin, we stop and return absent
  # otherwise, we convert
  celsius <- if_else(temp < 0, NA, temp - 273.15)
  
  # Think why we haven't done it with an if (...) else (...)

  # output
  return(celsius)
  
}

y <- c(0, 250, 300, 350)
kelvin_to_celsius(y)

4.3 Question 3

Create a joint function converter_temp that has two arguments: temperature and a text argument that tells us if it is kelvin or celsius (and that by default the input temperature is Celsius). The function must use that string to decide in which direction it converts (check that the text argument does not have an option other than the two allowed; otherwise, return error using the stop(“error message…”) command). Apply it to the previous vectors and check that it gives the same.

Code
# define function name and arguments
# default, units in celsius
conversor_temp <- function(temp, units = "celsius") {
  
  # we check that units are correct
  # within the allowed values
  if (units %in% c("celsius", "kelvin")) {
    
    if (units == "celsius") {
      
      temp_out <- celsius_to_kelvin(temp) 
      
    } else {
      
      temp_out <- kelvin_to_celsius(temp)
      
    }
    
  } else {
    
    # otherwise we stop the function with an error message
    stop("Error: just 'celsius' or 'kelvin' as units")
  }
  
  # output
  return(temp_out)
  
}

# Notice that we have not used `if_else()` because the number of elements
# to evaluate in the condition must be equal to the number of elements that
# it returns, by doing it vectorially.
conversor_temp(x)
conversor_temp(y, units = "kelvin")

4.4 Question 4

Repeats the previous function but regardless of whether units are in upper or lower case.

conversor_temp(y, units = "Kelvin")
Error in conversor_temp(y, units = "Kelvin"): could not find function "conversor_temp"
Code
# define function name and arguments
# default, units in celsius
library(stringr)
conversor_temp <- function(temp, units = "celsius") {
  
  # we use str_to_lower to make everything lowercase
  if (str_to_lower(units) %in% c("celsius", "kelvin")) {
    
    if (units == "celsius") {
      
      temp_out <- celsius_to_kelvin(temp) 
      
    } else {
      
      temp_out <- kelvin_to_celsius(temp)
      
    }
    
  } else {
    
    # otherwise we stop the function with an error message
    stop("Error: just 'celsius' or 'kelvin' units")
  }
  
  # devolvemos
  return(temp_out)
  
}

conversor_temp(y, units = "Kelvin")

4.5 Question 5

Repeat all the above process creating converter_temp2 but to convert between Celsius and Fahrenheit following the formula below

\[ºC = (ºF − 32) * \frac{5}{9}, \quad ºF = 32 + ºC * \frac{9}{5}\]

Code
celsius_to_fahr <- function(temp) {

  fahr <- 32 + temp * (9/5)
  return(fahr)
  
}
celsius_to_fahr(x)

fahr_to_celsius <- function(temp) {
  
  celsius <- (temp - 32) * (5/9)
  return(celsius)
  
}

z <- c(40, 60, 80, 100)
fahr_to_celsius(z)

conversor_temp2 <- function(temp, units = "celsius") {
  
  if (str_to_lower(units) %in% c("celsius", "fahr")) {
    
    if (units == "celsius") {
      
      temp_out <- celsius_to_fahr(temp) 
      
    } else {
      
      temp_out <- fahr_to_celsius(temp)
      
    }
    
  } else {
    
    stop("Error: just 'celsius' or 'fahr' units")
  }
  
  return(temp_out)
  
}

conversor_temp2(x)
conversor_temp2(z, units = "fahr")

4.6 Question 6

Finally, create the superfunction converter_temp_total that allows as input argument a temperature in one of the 3 units, a text indicating in which units it comes and another one indicating in which units it is to be output. By default it converts from celsius to kelvin.

Code
converter_temp_total <-
  function(temp, units_input = "celsius",
           units_output = "kelvin") {
  
  if (str_to_lower(units_input) %in% c("celsius", "fahr", "kelvin") &
      str_to_lower(units_output) %in% c("celsius", "fahr", "kelvin")) {
    
    if (units_input == units_output) {
      
      return(temp)
      
    }
    
    else if (units_input == "celsius") {
      
      if (units_output == "kelvin") {
        
        temp_out <- celsius_to_kelvin(temp) 
        
      } else { 
        
        temp_out <- celsius_to_fahr(temp) 
      }
      
    } else if (units_input == "kelvin") {
      
      if (units_output== "celsius") {
        
        temp_out <- kelvin_to_celsius(temp) 
    
      } else { 
        
        temp_out <- celsius_to_fahr(kelvin_to_celsius(temp))
      }
      
    } else {
      
      if (units_output == "celsius") {
        
        temp_out <- fahr_to_celsius(temp) 
    
      } else { 
        
        temp_out <- celsius_to_kelvin(fahr_to_celsius(temp))
      }
      
    }
    
  } else {
    
    stop("Error: just 'celsius', 'kelvin' or 'fahr'")
  }
  
  return(temp_out)
  
}

converter_temp_total(x, units_input = "celsius",
                     units_output = "celsius")
converter_temp_total(y, units_input = "kelvin",
                     units_output = "kelvin")
converter_temp_total(y, units_input = "kelvin",
                     units_output = "celsius")
converter_temp_total(z, units_input = "fahr",
                     units_output = "celsius")

converter_temp_total(z, units_input = "fahr",
                     units_output = "celsius")
converter_temp_total(converter_temp_total(z, units_input = "fahr",
                                          units_output = "kelvin"),
                     units_input = "kelvin",
                     units_output = "celsius")

5 🐣 Case study III: Monty Hall problem

In R the function sample(x = ..., size = ...) will be very useful: from a collection of x elements, it selects a random size number of them. For example, if we want to simulate 3 times the throw of a die we have 6 possible elements (x = 1:6) and we select it 3 times (size = 3).

sample(x = 1:6, size = 3)
[1] 1 3 5

Since it is random, each time you run it, something different will come out.

sample(x = 1:6, size = 3)
[1] 6 2 3

What if we want to throw it 10 times?

sample(x = 1:6, size = 10)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'

Having only 6 possible elements and choosing 10, it cannot, so we have to indicate that we want a sample with replacement (as with the die, each face can be repeated when re-rolled).

sample(x = 1:6, size = 10, replace = TRUE)
 [1] 1 4 1 3 5 1 6 1 2 6

5.1 Question 1

With the above, imagine that you are in a TV contest where you are given a choice of 3 doors: in one there is a millionaire prize and in the other 2 an oreo cookie. Design the simulation study with for loops to approximate the probability that you get the prize (obviously it has to give you approx 0.333333333). Perform the experiment for 10, 50 trials, 100 trials, 500 trials, 1000 trials, 10 000 trials and 25 000 trials (hint: you need a loop within a loop). What do you observe?

Code
library(dplyr)

# Possible doors
doors <- c(1, 2, 3)

# Possible trials
trials <- c(10, 50, 100, 500, 1000, 10000, 25000)

# For scenario, we define the number of times in which we win prize
n_prizes <- rep(0, length(trials))

# first loop: scenarios
for (i in 1:length(trials)) {
  
  # second loop: for scenario, the number of trials
  for (j in 1:trials[i]) {
    
    prize <- sample(x = doors, size = 1)
    choice <- sample(x = doors, size = 1)

    n_prizes[i] <- if_else(choice == prize, n_prizes[i] + 1, n_prizes[i])
    
  }
  # in proportion
  n_prizes[i] <- n_prizes[i] / trials[i]
}
n_prizes

5.2 Question 2

What if, in each round, one of the non-winning doors that you have not chosen was opened for you, would you change doors or would you stay? Simulate both cases and find out which is the correct strategy (this problem is known as the Monty Hall problem and even appears in movies such as 21 Black Jack).

Code
doors <- c(1, 2, 3)
trials <- c(10, 50, 100, 500, 1000, 10000, 25000)
n_prizes_nochange <- n_prizes_change <- rep(0, length(trials))

for (i in 1:length(trials)) {
  for (j in 1:trials[i]) {
    
    init_choice <- sample(x = doors, size = 1)
    
    prize <- sample(x = doors, size = 1)
    
    open_door <-
      doors[doors != init_choice & doors != prize]
    
    if (length(open_door) > 1) {
      
      open_door <- sample(x = open_door, size = 1)
    }
      
    n_prizes_nochange[i] <-
      if_else(init_choice == prize, n_prizes_nochange[i] + 1,
              n_prizes_nochange[i])

    changed_door <- doors[doors != init_choice & doors != open_door]
    n_prizes_change[i] <-
      if_else(changed_door == prize, n_prizes_change[i] + 1,
              n_prizes_change[i])
    
  }
  n_prizes_nochange[i] <- n_prizes_nochange[i] / trials[i]
  n_prizes_change[i] <- n_prizes_change[i] / trials[i]
}
n_prizes_nochange
n_prizes_change