<- c(14, 17, 24, 56, 31, 20, 87, 73)
ages < 18 ages
[1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Practical workbooks of Data Programming in Master in Computational Social Science (2024-2025)
A flow or control structure consists of a series of commands oriented to decide the path that your code must follow
If condition A is met, what happens?
What if B happens?
How can I repeat the same expression (depending on a variable)?
If you have programmed before, you may be familiar with what are known as conditional structures such as if (bla bla) {...} else {...}
or loops for/while
(to be avoided whenever possible).
One of the most famous control structures are those known as conditional structures if
.
IF a set of conditions is met (TRUE), then execute whatever is inside the curly brackets.
For example, the structure if (x == 1) { code A }
what it will do is execute code A in braces but ONLY IF the condition in brackets is true (only if x
is 1). In any other case, it will do nothing
Let’s define a vector of ages of 8 people
<- c(14, 17, 24, 56, 31, 20, 87, 73)
ages < 18 ages
[1] TRUE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
Our conditional structure will do the following: if there is a minor, it will print a message.
if (any(ages < 18)) {
print("There is a minor")
}
[1] "There is a minor"
if (any(ages < 18)) {
print("There is a minor")
}
In case the conditions are not true inside if()
(FALSE
), nothing happens.
if (all(ages >= 18)) {
print("All of them are of legal age")
}
We get no message because the condition all(ages >= 18)
is not TRUE
, so it does not execute anything.
The structure if (condition) { code A }
can be combined with an else { code B }
: when the condition is not checked, it will [execute the alternative code B]{. hl-yellow} inside else { }
, allowing us to decide what happens when it is satisfied and when it is not
For example, if (x == 1) { code A } else { code B }
will execute A if x
is equal to 1 and B in any other case.
if (all(ages >= 18)) {
print("All of them are of legal age")
else {
}
print("There is a minor")
}
[1] "There is a minor"
Esta estructura if - else
puede ser anidada: imagina que queremos ejecutar un código si todos son menores; si no sucede, pero todos son mayores de 16, hacer otra cosa; en cualquier otra cosa, otra acción.
if (all(ages >= 18)) {
print("All of them are of legal age")
else if (all(ages >= 16)) {
}
print("There is a minor but all of them are greater or equal to 16 years old")
else { print("There are any persons under 16 years of age") } }
[1] "There are any persons under 16 years of age"
You can collapse the structures by clicking on the left arrow in your script.
This conditional structure can be vectorized (in a single line) with if_else()
(from the {dplyr}
package), whose arguments are
the condition to evaluate
what happens when it is met and when not
an optional argument for when the condition to evaluate is NA
We will label without are greater/lesser and an unknown
when we don’t know.
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
<- c(NA, ages)
ages if_else(ages >= 18, "legal age", "minor", missing = "unknown")
[1] "unknown" "minor" "minor" "legal age" "legal age" "legal age"
[7] "legal age" "legal age" "legal age"
In R
base there is ifelse()
: it does not let you specify what to do with the absent ones but allows you to specify different types of data in TRUE
and FALSE
.
Try to perform the following exercises without looking at the solutions
📝 What will be the output of the following code?
if_else(sqrt(9) < 2, sqrt(9), 0)
0 since sqrt(9) equals 3, and since it is not less than 2, it returns the second argument which is 0. The output is
📝 What will be the output of the following code?
<- c(1, NA, -1, 9)
x if_else(sqrt(x) < 2, 0, 1)
c(0, NA, NA, 1) since sqrt(1) is less than 2, sqrt(9) is not, and in the case of both sqrt(NA) (root of absent) and sqrt(-1) (returns NaN, not a number), its square root cannot be checked whether it is less than 2 or not, so the output is NA. The output is the vector
📝 Modify the code below so that, when the square root of a number cannot be verified to be less than 2, it returns -1.
<- c(1, NA, -1, 9)
x if_else(sqrt(x) < 2, 0, 1)
<- c(1, NA, -1, 9)
x if_else(sqrt(x) < 2, 0, 1, missing = -1)
📝 What are the values of x
and y
of the lower code for z <- 1
, z <- -1
and z <- -5
?
<- -1
z if (z > 0) {
<- z^3
x <- -sqrt(z)
y
else if (abs(z) < 2) {
}
<- z^4
x <- sqrt(-z)
y
else {
}
<- z/2
x <- abs(z)
y
}
= 1 and y = -1. In the second case x = 1 and y = 1. In the third case -1 and 2. In the first case x
📝 What will happen if we execute the code below?
<- "a"
z if (z > 0) {
<- z^3
x <- -sqrt(z)
y
else if (abs(z) < 2) {
}
<- z^4
x <- sqrt(-z)
y
else {
}
<- z/2
x <- abs(z)
y
}
# will give error since it is not a numeric argument
in z^3 : non-numeric argument to binary operator Error
📝 From the {lubridate}
package, the hour()
function returns the time of a given date, and the now()
function returns the date and time of the current time. With both functions, have cat()
(cat()
) print “good night” only after 21:00.
# loading library
library(lubridate)
# Current date-time
<- now()
current_dt
# If structure
if (hour(current_dt) > 21) {
cat("Good night") # print or cat (two ways of printing)
}
Although in most occasions they can be replaced by other more efficient and readable structures, it is important to know one of the most famous control expressions: the loops.
for { }
: allows [repeating the same code]{. hl-yellow} in a prefixed and known number of times.
while { }
: allows repeating the same code but in an undetermined number of times (until a condition is no longer fulfilled).
A for loop is a structure that allows to repeat a set of commands a finite, prefixed and known number of times given a set of indices.
Let’s define a vector x <- c(0, -7, 1, 4)
and another empty variable y
. After that we will define a for loop with for () { }
: inside the brackets we will indicate an index and some values to traverse, inside the braces the code to execute in each iteration (in this case, fill y
as x + 1
).
<- c(0, -7, 1, 4)
x <- c()
y
for (i in 1:4) {
<- x[i] + 1
y[i]
}
Note that because R
works in a default vector manner, the loop is the same as doing x + 1
directly.
<- c(0, -7, 1, 4)
x <- c()
y
for (i in 1:4) {
<- x[i] + 1
y[i]
} y
[1] 1 -6 2 5
<- x + 1
y2 y2
[1] 1 -6 2 5
Another common option is to indicate the indexes “automatically”: from the first 1
to the last (corresponding to the length of x length(x)
).
<- c(0, -7, 1, 4)
x <- c()
y
for (i in 1:length(x)) {
<- x[i] + 1
y[i]
} y
[1] 1 -6 2 5
Thus the general structure of a for-loop will always be as follows
for (index in set) {
código (usually depending on index)
}
In the case of for loops ALWAYS we know how many iterations we have (as many as there are elements in the set to be indexed). We can see another example of a combining numbers and text loop: we define a vector of ages and names, and print the i-th name and age.
library(glue)
<- c("Javi", "Sandra", "Carlos", "Marcos", "Marta")
names <- c(33, 27, 18, 43, 29)
ages
for (i in 1:5) {
print(glue("{names[i]} are {ages[i]} old"))
}
Javi are 33 old
Sandra are 27 old
Carlos are 18 old
Marcos are 43 old
Marta are 29 old
Although they are usually indexed with numeric vectors, loops can be indexed on any vector structure, regardless of the type of the set.
library(stringr)
<- c("monday", "tuesday", "wednesday", "thursday",
week_days "friday", "saturday", "sunday")
for (days in week_days) {
print(days)
}
[1] "monday"
[1] "tuesday"
[1] "wednesday"
[1] "thursday"
[1] "friday"
[1] "saturday"
[1] "sunday"
Let’s combine conditional structures and loops: using the swiss
set of the {datasets}
package, let’s assign NA
if the fertility values are greater than 80.
for (i in 1:nrow(swiss)) {
if (swiss$Fertility[i] > 80) {
$Fertility[i] <- NA
swiss
} }
This is «the same» as a vectorized if_else()
.
data("swiss")
$Fertility <- if_else(swiss$Fertility > 80, NA, swiss$Fertility) swiss
Another way to create a loop is with the while { }
structure, which will loop an unknown number of times, until a condition stops being met (in fact it may never end). For example, we will inialize a variable times <- 1
, which we will increment at each step, and we will not exit the loop until times > 3
.
<- 1
times while(times <= 3) {
print(glue("Not yet, we are in the {times}-th iteration"))
<- times + 1
times
}
Not yet, we are in the 1-th iteration
Not yet, we are in the 2-th iteration
Not yet, we are in the 3-th iteration
print(glue("Now! We are in the {times}-th iteration"))
Now! We are in the 4-th iteration
A while
loop will always look like this
while(condition) {
while condition is TRUE
code to be executed # usually some variable is updated here
}
What happens when the condition is never FALSE? Try it yourself
while (1 > 0) {
print("Press ESC to exit")
}
A while { }
loop can be quite “dangerous” if we do not control well how to stop it.
We have two reserved commands to abort a loop or force it forward:
break
: allows abort a loop even if its end has not been reachedfor(i in 1:10) {
if (i == 3) {
break # if i = 3, we abort
}print(i)
}
[1] 1
[1] 2
next
: forces a loop to advance to the next iterationfor(i in 1:5) {
if (i == 3) {
next # if i = 3, we advance to the next iteration
}print(i)
}
[1] 1
[1] 2
[1] 4
[1] 5
Try to perform the following exercises without looking at the solutions
📝 Modify the code below to print a message on the screen if and only if all the data in airquality
is for a month other than January.
library(datasets)
<- airquality$Month
months
if (months == 2) {
print("No data in January")
}
library(datasets)
<- airquality$Month
months
if (all(months != 1)) {
print("No data in January")
}
📝 Modify the code below to store in a variable called temp_high
a TRUE
if any of the records has a temperature above 90 degrees Fahrenheit and FALSE
in any other case.
<- airquality$Temp
temp
if (temp == 100) {
print("Some of the records have temperatures in excess of 90 degrees Fahrenheit")
}
# Option 1
<- airquality$Temp
temp <- FALSE
temp_high if (any(temp > 90)) {
<- TRUE
temp_high
}
# Option 2
<- any(airquality$Temp > 90) temp_high
📝 Modify the code below to design a for
loop of 5 iterations that only loops through the first 5 odd (and at each step of the loop prints them)
for (i in 1:5) {
print(i)
}
for (i in c(1, 3, 5, 7, 9)) {
print(i)
}
📝 Modify the code below to design a while
loop that starts with a counter count <- 1
and stops when it reaches 6
<- 1
count while (count == 2) {
print(count)
}
<- 1
count while (count < 6) {
print(count)
<- count + 1
count
}
To practice control structures we are going to perform a simulation exercise
Define a variable called
amount
that starts at 100. Design a loop of 20 iterations where on each iteration, amount is reduced to half its value. Think about what kind of loop structure you should use. The final value ofamount
should be0.000095367
(approx).
# We use a for since we know the number of iterations
# by default (and it does not depend on anything).
# we initially define amount in 100
<- 100
amount
# for the loop we use e.g. i as index, ranging from 1 to 20
for (i in 1:20) {
# the code is the same and does not depend on i
<- amount/2
amount
} amount
Design a loop structure so that you find the iteration where
amount
is less than 0.001 for the first time. Once found save it initer
and stop the loop.
# two ways: for and while
# for
<- 100
amount
# we already know that in 20 it is less than 0.001 so we can set
# that amount as a ceiling knowing that it will not be reached
for (i in 1:20) {
# if it is still not less, we continue dividing
if (amount >= 0.001) {
<- amount/2
amount
else {
}
# if it is already smaller, we save the iteration (think why i - 1)
<- i - 1
iter
# and abort it
break
}
}
# while
<- 100
amount
<- 0 # we must initialize the iterations
iter
# we don't know how many iterations, only that it should stop when
# amount is below that amount
while (amount >= 0.001) {
<- amount/2
amount
# classic while structure: if iteration runs
# we update a value (in this case to count one iteration)
<- iter + 1
iter
}
iter
In
R
we have the function%%
: if we puta %% b
it returns the remainder that would give the division \(a/b\). For example,4 %%% 2
gives 0 since 4 is an even number (that is, its remainder when dividing by 2 is 0). If we put13 %% 5
we get 3, since the remainder of dividing 13 by 5 is 3.
# Remainder by dividing by 2
3 %% 2
[1] 1
4 %% 2
[1] 0
5 %% 2
[1] 1
6 %% 2
[1] 0
# Remainder by dividing by 3
9 %% 3
[1] 0
10 %% 3
[1] 1
11 %% 3
[1] 2
12 %% 3
[1] 0
Starting at an initial amount
initial_amount
of 100 (euros), design a loop that adds 3€ plus the iteration you are on if the current amount is even and subtracts 5€ minus the iteration you are on if it is odd, UNLESS the amount is already equal or below 0 (in that case it should neither add nor subtract). Example: if amount is 50 euros and you are in iteration 13, it will add 3 + 13 (66 in total); if amount is 51 euros and you are in iteration 13, it will subtract 5 + 13 (33 in total); if amount is -2 euros and you are in iteration 13, it will add 3 + 13 (14 in total); if amount is -1 euros and you are in iteration 13, it will do nothing. Save the resulting amounts for each iteration (maximum of 150 iterations). Start from iteration 2
<- 100
initial_amount <- c(initial_amount, rep(NA, 149))
amount for (i in 2:150) {
if (amount[i - 1] %% 2 == 0) {
<- amount[i - 1] + 3 + i
amount[i]
else if (amount[i - 1] > 0) {
}
<- amount[i - 1] - (5 + i)
amount[i]
else {
}
<- amount[i - 1]
amount[i]
} }
What happened?
Not only can we use default functions that come already loaded in packages, we can also create our own functions to automate tasks. How to create our own function?
Let’s look at its basic scheme:
Name: for example name_fun
(no spaces or strange characters). To the name we assign the reserved word function()
.
Define input arguments (inside function()
).
Body of the function inside { }
.
We end the function with the output arguments with return()
.
<- function(arg1, arg2, ...) {
name_fun
code to be executed
return(var_output)
}
arg1, arg2, ...
: will be the input arguments, the arguments that the function takes to execute the code inside.
code
: lines of code that we want to execute the function.
return(var_output)
: the output arguments will be entered.
<- function(arg1, arg2, ...) {
name_fun
# Code to be executed
code
# Output
return(var_output)
}
All variables that we define inside the function are LOCAL variables: they will only exist inside the function unless we specify otherwise.
Let’s look at a very simple example of a function for calculating the area of a rectangle.
Since the area of a rectangle is calculated as the product of its sides, we will need just that, its sides: those will be the input arguments and the value to return will be just its area (\(side_1 * side_2\)).
# We define the name of function and input arguments
<- function(side_1, side_2) {
compute_area
<- side_1 * side_2
area return(area)
}
We can also make a direct definition of variables without storing along the way.
# We define the name of function and input arguments
<- function(side_1, side_2) {
compute_area
return(side_1 * side_2)
}
How to apply our function?
compute_area(5, 3) # area of 5 x 3 rectangle
[1] 15
compute_area(1, 5) # area of 1 x 5 rectangle
[1] 5
Although it is not necessary, it is recommendable to make explicit the calling of the arguments, specifying in the code what value is for each argument so that it does not depend on its order, making the code more readable.
compute_area(side_1 = 5, side_2 = 3) # area of 5 x 3 rectangle
[1] 15
compute_area(side_2 = 3, side_1 = 5) # area of 5 x 3 rectangle
[1] 15
Imagine now that we realize that 90% of the time we use such a function to default calculate the area of a square (i.e., we only need one side). To do this, we can define default arguments in the function: they will take that value unless we assign another one.
Why not assign side_2 = side_1
default, to save lines of code and time?
<- function(side_1, side_2 = side_1) {
compute_area
# Code to be executed
<- side_1 * side_2
area
# Output
return(area)
}
Now default the second side will be equal to the first (if added it will use both).
compute_area(side_1 = 5) # square
[1] 25
compute_area(side_1 = 5, side_2 = 7) # rectangle
[1] 35
Let’s complicate the function a bit and add in the output the values of each side, labeled side_1
and side_2
, packing the output in a vector.
We can complicate the output a little more by adding a fourth variable that tells us, depending on the arguments, whether rectangle or square, having to add a character (or logic) variable in the output.
compute_area <- function(side_1, side_2 = side_1) {
# Code
area <- side_1 * side_2
# Output
return(c("area" = area, "side_1" = side_1, "side_2" = side_2,
"type" = if_else(side_1 == side_2, "square", "rectangle")))
}
compute_area(5, 3)
area side_1 side_2 type
"15" "5" "3" "rectangle"
Problem: when trying to put numbers and text together, it converts everything to numbers. We could store it all in a tibble()
as we have learned or in an object known in R
as lists (we will see it later).
Before we did not care about the order of the arguments, but now the order of the input arguments matters, since we include side_1
and side_2
in the output.
As mentioned, it is highly recommended to make the function call explicitly setting the arguments to improve legibility and interpretability.
# Equivalent to compute_area(5, 3)
compute_area(side_1 = 5, side_2 = 3)
area side_1 side_2 type
"15" "5" "3" "rectangle"
It seems silly what we have done but we have crossed an important frontier: we have gone from consuming knowledge (code from other packages, elaborated by others), to generating knowledge, creating our own functions.
Functions are going to be key in your day-to-day work because they will allow you to automate code that you are going to repeat over and over again: by packaging that code under an alias (function name) you will be able to use it over and over again without programming it (so doing twice as much work will not imply working twice as much)
An important aspect to think about with functions: what happens if we name a variable inside a function to which we have forgotten to assign a value inside the function.
We must be cautious when using functions in R
, since due to the “lexicographic rule”, if a variable is not defined inside the function, R
will look for that variable in the environment of variables.
<- 1
x <- function() {
fun_example
print(x) # No output, just doing an action
}fun_example()
[1] 1
If a variable is already defined outside the function (global environment), and is also used inside changing its value, the value only changes inside but not in the global environment.
<- 1
x <- function() {
fun_example
<- 2
x print(x) # value inside of function
}
# value inside of function (local)
fun_example()
[1] 2
# value output of function (global)
print(x)
[1] 1
If we want it to change locally as well as globally we must use the double assignment (<<-
).
<- 1
x <- 2
y <- function() {
fun_example
# no change in a global way, just locally
<- 3
x # change in a global way
<<- 0 #<<
y
print(x)
print(y)
}
fun_example() # value inside function (local)
[1] 3
[1] 0
# global value x
[1] 1
# global value y
[1] 0
Try to perform the following exercises without looking at the solutions
📝 Modify the code below to define a function called sum_function
, so that given two elements, it returns their sum.
<- function(x, y) {
name <- # code
sum_output return()
}# we apply the function
sum_function(3, 7)
<- function(x, y) {
sum_function<- x + y
sum_output return(sum_output)
}sum_function(3, 7)
📝 Modify the code below to define a function called product_function
, so that given two elements, it returns their product, but by default it calculates the square
<- function(x, y) {
name <- # code
prod_output return()
}product_function(3)
product_function(3, -7)
<- function(x, y = x) {
product_function
<- x * y
prod_output return(prod_output)
}product_function(3)
product_function(3, -7)
📝 Define a function called equal_names
that, given two names, tells us if they are equal or not. Do this by considering case-sensitive, and case-insensitive. Use the {stringr}
package.
# Case-sensitive
<- function(person_1, person_2) {
equal_names
return(person_1 == person_2)
}equal_names("Javi", "javi")
equal_names("Javi", "Lucía")
# Case-insensitive
<- function(person_1, person_2) {
equal_names
return(toupper(person_1) == toupper(person_2))
}equal_names("Javi", "javi")
equal_names("Javi", "Lucía")
📝 Create a function called compute_BMI
that, given two arguments (weight and height in meters) and a name, returns a list with the BMI (\(weight/(height^2)\)) and the name.
<- function(name, weight, height) {
compute_BMI
return(list("name" = name, "BMI" = weight/(height^2)))
}
📝 Repeat the previous exercise but with another optional argument called units (by default, units = “meters”
). Develop the function so that it does the right thing if units = “meters”
and if units = “centimeters”
.
<- function(name, weight, height, units = "meters") {
compute_BMI
return(list("name" = name,
"BMI" = weight / (if_else(units == "meters", height, height/100)^2)))
}
📝 Create a fictitious tibble of 7 persons, with three variables (invent name, and simulate weight, height in centimeters), and apply the defined function so that we obtain a fourth column with their BMI.
<-
data tibble("name" = c("javi", "sandra", "laura",
"ana", "carlos", "leo", NA),
"weight" = rnorm(n = 7, mean = 70, sd = 1),
"height" = rnorm(n = 7, mean = 168, sd = 5))
|>
data mutate(BMI = compute_BMI(name, weight, height, units = "centimeters")$BMI)
📝 Create a function called shortcut
that has two numeric arguments x
and y
. If both are equal, you should return equal
and have the function terminate automatically (think about when a function exits). WARNING: x
and y
could be vectors. If they are different (of equal length) calculate the proportion of different elements. If they are different (being of different length), it returns the elements that are not common.
<- function(x, y) {
shortcut
if (all(x == y) & length(x) == length(y)) { return("equal") }
else {
if (length(x) == length(y)) {
<- sum(x != y) / length(x)
n_diff return(n_diff)
else {
}
<- unique(c(setdiff(x, y), setdiff(y, x)))
diff_elem return(diff_elem)
}
} }
To practice using functions we are going to create a temperature converter. Let’s start simple. Try to conceptualize the idea on paper first.
Define a function called
celsius_to_kelvin
that, given a temperature in Celsius (e.g.temp
as argument) converts it to Kelvin according to the conversion formula below. After defining the function apply it to a vector of temperatures.
\[K = °C + 273.15\]
# define function name and arguments
<- function(temp) {
celsius_to_kelvin
# convert
<- temp + 273.15
kelvin
# output
return(kelvin)
}
<- c(-15, -3, 0, 15, 27.5)
x celsius_to_kelvin(x)
Create the inverse function
kelvin_to_celsius
and apply it to another vector of temperatures. You will have to make sure that the temperature in Kelvin does not take negative values (since it is an absolute scale). In case this is not true, returnNA
.
# define function name and arguments
<- function(temp) {
kelvin_to_celsius
# if negative in Kelvin, we stop and return absent
# otherwise, we convert
<- if_else(temp < 0, NA, temp - 273.15)
celsius
# Think why we haven't done it with an if (...) else (...)
# output
return(celsius)
}
<- c(0, 250, 300, 350)
y kelvin_to_celsius(y)
Create a joint function
converter_temp
that has two arguments: temperature and a text argument that tells us if it is kelvin or celsius (and that by default the input temperature is Celsius). The function must use that string to decide in which direction it converts (check that the text argument does not have an option other than the two allowed; otherwise, return error using the stop(“error message…”) command). Apply it to the previous vectors and check that it gives the same.
# define function name and arguments
# default, units in celsius
<- function(temp, units = "celsius") {
conversor_temp
# we check that units are correct
# within the allowed values
if (units %in% c("celsius", "kelvin")) {
if (units == "celsius") {
<- celsius_to_kelvin(temp)
temp_out
else {
}
<- kelvin_to_celsius(temp)
temp_out
}
else {
}
# otherwise we stop the function with an error message
stop("Error: just 'celsius' or 'kelvin' as units")
}
# output
return(temp_out)
}
# Notice that we have not used `if_else()` because the number of elements
# to evaluate in the condition must be equal to the number of elements that
# it returns, by doing it vectorially.
conversor_temp(x)
conversor_temp(y, units = "kelvin")
Repeats the previous function but regardless of whether units are in upper or lower case.
conversor_temp(y, units = "Kelvin")
Error in conversor_temp(y, units = "Kelvin"): could not find function "conversor_temp"
# define function name and arguments
# default, units in celsius
library(stringr)
<- function(temp, units = "celsius") {
conversor_temp
# we use str_to_lower to make everything lowercase
if (str_to_lower(units) %in% c("celsius", "kelvin")) {
if (units == "celsius") {
<- celsius_to_kelvin(temp)
temp_out
else {
}
<- kelvin_to_celsius(temp)
temp_out
}
else {
}
# otherwise we stop the function with an error message
stop("Error: just 'celsius' or 'kelvin' units")
}
# devolvemos
return(temp_out)
}
conversor_temp(y, units = "Kelvin")
Repeat all the above process creating
converter_temp2
but to convert between Celsius and Fahrenheit following the formula below
\[ºC = (ºF − 32) * \frac{5}{9}, \quad ºF = 32 + ºC * \frac{9}{5}\]
<- function(temp) {
celsius_to_fahr
<- 32 + temp * (9/5)
fahr return(fahr)
}celsius_to_fahr(x)
<- function(temp) {
fahr_to_celsius
<- (temp - 32) * (5/9)
celsius return(celsius)
}
<- c(40, 60, 80, 100)
z fahr_to_celsius(z)
<- function(temp, units = "celsius") {
conversor_temp2
if (str_to_lower(units) %in% c("celsius", "fahr")) {
if (units == "celsius") {
<- celsius_to_fahr(temp)
temp_out
else {
}
<- fahr_to_celsius(temp)
temp_out
}
else {
}
stop("Error: just 'celsius' or 'fahr' units")
}
return(temp_out)
}
conversor_temp2(x)
conversor_temp2(z, units = "fahr")
Finally, create the superfunction
converter_temp_total
that allows as input argument a temperature in one of the 3 units, a text indicating in which units it comes and another one indicating in which units it is to be output. By default it converts from celsius to kelvin.
<-
converter_temp_total function(temp, units_input = "celsius",
units_output = "kelvin") {
if (str_to_lower(units_input) %in% c("celsius", "fahr", "kelvin") &
str_to_lower(units_output) %in% c("celsius", "fahr", "kelvin")) {
if (units_input == units_output) {
return(temp)
}
else if (units_input == "celsius") {
if (units_output == "kelvin") {
<- celsius_to_kelvin(temp)
temp_out
else {
}
<- celsius_to_fahr(temp)
temp_out
}
else if (units_input == "kelvin") {
}
if (units_output== "celsius") {
<- kelvin_to_celsius(temp)
temp_out
else {
}
<- celsius_to_fahr(kelvin_to_celsius(temp))
temp_out
}
else {
}
if (units_output == "celsius") {
<- fahr_to_celsius(temp)
temp_out
else {
}
<- celsius_to_kelvin(fahr_to_celsius(temp))
temp_out
}
}
else {
}
stop("Error: just 'celsius', 'kelvin' or 'fahr'")
}
return(temp_out)
}
converter_temp_total(x, units_input = "celsius",
units_output = "celsius")
converter_temp_total(y, units_input = "kelvin",
units_output = "kelvin")
converter_temp_total(y, units_input = "kelvin",
units_output = "celsius")
converter_temp_total(z, units_input = "fahr",
units_output = "celsius")
converter_temp_total(z, units_input = "fahr",
units_output = "celsius")
converter_temp_total(converter_temp_total(z, units_input = "fahr",
units_output = "kelvin"),
units_input = "kelvin",
units_output = "celsius")
In R
the function sample(x = ..., size = ...)
will be very useful: from a collection of x
elements, it selects a random size
number of them. For example, if we want to simulate 3 times the throw of a die we have 6 possible elements (x = 1:6
) and we select it 3 times (size = 3
).
sample(x = 1:6, size = 3)
[1] 1 3 5
Since it is random, each time you run it, something different will come out.
sample(x = 1:6, size = 3)
[1] 6 2 3
What if we want to throw it 10 times?
sample(x = 1:6, size = 10)
Error in sample.int(length(x), size, replace, prob): cannot take a sample larger than the population when 'replace = FALSE'
Having only 6 possible elements and choosing 10, it cannot, so we have to indicate that we want a sample with replacement (as with the die, each face can be repeated when re-rolled).
sample(x = 1:6, size = 10, replace = TRUE)
[1] 1 4 1 3 5 1 6 1 2 6
With the above, imagine that you are in a TV contest where you are given a choice of 3 doors: in one there is a millionaire prize and in the other 2 an oreo cookie. Design the simulation study with for loops to approximate the probability that you get the prize (obviously it has to give you approx 0.333333333). Perform the experiment for 10, 50 trials, 100 trials, 500 trials, 1000 trials, 10 000 trials and 25 000 trials (hint: you need a loop within a loop). What do you observe?
library(dplyr)
# Possible doors
<- c(1, 2, 3)
doors
# Possible trials
<- c(10, 50, 100, 500, 1000, 10000, 25000)
trials
# For scenario, we define the number of times in which we win prize
<- rep(0, length(trials))
n_prizes
# first loop: scenarios
for (i in 1:length(trials)) {
# second loop: for scenario, the number of trials
for (j in 1:trials[i]) {
<- sample(x = doors, size = 1)
prize <- sample(x = doors, size = 1)
choice
<- if_else(choice == prize, n_prizes[i] + 1, n_prizes[i])
n_prizes[i]
}# in proportion
<- n_prizes[i] / trials[i]
n_prizes[i]
} n_prizes
What if, in each round, one of the non-winning doors that you have not chosen was opened for you, would you change doors or would you stay? Simulate both cases and find out which is the correct strategy (this problem is known as the Monty Hall problem and even appears in movies such as 21 Black Jack).
<- c(1, 2, 3)
doors <- c(10, 50, 100, 500, 1000, 10000, 25000)
trials <- n_prizes_change <- rep(0, length(trials))
n_prizes_nochange
for (i in 1:length(trials)) {
for (j in 1:trials[i]) {
<- sample(x = doors, size = 1)
init_choice
<- sample(x = doors, size = 1)
prize
<-
open_door != init_choice & doors != prize]
doors[doors
if (length(open_door) > 1) {
<- sample(x = open_door, size = 1)
open_door
}
<-
n_prizes_nochange[i] if_else(init_choice == prize, n_prizes_nochange[i] + 1,
n_prizes_nochange[i])
<- doors[doors != init_choice & doors != open_door]
changed_door <-
n_prizes_change[i] if_else(changed_door == prize, n_prizes_change[i] + 1,
n_prizes_change[i])
}<- n_prizes_nochange[i] / trials[i]
n_prizes_nochange[i] <- n_prizes_change[i] / trials[i]
n_prizes_change[i]
}
n_prizes_nochange n_prizes_change