Getting Started with Coding in R (Part 2)
In our previous post on Getting Started with Coding in R (Part 1), we covered how to download and install R and RStudio. In this post, we’ll dive right into coding in R. We’ll do that by quickly building up some skills to complete the following: Write a function to count the total number of heads and tails in a series of coin flips.
In the process, we’ll learn about creating variables, basic arithmetic functions, conditional statements, for loops, and writing functions in R.
How to follow along in this tutorial
In the rest of this post, I’ll present snippets of R code followed by the output generated from those snippets. You can follow along by launching the RStudio software you installed in Getting Started with Coding in R (Part 1) and typing each snippet into the Console window. Typing directly into the console is a great way to learn and work through the code on your own.
Making a coding plan
Let’s start by breaking down our task into smaller steps. We’ll need to figure out how to do the following.
- Model a single coin flip in R
- Create variables to count the total number of heads and tails flipped
- Conditionally increase the count for the heads or tails variables depending on whether we flipped a head or a tail on the last coin flip
- Combine these into a function that takes in the number of coin flips and outputs the number of heads and tails flipped
If you’re new to coding in R, this might sound like a lot so let’s go over each of these steps one by one.
Step 1 - How do you model a coin flip in R?
This is probably the hardest part of this task but it won’t be hard for us because R already has a built-in function for doing this! In fact, R has many built-in functions and packages for implementing popular methods for data analysis and machine learning. This is why R is such a great language to start with when learning and exploring statistics and data science!
The built-in function that models a single coin flip is the rbinom(1, 1, 0.5)
function. If you want to know more about the rbinom
function, you can read its documentation from the Help tab. To do that, let’s pull up the documentation for rbinom
by typing the following into the console.
?rbinom
In RStudio, the documentation for a function shows up in the Help tab whenever you type ?
followed by the function name into the console.
How to use the rbinom
function
The rbinom
function simulates a random variable from the binomial distribution. We won’t go into the technical details in this post but it’s still helpful to walk through how to use this function.
Under the Usage section of its documentation, we see rbinom(n, size, prob)
. This means that the rbinom
function takes in three inputs: n, size, and prob. The Arguments section of the documentation provides details on each of these inputs.
We see that n refers to the number of observations. In our context, this is the number of coins we want to flip. Since we want to flip a single coin, we’ll set n to \(1\).
Next, we see that size indicates the number of trials. For us, this is the number of times we want to flip that one coin. For now, let’s say we want to flip that one coin once. So we’ll also set size to \(1\).
Finally, we see that prob refers to the probability of success on each trial. In our context, we could define success as either flipping a head or flipping a tail. This is completely up to us. To keep things simple, we’ll follow convention for now and say that success is defined by flipping a head. Since we want to flip a fair coin (that means that we’re equally likely to flip a head or a tail), we’ll set prob to \(0.5\). That way, we have a \(50\) percent chance of flipping a head and a \(50\) percent chance of flipping a tail.
Using these input arguments, we arrive at rbinom(n=1, size=1, prob=0.5)
. We can also type this into the console as rbinom(1, 1, 0.5)
. This is because the inputs are listed in the order that they appear in the Usage section of the function documentation.
Trying out the rbinom
function
Let’s try it out! What do you get when you type rbinom(1, 1, 0.5)
in the console?
rbinom(1, 1, 0.5)
#> [1] 1
What if you type it in again? If you try this several times, do you get different answers?
rbinom(1, 1, 0.5)
#> [1] 0
rbinom(1, 1, 0.5)
#> [1] 1
rbinom(1, 1, 0.5)
#> [1] 1
rbinom(1, 1, 0.5)
#> [1] 1
Since we defined success as flipping a head, an output value of 1
means that we flipped a head and 0
means that we flipped a tail. Additionally, since we’re flipping a fair coin, it doesn’t matter whether we define 1
as flipping a head or a tail. Once we start counting how many heads and tails we’re getting, however, we’ll need to make sure we’re using the same definition each time we count.
If we want to get the same sequence of heads and tails next time we run rbinom(1, 1, 0.5)
several times in a row, we can do that by adding a seed before the first flip.
set.seed(123)
rbinom(1, 1, 0.5)
#> [1] 0
rbinom(1, 1, 0.5)
#> [1] 1
rbinom(1, 1, 0.5)
#> [1] 0
rbinom(1, 1, 0.5)
#> [1] 1
You can read more about setting seeds for generating random numbers in the help documentation with the following.
?set.seed
Step 2 - Create variables to keep track of the number of heads and tails flipped
Now that we know how to model a single coin flip in R, we’ll need to know how to create variables to keep track of the number of heads and tails we flip. A variable is just a named object that we use to store information. Since we’re going to be storing count information, our variables will be numeric and we’ll assign number values to them.
Before we start counting, we’ll want both of these variables to be set to 0
. We can do this with the following.
heads <- 0
tails <- 0
We use the assignment <-
to assign things to variables. Now when we call the heads
and tails
variables, we’ll see that they have value 0
.
heads
#> [1] 0
tails
#> [1] 0
We can name our variables anything we like. However, in R, variable names cannot begin with numbers. Also, capitalization makes a difference so “Heads” is not the same variable as “heads”. We can test this by typing Heads
into the console.
Numeric variables are objects that hold numbers so they function just like numbers. We can add and multiply other numbers to them and we can also add and multiply them to each other. To illustrate this, let’s make some new variables below.
a <- 100
b <- 5
a + 200
#> [1] 300
a*b
#> [1] 500
a/b
#> [1] 20
c = a + b
c
#> [1] 105
c - a
#> [1] 5
You can try your own variations by typing them into the console!
Step 3 - Counting the number of heads and tails
Now we’re ready to start counting the number of heads and tails we flip. To do that, we’ll use conditional statements, which typically follow some variation of the form: If (X happens), then (do Y). Otherwise, (do Z). In R, these statements will look something like this.
if (X) {
# do Y
} else {
# do Z
}
Since we’re counting heads and tails, we want to increase the value of the heads
variable by \(1\) if we flip a heads. Otherwise, we will increase the value of the tails
variable by \(1\). So our code will look something like the following.
# Initialize heads and tails variables to 0
heads <- 0
tails <- 0
# Flip a coin once
flip <- rbinom(1,1,0.5)
# If we flip a head, increase the heads variable by 1
if (flip == 1) { # <------------------------------- conditional statement
heads = heads + 1
# Otherwise, increase the tails variable by 1
} else {
tails = tails + 1
}
In the code above, I’ve added comments in the code to describe what each portion of the code does by adding text after a #
. The commented portions are not read by R so you can make notes for yourself and future readers of your code in this way. In RStudio, these comments typically appear in a different color in your editor so you can easily distinguish them from your code.
I’ve also added a new variable flip
in the code above. The flip
variable stores the output of the coin I just flipped. We can look at the new output of the heads and tails variables below.
flip
#> [1] 1
heads
#> [1] 1
tails
#> [1] 0
Counting the outcome of many coin flips
If we want to model the outcome of multiple coin flips performed one at a time, we can combine our conditional statements with for loops. A for loop contains a set of instructions that get repeated over some sequence of numbers. The following is an example.
start_num <- 1
end_num <- 100
for (index in start_num:end_num) {
print(index)
}
In the above example, print(index)
is repeated over the sequence of numbers beginning with start_num
and ending with end_num
. We can replace start_num
and end_num
with other variable names as well as actual values. We can also replace index
with other variable names. Some common variable names used to denote indices are i
, j
, k
, and a
but we can also use any other variable name we prefer.
Let’s say we want to model \(1000\) coin flips. Each time we flip our coin, we want to check whether we flipped a head or a tail, and increment the head or tail variable accordingly to keep track of our counts. Below is code to do that in R.
# Variable for number of times we want to flip the coin
num_flips <- 1000
# Initialize heads and tails variables for counting
heads <- 0
tails <- 0
# Repeat everything in the outer brackets num_flips times
for (i in 1:num_flips) { # <------------------------------- for loop
# Flip the coin
flip <- rbinom(1,1,0.5)
# Check if we flipped a head or tail, increment accordingly
if (flip == 1) {
heads = heads + 1
} else {
tails = tails + 1
}
}
Try this and see what you get! What fraction of the flips do you think will be heads? What fraction will be tails? To see the fraction of heads and tails we flipped, we can just type the following into the console.
heads/num_flips
#> [1] 0.493
tails/num_flips
#> [1] 0.507
Step 4 - Writing a coin flipping function
The final step in this task is to combine everything into a function that takes in the number of coin flips we want, and outputs the number of heads and tails we flipped.
Functions store instructions that depend on a particular set of input values. These are very useful in coding because we can use them to easily re-run a set of instructions for different inputs.
Functions in R look something like the following.
function_name <- function(input) {
# STUFF THE FUNCTION DOES
return(output)
}
Each function needs a name, which we use to call the function. They also take in inputs, carry out some instructions based on the inputs, and return some outputs.
Let’s take the code we made above for counting the number of heads and tails over \(1000\) coin flips. We’ll let num_flips
be our input argument. This means that the user can specify the number of times we want to flip our coin.
For our outputs, let’s say we want to return the number of heads, the number of tails, the proportion of heads, and the proportion of tails. Give this a try before scrolling down to see the answer! What do you think you’ll fill in with the instructions?
coinflips <- function(num_flips) {
# What will you put here?
return(list=(heads=heads, tails=tails, prop_heads=heads/num_flips,
prop_tails=tails/num_flips))
}
Below is how I’ve written the function. There are often multiple ways to code the same thing so your code might look different and still achieve the same goal!
coinflips <- function(num_flips) {
# Initialize heads and tails counters
heads <- 0
tails <- 0
# Loop over num_flips
for (i in 1:num_flips) {
# Flip the coin
flip <- rbinom(1,1,0.5)
# Check whether we flipped a head or tail, increment accordingly
if (flip == 1) {
heads = heads + 1
} else {
tails = tails + 1
}
}
return(list(heads=heads, tails=tails, prop_heads=heads/num_flips,
prop_tails=tails/num_flips))
}
The function above returns the output of the coinflips
function as a list
with named objects. We can access the different parts of that list as follows.
ex1 <- coinflips(100)
ex1$heads
#> [1] 48
ex1$tails
#> [1] 52
ex1$prop_heads
#> [1] 0.48
ex1$prop_tails
#> [1] 0.52
Let’s model 100 coin flips using our function. What proportion of heads and tails do you get?
coinflips(100)
#> $heads
#> [1] 50
#>
#> $tails
#> [1] 50
#>
#> $prop_heads
#> [1] 0.5
#>
#> $prop_tails
#> [1] 0.5
How does the proportion of heads vary with the number of coin flips?
Let’s say we want to compare the proportion of heads over num_flips
ranging from \(50\) to \(250{,}000\). To do that, we’ll create a vector called nvec
containing the different values of num_flips
. In R, we can do that with the seq
function.
The following code creates a vector named nvec
that contains \(10\) values beginning with \(50\) and ending with \(250{,}000\) with equal spacing between the values. In order to get integer values, we round all values to the nearest whole number.
nvec <- round(seq(50,250000, length.out=10))
nvec
#> [1] 50 27822 55594 83367 111139 138911 166683 194456 222228 250000
Now we can use the sapply
function to apply the coinflips
function to each entry in nvec
. The first input in sapply
is the vector of values. The second input is the function to apply to each element in the vector. For examples on how to use the sapply
function, you can type ?sapply
into the console.
outcomes <- sapply(nvec, coinflips)
Let’s look at the results of our experiment. Each column in outcomes
below goes with the corresponding entry in nvec
.
outcomes
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7]
#> heads 23 13740 27643 41838 55533 69570 83186
#> tails 27 14082 27951 41529 55606 69341 83497
#> prop_heads 0.46 0.4938538 0.4972299 0.5018533 0.4996716 0.5008243 0.4990671
#> prop_tails 0.54 0.5061462 0.5027701 0.4981467 0.5003284 0.4991757 0.5009329
#> [,8] [,9] [,10]
#> heads 97028 110755 125200
#> tails 97428 111473 124800
#> prop_heads 0.4989715 0.4983845 0.5008
#> prop_tails 0.5010285 0.5016155 0.4992
We can also plot the number of coin flips contained in nvec
against the proportion of heads. Since the proportion of heads is contained in the third row in outcomes
, we can access it with outcomes[3,]
. This tells R that we want the entries corresponding to the 3rd row of outcomes
and all its columns.
prop_heads <- outcomes[3,]
We can plot them with the plot
function. The first input in plot
corresponds to the x-axis and the second input corresponds to the y-axis.
plot(nvec, prop_heads, xlab="Number of coin flips", ylab="Proportion of heads",
pch=18, col="blue")
From this plot, we see that the proportion of heads hovers around \(50\) percent for larger numbers of coin flips. This is what we expect since our rbinom(1, 1, 0.5)
function simulates fair coin flips.
Wrap up
Great job! In this post, you learned the basics of coding in R including how to
- read help documentation for functions,
- create numeric variables and vectors,
- model coin flips in R,
- write conditional statements,
- write for-loops,
- create functions,
- apply a function to each element in a vector, and
- make simple plots.
That’s a lot of ground you covered! In future posts, we’ll revisit these skills as we explore new projects and ideas.