PSY 203: Statistics for Psychology: Introduction to R (Lab 02) Exercise Answers

Justin Dainer-Best

This is an R Markdown document. When you execute code within the notebook, the results appear beneath the code. I’ll upload a version of the correct answer after each class.

This R Markdown file can be gotten to by looking on the the lab class website.

This is the set of exercises for you to complete after the first lab tutorial. Please read the text and try the code as it follows!

These documents are the second way (after tutorials like what you’ve already completed) that you will interact with data in R Studio during our lab classes. The “markdown” here is similar to what you may have seen on web forums, on Wikipedia, or elsewhere online. It is pretty straightforward, and mostly just involves basic word processing. Fortunately, you can also find some excellent “cheatsheets” by searching online for an R Markdown cheatsheet, or start here on the R studio website.

Want to learn more about Markdown? I recommend spending 15 minutes completing this tutorial: https://commonmark.org/help/tutorial/

You’ll also see that things like asterisks make font italics (two make font bold) in the knitted output—but we’ll talk more about that, below. Similarly, when I put text between tick marks like this, it usually means that it’s code!

How to run code in these documents

You can try executing the “chunk” below by clicking the Run button within the chunk, or by placing your cursor inside it and pressing (on a Mac) Cmd+Shift+Enter. (On a PC, try instead pressing Ctrl+Shift+Enter. Many shortcuts in R will use the Command key on a Mac and the Control key on a PC.) If you only want to execute (run) one line at a time, you can always just hit Cmd+Enter (Ctrl+Enter) while the cursor is on that line—or just a selection of text.


# Load the relevant packages
library(tidyverse)

The code that you just ran loads the tidyverse package that we installed last lab. You need to load some packages every time you load RStudio; this is one of them.

When you ran that line, it may have “minimized” the Console window and made this pane larger. That’s fine! It’s still running the code in the Console, though. If you click on the word Console below, you’ll see that it loaded several packages.

Loading a package lets you use the functions (commands) that the package has. Functions like the chain operator %>% or select() that we talked about in the tutorial come from a package called dplyr; that package is loaded by running the above command, but we could also load it explicitly with:


library(dplyr)

You can add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I. (Ctrl+Alt+I on a PC.) Hit enter twice (from the end of this line) and give it a try! Write something like 5+5 in the code chunk you create, and try running it. Make sure the code works!

Knitting

In a code chunk, any code you write will “run” in the console when you run it, or when you Knit the document. Knitting an R Markdown document will create an HTML file (a website) with all of your code. Try hitting the Knit button (it has a ball of yarn next to it) at the top of this pane. You should get a browser that pops up with the whole document. (You can close it to return here.)

If there’s an error, that’s okay—it might be because your working directory isn’t quite right.

These HTML files contain all of the code and its output, and are saved in the same folder as the .Rmd (R Markdown) file. Again: click the Knit button or press Cmd+Shift+K [Ctrl+Shift+K] to see the HTML file.

When you are asked to submit the results of your work, that HTML file is what you should submit. (But I’ll also always accept the .Rmd version if the code won’t knit.)

A general rule about knitting R Markdown: do it early and often. Much better to find errors right away, rather than figuring them out when you thought you were done.

If you have any questions at this point, ask me or a course assistant.

Importing data

You can import data from the File menu in RStudio. However, most of the time you’ll want to do so using code.

When you ran the line that downloaded this file, it also downloaded a CSV (comma-separated value) file with data about penguins. You can read more about that data here. (That will be a link when you knit this—either knit it and click, or copy and paste the URL into your web browser if you’d like to read it.) The data is from Horst, Hill, & Gorman (2020).

You can use the function read_csv() to, well, read a CSV file. Try running the following code. (How? Scroll back up and read the section “How to run code in these documents”.)


read_csv("penguins.csv")


# A tibble: 344 x 8
   species island bill_length_mm bill_depth_mm flipper_length_…
   <chr>   <chr>           <dbl>         <dbl>            <dbl>
 1 Adelie  Torge…           39.1          18.7              181
 2 Adelie  Torge…           39.5          17.4              186
 3 Adelie  Torge…           40.3          18                195
 4 Adelie  Torge…           NA            NA                 NA
 5 Adelie  Torge…           36.7          19.3              193
 6 Adelie  Torge…           39.3          20.6              190
 7 Adelie  Torge…           38.9          17.8              181
 8 Adelie  Torge…           39.2          19.6              195
 9 Adelie  Torge…           34.1          18.1              193
10 Adelie  Torge…           42            20.2              190
# … with 334 more rows, and 3 more variables: body_mass_g <dbl>,
#   sex <chr>, year <dbl>

It should just work—this is the great thing about having your working directory set! You might see some language about how it’s been “Parsed with column specification”—great! It’s telling you the types (classes) of variables.

Now, you may remember from the tutorial, that if it’s printing to the screen, it’s not actually saving this data. To do that, we need to assign it to a variable. Run this code:


penguins <- read_csv("penguins.csv")

You should see the variable penguins appear in the Environment pane. (If you don’t see the Environment pane, it’s one of the tabs in the top right!) It’ll tell you that there are 344 observations of 8 variables—i.e., the data frame has 344 rows and 8 columns.

(Aside: You don’t have to use code to import data. You could also [a] click on the “Import Dataset” button in the Environment pane, or [b] in the Files pane, click on the filename of the file when you’re in the right folder.)

Exercises

What class is this variable penguins? In the code chunk below, use the class() function to find out! (It should give you more than one answer, including that it’s a “tbl” [table] and a data.frame)


class(penguins)


[1] "spec_tbl_df" "tbl_df"      "tbl"         "data.frame"

Using both the select() function from the dplyr package and the $ operator (which is the default in R), print the column species from penguins. (Do one on the first line and the other on the second.)

Remember that you should use the chain %>% with select(). Feel free to refer back to your notes or the tutorial!


# with select()
penguins %>% select(species)


# A tibble: 344 x 1
   species
   <chr>  
 1 Adelie 
 2 Adelie 
 3 Adelie 
 4 Adelie 
 5 Adelie 
 6 Adelie 
 7 Adelie 
 8 Adelie 
 9 Adelie 
10 Adelie 
# … with 334 more rows


# or:
select(penguins, species)


# A tibble: 344 x 1
   species
   <chr>  
 1 Adelie 
 2 Adelie 
 3 Adelie 
 4 Adelie 
 5 Adelie 
 6 Adelie 
 7 Adelie 
 8 Adelie 
 9 Adelie 
10 Adelie 
# … with 334 more rows


# with the $
penguins$species


  [1] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
  [6] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [11] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [16] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [21] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [26] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [31] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [36] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [41] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [46] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [51] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [56] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [61] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [66] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [71] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [76] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [81] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [86] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [91] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
 [96] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[101] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[106] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[111] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[116] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[121] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[126] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[131] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[136] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[141] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[146] "Adelie"    "Adelie"    "Adelie"    "Adelie"    "Adelie"   
[151] "Adelie"    "Adelie"    "Gentoo"    "Gentoo"    "Gentoo"   
[156] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[161] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[166] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[171] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[176] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[181] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[186] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[191] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[196] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[201] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[206] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[211] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[216] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[221] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[226] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[231] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[236] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[241] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[246] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[251] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[256] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[261] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[266] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[271] "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"    "Gentoo"   
[276] "Gentoo"    "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[281] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[286] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[291] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[296] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[301] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[306] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[311] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[316] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[321] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[326] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[331] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[336] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[341] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"

Using the slice() function from dplyr, print the 20th row of penguins.


penguins %>% slice(20)


# A tibble: 1 x 8
  species island bill_length_mm bill_depth_mm flipper_length_…
  <chr>   <chr>           <dbl>         <dbl>            <dbl>
1 Adelie  Torge…             46          21.5              194
# … with 3 more variables: body_mass_g <dbl>, sex <chr>, year <dbl>

Okay, now combine them! Use a chain %>% between the commands. First, select() the column species from penguins and second, slice() the 20th row.


penguins %>% 
  select(species) %>%
  slice(20)


# A tibble: 1 x 1
  species
  <chr>  
1 Adelie

In words, describe what is going on in #4. (Just type it in the space below.)

We take the data frame “penguins” and select the column “species” from it, and then we slice the 20th row – so we get the value of “Adelie” (which is the 20th value in that row.)

Try knitting the document. Does everything work? If there’s an error, is it one you can address? Try to fix it.
You can also use square brackets [like these] after the name of a variable to subset part of that variable, as you learned in the tutorial. With a data frame, if you put any number inside the brackets, it will select the column that corresponds to that number. Use this method to select the 4th column of penguins in the code chunk below.

Technically, there’s a comma that comes before the 4, inside the brackets. (Nothing comes before the comma.) Try that out—you should get the same result.


penguins[4]


# A tibble: 344 x 1
   bill_depth_mm
           <dbl>
 1          18.7
 2          17.4
 3          18  
 4          NA  
 5          19.3
 6          20.6
 7          17.8
 8          19.6
 9          18.1
10          20.2
# … with 334 more rows


penguins[,4] # same thing


# A tibble: 344 x 1
   bill_depth_mm
           <dbl>
 1          18.7
 2          17.4
 3          18  
 4          NA  
 5          19.3
 6          20.6
 7          17.8
 8          19.6
 9          18.1
10          20.2
# … with 334 more rows


identical(penguins[4], penguins[,4]) # tests if they're identical---TRUE


[1] TRUE

If you put a comma after the number in the square brackets, it selects the row from a data frame. (e.g., dataframe[number,]). Try this with the penguins data: select the fifth row below.


penguins[5,]


# A tibble: 1 x 8
  species island bill_length_mm bill_depth_mm flipper_length_…
  <chr>   <chr>           <dbl>         <dbl>            <dbl>
1 Adelie  Torge…           36.7          19.3              193
# … with 3 more variables: body_mass_g <dbl>, sex <chr>, year <dbl>

Okay, now combine them: select the fifth row of the fourth column in penguins by selecting [5,4] after the variable penguins.


penguins[5,4]


# A tibble: 1 x 1
  bill_depth_mm
          <dbl>
1          19.3

This is actually similar to how Excel thinks of cells: it’s called “row, column” notation.

Next step: Filtering

What about filtering data? Imagine that we were only interested in Chinstrap penguins (not Adelie). Well, we can filter those out with the filter() function from the dplyr package.

You can click on the Help pane at the lower right of the RStudio window, or type ?filter in the Console, to find out more about filter. Essentially, it does a logical check for some condition that you provide. What’s that mean?

Well, filter() wants something that results in either TRUE or FALSE—a logical (or Boolean) response. Let’s check that out: just run the following code:


penguins$species == "Chinstrap"


  [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [11] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [21] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [31] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [41] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [51] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [71] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[101] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[131] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[141] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[151] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[171] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[191] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[201] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[211] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[231] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[251] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[261] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[271] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE
[281]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[291]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[301]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[311]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[321]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[331]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
[341]  TRUE  TRUE  TRUE  TRUE

What you’ll get is a series of TRUE and FALSE values, as R tests every one of the 344 rows of the species column against the name “Chinstrap”. Looks like the Chinstrap penguins are the last of the set.

The double equal sign can be read as “is this equal?” or a test for equivalence. If we put that inside of a filter() function, we’ll get only the rows where it’s true.

So: filter() works like select() or slice(). Name the variable, use the %>% chain, and then use the filter() function with this code inside the parentheses: species == "Chinstrap". Give that a try below. (Remember not to include the tick marks ` themselves in the code you write.)


penguins %>%
  filter(species == "Chinstrap")


# A tibble: 68 x 8
   species island bill_length_mm bill_depth_mm flipper_length_…
   <chr>   <chr>           <dbl>         <dbl>            <dbl>
 1 Chinst… Dream            46.5          17.9              192
 2 Chinst… Dream            50            19.5              196
 3 Chinst… Dream            51.3          19.2              193
 4 Chinst… Dream            45.4          18.7              188
 5 Chinst… Dream            52.7          19.8              197
 6 Chinst… Dream            45.2          17.8              198
 7 Chinst… Dream            46.1          18.2              178
 8 Chinst… Dream            51.3          18.2              197
 9 Chinst… Dream            46            18.9              195
10 Chinst… Dream            51.3          19.9              198
# … with 58 more rows, and 3 more variables: body_mass_g <dbl>,
#   sex <chr>, year <dbl>

You should get 68 rows—all of which are of the Chinstrap species.

You’ll note that I didn’t need to specify the name of the data frame again inside of the filter() function. In fact, you should not name the data frame again.

Last two pieces: use filter() again to only select rows from penguins where the bill length in millimeters (column name: bill_length_mm) is more than 40.


penguins %>%
  filter(bill_length_mm > 40)


# A tibble: 242 x 8
   species island bill_length_mm bill_depth_mm flipper_length_…
   <chr>   <chr>           <dbl>         <dbl>            <dbl>
 1 Adelie  Torge…           40.3          18                195
 2 Adelie  Torge…           42            20.2              190
 3 Adelie  Torge…           41.1          17.6              182
 4 Adelie  Torge…           42.5          20.7              197
 5 Adelie  Torge…           46            21.5              194
 6 Adelie  Biscoe           40.6          18.6              183
 7 Adelie  Biscoe           40.5          17.9              187
 8 Adelie  Biscoe           40.5          18.9              180
 9 Adelie  Dream            40.9          18.9              184
10 Adelie  Dream            42.2          18.5              180
# … with 232 more rows, and 3 more variables: body_mass_g <dbl>,
#   sex <chr>, year <dbl>

Then, copy that code below—and after the 40, add a comma, and a second thing to filter by. You want to filter() only rows from penguins where (again) the bill length in millimeters (column name: bill_length_mm) is more than 40, and where the species is Adelie.


penguins %>%
  filter(bill_length_mm > 40, 
         species == "Adelie")


# A tibble: 51 x 8
   species island bill_length_mm bill_depth_mm flipper_length_…
   <chr>   <chr>           <dbl>         <dbl>            <dbl>
 1 Adelie  Torge…           40.3          18                195
 2 Adelie  Torge…           42            20.2              190
 3 Adelie  Torge…           41.1          17.6              182
 4 Adelie  Torge…           42.5          20.7              197
 5 Adelie  Torge…           46            21.5              194
 6 Adelie  Biscoe           40.6          18.6              183
 7 Adelie  Biscoe           40.5          17.9              187
 8 Adelie  Biscoe           40.5          18.9              180
 9 Adelie  Dream            40.9          18.9              184
10 Adelie  Dream            42.2          18.5              180
# … with 41 more rows, and 3 more variables: body_mass_g <dbl>,
#   sex <chr>, year <dbl>

When that’s all done, Knit the document again. If there are errors, try to solve them. And then upload the HTML file to Brightspace under Lab Exercises: Practicing with R (this link should get you there). I’ll note again: if the file doesn’t knit and you can’t get it to work, that’s okay! You can upload this .Rmd file.

Have feedback for me? Send it to me at the same exit survey link—select Lab 2.

Introduction to R (Lab 02) Exercise Answers

How to run code in these documents

Knitting

Importing data

Exercises

Next step: Filtering

Citation