Completed exercises for the second lab
This is an R Markdown document. When you execute code within the notebook, the results appear beneath the code. I’ll upload a version of the correct answer after each class.
This R Markdown file can be gotten to by looking on the the lab class website.
This is the set of exercises for you to complete after the first lab tutorial. Please read the text and try the code as it follows!
These documents are the second way (after tutorials like what you’ve already completed) that you will interact with data in R Studio during our lab classes. The “markdown” here is similar to what you may have seen on web forums, on Wikipedia, or elsewhere online. It is pretty straightforward, and mostly just involves basic word processing. Fortunately, you can also find some excellent “cheatsheets” by searching online for an R Markdown cheatsheet, or start here on the R studio website.
Want to learn more about Markdown? I recommend spending 15 minutes completing this tutorial: https://commonmark.org/help/tutorial/
You’ll also see that things like asterisks make font italics (two make font bold) in the knitted output—but we’ll talk more about that, below. Similarly, when I put text between tick marks like this
, it usually means that it’s code!
You can try executing the “chunk” below by clicking the Run button within the chunk, or by placing your cursor inside it and pressing (on a Mac) Cmd+Shift+Enter. (On a PC, try instead pressing Ctrl+Shift+Enter. Many shortcuts in R will use the Command key on a Mac and the Control key on a PC.) If you only want to execute (run) one line at a time, you can always just hit Cmd+Enter (Ctrl+Enter) while the cursor is on that line—or just a selection of text.
# Load the relevant packages
library(tidyverse)
The code that you just ran loads the tidyverse
package that we installed last lab. You need to load some packages every time you load RStudio; this is one of them.
When you ran that line, it may have “minimized” the Console window and made this pane larger. That’s fine! It’s still running the code in the Console, though. If you click on the word Console below, you’ll see that it loaded several packages.
Loading a package lets you use the functions (commands) that the package has. Functions like the chain operator %>%
or select()
that we talked about in the tutorial come from a package called dplyr
; that package is loaded by running the above command, but we could also load it explicitly with:
library(dplyr)
You can add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I. (Ctrl+Alt+I on a PC.) Hit enter twice (from the end of this line) and give it a try! Write something like 5+5 in the code chunk you create, and try running it. Make sure the code works!
In a code chunk, any code you write will “run” in the console when you run it, or when you Knit the document. Knitting an R Markdown document will create an HTML file (a website) with all of your code. Try hitting the Knit button (it has a ball of yarn next to it) at the top of this pane. You should get a browser that pops up with the whole document. (You can close it to return here.)
If there’s an error, that’s okay—it might be because your working directory isn’t quite right.
These HTML files contain all of the code and its output, and are saved in the same folder as the .Rmd (R Markdown) file. Again: click the Knit button or press Cmd+Shift+K [Ctrl+Shift+K] to see the HTML file.
When you are asked to submit the results of your work, that HTML file is what you should submit. (But I’ll also always accept the .Rmd version if the code won’t knit.)
A general rule about knitting R Markdown: do it early and often. Much better to find errors right away, rather than figuring them out when you thought you were done.
If you have any questions at this point, ask me or a course assistant.
You can import data from the File menu in RStudio. However, most of the time you’ll want to do so using code.
When you ran the line that downloaded this file, it also downloaded a CSV (comma-separated value) file with data about penguins. You can read more about that data here. (That will be a link when you knit this—either knit it and click, or copy and paste the URL into your web browser if you’d like to read it.) The data is from Horst, Hill, & Gorman (2020).
You can use the function read_csv()
to, well, read a CSV file. Try running the following code. (How? Scroll back up and read the section “How to run code in these documents”.)
read_csv("penguins.csv")
# A tibble: 344 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Adelie Torge… 39.1 18.7 181
2 Adelie Torge… 39.5 17.4 186
3 Adelie Torge… 40.3 18 195
4 Adelie Torge… NA NA NA
5 Adelie Torge… 36.7 19.3 193
6 Adelie Torge… 39.3 20.6 190
7 Adelie Torge… 38.9 17.8 181
8 Adelie Torge… 39.2 19.6 195
9 Adelie Torge… 34.1 18.1 193
10 Adelie Torge… 42 20.2 190
# … with 334 more rows, and 3 more variables: body_mass_g <dbl>,
# sex <chr>, year <dbl>
It should just work—this is the great thing about having your working directory set! You might see some language about how it’s been “Parsed with column specification”—great! It’s telling you the types (classes) of variables.
Now, you may remember from the tutorial, that if it’s printing to the screen, it’s not actually saving this data. To do that, we need to assign it to a variable. Run this code:
penguins <- read_csv("penguins.csv")
You should see the variable penguins
appear in the Environment pane. (If you don’t see the Environment pane, it’s one of the tabs in the top right!) It’ll tell you that there are 344 observations of 8 variables—i.e., the data frame has 344 rows and 8 columns.
(Aside: You don’t have to use code to import data. You could also [a] click on the “Import Dataset” button in the Environment pane, or [b] in the Files pane, click on the filename of the file when you’re in the right folder.)
penguins
? In the code chunk below, use the class()
function to find out! (It should give you more than one answer, including that it’s a “tbl” [table] and a data.frame)
class(penguins)
[1] "spec_tbl_df" "tbl_df" "tbl" "data.frame"
select()
function from the dplyr
package and the $
operator (which is the default in R), print the column species
from penguins
. (Do one on the first line and the other on the second.)Remember that you should use the chain %>%
with select()
. Feel free to refer back to your notes or the tutorial!
# with select()
penguins %>% select(species)
# A tibble: 344 x 1
species
<chr>
1 Adelie
2 Adelie
3 Adelie
4 Adelie
5 Adelie
6 Adelie
7 Adelie
8 Adelie
9 Adelie
10 Adelie
# … with 334 more rows
# or:
select(penguins, species)
# A tibble: 344 x 1
species
<chr>
1 Adelie
2 Adelie
3 Adelie
4 Adelie
5 Adelie
6 Adelie
7 Adelie
8 Adelie
9 Adelie
10 Adelie
# … with 334 more rows
# with the $
penguins$species
[1] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[6] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[11] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[16] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[21] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[26] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[31] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[36] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[41] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[46] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[51] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[56] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[61] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[66] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[71] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[76] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[81] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[86] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[91] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[96] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[101] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[106] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[111] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[116] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[121] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[126] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[131] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[136] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[141] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[146] "Adelie" "Adelie" "Adelie" "Adelie" "Adelie"
[151] "Adelie" "Adelie" "Gentoo" "Gentoo" "Gentoo"
[156] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[161] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[166] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[171] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[176] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[181] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[186] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[191] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[196] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[201] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[206] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[211] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[216] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[221] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[226] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[231] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[236] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[241] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[246] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[251] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[256] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[261] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[266] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[271] "Gentoo" "Gentoo" "Gentoo" "Gentoo" "Gentoo"
[276] "Gentoo" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[281] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[286] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[291] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[296] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[301] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[306] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[311] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[316] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[321] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[326] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[331] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[336] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
[341] "Chinstrap" "Chinstrap" "Chinstrap" "Chinstrap"
slice()
function from dplyr
, print the 20th row of penguins
.
penguins %>% slice(20)
# A tibble: 1 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Adelie Torge… 46 21.5 194
# … with 3 more variables: body_mass_g <dbl>, sex <chr>, year <dbl>
%>%
between the commands. First, select()
the column species
from penguins
and second, slice()
the 20th row.
penguins %>%
select(species) %>%
slice(20)
# A tibble: 1 x 1
species
<chr>
1 Adelie
We take the data frame “penguins” and select the column “species” from it, and then we slice the 20th row – so we get the value of “Adelie” (which is the 20th value in that row.)
Try knitting the document. Does everything work? If there’s an error, is it one you can address? Try to fix it.
You can also use square brackets [like these] after the name of a variable to subset part of that variable, as you learned in the tutorial. With a data frame, if you put any number inside the brackets, it will select the column that corresponds to that number. Use this method to select the 4th column of penguins
in the code chunk below.
Technically, there’s a comma that comes before the 4, inside the brackets. (Nothing comes before the comma.) Try that out—you should get the same result.
penguins[4]
# A tibble: 344 x 1
bill_depth_mm
<dbl>
1 18.7
2 17.4
3 18
4 NA
5 19.3
6 20.6
7 17.8
8 19.6
9 18.1
10 20.2
# … with 334 more rows
penguins[,4] # same thing
# A tibble: 344 x 1
bill_depth_mm
<dbl>
1 18.7
2 17.4
3 18
4 NA
5 19.3
6 20.6
7 17.8
8 19.6
9 18.1
10 20.2
# … with 334 more rows
identical(penguins[4], penguins[,4]) # tests if they're identical---TRUE
[1] TRUE
dataframe[number,]
). Try this with the penguins
data: select the fifth row below.
penguins[5,]
# A tibble: 1 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Adelie Torge… 36.7 19.3 193
# … with 3 more variables: body_mass_g <dbl>, sex <chr>, year <dbl>
penguins
by selecting [5,4]
after the variable penguins
.
penguins[5,4]
# A tibble: 1 x 1
bill_depth_mm
<dbl>
1 19.3
This is actually similar to how Excel thinks of cells: it’s called “row, column” notation.
filter()
function from the dplyr
package.You can click on the Help pane at the lower right of the RStudio window, or type ?filter
in the Console, to find out more about filter. Essentially, it does a logical check for some condition that you provide. What’s that mean?
Well, filter()
wants something that results in either TRUE or FALSE—a logical (or Boolean) response. Let’s check that out: just run the following code:
penguins$species == "Chinstrap"
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[11] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[21] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[31] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[41] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[51] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[71] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[91] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[101] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[111] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[131] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[141] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[151] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[171] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[191] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[201] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[211] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[221] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[231] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[251] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[261] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[271] FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
[281] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[291] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[301] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[311] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[321] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[331] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
[341] TRUE TRUE TRUE TRUE
What you’ll get is a series of TRUE and FALSE values, as R tests every one of the 344 rows of the species
column against the name “Chinstrap”. Looks like the Chinstrap penguins are the last of the set.
The double equal sign can be read as “is this equal?” or a test for equivalence. If we put that inside of a filter()
function, we’ll get only the rows where it’s true.
So: filter()
works like select()
or slice()
. Name the variable, use the %>%
chain, and then use the filter()
function with this code inside the parentheses: species == "Chinstrap"
. Give that a try below. (Remember not to include the tick marks ` themselves in the code you write.)
penguins %>%
filter(species == "Chinstrap")
# A tibble: 68 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Chinst… Dream 46.5 17.9 192
2 Chinst… Dream 50 19.5 196
3 Chinst… Dream 51.3 19.2 193
4 Chinst… Dream 45.4 18.7 188
5 Chinst… Dream 52.7 19.8 197
6 Chinst… Dream 45.2 17.8 198
7 Chinst… Dream 46.1 18.2 178
8 Chinst… Dream 51.3 18.2 197
9 Chinst… Dream 46 18.9 195
10 Chinst… Dream 51.3 19.9 198
# … with 58 more rows, and 3 more variables: body_mass_g <dbl>,
# sex <chr>, year <dbl>
You should get 68 rows—all of which are of the Chinstrap species.
You’ll note that I didn’t need to specify the name of the data frame again inside of the filter()
function. In fact, you should not name the data frame again.
filter()
again to only select rows from penguins
where the bill length in millimeters (column name: bill_length_mm
) is more than 40.
penguins %>%
filter(bill_length_mm > 40)
# A tibble: 242 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Adelie Torge… 40.3 18 195
2 Adelie Torge… 42 20.2 190
3 Adelie Torge… 41.1 17.6 182
4 Adelie Torge… 42.5 20.7 197
5 Adelie Torge… 46 21.5 194
6 Adelie Biscoe 40.6 18.6 183
7 Adelie Biscoe 40.5 17.9 187
8 Adelie Biscoe 40.5 18.9 180
9 Adelie Dream 40.9 18.9 184
10 Adelie Dream 42.2 18.5 180
# … with 232 more rows, and 3 more variables: body_mass_g <dbl>,
# sex <chr>, year <dbl>
Then, copy that code below—and after the 40, add a comma, and a second thing to filter by. You want to filter()
only rows from penguins
where (again) the bill length in millimeters (column name: bill_length_mm
) is more than 40, and where the species is Adelie.
penguins %>%
filter(bill_length_mm > 40,
species == "Adelie")
# A tibble: 51 x 8
species island bill_length_mm bill_depth_mm flipper_length_…
<chr> <chr> <dbl> <dbl> <dbl>
1 Adelie Torge… 40.3 18 195
2 Adelie Torge… 42 20.2 190
3 Adelie Torge… 41.1 17.6 182
4 Adelie Torge… 42.5 20.7 197
5 Adelie Torge… 46 21.5 194
6 Adelie Biscoe 40.6 18.6 183
7 Adelie Biscoe 40.5 17.9 187
8 Adelie Biscoe 40.5 18.9 180
9 Adelie Dream 40.9 18.9 184
10 Adelie Dream 42.2 18.5 180
# … with 41 more rows, and 3 more variables: body_mass_g <dbl>,
# sex <chr>, year <dbl>
Have feedback for me? Send it to me at the same exit survey link—select Lab 2.
For attribution, please cite this work as
Dainer-Best (2020, Sept. 10). PSY 203: Statistics for Psychology: Introduction to R (Lab 02) Exercise Answers. Retrieved from https://faculty.bard.edu/jdainerbest/psy-203/posts/2020-09-11-lab-2-answers/
BibTeX citation
@misc{dainer-best2020introduction, author = {Dainer-Best, Justin}, title = {PSY 203: Statistics for Psychology: Introduction to R (Lab 02) Exercise Answers}, url = {https://faculty.bard.edu/jdainerbest/psy-203/posts/2020-09-11-lab-2-answers/}, year = {2020} }