Completed exercises for the tenth lab
This document is meant to be used to practice after you have completed the tutorial for today’s lab. Make sure to put your name as the author of the document, above!
If you intend to work on these exercises while referring to the tutorial, there are instructions on the wiki on how to do so. You may also want to refer to past labs. Don’t forget that previous labs are linked to on the main labs website.
Important reminder: as with every time you open RStudio, don’t forget to load the libraries, below.
In the tutorial, we learned about using lm()
and summary()
for regressions, and cor()
and cor.test()
for correlations. You’ll use those and the library(ggplot2)
functions to plot them to make further sense of the predictions
data, including adding regression lines. You’ll also practice (briefly) filter()
and a few other functions to clean up the data as provided.
As always, a version of these exercises with my answers will be posted at the end of the week to the lab website: https://faculty.bard.edu/~jdainerbest/psy-203/labslist.html
Don’t forget to (a) save and (b) knit the document frequently, so you’ll keep track of your work and also know where you run into errors.
As always, you must load packages if you intend to use their functions. Run the following code chunk to load necessary packages for these exercises.
library(tidyverse)
As discussed in the tutorial, we’re using data from Beall, Hofer, & Shaller (2016).
Beall, A. T., Hofer, M. K., & Shaller, M. (2016). Infections and elections: Did an Ebola outbreak influence the 2014 U.S. federal elections (and if so, how)? Psychological Science, 27, 595-605. https://doi.org/10.1177/0956797616628861
Make sure you read the description of the study in the tutorial—it’s important for thinking about what we’re doing in these exercises.
In the tutorial, we used a “cleaned-up” version of the data. (It’s in a folder called /www.) But let’s actually use the raw data here: that one is called beall_untidy.csv
and should be in the same folder as this document.
The data was downloaded with this file. Load it using the read_csv()
command—probably with the code below:
predictions <- read_csv("beall_untidy.csv")
For the questions below, create your own code chunks and insert all code into them.
filter()
function:
Date
and Month
columnDJIA
with either select()
(putting a - in front of the name will remove it) or by assigning predictions$DJIA
to the value NULL
predictions <- tibble(predictions) %>%
filter( ! is.na(Month) ) %>%
select(-DJIA)
cor.test()
and your predictions
data. (You’ll use the columns Ebola.Search.Volume.Index
and LexisNexisNewsVolumeWeek
) Then, briefly report the correlation. Is it significant?
cor.test(predictions$Ebola.Search.Volume.Index, predictions$LexisNexisNewsVolumeWeek)
Pearson's product-moment correlation
data: predictions$Ebola.Search.Volume.Index and predictions$LexisNexisNewsVolumeWeek
t = 11.759, df = 63, p-value < 2.2e-16
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.7331684 0.8923563
sample estimates:
cor
0.8288528
There was a significant relationship between the Ebola-search-volume index and the LexisNexis index, \(r(63)=.83, 95% CI [.73, .89], p < .05\)
ggplot() + geom_point()
. Add a theme and label the axes. Add a regression line using geom_smooth()
or geom_abline()
(you’ll get the data in the next question).
ggplot(predictions, aes(x = Ebola.Search.Volume.Index,
y = LexisNexisNewsVolumeWeek)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE, formula = "y ~ x") +
labs(x = "Ebola-search-volume index", y = "LexisNexis index")
lm()
function to create a regression model of the same relationship. Then use summary()
to get the results. Report them succinctly below. Also report what parallels exist between the numbers from this regression and the correlation.
model <- lm(Ebola.Search.Volume.Index ~ LexisNexisNewsVolumeWeek,
data = predictions)
summary(model)
Call:
lm(formula = Ebola.Search.Volume.Index ~ LexisNexisNewsVolumeWeek,
data = predictions)
Residuals:
Min 1Q Median 3Q Max
-20.615 -7.050 -1.244 9.823 24.349
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.91116 2.73372 -0.699 0.487
LexisNexisNewsVolumeWeek 0.15516 0.01319 11.759 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.88 on 63 degrees of freedom
Multiple R-squared: 0.687, Adjusted R-squared: 0.682
F-statistic: 138.3 on 1 and 63 DF, p-value: < 2.2e-16
There was a statistically-significant relationship between the two indexes, \(b=0.16,p<.05\), with an \(R^2\) of .69, \(p<.05\).
filter()
to select only the scores from the two-week period including the last week of September and the first week of October. You could look at the Month
and Date
columns… but the third column might be more helpful. Don’t forget to assign this to a new data frame so we can use it.
highanxtime <- filter(predictions, Two.weeks.prior.to.outbreak.only==1)
On the full dataset, run the correlation analyses we did in the tutorial, for the association between Ebola search volume index and voter intention index.
With the filtered data from #5, re-run the correlation analyses for the association between Ebola search volume index and voter intention index. Is the correlation higher or lower?
cor.test(highanxtime$Voter.Intention.Index,
highanxtime$Ebola.Search.Volume.Index)
Pearson's product-moment correlation
data: highanxtime$Voter.Intention.Index and highanxtime$Ebola.Search.Volume.Index
t = 15.975, df = 6, p-value = 3.821e-06
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.9351079 0.9979890
sample estimates:
cor
0.9884478
It’s much higher—although note that there are many fewer data points!
ggplot(highanxtime,
aes(x = Ebola.Search.Volume.Index, y = Voter.Intention.Index)) +
geom_point() +
theme_classic() +
geom_smooth(method = "lm", se = FALSE, formula = "y ~ x") +
labs(x = "Ebola-search-volume index", y = "LexisNexis index")
Have any feedback about the exercises? Let me know at the exit survey and select Lab 9.
For attribution, please cite this work as
Dainer-Best (2020, Nov. 6). PSY 203: Statistics for Psychology: Correlation and Regression (Lab 10) Exercises, Completed. Retrieved from https://faculty.bard.edu/jdainerbest/psy-203/posts/2020-11-06-lab-10-correg-completed/
BibTeX citation
@misc{dainer-best2020correlation, author = {Dainer-Best, Justin}, title = {PSY 203: Statistics for Psychology: Correlation and Regression (Lab 10) Exercises, Completed}, url = {https://faculty.bard.edu/jdainerbest/psy-203/posts/2020-11-06-lab-10-correg-completed/}, year = {2020} }