Final Exam (Dec 9): review and practice materials

Format

Similar format to Exams 1 and 2. Two parts: the first being pen-and-paper, and the second being open-computer.

  • Part 1:
    • Closed computer and closed notes
    • Handwritten answers
  • Part 2:
    • Open computer. Any online resource that is “not alive” may be used (i.e. you may freely use online resources, but you may not communicate with any other person).
    • Electronic submission on Moodle: you will fill in a Quarto script with R code and written answers

Once you turn in Part 1, you may open your computer and download the blank Quarto script for Part 2. You may not return to Part 1 after turning it in and beginning the open-computer portion.

Topics

Based on all material covered so far in the class.

  • R Programming Fundamentals
    • Variable assignment and basic operations
    • Data types and classes (typeof(), class(), str())
    • Type coercion (implicit and explicit)
    • Vectors: creation, subsetting, and operations
    • Data frames: structure, column extraction, subsetting
    • Factors for categorical data
    • Missing values (NA)
  • Descriptive Statistics
    • Measures of center: mean, median
    • Measures of spread: standard deviation, variance, IQR, range
    • Outlier-resistant vs. outlier-sensitive statistics
    • Frequency tables and proportions for categorical data
    • Data visualization: histograms, boxplots
  • Statistical Inference Concepts
    • Populations, samples, parameters, and statistics
    • Hypothesis testing framework
    • P-values and their interpretation
    • Confidence intervals and their interpretation
    • Statistical significance (α = 0.05)
  • Hypothesis Tests and Confidence Intervals in R
    • One-sample t-test for means: t.test(x, mu = .)
    • Two-sample t-test for comparing means: t.test(y ~ group)
    • One-sample proportion test: prop.test(x, n, p = .)
    • Two-sample proportion test: prop.test()
    • Chi-squared test of independence: chisq.test()
    • Constructing and interpreting confidence intervals
  • Data Visualization
    • Creating plots with ggplot2: ggplot(), aes(), geom_histogram(), etc.
    • Basic plot customization (titles, axis labels)
  • Linear Regression
    • Simple linear regression: fitting, interpreting, predicting
    • The lm() function and formula syntax
    • Regression equation: \(Y = \beta_0 + \beta_1 X + \varepsilon\)
    • Interpreting coefficients (intercept and slope)
    • Understanding residuals
    • Regression diagnostics
    • Hypothesis tests and confidence intervals for model coefficients
    • F-tests for comparing nested models (anova())
    • R-squared and adjusted R-squared
    • Making predictions
    • Confidence intervals and prediction intervals for model predictions
    • Multiple linear regression
  • Functions and Function-Oriented Programming
    • Writing functions in R: syntax and structure
    • Function arguments and default values
    • Return values (implicit and explicit)
  • Logistic regression for binary outcomes
    • Basic syntax with glm()
    • Inference for coefficients (hypothesis tests)
    • Making predictions

Review materials

For Part 1 (on paper, closed computer), I have created a practice exam to give you a sense of the format and topics: part_1_practice.pdf.

The solution is here: part_1_practice_solution.pdf.

Additional practice questions for live competition on Menti: link. Downloadable version extra_practice.md