This section is an incomplete template. Replace it with your own text and concise code chunks that describe and justify your exploratory data analysis (EDA). Keep Quarto code minimal by defining reusable functions in R/02_explore.R and calling them here.
Checklist
Delete everything inside this yellow callout (including BEGIN/END comments).
Fill in the bodies of the next two functions in R/02_explore.R, then call them below.
3.2.1 Final grade by school
# EXPECTED: a tibble with mean/sd/n by schoolby_school_summary <-summarize_by_school(analysis_data)if(!is.null(by_school_summary)){gt(by_school_summary)}
3.2.2 Correlation plot for selected numeric variables
# EXPECTED: a heatmap (geom_tile) of correlations among chosen numeric columnscorrelation_plot(analysis_data)
3.3 Open-ended exploration
Extend your analysis with 1–3 additional functions and calls to provide descriptive, exploratory summaries of the data. Possibilities:
Compare grade distributions by failures.
Visualize absences vs. G3 (scatter with smoothing).
Tabulate Dalc/Walc vs. G3 quartiles.
Explore Medu/Fedu vs. G3.
Briefly interpret each result in a few sentences: What do you learn? How does it inform modeling choices in Chapter 4?
# Exploratory and Descriptive Analyses {#sec-eda}```{r}#| label: setup_01_data#| echo: false#| message: false#| warning: false#| include: falsehere::i_am("quarto/02_explore.qmd")library(here)library(tidyverse)library(gt)analysis_data <-readRDS(here("data", "derived", "student_performance.rda"))source(here("R", "02_explore.R"))```::: {.callout-note collapse="false" icon="false" .eda-instructions}<!-- BEGIN_INSTRUCTION_BLOCK -->This section is an incomplete template. Replace it with your own text and concise code chunks that describe and justify your exploratory data analysis (EDA). Keep Quarto code minimal by defining reusable functions in `R/02_explore.R` and calling them here.**Checklist**1. Delete everything inside this yellow callout (including BEGIN/END comments).2. Write 2–4 short narrative paragraphs explaining EDA goals & key findings.3. Use small, focused code chunks that call functions you implement in `R/02_explore.R`.4. Prefer clarity over volume: a few well-chosen tables/plots > many unfocused figures.**Suggested exploration topics**- Descriptive summaries of grades (`G1`, `G2`, `G3`).- Study time (`studytime`) vs. final grade (`G3`).- Differences in `G3` by school (`school`) or sex (`sex`).- Absences (`absences`) and association with `G3`.- Correlations among numeric variables (`G1`, `G2`, `G3`, `absences`, `failures`, `Dalc`, `Walc`).*After removal:* Begin with a short paragraph motivating the EDA, then present concise tables/plots and brief interpretations.<!-- END_INSTRUCTION_BLOCK -->:::## Examples: calling your functionsThe first two summaries are fully implemented for you in `R/02_explore.R` as examples.### Grade summaries```{r}#| label: summarize_gradesgrade_summary <-summarize_grades(analysis_data)if(!is.null(grade_summary)){gt(grade_summary) |>fmt_number(c(mean, sd), decimals =2)}```### Study time vs. final grade```{r}#| label: studytime_vs_gradesplot_studytime_vs_grade(analysis_data)```## Your turn: implement and use theseFill in the bodies of the next two functions in `R/02_explore.R`, then call them below.### Final grade by school```{r}#| label: summarize_by_school# EXPECTED: a tibble with mean/sd/n by schoolby_school_summary <-summarize_by_school(analysis_data)if(!is.null(by_school_summary)){gt(by_school_summary)}```### Correlation plot for selected numeric variables```{r}#| label: correlation_plot# EXPECTED: a heatmap (geom_tile) of correlations among chosen numeric columnscorrelation_plot(analysis_data)```## Open-ended explorationExtend your analysis with 1–3 additional functions and calls to provide descriptive, exploratory summaries of the data. Possibilities:- Compare grade distributions by `failures`.- Visualize `absences` vs. `G3` (scatter with smoothing).- Tabulate `Dalc`/`Walc` vs. `G3` quartiles.- Explore `Medu`/`Fedu` vs. `G3`.Briefly interpret each result in a few sentences: What do you learn? How does it inform modeling choices in @sec-modelling?