3  Exploratory and Descriptive Analyses

Note

This section is an incomplete template. Replace it with your own text and concise code chunks that describe and justify your exploratory data analysis (EDA). Keep Quarto code minimal by defining reusable functions in R/02_explore.R and calling them here.

Checklist

  1. Delete everything inside this yellow callout (including BEGIN/END comments).
  2. Write 2–4 short narrative paragraphs explaining EDA goals & key findings.
  3. Use small, focused code chunks that call functions you implement in R/02_explore.R.
  4. Prefer clarity over volume: a few well-chosen tables/plots > many unfocused figures.

Suggested exploration topics

  • Descriptive summaries of grades (G1, G2, G3).
  • Study time (studytime) vs. final grade (G3).
  • Differences in G3 by school (school) or sex (sex).
  • Absences (absences) and association with G3.
  • Correlations among numeric variables (G1, G2, G3, absences, failures, Dalc, Walc).

After removal: Begin with a short paragraph motivating the EDA, then present concise tables/plots and brief interpretations.

3.1 Examples: calling your functions

The first two summaries are fully implemented for you in R/02_explore.R as examples.

3.1.1 Grade summaries

grade_summary <- summarize_grades(analysis_data)
if(!is.null(grade_summary)){
  gt(grade_summary) |>
    fmt_number(c(mean, sd), decimals = 2)
}
grade n mean sd min q1 median q3 max
G1 395 10.91 3.32 3 8 11 13 19
G2 395 10.71 3.76 0 9 11 13 19
G3 395 10.42 4.58 0 8 11 14 20

3.1.2 Study time vs. final grade

plot_studytime_vs_grade(analysis_data)

3.2 Your turn: implement and use these

Fill in the bodies of the next two functions in R/02_explore.R, then call them below.

3.2.1 Final grade by school

# EXPECTED: a tibble with mean/sd/n by school
by_school_summary <- summarize_by_school(analysis_data)
if(!is.null(by_school_summary)){
  gt(by_school_summary)
}

3.2.2 Correlation plot for selected numeric variables

# EXPECTED: a heatmap (geom_tile) of correlations among chosen numeric columns
correlation_plot(analysis_data)

3.3 Open-ended exploration

Extend your analysis with 1–3 additional functions and calls to provide descriptive, exploratory summaries of the data. Possibilities:

  • Compare grade distributions by failures.
  • Visualize absences vs. G3 (scatter with smoothing).
  • Tabulate Dalc/Walc vs. G3 quartiles.
  • Explore Medu/Fedu vs. G3.

Briefly interpret each result in a few sentences: What do you learn? How does it inform modeling choices in Chapter 4?