Week 2: Objects, Types, Classes, and Collections

Variables

What is a variable?

  • A variable is a named object that stores a value or other data in memory
x <- 4 
name <- "Andrew"
  • The code above shows two examples of assigning a value to a variable (4 to x, and "Andrew" to name)
  • Having made such an assignment, we have created an object which is stored in memory in the R session

Operations

  • Once an object has been defined in this way, it may be used, modified, or operated on in downstream parts of your script
x ^ 2 
[1] 16
toupper(name)
[1] "ANDREW"

Types and Classes

Types and Classes

“Basic” or “Atomic” types include

  • double (aka numeric; e.g., 3.14)
  • integer (e.g., 2L)
  • character (e.g., "hello")
  • logical (e.g., TRUE, FALSE)

Another special type

  • list (collection of objects allowing mixed types)

Key classes

  • factor (categorical data, built on integer)
  • data.frame (tabular data, list of columns)
  • matrix (2D array, single type)
  • array (generalization of matrix to more dimensions)
  • Many many others: additional classes can be defined by packages

Difference:

  • Basic types are the fundamental building blocks for data storage in R.

  • Classes (e.g. S3) add structure and behavior to objects, enabling custom methods and more complex data structures.

Numerics and integers

  • Numeric objects are stored as real numbers (mathematical sense) or floating point numbers (in computer memory)
    • Example: x <- 3.14
    • Question: having assigned x this value, what does typeof(x) return? (this function tells us the basic type of an object)
  • Integer objects are whole numbers
    • Example: y <- 5L (L suffix makes it an integer)
    • Question: what is typeof(y)?

Arithmetic operations

  • Arithmetic operations work on both types:
a <- 7
b <- 2L
a + b
[1] 9
a / b
[1] 3.5
a * b
[1] 14
a ^ b
[1] 49

Characters

  • Character objects store text (strings).
    • Example: name <- "Alice"
  • You can combine strings with paste() or paste0():
first <- "Data"
second <- "Science"
paste(first, second)   # "Data Science"
[1] "Data Science"
  • Functions like nchar() (number of characters), toupper() (uppercase), and tolower() (lowercase) operate on character objects

Logicals

  • Logical objects represent TRUE or FALSE values.
    • Example: flag <- TRUE
  • Relational operators like below return logical objects
x <- 5
x > 3      # TRUE
[1] TRUE
x == 5     # TRUE
[1] TRUE
x != 2     # TRUE
[1] TRUE

Factors

  • Factors are used for categorical data (groups/levels).
    • Example:
    • Question: what is class(species)?
    • Question: what is typeof(species)? (And why?)
  • Factors have levels (the possible values which the variable can take)
levels(species)   
[1] "Cat"    "Dog"    "Rabbit"
  • Useful for statistical analysis of categorical data (nominal or ordinal variables)

Collections of objects

Vectors

  • A vector is the most basic data structure in R: a sequence of elements of the same type.
    • Example: scores <- c(88, 92, 79, 85)
    • All elements must be of the same atomic type (numeric, character, logical, etc.), though R will try type coercion first
  • You can create vectors with c(), seq(), or rep()
letters_vec <- c("A", "B", "C")
seq_vec <- seq(1, 5)
  • Vectors support element-wise operations:
scores + 5   # adds 5 to each score
[1] 93 97 84 90

Matrices I

  • A matrix is a two-dimensional array of elements of the same basic type.
    • Example:
mat <- matrix(1:6, nrow = 2, ncol = 3)
mat
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Matrices II

  • Access elements by row and column:
mat[1, 2]   # element in row 1, column 2
[1] 3
  • Useful for mathematical and statistical computations.

Arrays

  • An array generalizes matrices to more than two dimensions.
    • Example: arr <- array(1:8, dim = c(2, 2, 2))
    • All elements must be of the same atomic type.
  • Arrays are useful for representing multi-dimensional data (e.g., images, time series).

Lists I

  • A list is a collection of objects that can be of different types and structures.
    • Example:
my_list <- list(student = "Alice", age = 30, scores = c(88, 92, 79))
my_list
$student
[1] "Alice"

$age
[1] 30

$scores
[1] 88 92 79

Lists II

  • Lists can contain vectors, matrices, other lists, or any R object.
  • Access elements by name or position:
my_list$name
NULL
my_list[[2]]
[1] 30
my_list[["scores"]]
[1] 88 92 79

Data frames I

  • A data frame is a table where each column is a vector (often of different types).
    • Columns can be numeric, character, logical, or factor.
    • Example:
df <- data.frame(
  name = c("Alice", "Bob"),
  age = c(30, 25),
  likes_ice_cream = c(TRUE, FALSE)
)
df
   name age likes_ice_cream
1 Alice  30            TRUE
2   Bob  25           FALSE

Data frames II

  • Data frames are the primary structure for storing and analyzing tabular data in R.
  • A data frame is a list in disguise (additional constraint that length of each list element must match: rectangluar data)
    • Question what do class(df) and typeof(df) return for df as defined on the last slide?
  • Access to columns of data frames follows the same syntax as access to elements of lists:
df$name
[1] "Alice" "Bob"  
df[["age"]]
[1] 30 25

Exploring types

typeof() for basic type

  • The typeof() function tells you how R stores an object internally.
    • Example:
x <- 3.14
typeof(x)   # "double"
[1] "double"
species <- factor(c("Dog", "Cat"))
typeof(species)   # "integer"
[1] "integer"
  • Useful for understanding the underlying representation of your data.

class() for higher level class

  • The class() function tells you the object’s class, which determines how R treats it in generic functions.
    • Example:
x <- 3.14
class(x)   # "numeric"
[1] "numeric"
species <- factor(c("Dog", "Cat"))
class(species)   # "factor"
[1] "factor"
df <- data.frame(name = c("Alice", "Bob"))
class(df)   # "data.frame"
[1] "data.frame"
typeof(df)   # "list"
[1] "list"
  • Many objects have both a basic type and a class attribute.

str() to inspect objects

  • The str() function gives a compact summary of an object’s structure.
    • Example:
str(df)
'data.frame':   2 obs. of  1 variable:
 $ name: chr  "Alice" "Bob"
str(species)
 Factor w/ 2 levels "Cat","Dog": 2 1
str(x)
 num 3.14
  • Shows type, class, length, and a preview of the contents.
  • Very useful for quickly understanding unfamiliar objects.

Type coercion

Implicit coercion

  • R will automatically convert (coerce) types in certain situations, especially when combining different types in a vector.
    • Example:
c(1, "a", TRUE)   # All elements become character: "1" "a" "TRUE"
[1] "1"    "a"    "TRUE"
c(TRUE, 2)        # All elements become numeric: 1 2
[1] 1 2
  • The rule: R chooses the “least restrictive” type that can represent all elements (character > numeric > logical).
  • Be aware: implicit coercion can lead to unexpected results if you mix types!

Explicit coercion

  • You can manually convert (coerce) an object to another type using functions like:
    • as.numeric()
    • as.character()
    • as.logical()
    • as.factor()
  • Example:
x <- "123"
as.numeric(x)   # 123
[1] 123
y <- c("TRUE", "FALSE", "TRUE")
as.logical(y)   # TRUE FALSE TRUE
[1]  TRUE FALSE  TRUE
  • If coercion fails, R will return NA and may give a warning

Vectors in depth

Creating vectors

  • Use c() to combine values into a vector:
nums <- c(1, 2, 3, 4)
chars <- c("A", "B", "C")
  • Use seq() to generate sequences:
seq(1, 10, by = 2)   # 1 3 5 7 9
[1] 1 3 5 7 9
  • Use rep() to repeat values:
rep(5, times = 3)    # 5 5 5
[1] 5 5 5

Subsetting

  • Access elements of a vector by position (indexing starts at 1):
nums <- c(10, 20, 30, 40)
nums[2]      # 20
[1] 20
nums[1:3]    # 10 20 30
[1] 10 20 30
  • Use logical vectors for subsetting:
nums[nums > 15]   # 20 30 40
[1] 20 30 40
  • Negative indices remove elements:
nums[-1]   # 20 30 40
[1] 20 30 40

Vectorized operations

  • Most operations in R are vectorized: they apply to each element automatically.
x <- c(1, 2, 3)
y <- c(10, 20, 30)
x + y      # 11 22 33
[1] 11 22 33
x * 2      # 2 4 6
[1] 2 4 6
  • Functions also operate element-wise:
sqrt(x)    # 1 1.414214 1.732051
[1] 1.000000 1.414214 1.732051

Missing values

Missing values

  • In R, missing data is represented by NA.
  • NA can appear in any type of object: numeric, character, logical, etc.
    • Example:
scores <- c(88, NA, 79, 85)
names <- c("Alice", NA, "Bob")
  • You can check for missing values with is.na():
scores
[1] 88 NA 79 85
is.na(scores)   # TRUE for missing values
[1] FALSE  TRUE FALSE FALSE
  • Be aware: calculations with NA usually return NA unless you explicitly remove or handle (to be discussed next week)

Lab 2

Lab 2

lab_2_blank.qmd lab_2.pdf