Skip to contents
library(retention.helpers)
library(tidyverse)
#> Warning: package 'tidyverse' was built under R version 4.2.2
#> ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
#>  ggplot2 3.4.0       purrr   1.0.0 
#>  tibble  3.1.8       dplyr   1.0.10
#>  tidyr   1.2.1       stringr 1.5.0 
#>  readr   2.1.3       forcats 0.5.2
#> Warning: package 'ggplot2' was built under R version 4.2.2
#> Warning: package 'tibble' was built under R version 4.2.2
#> Warning: package 'tidyr' was built under R version 4.2.2
#> Warning: package 'readr' was built under R version 4.2.2
#> Warning: package 'purrr' was built under R version 4.2.2
#> Warning: package 'dplyr' was built under R version 4.2.2
#> Warning: package 'stringr' was built under R version 4.2.2
#> Warning: package 'forcats' was built under R version 4.2.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#>  dplyr::filter() masks stats::filter()
#>  dplyr::lag()    masks stats::lag()

Overview

Enrolments come in two levels at Charles Sturt University; course and subject. Course enrolments refer to the students’ enrolment in a particular program of study, such as a Bachelor of Nursing. These are stored in the student_course table. Subject level enrolments are stored in the enrolments table, and correspond to the enrolment records for units of study in a particular session. These should match to the academic table that stores the grades for each subject taken (if the student remained past census date).

Quirks

The enrolments tables also includes other pseudo-subjects, that are place holders for a Leave of Absence. These ‘subjects’ have the code XLV0000 and should not be treated like a normal subject. Subjects beginning with the code SSS… might also be needed to be treated differently, as they are free subjects and exhibit very different behaviour from students.

Counting enrolments

For this example, we set up a toy version of the student_course and enrolments tables. We rename them to match the names used in the (private) data.csu.retention package.

student_course <- retention.helpers::toy_student_course
enrolments <- retention.helpers::toy_enrolments

glimpse(student_course)
#> Rows: 11
#> Columns: 11
#> $ id                      <chr> "Student1", "Student2", "Student3", "Student1"…
#> $ course                  <chr> "Bachelor of Nursing", "Bachelor of Nursing", …
#> $ course_code             <chr> "4415NS", "4415NS", "4415NS", "4415NS", "4421F…
#> $ course_level            <chr> "Bachelors Pass", "Bachelors Pass", "Bachelors…
#> $ course_faculty          <chr> "FOSH", "FOSH", "FOSH", "FOSH", "FOSH", "FOSH"…
#> $ basis_of_admission      <chr> "A VET award course or VET delivered enabling …
#> $ admit_session           <dbl> 202130, 202130, 202130, 202130, 202230, 202160…
#> $ catalog_year            <dbl> 2021, 2021, 2021, 2021, 2022, 2021, 2021, 2022…
#> $ last_registered_session <dbl> NA, 202260, NA, NA, NA, NA, NA, 202230, 202260…
#> $ points_att              <dbl> 0, 192, 0, 0, 0, 0, 0, 72, 40, 0, 144
#> $ points_comp             <dbl> 0, 176, 0, 0, 0, 0, 0, 72, 40, 0, 72
glimpse(enrolments)
#> Rows: 65
#> Columns: 6
#> $ id            <chr> "Student1", "Student1", "Student2", "Student2", "Student…
#> $ session       <dbl> 202160, 202160, 202230, 202230, 202230, 202260, 202130, …
#> $ subject       <chr> "NRS160", "NRS163", "NRS312", "NRS312", "NRS313", "NRS32…
#> $ offering      <chr> "NRS160_202160_B_D", "NRS163_202160_B_D", "NRS312_202230…
#> $ enrol_date    <dttm> 2021-05-06, 2021-05-06, 2021-12-08, 2021-12-08, 2021-11…
#> $ withdraw_date <dttm> 2021-06-09, 2021-06-09, NA, 2021-12-14, NA, NA, NA, 202…

Looking at subject enrolments

We might want to look at the active enrolments in a particular session.

# gets all enrolments, and counts by student
enrolments |> 
    filter(session == 202230) |> 
    count(id, sort = T) |> 
    rename(n_subjects = n)
#> # A tibble: 3 × 2
#>   id       n_subjects
#>   <chr>         <int>
#> 1 Student2          3
#> 2 Student4          2
#> 3 Student5          2

# gets only those who remained in the subject, then counts by student
enrolments |> 
    filter(session == 202230, 
           is.na(withdraw_date) # exluding those that have withdrawn
           ) |> 
    count(id, sort = T) |> 
    rename(n_subjects = n)
#> # A tibble: 1 × 2
#>   id       n_subjects
#>   <chr>         <int>
#> 1 Student2          2

Joining subject enrolments with course enroments

We might also be interested in what course these students are in. The toy data is primarily for the Bachelor of Nursing, however joining directly to the student_course table can produce errors if a student has been in multiple courses.

enrolments |> 
    filter(session == 202230) |> 
    distinct(id, session) |> 
    inner_join(
        student_course |> 
            select(id, course, course_code, 
                   admit_session, last_registered_session))
#> Joining, by = "id"
#> # A tibble: 6 × 6
#>   id       session course                                cours…¹ admit…² last_…³
#>   <chr>      <dbl> <chr>                                 <chr>     <dbl>   <dbl>
#> 1 Student2  202230 Bachelor of Nursing                   4415NS   202130  202260
#> 2 Student4  202230 Bachelor of Nursing                   4415NS   202130      NA
#> 3 Student4  202230 Bachelor of Nursing                   4421NS   202230  202230
#> 4 Student5  202230 Bachelor of Business (with specialis… 2421BU   202260  202260
#> 5 Student5  202230 Bachelor of Social Science (Psycholo… 2421SS   202230      NA
#> 6 Student5  202230 Bachelor of Nursing                   4415NS   202130  202130
#> # … with abbreviated variable names ¹​course_code, ²​admit_session,
#> #   ³​last_registered_session

We need to select the most appropriate course, based on the admit_session (when the student entered the course - however be warned this is sometimes later than their first subjects in the course!), and their last_registered_session. Then it might be necessary to ignore the course_code field as there can be multiple course codes for a single course.

students_in_202230 <- 
    enrolments |> 
    filter(session == 202230) |> 
    distinct(id, session) |> 
    inner_join(
        student_course |> 
            select(id, course, course_code, 
                   admit_session, last_registered_session)) |> 
    group_by(id, session) |>
    filter(
        # exclude any courses if already completed
        last_registered_session >= session | is.na(last_registered_session),
        # exclude if they entered the course after the session of interest
        admit_session <= session
        ) |> 
    distinct(id, session, course) |> 
    ungroup()
#> Joining, by = "id"

students_in_202230
#> # A tibble: 3 × 3
#>   id       session course                                 
#>   <chr>      <dbl> <chr>                                  
#> 1 Student2  202230 Bachelor of Nursing                    
#> 2 Student4  202230 Bachelor of Nursing                    
#> 3 Student5  202230 Bachelor of Social Science (Psychology)