The Data
data.Rmd
This article describes data available, for internal Charles Sturt
University use, in the following packages:
data.csu.retention
, data.csu.activity
and
data.csu.exit
.
The retention dataset : data.csu.retention
This includes data on student demographics, enrolments, academic results, as well as data on the Retention Teams interventions.
The activity dataset : data.csu.activity
This includes LMS activity data on students. It is in a separate dataset due to its large size.
The exit survey dataset : data.csu.exit
This includes data on the exit survey results. It is not commonly used with the other data and is sometimes sensitive so is in a separate dataset.
Common design features of the data set
There are essentially four families of tables in the data:
-
Learning structure tables, containing data on the
subjects, sessions and other details. These tables include data on the
teaching
sessions
,campus_codes
, and whatofferings
andsubjects
are available. -
Student enrolment tables. These detail details
about the student and their enrolment at the university. Student
characteristics are contained in
student_ids
andstudent_demographics
, details on their course and subject enrolments are contained instudent_course
andenrolments
, respectively. -
Student learning tables. These record what the
student does once they are enrolled. Some detail the students
activity
on the subject sites through the LMS trace data, and their performance viaassessment_marks
and overall subject [grades
]. Additionally there is some data on if the students decide toexit
the university as to why they have done so.
-
Retention team campaign tables. These tables record
data pertinent to the activities and
interventions
of the Retention Team. It includes lists of which students utilised theembedded_tutors
, which raisedflags
for being at-risk, whattriggers
were used to identify at-risk students, and what kind ofcontact
was made with students. The key table here is theinterventions
table, which attempts to summarise all the other tables into one, hopefully practical, place.
Common variables
A key part of the cleaning and processing of the data is consistent names for variables throughout the tables. These will be commonly used for as keys for making joins.
-
id
: Unique student ID (character). Sometimes a student will have multiple ID’s, if they have re-enrolled. In this case the numeric part of the ID is the same, but there is anA
or aB
appended at the end. -
session
: Unique ID for the teaching session (numeric). The first four digits are the year, the last two correspond to when in the year the session runs. The main three teaching sessions end in 30, 60, 90. E.g. 202230 is the first main teaching session in 2022. -
year
: The calendar year. -
subject
: The six character subject code (for a unit of study), in the form ofABC123
. -
offering
: The offering code. An offering is a subject (unit of study) taught in a particularsession
at a particularcampus
delivered in a particularmode
(Internal or Distinace). The code is in the formABC123_202230_PT_I
, denoting the subject, session, campus and mode, separated by_
. -
campaign
: The particular intervention ‘campaign’ that this data relates to. Details of these values are outlined in the notes on theflags
table.
Retention data package tables
The tables in the
academic
This table gives academic results by student (id
),
subject
and session
.
data.csu.retention::academic |>
slice(1:10) |>
mutate( # anonymising
id = "student_id", subject = "ABC123",
offering = "ABC123_202230_W_D") |>
glimpse()
#> Rows: 10
#> Columns: 8
#> $ id <chr> "student_id", "student_id", "student_id", "student_id",…
#> $ session <dbl> 202230, 202260, 202230, 202230, 202230, 202260, 202230,…
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "ABC1…
#> $ offering <chr> "ABC123_202230_W_D", "ABC123_202230_W_D", "ABC123_20223…
#> $ grade <fct> DI, LV, CR, CR, CR, CR, CR, DI, TA, DI
#> $ mark <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ mark_bb <dbl> 0.8000, NA, 0.7370, 0.7100, 0.7100, NA, 0.7382, 0.7620,…
#> $ grade_original <fct> DI, LV, CR, CR, CR, CR, CR, DI, TA, DI
-
grade
is the final grade the student received for the subject. -
mark
is the percentage score the student achieved. Lots of this data is missing (97%) -
mark_bb
is an attempt to get an estimate the student’s mark from the LMS data. A lot of this data is also missing (42%). If you are looking for a best guess of the mark it is worth coalescingmark
andmark_bb
, but so far this data is very incomplete -
grade_original
indicates the first grade the student achieved in the subject. Sometimes there is an ‘admin override’ grade, so ifgrade_original
is different tograde
this indicates that an override has occurred.
academic_details
This is historic data, only available for the three main sessions in 2019, and only for HEPPP subjects in those sessions.
data.csu.retention::academic_details |>
slice(1:10) |>
mutate( # anonymising
id = "student_id", subject = "ABC123") |>
glimpse()
#> Rows: 10
#> Columns: 7
#> $ id <chr> "student_id", "student_id", "student_id", "student_id", "…
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "ABC123…
#> $ eai <dbl> 0.570, 0.570, 0.825, 0.300, 0.600, 0.600, 0.800, 0.800, 1…
#> $ cm <dbl> 0.63800, 0.63800, 0.51425, 0.03000, 0.54600, 0.54600, 0.3…
#> $ mean_mark <dbl> 0.5922222, 0.5922222, 0.7287500, 0.3000000, 0.4475000, 0.…
#> $ submit_ratio <dbl> 1.0000000, 1.0000000, 0.8000000, 0.3333333, 1.0000000, 1.…
#> $ session <dbl> 201960, 201990, 201990, 201990, 201930, 201990, 201930, 2…
-
eai
indicates the mark the student received for their Early Assessment Item. -
cm
indicates the students’ course mark. -
mean_mark
indicates the average mark the student received in all their assessments (unweighted). -
submit_ratio
indicates the ratio of assessments submitted to assessments required in the subject.
assessment_marks
This data collects together assessments, grades and cumulative marks
for selected subjects. Each row is a result for a particular assessment
item (title
), for a student id
, in a given
subject
, in a given session
. Some marks are
numeric and some are categorical (grades, or pass / fail items).
data.csu.retention::assessment_marks |>
slice(1:10) |>
mutate( # anonumising
subject = "ABC123", id = "student_id", bb_pk1 = "bb_pk") |>
glimpse()
#> Rows: 10
#> Columns: 15
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "ABC1…
#> $ session <dbl> 202030, 202030, 202030, 202030, 202030, 202030, 202030,…
#> $ id <chr> "student_id", "student_id", "student_id", "student_id",…
#> $ title <chr> "Calculated Grade", "Calculated Grade", "Calculated Gra…
#> $ week_due <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ value <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ eai <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ threshold_mark <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ metric <chr> "Total Pts: up to 100 Letter", "Total Pts: up to 100 Le…
#> $ grade <chr> "FL", "DI", "DI", "DI", "CR", "DI", "DI", "PS", "HD", "…
#> $ bb_pk1 <chr> "bb_pk", "bb_pk", "bb_pk", "bb_pk", "bb_pk", "bb_pk", "…
#> $ score <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ out_of <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ score_p <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ sy_us <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
-
title
is the name of the assessment. There are three special items, Calculated Grade, Administrative Override and Cumulative Mark. Calculated Grade is not an assessment mark, but rather the students’ final grade, as it appears in the LMS (this may not always match the official grade). If the grade was overwritten, this is stored as an Administrative Override. Cumulative Mark is the numeric mark that the student received, from their weighted assessment marks, used to calculate their grade. -
week_due
is the week the assessment was due. -
value
is the value of the assessment (its contribution to a total of 100 marks for the Cumulative Mark). Some assessments are required to pass, but have no weighting, and will have avalue
of 0. -
eai
indicated if the assessment item was used as an Early Assessment Item as part of a pre census campaign to identify disengaged students. -
threshold_mark
includes a value if the particular assessment item requires a minimum mark in order for the student to pass the subject. Athreshold_mark
of 0 indicates that the item must be submitted, but their is no required numeric value to achieve. -
mertric
indicates how the assessment item is assessed; for instance if it is a numeric score and if so what is it out of. This is taken directly from the assessment item header in the LMS.
-
grade
is the assessments grade (if categorical, blank otherwise) -
bb_pk1
is the primary key to the LMS data base (Blackboard) that matches the assessment item. It is possible that this will not always match if an academic deletes the item and recreates it. -
score
is the numeric score for the assessment item (if numeric, blank otherwise). -
out_of
is the maximum score that could be achieved for the assessment item (if numeric, blank otherwise). -
score_p
is the % score of the assessment item (if numeric, blank otherwise). -
sy_us
indicates a Satisfactory (SY) or Unsatisfactory (US) grade, if this is how the assessment was marked.
campus_codes
In the offering codes (formatted like ABC123_202230_W_I) the third component (the W in the example here) indicates the campus that the offering is taught from. This table matches these one or two character codes to the campus name.
data.csu.retention::campus_codes |> glimpse()
#> Rows: 25
#> Columns: 2
#> $ campus <chr> "Bathurst", "Albury-Wodonga", "Orange", "Wagga Wagga", "Du…
#> $ campus_code <chr> "B", "A", "OA", "W", "D", "UC", "WT", "C", "PT", "G", "HS"…
embedded_tutors
Part of the Retention Teams interventions is the embedded tutors
program. This table is a record of students (id
) who have
attended a tutorial session, for a particular subject
, in a
particular session
. It does not take into account multiple
tutorial sessions for the same subject in the same session.
data.csu.retention::embedded_tutors |>
mutate( # anonymising
id = "student_id", subject = "ABC123"
) |>
glimpse()
#> Rows: 1,637
#> Columns: 3
#> $ id <chr> "student_id", "student_id", "student_id", "student_id", "stude…
#> $ session <dbl> 202130, 202130, 202130, 202130, 202130, 202130, 202130, 202130…
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "A…
contact
A component of the various intervention campaigns run by the
Retention Team involve communication with students. This table attempts
to draw together records of the different types of contact that have
happened, when, with who, and how they went. It is one row per contact
attempt, for a particular student id
, in a
session
, for a campaign
, at a particular time
(contact_timestamp
).
data.csu.retention::contact |>
mutate(id = "student_id") |>
glimpse()
#> Rows: 24,756
#> Columns: 9
#> $ id <chr> "student_id", "student_id", "student_id", "student_i…
#> $ session <dbl> 202060, 202060, 202060, 202060, 202060, 202060, 2020…
#> $ campaign <chr> "pre census", "pre census", "pre census", "pre censu…
#> $ dialogue <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
#> $ contact_type <chr> "call", "call", "call", "call", "call", "call", "cal…
#> $ contact_date <date> 2020-07-27, 2020-07-27, 2020-07-27, 2020-07-27, 202…
#> $ contact_timestamp <dttm> 2020-07-27 10:48:39, 2020-07-27 14:35:41, 2020-07-2…
#> $ earliest_flag <dttm> 2020-07-27, 2020-07-27, 2020-07-27, 2020-07-27, 202…
#> $ details <chr> "outreach_campaign: HEPPP - Previous fails * * * cal…
-
dialogue
indicates if the outreach team had a meaningful conversation with the student. -
contact_type
is the method of contact, usually call, sms or email. -
contact_date
the day the contact was made. -
contact_timestamp
is the timestamp of when the contact was recorded (the data is filled out at the end of the conversation, so their might be a little delay). -
earliest_flag
indicates the earliest flag that the student was contacted for. Sometimes students raise several flags in a given campaign, however the contact is usually initiated for the first flag only. -
details
eleborate why the contact was made, and any other notes that were made by the outreach team.
enrolments
A table of the subject enrolment records for each student. It
includes when they enrolled in the subject and when they withdrew. This
data generally takes about a week to align itself with what actually
happened - so beware enrolment movements that are recent. One row per
student id
, per session
, per
offering
, per subject
- although it is
technically possible that students could be moving in and out of the
same subject on the same day.
data.csu.retention::enrolments |>
slice(1:10) |>
mutate(
id = "student_id", subject = "ABC123",
offering = str_c("ABC123", session, "W", "D", sep = "_")
) |>
glimpse()
#> Rows: 10
#> Columns: 6
#> $ id <chr> "student_id", "student_id", "student_id", "student_id", …
#> $ session <dbl> 202360, 202360, 202334, 202344, 202374, 202330, 202330, …
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "ABC12…
#> $ offering <chr> "ABC123_202360_W_D", "ABC123_202360_W_D", "ABC123_202334…
#> $ enrol_date <date> 2022-09-04, 2022-09-04, 2022-09-04, 2022-09-04, 2022-09…
#> $ withdraw_date <date> 2022-09-06, NA, 2022-09-06, 2022-09-06, 2022-09-06, NA,…
flags
As part of some of the intervention campaigns students may be
flagged as ‘at-risk’ for some reason. This is recorded
in the flags
table. Each row is for a particular
concern
, by student id
, in a
subject
/ offering
, in a particular
session
for a particular campaign
. The is also
identified by when it occured, however this data has been inconsistently
recorded over the years and may be in any of week
,
flag_timestamp
or trigger_date
.
data.csu.retention::flags |>
filter(session == 202230) |>
slice(1:10) |>
mutate(id = "student_id", subject = "ABC123",
offering = str_c("ABC123", session, "W", "D", sep = "_"),
concern_detail = "detailed comments") |>
glimpse()
#> Rows: 10
#> Columns: 11
#> $ campaign <chr> "pre census", "pre census", "pre census", "pre ce…
#> $ session <dbl> 202230, 202230, 202230, 202230, 202230, 202230, 2…
#> $ id <chr> "student_id", "student_id", "student_id", "studen…
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123",…
#> $ offering <chr> "ABC123_202230_W_D", "ABC123_202230_W_D", "ABC123…
#> $ trigger_date <dttm> 2022-03-07, 2022-03-07, 2022-03-07, 2022-03-07, …
#> $ concern <chr> "non submission", "non submission", "non submissi…
#> $ flag_timestamp <dttm> 2022-03-10, 2022-03-10, 2022-03-10, 2022-03-10, …
#> $ week <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ concern_detail <chr> "detailed comments", "detailed comments", "detail…
#> $ reason_for_no_upload <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
-
campaign
indicates which of the Retention Team campaigns this flag belongs to. The early low engagement campaign is run in week two of session and identifies students with no subject site access since beginning of session, and a friendly email is sent. The pre census campaign is run in weeks 3 and 4, before census date, and is identifying students at higher risk of failure in a select number of subjects. Students are then contacted by the outreach team. A similar campaign was run in 2020 after census date, and was called the post census campaign. The persistent low engagement campaign is run just before census date, and is a more strongly worded contact for students with still no subject site access in the days before census date. The non genuine students campaign is run after census, and is for very disengaged students who did not withdraw from subjects prior to census. The former fail campaign is no longer run by the retention team, but was an early contact of students who had performed poorly in a prior session. -
trigger_date
indicates when the flag was intended to be triggered, such as the due date for an early assessment item. -
concern
indicates why the flag was raised. This can depend on the campaign. For the pre census campaign it is usually low activity for a period of inactivity on the subject site, or non submission for not submitting an early assessment item. For the former fail campaign (no longer running) it is always prior performance. For the non genuine students campaign there are variety of concerns, hopefully explaining clearly why the student was put on the list. -
flag_timestamp
indicates when the flag happened, where data available (78.8% complete) -
week
indicates the week of the session that the flag was raised. Whereflag_timestamp
is missing this is often there instead. -
concern_detail
is any other information pertinent to the flag. -
reason_for_no_upload
indicates a reason why this particular flag was not forwarded onto the outreach team. This is usually because the student has already been flagged in this campaign as at risk.
flags_unticked
When students are deemed at risk due to missing an early assessment
item, this is checked with the academic. Some students are removed from
the flag list (perhaps they had already organised an extension). These
students do not appear on the flag list, but do appear on the
flags_unticked
table. This is a work in progress and
currently only has data for the sessions: 202230.
data.csu.retention::flags_unticked |>
mutate(id = "student_id", offering = "ABC123_202230_W_D") |>
glimpse()
#> Rows: 599
#> Columns: 4
#> $ campaign <chr> "pre census", "pre census", "pre census", "pre census", "pre …
#> $ session <dbl> 202230, 202230, 202230, 202230, 202230, 202230, 202230, 20223…
#> $ offering <chr> "ABC123_202230_W_D", "ABC123_202230_W_D", "ABC123_202230_W_D"…
#> $ id <chr> "student_id", "student_id", "student_id", "student_id", "stud…
interventions
This table aggregates data from flags
,
contact
, embedded_tutors
and other sources to
summarise what interventions have been made by the Retention Team and
how they went. It is organised as one row per student id
,
per session
, per campaign
.
data.csu.retention::interventions |>
mutate(id = "student_id", intervention_target = "ABC123_202090_W_D") |>
glimpse()
#> Rows: 27,690
#> Columns: 9
#> $ id <chr> "student_id", "student_id", "student_id", "stud…
#> $ campaign <chr> "early low engagement", "early low engagement",…
#> $ session <dbl> 202090, 202090, 202090, 202090, 202090, 202090,…
#> $ intervention_timestamp <dttm> 2020-11-24 22:39:30, 2020-11-24 22:39:30, 2020…
#> $ intervention_target <chr> "ABC123_202090_W_D", "ABC123_202090_W_D", "ABC1…
#> $ concerns <chr> "early low engagement", "early low engagement",…
#> $ intervention <chr> "email", "email", "email", "email", "email", "e…
#> $ intervention_result <chr> "one-way communication", "one-way communication…
#> $ intervention_details <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
-
intervention_timestamp
is a timestamp of when the intervention occurred, where available. -
intervention_target
indicates the target of the intervention. This is different for different campaigns. For the pre census campaign, the target will be a list of subjects / offerings that the student was flagged in. For the non genuine students campaign the target will be that students course enrolment, for instance. -
concerns
details a list of the concerns that pertain to this intervention. -
intervention
details what action was taken by the retention team, such as email or call. -
intervention_result
details a short outline of what the result of the intervention was, such as whether or not the outreach team talked to the student. -
intervention_details
includes in other information, such as notes that the outreach team may have made.
offerings
This includes data on the individual subject offerings. It is one row
per offering
.
data.csu.retention::offerings |>
slice(1:5) |>
mutate(academic_name = "Bob Katter", academic_email = "bob@csu.edu.au",
academic_id = "bkatter27") |>
glimpse()
#> Rows: 5
#> Columns: 8
#> $ offering <chr> "ACC100_201990_B_D", "ACC110_201990_B_D", "ECO13…
#> $ subject <chr> "ACC100", "ACC110", "ECO130", "ITC105", "JST110"
#> $ offering_subject_name <chr> "Accounting 1", "Accounting 2", "Business Econom…
#> $ session <dbl> 201990, 201990, 201990, 201990, 201990
#> $ pre_census_focus <lgl> TRUE, TRUE, TRUE, TRUE, TRUE
#> $ academic_name <chr> "Bob Katter", "Bob Katter", "Bob Katter", "Bob K…
#> $ academic_email <chr> "bob@csu.edu.au", "bob@csu.edu.au", "bob@csu.edu…
#> $ academic_id <chr> "bkatter27", "bkatter27", "bkatter27", "bkatter2…
-
offering_subject_name
is the name of the subject. -
pre_census_focus
indicates if the offering was part of the pre census campaign. -
academic_name
is the name of the teaching academic. -
academic_name
is the email of the teaching academic. -
academic_id
is the email of the teaching academic. This is sometimes more reliable for matching than the name.
student_ids
Data on student identifying variables. One row per student
id
, which is the university id.
data.csu.retention::student_ids |>
names() # this is all identifiable data
#> [1] "id" "firstname" "lastname" "user_id" "email" "phone"
#> [7] "first_id" "pidm"
-
firstname
andlastname
are the first and last names of the student. -
user_id
is the students LMS login id. -
email
is the students email. -
phone
their phone number. Format varies. -
first_id
is the firstid
that the student had at Charles Sturt University. -
pidm
is a unique id that should be unique for a given student. This is used to match whereid
variables are different for the same student (due to re-enrolling at a later point in their academic career).
student_demographics
This table includes the most recent student demographic data, one row
per student id
.
data.csu.retention::student_demographics |>
slice(1:10) |>
mutate(id = "student_id", firstname = "firstname", lastname = "lastname") |>
glimpse()
#> Rows: 10
#> Columns: 13
#> $ id <chr> "student_id", "student_id", "student_id", "s…
#> $ firstname <chr> "firstname", "firstname", "firstname", "firs…
#> $ lastname <chr> "lastname", "lastname", "lastname", "lastnam…
#> $ age <dbl> 52, 54, 47, 55, 47, 51, 53, 45, 47, 36
#> $ gender <chr> "Male", "Female", "Male", "Male", "Male", "F…
#> $ domesticity <chr> "Domestic", "Domestic", "Domestic", "Domesti…
#> $ atsi <chr> "Not Australian First Nations", "Not Austral…
#> $ nesb <chr> "Not NESB", "Not NESB", "Not NESB", "Not NES…
#> $ atar_group <chr> "Not Recent School Leaver", "Not Recent Scho…
#> $ parental_education <chr> "University Level", "Unknown highest parenta…
#> $ ses <chr> "Low SES", "High SES", "High SES", "Low SES"…
#> $ disability_support_status <chr> "no Disability", "no Disability", "no Disabi…
#> $ remoteness <chr> "(2) Outer Regional Australia", "(0) Major C…
-
age
is the students age at the last time they were an active student (well, a best guess at that). Age demographic data should really be done as a snapshot from active students with data from OPA.
-
gender
- students gender. -
domesticity
indicates if the student is regestered as a domestic student. -
atsi
indicates the students First Nation status. -
nesb
indicates if the student comes from a Non English Speaking Background. -
atar_group
, where available, indicates which band of marks the student received for their ATAR in regards to university entrance. This is patchy due to a large proportion of Charles Sturt students coming into university from alternative pathways later in life. -
parental_education
indicates the highest level of education the students parents received. If it is Not University Level then the student is regarded as First in Family. -
ses
is the students Social Economic Status, based on their postcode. -
disability_support_status
indicates if the student has requested support for their disability. -
remoteness
indicates how close to an urban centre the student lived, prior to university. The levels are, from more urban to more remote: Major Cities, Inner Regional, Outer Regional, Remote, Very Remote.
student_course
This table includes data on the students enrolment in a particular
course (program of study). It is one row per student id
,
per course
, per admit_session
.
data.csu.retention::student_course |>
filter(admit_session == 202030) |>
slice(89:99) |>
mutate(
id = "student_id", course = "Course Name", course_code = "10000AB"
) |>
glimpse()
#> Rows: 11
#> Columns: 16
#> $ id <chr> "student_id", "student_id", "student_id", "stu…
#> $ course <chr> "Course Name", "Course Name", "Course Name", "…
#> $ course_code <chr> "10000AB", "10000AB", "10000AB", "10000AB", "1…
#> $ course_level <chr> "Graduate Certificate", "Graduate Certificate"…
#> $ course_faculty <chr> "FOBJBS", "FOBJBS", "FOBJBS", "FOBJBS", "FOBJB…
#> $ mode <chr> "D", "D", "D", "D", "D", "D", "D", "D", "D", "…
#> $ stud_fee_type <chr> "D", "D", "P", "D", "P", "P", "D", "P", "P", "…
#> $ stud_rate_code <chr> "HXCSS", "HXCSD", "PC44D", "HXCSD", "PC44D", "…
#> $ basis_of_admission <chr> "Work and life experience", "A higher educatio…
#> $ admit_session <dbl> 202030, 202030, 202030, 202030, 202030, 202030…
#> $ catalog_year <dbl> 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020…
#> $ last_registered_session <dbl> 202090, 202060, NA, 202090, 202030, 202060, 20…
#> $ latest_leave_session <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, 202160, 20…
#> $ gpa <dbl> 6.25, 4.75, 0.00, 5.75, 4.75, 4.00, 4.67, 4.75…
#> $ points_att <dbl> 32, 32, 0, 32, 32, 48, 32, 32, 0, 32, 16
#> $ points_comp <dbl> 32, 32, 0, 32, 32, 16, 32, 32, 0, 16, 8
-
course
is the name of the course. -
course_code
is the (usually 6 character) course code. Sometimes this changes for the same course socourse
name is usually better for matching. -
course_level
is the level of the course, the most common being Bachelor Pass level. -
course_faculty
is the (abbriviated) faculty that the course is run by. -
mode
is the delivery mode for the course, D for distance, I for internal and M for mixed. stud_fee_type
stud_rate_code
-
basis_of_admission
indicates how the student was admitted to the course. -
catalog_year
relates to the commencing (check this) cohort year that the student belongs to. -
last_registered_session
is the last session the student was registered in. Should be blank for continuing students. -
latest_leave_session
records the last session the student took a leave of absence (I think). -
gpa
is the students Grade Point Average in this course. -
points_att
is the number of credit points the student has attempted. -
points_comp
is the number of credit points the student has successfully completed.
sessions
The sessions table details important dates for the teaching session.
One row per session
.
data.csu.retention::sessions |>
glimpse()
#> Rows: 120
#> Columns: 8
#> $ session <dbl> 201815, 201830, 201845, 201860, 201875, 201890, 2…
#> $ teaching_period_name <chr> "Term 1", "Session 1", "Term 2", "Session 2", "Te…
#> $ tp <chr> "T1", "S1", "T2", "S2", "T3", "S3", "T1", "S1", "…
#> $ start_date <date> 2018-01-08, 2018-02-26, 2018-05-21, 2018-07-09, …
#> $ census_date <date> 2018-02-02, 2018-03-23, 2018-06-08, 2018-08-03, …
#> $ end_date <date> 2018-04-27, 2018-06-15, 2018-08-10, 2018-10-26, …
#> $ results_release_date <date> NA, 2018-07-06, NA, 2018-11-09, NA, 2019-03-01, …
#> $ length_in_weeks <chr> "16 weeks", "16 weeks", "12 weeks", "16 weeks", "…
-
teaching_period_name
is the common name for the teaching session. -
tp
is the shortened version of the common name. -
start_date
is the date that teaching commences in the session. -
census_date
is the date that students must unenrol from a subject before accruing a debt. -
end_date
is the last teaching date of the session. -
results_release_date
is the date when results are released, where available. -
length_in_weeks
is the length of the session.
subjects
The subjects
table contains the latest subject level
data. Whilst the offering table is tied to a session, this table only
grabs the latest data, one row per subject
.
data.csu.retention::subjects |>
glimpse()
#> Rows: 2,803
#> Columns: 5
#> $ subject <chr> "EMG201", "EMG208", "MGT579", "PHL402", "PHL502", "MGM505…
#> $ subject_name <chr> "Standing Operating Procedures: Training and Exercise Dev…
#> $ faculty <chr> "FoBJaBS", "FoBJaBS", "FoBJaBS", "FoAaE", "FoAaE", "FoBJa…
#> $ school <chr> "Australian Graduate School of Policing and Security", "A…
#> $ faculty_name <chr> "Faculty of Business, Justice and Behavioural Sciences", …
-
subject_name
is the name of the subject. -
faculty_name
is the full name of the faculty that teaches the subject. -
faculty
is the abbreviated name of the faculty that teaches the subject. -
school
is the school that teaches the subject.
triggers
This table attempts to store data on the triggers used to flag
‘at-risk’ students in several of the campaigns. It is one row per
offering
per campaign
per
session
.
data.csu.retention::triggers |>
mutate(offering = "ABC123_201930_W_D", subject = "ABC123") |>
glimpse()
#> Rows: 3,175
#> Columns: 11
#> $ offering <chr> "ABC123_201930_W_D", "ABC123_201930_W_D", "ABC123_2…
#> $ campaign <chr> "pre census", "pre census", "pre census", "pre cens…
#> $ trigger_date <dttm> NA, NA, 2019-03-29, NA, 2019-03-20, NA, 2019-03-22…
#> $ subject <chr> "ABC123", "ABC123", "ABC123", "ABC123", "ABC123", "…
#> $ session <dbl> 201930, 201930, 201930, 201930, 201930, 201930, 201…
#> $ trigger <chr> "course requirement", "non submission", "non submis…
#> $ trigger_detail <chr> "did not attend res school", "did not submit, no i2…
#> $ trigger_title <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
#> $ trigger_notes <chr> NA, NA, NA, NA, "EASTS", "Online", "EASTS", "EASTS"…
#> $ value <dbl> NA, NA, 0.20, NA, 0.30, 0.10, 0.10, 0.20, 0.15, 0.1…
#> $ imputed_from_flags <lgl> FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FA…
-
trigger_date
is the date that the trigger was instigated. -
trigger
is the type of condition used to see if a student was at risk. The most common types in the pre census campaign are low activity for inactive use of the LMS and non submission for the non submission of an early assessment item. -
trigger_detail
includes more detail on the trigger item, where available. -
trigger_title
is the title of the assessment if used. -
trigger_notes
outlines any additional information relevant to the trigger used. -
value
is the % weighting of the assessment item, if that is what was used as a trigger. -
imputed_from_flags
indicates if the trigger information was missing from the Retention Team’s campaign data, and was instead imputed from the flag list. This should not happen, but…we work with the data we have.
Activity data package tables
activity
This table is an aggregate of the trace data from the Learning
Management System. It aggregates per student id
, per day
(date
), per subject offering
site.
data.csu.activity::activity |>
slice(1:10) |>
mutate(id = "student_id", offering = "ABC123_202290_W_D") |>
glimpse()
#> Rows: 10
#> Columns: 6
#> $ id <chr> "student_id", "student_id", "student_id", "student_id", "stud…
#> $ offering <chr> "ABC123_202290_W_D", "ABC123_202290_W_D", "ABC123_202290_W_D"…
#> $ date <date> 2022-11-11, 2022-11-15, 2022-11-16, 2022-11-27, 2022-11-11, …
#> $ logins <dbl> 1, 1, 3, 1, 2, 1, 1, 1, 1, 1
#> $ clicks <dbl> 1, 3, 24, 6, 18, 2, 5, 3, 10, 4
#> $ views <dbl> 0, 2, 5, 0, 7, 2, 4, 1, 3, 4
-
date
is the day the data was aggregated. The day changes at midnight, so a student working late one night into the early hours of the next morning would spread that activity of two adjacent days. -
logins
measures how many times the student logged into the subject site.
-
clicks
indicates how many times the student clicked on anything in the subject site. -
views
counts how many different pages (items) the student viewed in the subject site.
SQL for activity aggregation
This query is run on the Blackboard DDA public schema table.
/* aa by-day */
select id as id,
replace(coalesce(child_course_id, course_id), 'S-', '') as offering,
date as date,
count(distinct session_id) as logins,
count(session_id) as clicks,
count(distinct content_pk1) as views
from (select u.student_id as id,
as course_id,
cm.course_id as child_course_id,
cmchild.course_id as session_id,
aa.session_id timestamp::date as date,
aa.as content_pk1
aa.content_pk1 from activity_accumulator aa
inner join users u on u.pk1 = aa.user_pk1
inner join course_main cm on aa.course_pk1 = cm.pk1
inner join course_users cu on cu.crsmain_pk1 = cm.pk1 and u.pk1 = cu.users_pk1
left join course_main cmchild on cmchild.pk1 = cu.child_crsmain_pk1
where u.student_id is not null
and cu.role = 'S'
and u.lastname not like '%PreviewUser'
and aa.event_type != 'SESSION_TIMEOUT'
and cm.course_id similar to 'S-%202290%'
and aa.timestamp::date >= make_date(2023, 2, 1) -- one month at a time
and aa.timestamp::date < make_date(2023, 3, 1)
as tab1
) group by id, offering, date;
Exit survey data package tables
exit
data.csu.exit::exit |>
slice(1:10) |>
mutate(id = "student_id", name = "name", other_info = "other info") |>
glimpse()
#> Rows: 10
#> Columns: 19
#> $ id <chr> "student_id", "student_id", "student_id", …
#> $ name <chr> "name", "name", "name", "name", "name", "n…
#> $ date <dttm> 2016-01-14 22:56:13, 2016-01-20 22:04:46,…
#> $ program <chr> "3409BC", "1416SW", "4311HN", "9001YW", "4…
#> $ faculty_program <chr> "Faculty of Education", "Faculty of Arts",…
#> $ study_mode <chr> "I was a distance education student", "I w…
#> $ future_study <chr> "I plan to return to the same course in th…
#> $ contact_request <chr> "Yes", NA, NA, NA, NA, "No", NA, "No", NA,…
#> $ discuss_further <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ discuss_csu <chr> "No", "No", "No", "No", "No", "No", "No", …
#> $ who_discuss_csu <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA
#> $ discuss_outside <chr> "No", "Yes", "No", "Yes", "No", "Yes", "No…
#> $ who_discuss_outside <chr> NA, "Family (including partner, parents, e…
#> $ withdraw_prior_commencement <chr> "No", "No", "Yes", "Yes", "Yes", "Yes", "Y…
#> $ factors_not_to_commence <chr> NA, NA, "I was offered a position at anoth…
#> $ factors_leave <chr> NA, "Time - My family life doesn't leave m…
#> $ factors_withdraw <chr> "Time - Time required to attend Residentia…
#> $ factors <chr> "Time - Time required to attend Residentia…
#> $ other_info <chr> "other info", "other info", "other info", …
-
date
is the day the form was submitted. -
program
is the course code (yes program of study and course are the same at Charles Sturt.) -
faculty_program
is the faculty that runs the course. -
study_mode
is the self reported mode of study. -
contact_request
- did the student want to be contacted regarding their decision? -
discuss_further
- only available in more recent surveys, answers the question Are you willing to be contacted to discuss your CSU experience? -
discuss_csu
- did the student discuss their decision with anyone at CSU? -
who_discuss_csu
- who did they discuss their decision with at CSU? -
discuss_outside
- did the student discuss their decision with anyone outside of CSU? -
who_discuss_outside
- who else did they discuss their decision with? -
withdraw_prior_commencement
- did the student withdraw prior to commencing study? -
factors_not_to_commence
- reasons for not commencing, selected from a drop down list. -
factors_leave
- reasons for leaving, selected from a drop down list. -
factors_withdraw
- reasons for withdrawing, selected from a drop down list. -
factors
- all of the factors above, collected together. -
other_info
contains free text comments.