one year later: using r to automate the tedious stuff
TRANSCRIPT
One Year Later: Using R to Automate the Tedious Stuff
Robert Marsh, Ph.D., P.E.North Central Michigan College
MI/AIR November 8, 2018
To talk about
• A little bit about R (not a programming demonstration)• Reasons to use programming• IR/Assessment work, before and after R
• Class cancellation break even• Assessment and LMS• Perkins Core Indicators• Enrollment patterns
• Discussion
R programming language• Freely available
• Supported by the R Foundation for Statistical Computing
• Ranked 18th most popular programming language!
• Wide variety of statistical functions and libraries (which I haven’t really used)
• Has relatively low overhead on a computer
• Flexible
• Not too hard to learn basics
• Completely REPEATABLE (like all languages)
• My interest in R’s statistics led to attending last year’s and this year’s workshop
• Took additional Coursera courses (Data Science sequence from Johns Hopkins)
• Hacked my way into some usefulness• (Hint: stackoverflow.com + Google)https://en.wikipedia.org/wiki/R_(programming_language)
Class cancellation break-even analysis
• Looks at each section’s direct costs and revenues• Tuition, adjusted by in/out district and/or discounting• Looks at FT or PT faculty expenses
• Direct per-credit overload or adjunct rate• If above load = overload rate• If below load = adjunct rate (bumped)
• Adjusts for type of retirement contribution and FICA• Adjusts for staffing company premium• Adjusts for piggy-back courses
• Looked at the impact of cancelling each section; positive or negative?
• Previously done within Excel with heavy reliance on pivot tables and look-ups
Excel action
Section Tuition after discounts TUITION_HRS Count INSTRUCTR_ID FacType Salary NetAH 107 A $7,824 4 12 69600 P $3,426 $4,398AH 116 A $12,642 2 43 68254 F $2,124 $10,518AH 130 A $2,370 3 5 212 F $2,913 -$543AH 130 B $3,387 3 8 68321 P $2,569 $818AH 130 C $1,128 3 2 212 F $2,913 -$1,785
Pivot table from student enrollment listingLook up from faculty table
Look up and calculate fromseveral sheets and pivot tables
Section Tuition Tuition hrs Count Instructor FacType Salary Net Adj Cost total Adj net CumNUR 110 A 96320 16 41 68254 F $15,832 $80,488 $96,320.00 $80,487.78 $80,487.78NUR 210 A 49770 9 37 107870 F $1,502 $48,268 $49,770.00 $48,267.74 $128,755.52NUR 230 A 49770 9 37 82500 F $9,557 $40,213 $49,770.00 $40,212.74 $168,968.26BIO 235 F 28575 5 32 108539 P $4,282 $24,293 $28,575.00 $22,492.80 $191,461.06BIO 235 C 25955 5 32 105257 P $4,282 $21,673 $25,955.00 $21,672.80 $213,133.86
Copy and pasted into new sheet
• As number of students enrolled or number of sections changed, so did rows in pivot and other tables. Had to adjust formulas,extend columns.
Using R- some key queriesID_NUM Section Fac ID Fac type Hrs Tuition code
Discount code Tuition Dual Major Code
10580 AH 116 A 68254 F 2 OD ADN Accepted Nursing 257
11171 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25711750 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25719103 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25746613 AH 116 A 68254 F 2 ID ADN Accepted Nursing 25752724 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25760071 AH 116 A 68254 F 2 OD ADN Accepted Nursing 257
109548 AH 154 A 35441 E 7 OD T Phlebotomy 178112177 AH 154 A 35441 E 7 OD Nursing 256113211 AH 154 A 35441 E 7 ID DI C General Studies 600
Section Count Status Fac ID Fac type Hrs Fac salAH 116 A 44 O 68254 F 2 OAH 130 A 4 O 68321 E 3 MAH 154 A 8 O 35441 E 7 BAH 180 A 7 O 95965 P 3 MAH 240 OL A 3 O 110541 F 4 OAH 285 A 4 O 58491 E 4 AANP 110 A 16 O 12082 F 3 O
Fac ID Mutiplier114 1.3265165 1.3265212 1.3265230 1.3265242 1.3265305 1.3265345 1.3265
1381 1.32654526 1.3265
11749 1.326512082 1.186512083 1.1865
MPSERS
ORP
Code Per hr X/hrOSNA 257 38OST 0 0DODO 160 38ODNA 198 38IDNA 119 33IDDI 119 33
Student enrollment
FT faculty adjustmentsTuition ratesSections
Imported as .csv files Some adjuncts in MSPERS, some through staffing company
Resulting file (after R)
Section Tuition Expense Net Emp Senior MITW Total ID OD ProgramsAH 116 A 13790 2254.35 11535.65 0 0 0 44 23 21 ADN Accepted Nursing(44);
AH 130 A 1902 2757.66 -855.66 1 0 0 4 2 2General Studies(1); Medical Assistant-Accepted(1); Medical Billing and Coding(1); Phlebotomy(1);
AH 154 A 7490 6062.84 1427.16 0 1 0 8 4 4 Certified Nurse Aide(1); General Studies(3); Nursing(3); Phlebotomy(1);
AH 180 A 3684 2757.66 926.34 0 0 0 7 2 5Medical Assistant(2); Medical Billing and Coding(2); Phlebotomy-Accepted(3);
Discounts In/out district
• After decisions are made, this file with cancelled annotations is run through program• Results in “savings” graph
Before, after cancellation meeting
50,000
250,000
450,000
650,000
850,000
1,050,000
1,250,000
1,450,000
1,650,000
1,850,000
1 10 19 28 37 46 55 64 73 82 91 100
109
118
127
136
145
154
163
172
181
190
199
208
217
226
235
244
253
262
271
280
289
298
307
316
325
334
343
352
361
370
379
388
397
406
415
424
433
F18 class cancellation, cum + adjusted
Cum Adj Cum
~ $74,000
Using Brightspace LMS gradebook for assessment
• Per section, faculty identify• Course outcome(s) to evaluate
• BS gradebook item to use as measurement (quiz, paper, exam, presentation, etc.)
• Establish cut scores for assessment levels. Ex:• 0 – 60% = Beginning
• 61 – 85% = Developing
• 86 – 100% = Advanced
• (or by points)
• Create a spreadsheet
Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 100 70 80 90 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper
Brightspace gradebook- download as .csv
• Through “Data Hub” option within Brightspace
• Approximately 120,000 rows x 35 columns (~40 MB)(Not huge, but very cumbersome with Excel)
• Key columns used
Stu ID Course ID Course Offering Name Grade Item Id Grade Item NamePoints
NumeratorPoints
Denominator92592 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30
101856 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104220 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105288 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105330 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30105930 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 24 30108575 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30108615 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 21 30109662 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104282 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30104365 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30107966 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30
Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 30 1 21 27 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper
Main tasks for R
• Compare sections assessed listing to BS gradebook• Assign outcome number & level of achievement to each student
Stu ID Course ID Course Offering Name Grade Item Id Grade Item NamePoints
NumeratorPoints
Denominator92592 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30
101856 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104220 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105288 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105330 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30105930 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 24 30108575 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30108615 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 21 30109662 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104282 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30104365 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30107966 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30
Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 30 1 21 27 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper
• 120,000 x 100 = 12 million loops• Initially took 15-20 minutes• Was able to split gradebook by Grade item ID• Reduced time to 1 – 2 minutes
• Put into usable form (for Jenzabar upload)
Resulting tables for SIS
ID_Num Year Term Out_Num Seq_Num Pre_Post Dimen Rating Rate_scale CrsComp1 CrsComp2 CrsComp3 CrscComp4109934 2017 50 16123 3 6 PHL 102 OL A108638 2017 50 16123 1 6 PHL 102 OL A106998 2017 50 16123 3 6 PHL 102 OL A103132 2017 50 16123 3 6 PHL 102 OL A112138 2017 50 16123 2 6 PHL 102 OL A
Year Term CrsComp1 CrsComp2 CrsComp3 CrsComp4 CrsComp5 CrsComp6 Assign_num Out_Num Artifact2017 50 PHL 102 OL A 15909 16123 Logical Thinking2017 50 B 291 OL A 29274 21013 Writing assignment2017 50 B 291 OL A 29274 21021 Writing assignment2017 50 PSY 255 OL A 27670 15559 Developmental Assessment paper
Student records
Section records
via .csv
To Jenzabar
Perkins Core Indicator and various other State reports (michigancc.net)
• Ideal application for programming• Same year to year (so far)• Essentially counting categories• All have same requirements (race/ethnicity, gender, special pops)• All share CIP codes, degree levels and non-trad-for• Almost all have csv upload capability to michigancc.net
Excel based
• Always started with listing of students
• Ran through database (Helix for Mac)• Posted to .csv file aligned with michigancc.net upload file format• Worked OK. Difficult to make adjustments, hard to debug.
ID_NUM GENDER IPEDS_VALUE_DESC NonTrad_for Deg_date PCode DegLvl CIP_Code Ind_Dis Ec_Dis Non_Trad Single_Par Disp_HM LEP
65943 F Race and Ethnicity unknown M 8/25/17 181 2 51.0710-181 Y Y
85137 F Race and Ethnicity unknown M 12/15/17 280 3 13.1210-280 Y
87477 F Race and Ethnicity unknown M 5/4/18 181 2 51.0710-181 Y Y
104099 M Race and Ethnicity unknown F 5/4/18 113 1 50.0409-113 Y
47531 F American Indian or Alaska Native M 7/28/17 181 2 51.0710-181 Y Y Y
Field DescriptionCIPCODE CIP CodeDegLvl Degree Level IdentifierPROGNAME Name of ProgramALIENM Non-Resident Alien MenALIENF Non-Resident Alien WomenHISPANICM Hispanic/Latino MenHISPANICF Hispanic/Latino WomenNATIVEM American Indian/Alaskan Native MenNATIVEF American Indian/Alaskan Native WomenASIANM Asian American MenASIANF Asian American WomenBLACKM Black, Non-Hispanic MenBLACKF Black, Non-Hispanic WomenHAWAIIANPACIFICISLANDERM Native Hawaiian/Other Pacific Islander MenHAWAIIANPACIFICISLANDERF Native Hawaiian/Other Pacific Islander Women
WHITEM White, Non-Hispanic Men
CIPCODE DegLvl PROGNAME ALIENM ALIENF HISPANICM HISPANICF NATIVEM NATIVEF ASIANM ASIANF BLACKM BLACKF52.0201 3 Business Management 0 1 3 5 10 12 1 0 2 3
PWDM PWDF ECDISM ECDISF NONTRADM NONTRADF0 1 2 4 0 20
• Upload format
Main steps with R
• SQL query, generate csv of student listing, read into R
• Read special pops file, CIP listing, michigancc.net listing of labels
• Deduplicate students, if necessary
• Create Race-Gender and SpPops-Gender tags• Wh-M = White male
• Hi-F = Hispanic female
• ID-F = female with individual disability• ED-M = Economically disadvantaged male (could be multiple per student)
ID_NUM GENDER IPEDS_VALUE_DESC NonTrad_for Deg_date PCode DegLvl CIP_Code65943F Race and Ethnicity unknown M 8/25/17 181 2 51.0710-18185137F Race and Ethnicity unknown M 12/15/17 280 3 13.1210-28087477F Race and Ethnicity unknown M 5/4/18 181 2 51.0710-181
104099M Race and Ethnicity unknown F 5/4/18 113 1 50.0409-11347531F American Indian or Alaska Native M 7/28/17 181 2 51.0710-181
ID_Num Ind_Dis Ec_Dis Non_Trad Single_Par Disp_HM LEP8699 Y Y
10200 Y Y10225 y y y
CIP_CODE Program Deg_Lvl NT_for51.3801-295 LPN to RN Transition Program 3
51.3902-197 Certified Nurse Aide 1
52.0201-149 Management 2 F52.0201-150 Small Business Management 2 F
52.0201-220 Business Management 3 F
FieldCIPCODEDegLvlPROGNAMEALIENMALIENFHISPANICM
Perkins via R
• Create upload file
• Cumulatively add by column number (place)
Code Place/ColNo-M 4No-F 5Hi-M 6Hi-F 7Ra-M 20Ra-F 21ID-M 22ID-F 23ED-M 24
CIPCODE DegLvl PROGNAME ALIENM ALIENF HISPANICM HISPANICF NATIVEM NATIVEF ASIANM ASIANF BLACKM BLACKF
11.0301 3 Computer Information Systems 0 0 0 0 0 0 0 0 0 0
11.0901 3 Computer Networking 0 0 0 0 0 1 0 0 0 0
13.1210 3 Early Childhood Education 0 0 0 0 0 1 0 0 0 015.1301 2 Computer Aided Design 0 0 0 0 0 0 0 0 0 0
• Download as csv (write.csv), upload to michigancc.net
+ +
+
Enrollment patterns- compare year-to-year
• Important to track progress during enrollment and application periods• Total credit (tuition) hours• Demographic splits (gender, age, dual status)• Unduplicated students
• Track application pipeline• Wish to create a more up-to-date dashboard of enrollment
Enrollment patterns- compare year-to-year: ExcelID_NUM YR_TRM CREDIT HRS TUITION_HRS Section Course REG_DTE DROP_DTE GENDER BIRTH_DTE COUNTY DUAL LOC BLDG_CDE
54850 201830 1 1 OAS 103 OL A OAS103 8/4/18 12:44 8/4/18 12:45 F 2/28/85 EM I-NET I-NET54850 201830 1 1 OAS 103 OL A OAS103 8/24/18 13:05 F 2/28/85 EM I-NET I-NET91616 201830 4 5 BIO 133 A BIO133 7/23/18 16:10 M 11/24/92 CR PET HESC
91616 201830 6 10 EMS 120 A EMS120 8/3/18 9:49 M 11/24/92 CR PET HESC 77461 201830 4 5 ESC 101 A ESC101 4/11/18 10:50 F 8/1/90 EM PET HESC
77461 201830 3 3 ECE 225 A ECE225 4/11/18 11:03 8/22/18 7:08 F 8/1/90 EM PET ECE
Date Added Dropped Net Cum4/2/18 27 0 27 274/3/18 6 1 5 284/4/18 1976 23 1953 19584/5/18 850 231.5 618.5 2603.54/6/18 550 11.5 538.5 31424/7/18 120 36 84 32264/8/18 250 160.5 89.5 3315.54/9/18 520 171 349 3664.5
4/10/18 355 179 176 3840.5
SUMIFS(TUITION_HRS, Date = REG_DTE)SUMIFS(TUITION_HRS, Date = DROP_DTE)
Create list with no gaps
Query for enrollment to date, with demographics, location
• Change date to 00/00/00 format• Split by date, cum added and dropped by date• Draw cum graph• Do further splits by gender, age, location, etc.• Adds, drops, net, cums done inside program
• Also produce a cum of unduplicated students
ID_NUM YR_TRM CREDIT HRS TUITION_HRS Section Course REG_DTE DROP_DTE GENDER BIRTH_DTE COUNTY DUAL LOC BLDG_CDE54850 201830 1 1 OAS 103 OL A OAS103 8/4/18 12:44 8/4/18 12:45 F 2/28/85 EM I-NET I-NET54850 201830 1 1 OAS 103 OL A OAS103 8/24/18 13:05 F 2/28/85 EM I-NET I-NET91616 201830 4 5 BIO 133 A BIO133 7/23/18 16:10 M 11/24/92 CR PET HESC
91616 201830 6 10 EMS 120 A EMS120 8/3/18 9:49 M 11/24/92 CR PET HESC 77461 201830 4 5 ESC 101 A ESC101 4/11/18 10:50 F 8/1/90 EM PET HESC
77461 201830 3 3 ECE 225 A ECE225 4/11/18 11:03 8/22/18 7:08 F 8/1/90 EM PET ECE
DATE Cum_credits4/2/18 274/3/18 284/4/18 19584/5/18 2603.54/6/18 31424/7/18 32264/8/18 3315.54/9/18 3664.5
4/10/18 3840.5
DATE Cum_M Cum_F4/2/18 9 184/3/18 9 194/4/18 721 12374/5/18 895 1708.54/6/18 1018 21244/7/18 1038 21884/8/18 1066.5 22494/9/18 1215.5 2449
4/10/18 1296.5 2544
Cum graphs developed
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
3/21/18
3/28/18
4/4/18
4/11/18
4/18/18
4/25/18
5/2/18
5/9/18
5/16/18
5/23/18
5/30/18
6/6/18
6/13/18
6/20/18
6/27/18
7/4/18
7/11/18
7/18/18
7/25/18
8/1/18
8/8/18
8/15/18
8/22/18
8/29/18
9/5/18
9/12/18
9/19/18
9/26/18
10/3/18
10/10/18
10/17/18
10/24/18
Cum_credits
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
3/21
/18
3/28
/18
4/4/
18
4/11
/18
4/18
/18
4/25
/18
5/2/
18
5/9/
18
5/16
/18
5/23
/18
5/30
/18
6/6/
18
6/13
/18
6/20
/18
6/27
/18
7/4/
18
7/11
/18
7/18
/18
7/25
/18
8/1/
18
8/8/
18
8/15
/18
8/22
/18
8/29
/18
9/5/
18
9/12
/18
9/19
/18
9/26
/18
10/3
/18
10/1
0/18
10/1
7/18
10/2
4/18
Cum credits by gender
M_Cum F_Cum