one year later: using r to automate the tedious stuff

20
One Year Later: Using R to Automate the Tedious Stuff Robert Marsh, Ph.D., P.E. North Central Michigan College MI/AIR November 8, 2018

Upload: others

Post on 21-Apr-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: One Year Later: Using R to Automate the Tedious Stuff

One Year Later: Using R to Automate the Tedious Stuff

Robert Marsh, Ph.D., P.E.North Central Michigan College

MI/AIR November 8, 2018

Page 2: One Year Later: Using R to Automate the Tedious Stuff

To talk about

• A little bit about R (not a programming demonstration)• Reasons to use programming• IR/Assessment work, before and after R

• Class cancellation break even• Assessment and LMS• Perkins Core Indicators• Enrollment patterns

• Discussion

Page 3: One Year Later: Using R to Automate the Tedious Stuff

R programming language• Freely available

• Supported by the R Foundation for Statistical Computing

• Ranked 18th most popular programming language!

• Wide variety of statistical functions and libraries (which I haven’t really used)

• Has relatively low overhead on a computer

• Flexible

• Not too hard to learn basics

• Completely REPEATABLE (like all languages)

• My interest in R’s statistics led to attending last year’s and this year’s workshop

• Took additional Coursera courses (Data Science sequence from Johns Hopkins)

• Hacked my way into some usefulness• (Hint: stackoverflow.com + Google)https://en.wikipedia.org/wiki/R_(programming_language)

Page 4: One Year Later: Using R to Automate the Tedious Stuff

Class cancellation break-even analysis

• Looks at each section’s direct costs and revenues• Tuition, adjusted by in/out district and/or discounting• Looks at FT or PT faculty expenses

• Direct per-credit overload or adjunct rate• If above load = overload rate• If below load = adjunct rate (bumped)

• Adjusts for type of retirement contribution and FICA• Adjusts for staffing company premium• Adjusts for piggy-back courses

• Looked at the impact of cancelling each section; positive or negative?

• Previously done within Excel with heavy reliance on pivot tables and look-ups

Page 5: One Year Later: Using R to Automate the Tedious Stuff

Excel action

Section Tuition after discounts TUITION_HRS Count INSTRUCTR_ID FacType Salary NetAH 107 A $7,824 4 12 69600 P $3,426 $4,398AH 116 A $12,642 2 43 68254 F $2,124 $10,518AH 130 A $2,370 3 5 212 F $2,913 -$543AH 130 B $3,387 3 8 68321 P $2,569 $818AH 130 C $1,128 3 2 212 F $2,913 -$1,785

Pivot table from student enrollment listingLook up from faculty table

Look up and calculate fromseveral sheets and pivot tables

Section Tuition Tuition hrs Count Instructor FacType Salary Net Adj Cost total Adj net CumNUR 110 A 96320 16 41 68254 F $15,832 $80,488 $96,320.00 $80,487.78 $80,487.78NUR 210 A 49770 9 37 107870 F $1,502 $48,268 $49,770.00 $48,267.74 $128,755.52NUR 230 A 49770 9 37 82500 F $9,557 $40,213 $49,770.00 $40,212.74 $168,968.26BIO 235 F 28575 5 32 108539 P $4,282 $24,293 $28,575.00 $22,492.80 $191,461.06BIO 235 C 25955 5 32 105257 P $4,282 $21,673 $25,955.00 $21,672.80 $213,133.86

Copy and pasted into new sheet

• As number of students enrolled or number of sections changed, so did rows in pivot and other tables. Had to adjust formulas,extend columns.

Page 6: One Year Later: Using R to Automate the Tedious Stuff

Using R- some key queriesID_NUM Section Fac ID Fac type Hrs Tuition code

Discount code Tuition Dual Major Code

10580 AH 116 A 68254 F 2 OD ADN Accepted Nursing 257

11171 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25711750 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25719103 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25746613 AH 116 A 68254 F 2 ID ADN Accepted Nursing 25752724 AH 116 A 68254 F 2 OD ADN Accepted Nursing 25760071 AH 116 A 68254 F 2 OD ADN Accepted Nursing 257

109548 AH 154 A 35441 E 7 OD T Phlebotomy 178112177 AH 154 A 35441 E 7 OD Nursing 256113211 AH 154 A 35441 E 7 ID DI C General Studies 600

Section Count Status Fac ID Fac type Hrs Fac salAH 116 A 44 O 68254 F 2 OAH 130 A 4 O 68321 E 3 MAH 154 A 8 O 35441 E 7 BAH 180 A 7 O 95965 P 3 MAH 240 OL A 3 O 110541 F 4 OAH 285 A 4 O 58491 E 4 AANP 110 A 16 O 12082 F 3 O

Fac ID Mutiplier114 1.3265165 1.3265212 1.3265230 1.3265242 1.3265305 1.3265345 1.3265

1381 1.32654526 1.3265

11749 1.326512082 1.186512083 1.1865

MPSERS

ORP

Code Per hr X/hrOSNA 257 38OST 0 0DODO 160 38ODNA 198 38IDNA 119 33IDDI 119 33

Student enrollment

FT faculty adjustmentsTuition ratesSections

Imported as .csv files Some adjuncts in MSPERS, some through staffing company

Page 7: One Year Later: Using R to Automate the Tedious Stuff

Resulting file (after R)

Section Tuition Expense Net Emp Senior MITW Total ID OD ProgramsAH 116 A 13790 2254.35 11535.65 0 0 0 44 23 21 ADN Accepted Nursing(44);

AH 130 A 1902 2757.66 -855.66 1 0 0 4 2 2General Studies(1); Medical Assistant-Accepted(1); Medical Billing and Coding(1); Phlebotomy(1);

AH 154 A 7490 6062.84 1427.16 0 1 0 8 4 4 Certified Nurse Aide(1); General Studies(3); Nursing(3); Phlebotomy(1);

AH 180 A 3684 2757.66 926.34 0 0 0 7 2 5Medical Assistant(2); Medical Billing and Coding(2); Phlebotomy-Accepted(3);

Discounts In/out district

• After decisions are made, this file with cancelled annotations is run through program• Results in “savings” graph

Page 8: One Year Later: Using R to Automate the Tedious Stuff

Before, after cancellation meeting

50,000

250,000

450,000

650,000

850,000

1,050,000

1,250,000

1,450,000

1,650,000

1,850,000

1 10 19 28 37 46 55 64 73 82 91 100

109

118

127

136

145

154

163

172

181

190

199

208

217

226

235

244

253

262

271

280

289

298

307

316

325

334

343

352

361

370

379

388

397

406

415

424

433

F18 class cancellation, cum + adjusted

Cum Adj Cum

~ $74,000

Page 9: One Year Later: Using R to Automate the Tedious Stuff

Using Brightspace LMS gradebook for assessment

• Per section, faculty identify• Course outcome(s) to evaluate

• BS gradebook item to use as measurement (quiz, paper, exam, presentation, etc.)

• Establish cut scores for assessment levels. Ex:• 0 – 60% = Beginning

• 61 – 85% = Developing

• 86 – 100% = Advanced

• (or by points)

• Create a spreadsheet

Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 100 70 80 90 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper

Page 10: One Year Later: Using R to Automate the Tedious Stuff

Brightspace gradebook- download as .csv

• Through “Data Hub” option within Brightspace

• Approximately 120,000 rows x 35 columns (~40 MB)(Not huge, but very cumbersome with Excel)

• Key columns used

Stu ID Course ID Course Offering Name Grade Item Id Grade Item NamePoints

NumeratorPoints

Denominator92592 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30

101856 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104220 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105288 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105330 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30105930 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 24 30108575 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30108615 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 21 30109662 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104282 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30104365 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30107966 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30

Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 30 1 21 27 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper

Page 11: One Year Later: Using R to Automate the Tedious Stuff

Main tasks for R

• Compare sections assessed listing to BS gradebook• Assign outcome number & level of achievement to each student

Stu ID Course ID Course Offering Name Grade Item Id Grade Item NamePoints

NumeratorPoints

Denominator92592 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30

101856 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104220 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105288 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30105330 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30105930 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 24 30108575 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 28 30108615 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 21 30109662 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30104282 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 30 30104365 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30107966 9657 INTRODUCTION TO LOGIC - PHL-102-OLA (Winter 2018) 15909 Logical Thinking 26 30

Yr Trm Section Outcome_num Assign_num Denom Beg Dev Adv Artifact2017 50 PHL 102 OLA 16123 15909 30 1 21 27 Essay, logical thinking2017 50 B 161 B 14563 20553 50 1 30 45 Business paper

• 120,000 x 100 = 12 million loops• Initially took 15-20 minutes• Was able to split gradebook by Grade item ID• Reduced time to 1 – 2 minutes

• Put into usable form (for Jenzabar upload)

Page 12: One Year Later: Using R to Automate the Tedious Stuff

Resulting tables for SIS

ID_Num Year Term Out_Num Seq_Num Pre_Post Dimen Rating Rate_scale CrsComp1 CrsComp2 CrsComp3 CrscComp4109934 2017 50 16123 3 6 PHL 102 OL A108638 2017 50 16123 1 6 PHL 102 OL A106998 2017 50 16123 3 6 PHL 102 OL A103132 2017 50 16123 3 6 PHL 102 OL A112138 2017 50 16123 2 6 PHL 102 OL A

Year Term CrsComp1 CrsComp2 CrsComp3 CrsComp4 CrsComp5 CrsComp6 Assign_num Out_Num Artifact2017 50 PHL 102 OL A 15909 16123 Logical Thinking2017 50 B 291 OL A 29274 21013 Writing assignment2017 50 B 291 OL A 29274 21021 Writing assignment2017 50 PSY 255 OL A 27670 15559 Developmental Assessment paper

Student records

Section records

via .csv

To Jenzabar

Page 13: One Year Later: Using R to Automate the Tedious Stuff

Perkins Core Indicator and various other State reports (michigancc.net)

• Ideal application for programming• Same year to year (so far)• Essentially counting categories• All have same requirements (race/ethnicity, gender, special pops)• All share CIP codes, degree levels and non-trad-for• Almost all have csv upload capability to michigancc.net

Page 14: One Year Later: Using R to Automate the Tedious Stuff

Excel based

• Always started with listing of students

• Ran through database (Helix for Mac)• Posted to .csv file aligned with michigancc.net upload file format• Worked OK. Difficult to make adjustments, hard to debug.

ID_NUM GENDER IPEDS_VALUE_DESC NonTrad_for Deg_date PCode DegLvl CIP_Code Ind_Dis Ec_Dis Non_Trad Single_Par Disp_HM LEP

65943 F Race and Ethnicity unknown M 8/25/17 181 2 51.0710-181 Y Y

85137 F Race and Ethnicity unknown M 12/15/17 280 3 13.1210-280 Y

87477 F Race and Ethnicity unknown M 5/4/18 181 2 51.0710-181 Y Y

104099 M Race and Ethnicity unknown F 5/4/18 113 1 50.0409-113 Y

47531 F American Indian or Alaska Native M 7/28/17 181 2 51.0710-181 Y Y Y

Field DescriptionCIPCODE CIP CodeDegLvl Degree Level IdentifierPROGNAME Name of ProgramALIENM Non-Resident Alien MenALIENF Non-Resident Alien WomenHISPANICM Hispanic/Latino MenHISPANICF Hispanic/Latino WomenNATIVEM American Indian/Alaskan Native MenNATIVEF American Indian/Alaskan Native WomenASIANM Asian American MenASIANF Asian American WomenBLACKM Black, Non-Hispanic MenBLACKF Black, Non-Hispanic WomenHAWAIIANPACIFICISLANDERM Native Hawaiian/Other Pacific Islander MenHAWAIIANPACIFICISLANDERF Native Hawaiian/Other Pacific Islander Women

WHITEM White, Non-Hispanic Men

CIPCODE DegLvl PROGNAME ALIENM ALIENF HISPANICM HISPANICF NATIVEM NATIVEF ASIANM ASIANF BLACKM BLACKF52.0201 3 Business Management 0 1 3 5 10 12 1 0 2 3

PWDM PWDF ECDISM ECDISF NONTRADM NONTRADF0 1 2 4 0 20

• Upload format

Page 15: One Year Later: Using R to Automate the Tedious Stuff

Main steps with R

• SQL query, generate csv of student listing, read into R

• Read special pops file, CIP listing, michigancc.net listing of labels

• Deduplicate students, if necessary

• Create Race-Gender and SpPops-Gender tags• Wh-M = White male

• Hi-F = Hispanic female

• ID-F = female with individual disability• ED-M = Economically disadvantaged male (could be multiple per student)

ID_NUM GENDER IPEDS_VALUE_DESC NonTrad_for Deg_date PCode DegLvl CIP_Code65943F Race and Ethnicity unknown M 8/25/17 181 2 51.0710-18185137F Race and Ethnicity unknown M 12/15/17 280 3 13.1210-28087477F Race and Ethnicity unknown M 5/4/18 181 2 51.0710-181

104099M Race and Ethnicity unknown F 5/4/18 113 1 50.0409-11347531F American Indian or Alaska Native M 7/28/17 181 2 51.0710-181

ID_Num Ind_Dis Ec_Dis Non_Trad Single_Par Disp_HM LEP8699 Y Y

10200 Y Y10225 y y y

CIP_CODE Program Deg_Lvl NT_for51.3801-295 LPN to RN Transition Program 3

51.3902-197 Certified Nurse Aide 1

52.0201-149 Management 2 F52.0201-150 Small Business Management 2 F

52.0201-220 Business Management 3 F

FieldCIPCODEDegLvlPROGNAMEALIENMALIENFHISPANICM

Page 16: One Year Later: Using R to Automate the Tedious Stuff

Perkins via R

• Create upload file

• Cumulatively add by column number (place)

Code Place/ColNo-M 4No-F 5Hi-M 6Hi-F 7Ra-M 20Ra-F 21ID-M 22ID-F 23ED-M 24

CIPCODE DegLvl PROGNAME ALIENM ALIENF HISPANICM HISPANICF NATIVEM NATIVEF ASIANM ASIANF BLACKM BLACKF

11.0301 3 Computer Information Systems 0 0 0 0 0 0 0 0 0 0

11.0901 3 Computer Networking 0 0 0 0 0 1 0 0 0 0

13.1210 3 Early Childhood Education 0 0 0 0 0 1 0 0 0 015.1301 2 Computer Aided Design 0 0 0 0 0 0 0 0 0 0

• Download as csv (write.csv), upload to michigancc.net

+ +

+

Page 17: One Year Later: Using R to Automate the Tedious Stuff

Enrollment patterns- compare year-to-year

• Important to track progress during enrollment and application periods• Total credit (tuition) hours• Demographic splits (gender, age, dual status)• Unduplicated students

• Track application pipeline• Wish to create a more up-to-date dashboard of enrollment

Page 18: One Year Later: Using R to Automate the Tedious Stuff

Enrollment patterns- compare year-to-year: ExcelID_NUM YR_TRM CREDIT HRS TUITION_HRS Section Course REG_DTE DROP_DTE GENDER BIRTH_DTE COUNTY DUAL LOC BLDG_CDE

54850 201830 1 1 OAS 103 OL A OAS103 8/4/18 12:44 8/4/18 12:45 F 2/28/85 EM I-NET I-NET54850 201830 1 1 OAS 103 OL A OAS103 8/24/18 13:05 F 2/28/85 EM I-NET I-NET91616 201830 4 5 BIO 133 A BIO133 7/23/18 16:10 M 11/24/92 CR PET HESC

91616 201830 6 10 EMS 120 A EMS120 8/3/18 9:49 M 11/24/92 CR PET HESC 77461 201830 4 5 ESC 101 A ESC101 4/11/18 10:50 F 8/1/90 EM PET HESC

77461 201830 3 3 ECE 225 A ECE225 4/11/18 11:03 8/22/18 7:08 F 8/1/90 EM PET ECE

Date Added Dropped Net Cum4/2/18 27 0 27 274/3/18 6 1 5 284/4/18 1976 23 1953 19584/5/18 850 231.5 618.5 2603.54/6/18 550 11.5 538.5 31424/7/18 120 36 84 32264/8/18 250 160.5 89.5 3315.54/9/18 520 171 349 3664.5

4/10/18 355 179 176 3840.5

SUMIFS(TUITION_HRS, Date = REG_DTE)SUMIFS(TUITION_HRS, Date = DROP_DTE)

Create list with no gaps

Page 19: One Year Later: Using R to Automate the Tedious Stuff

Query for enrollment to date, with demographics, location

• Change date to 00/00/00 format• Split by date, cum added and dropped by date• Draw cum graph• Do further splits by gender, age, location, etc.• Adds, drops, net, cums done inside program

• Also produce a cum of unduplicated students

ID_NUM YR_TRM CREDIT HRS TUITION_HRS Section Course REG_DTE DROP_DTE GENDER BIRTH_DTE COUNTY DUAL LOC BLDG_CDE54850 201830 1 1 OAS 103 OL A OAS103 8/4/18 12:44 8/4/18 12:45 F 2/28/85 EM I-NET I-NET54850 201830 1 1 OAS 103 OL A OAS103 8/24/18 13:05 F 2/28/85 EM I-NET I-NET91616 201830 4 5 BIO 133 A BIO133 7/23/18 16:10 M 11/24/92 CR PET HESC

91616 201830 6 10 EMS 120 A EMS120 8/3/18 9:49 M 11/24/92 CR PET HESC 77461 201830 4 5 ESC 101 A ESC101 4/11/18 10:50 F 8/1/90 EM PET HESC

77461 201830 3 3 ECE 225 A ECE225 4/11/18 11:03 8/22/18 7:08 F 8/1/90 EM PET ECE

DATE Cum_credits4/2/18 274/3/18 284/4/18 19584/5/18 2603.54/6/18 31424/7/18 32264/8/18 3315.54/9/18 3664.5

4/10/18 3840.5

DATE Cum_M Cum_F4/2/18 9 184/3/18 9 194/4/18 721 12374/5/18 895 1708.54/6/18 1018 21244/7/18 1038 21884/8/18 1066.5 22494/9/18 1215.5 2449

4/10/18 1296.5 2544

Page 20: One Year Later: Using R to Automate the Tedious Stuff

Cum graphs developed

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

3/21/18

3/28/18

4/4/18

4/11/18

4/18/18

4/25/18

5/2/18

5/9/18

5/16/18

5/23/18

5/30/18

6/6/18

6/13/18

6/20/18

6/27/18

7/4/18

7/11/18

7/18/18

7/25/18

8/1/18

8/8/18

8/15/18

8/22/18

8/29/18

9/5/18

9/12/18

9/19/18

9/26/18

10/3/18

10/10/18

10/17/18

10/24/18

Cum_credits

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

20000

3/21

/18

3/28

/18

4/4/

18

4/11

/18

4/18

/18

4/25

/18

5/2/

18

5/9/

18

5/16

/18

5/23

/18

5/30

/18

6/6/

18

6/13

/18

6/20

/18

6/27

/18

7/4/

18

7/11

/18

7/18

/18

7/25

/18

8/1/

18

8/8/

18

8/15

/18

8/22

/18

8/29

/18

9/5/

18

9/12

/18

9/19

/18

9/26

/18

10/3

/18

10/1

0/18

10/1

7/18

10/2

4/18

Cum credits by gender

M_Cum F_Cum