educational question routing in online student communities
TRANSCRIPT
Educational Question Routing in Online Student Communities
Jakub Macina Slovak University of Technology
Ivan Srba Slovak University of Technology
Joseph Jay Williams Harvard University / National University of Singapore
Maria Bielikova Slovak University of Technology
11th ACM Conference on Recommender Systems, Como, Italy, 27th-31st August 2017
2 / 31
Online Student Communities
• Massive Open Online Courses (MOOCs)
Dropout rate up to 94%
Community Question Answering (CQA)
Discussion forum
3 / 31
Challenge for MOOC Discussions
• Up to 50% of unanswered questions
• Course instructors are overloaded with many students to serve
• Low participation of students in question answering
1. Lurkers who are not contributing
2. Willing to participate but overloaded with many questions
4 / 31
Our Idea: Question Routing
• Recommendation of new questions to users who are suitable to answer them
• Well-known research task from CQA systems
What is capital city of Italy?
5 / 31
Related Work
• Question routing in standard CQA
• Asker-oriented approaches
• Overloading small group of experts
• Based mainly on QA data
6 / 31
Related Work
• Question routing in standard CQA
• Asker-oriented approaches
• Overloading small group of experts
• Based mainly on QA data
• Question recommendation in MOOCs
• Constraints optimization framework (Yang et al 2014)
• Any question beneficial to user
• With significant time-delay
Not appropriate for MOOCs
8 / 31
Educational Question Routing Task
Given new question 𝑞
find an ordered list of users 𝑢1, … , 𝑢𝑛who are most suitable to answer question 𝑞
Opportunities
Data from MOOC course (grades, accomplished exercises)
Constraints
Appropriate knowledge
Willingness to answer
Working capacity
9 / 31
Goals of Educational Question Routing
• G1: Decrease information load of users by accurate recommendations
• G2: Engage a greater part of the community in the question answering
• G3: Increase an average number of contributions
13 / 31
1. Construction of Question Profile
• Question text profile 𝜃𝑞
• Captures question’s content
• Text pre-processing
• Bag-of-words model (tf-idf weights)
• Metadata
• Asker, category, etc.
14 / 31
2. Construction of User Profile
• User text profile 𝜃𝑢
• Captures topics of question user previously answered
(user’s interests)
𝜃𝑢 =
𝑞𝜖𝑄𝑢
(𝜃𝑞 + 𝜃𝑎,𝑞)
• Metadata about previous user activities
• Quantity, quality and time distribution
• In CQA and MOOC
15 / 31
3. Matching of Questions and Users
• Ranking of users given new question
• Ensemble of two classification tasks:
• Appropriate expertise to answer a new question
• Willingness to answer a new question
• Combination:
𝑃(𝑦 = 1) = 𝑃(𝑒𝑥𝑝𝑒𝑟𝑡𝑖𝑠𝑒 = 1) ∗ 𝑃(𝑤𝑖𝑙𝑙𝑖𝑛𝑔𝑛𝑒𝑠𝑠 = 1)
16 / 31
3. Matching of Questions and Users• Features derived from text and metadata comparison between
question and user profile
• Features for expertise classification (# of features = 11)
• Level of difficulty for a user to answer a new question - knowledge gap
• Portion of related lectures watched
• Grades
• Features for willingness classification (# of features = 14)
• Overall count of answers, questions and comments
• Amount of latest activity
• Response time on rec.
18 / 31
Experiments – CQA system
• Educational and organizational CQA system Askalot
• Open source, developed at Slovak University of Technology
• Builds on diversity in students’ knowledge and educational/organizational specifics
• University/MOOC variant
github.com/AskalotCQA/askalot [email protected]
19 / 31
Experiments - MOOC
• QuCryptox Quantum Cryptography at edX
• Offered by Caltech and TU Delft
• 10 weeks (Sept. 2016 – Dec. 2016)
https://courses.edx.org/courses/course-v1:CaltechDelftX+QuCryptox+3T2016
20 / 31
Course Statistics
Metric Quantity
Students enrolled in the course 8115
Students started the course 4618
Users participating in CQA (contributors + lurkers) 1098 (24%)
Users contributing in CQA 377 (8%)
Questions 361
Answers 386
Comments 476
21 / 31
Evaluation Methodology
• Offline experiment
• Online experiment
• Very rare in context of CQA systems
• Ecologically valid
• Total impact on student community
• Baseline: non-educational asker-oriented question routing method with optimization
22 / 31
Offline Experiment Setup
• Standard ML pipeline including:
• Feature transformation
• Feature selection
• Chi square selection
• Model selection
• SVM, Random forest, Logistic regression
• Hyper-parameter tuning
24 / 31
Online Experiment Setup
• A/B test during 7 weeks
• Stratified random assignment to three groups:
1. Educational (n=1306)
2. Baseline (n=1306)
3. Control (n=1306)
• Recommendation to top 10 users
• Constraint for workload 𝐿𝑢
• maximum 4 recommendations per 7 days
• Real-time profile updates, re-training each day
26 / 31
Online Experiment Results
• 132 new questions were routed to potential answerers
• Resulting in 2640 recommendations
27 / 31
G1: Accurate Recommendations Decreased Information Load
Metric Our method Baseline Statistical significance
CTR 23.25% 18.29% 𝜒2 1, 𝑁 = 2640 = 10.03, 𝑝 < 0.01
Success@10 15.91% 10.61% 𝜒2 1, 𝑁 = 264 = 1.61, 𝑝 = 0.20
28 / 31
G2: Greater Part of the Community Got Involved
Period Our method Baseline Control
Before 7.60% (62/816) 8.99% (73/812) 9.12% (74/811)
During 13.16% (40/304) 9.35% (26/278) 8.72% (28/321)
Active CQA users / active MOOC users
29 / 31
G3: Average Number of Contributions Increased
Before experiment During experiment(with recommendation)
30 / 31
Possible Improvements
• Duplicate questions identification
• Question retrieval (another well-known task in CQA)
• Question type identification
• Some questions can be answered only by instructors
• Scalability
31 / 31
Educational Question Routing in Online Student Communities
1. Answerer-oriented question routing framework considering not only expertise, but also willingness and workload of answerers
2. Incorporating additional MOOC data beyond CQA activity
3. Effectiveness in real world is demonstrated by online experiment with more than 4600 MOOC students. Code available at:
https://github.com/dmacjam/dp-analysis-evaluation