paul wang soed 2016
TRANSCRIPT
Video-based Big Data Analytics in Cyberlearning
Shuangbao (Paul) Wang, Ph.D. Professor, Director
Center for Security Studies
1913
1920 * 1080 * 50 * 60 -- 3000 * 2M -- 6G
2016
Videos in Cyberlearning – big data
• Video use is growing rapidly in education (and elsewhere) • MOOCs (ex. EdX, Coursera, Udacity) rely on videos • Huge repositories (ex. RBDIL, NSDL) contain
extraordinary amounts of valuable video data • Videos are big data, unstructured
– Hardly being analyzed by current data analytics tools
• Cyberlearning requires more interactions.
RBDIL – Rutgers University
NSDL – National Science Digital Library
Using Videos Effectively in Learning
• Interactive? (DoD) • Assessments ? (adaptive) • Easy for instructors to use? (course development) • Accessibility? (Mac, mobile) • Track students’ growth? (over multiple years) • Long videos vs. short ones? (crop) • Recording methodology? (noises and echo)
inVideo - A Novel Big Data Analytics Tool for Video Data Analytics
• Analyzing Video by Keywords • Content Based Image Retrieval (CBIR) • Pattern Recognition (PR) • Multiple Languages
inVideo: Analyzing Video by Keywords
• Audio is stripped and used to generate a transcript • Transcript is indexed back to original media • Video is now searchable/mineable by keyword
Result shows that 7 video clips from three videos were retrieved for keyword “online”
inVideo: Content Based Image Recognition (CBIR)
• Provide a picture reference • Search video content (frames) that contains the reference picture • Return the video clips
Result shows that the match is at 0.05th sec. in video named “student”
inVideo: Pattern Recognition (PR)
• Provide a keyword reference • Search video content (frames) that contains the object described as
“keyword”
The results shows three videos were retrieved that contain objects look like the keyword “credit card”
inVideo: Analyzing Different Languages
• Input keywords in other languages • Search transcript for keywords in that language • Retrieve video clips that match
The results shows two video clips in one video contain the keyword “学生” (the word “student” in Chinese).
Clip
#0 Introduction
Clip
#1 Pwd cracking
Clip
#2 Port scan
Clip
#3 Encryption
Clip
#4 Forensics
Clip
#5 Cyber
weapon
Video: Linear – to -- Interactive
Before: A 46-minute long video
After: 2-3 minute video clips with assessments in between and at the end
Assessments
• Learning objects composed of short video clips
• Assessment of learning outcomes of studying video content
• Teachers: selecting a video segment and assign Q&As
Drag the stage bar and click “From” button; continue dragging and then click “To” button. Add a question and answers.
Define “Learning Objects” (Instructors)
Learning and Assessments
• View the whole video, and take a quiz
• Review the video clip corresponding to the question
Click “Review” button to review the video clip; click the speak icon to speak out the question; click “Confirm” to check your answer
Case Study: Cybersecurity Program
Student Engagement for the 24 Classrooms
inVideo: Turn videos into interactive learning contents
Low Accuracy
Video1: 45 Video Clips
Video1: 29 Video Clips
Video3: 29 Video Clips Video1: Individual Video 2: Small Class Video 3: Full Classroom
Accuracy of transcripts of 9 video clips from three original videos
SDLC
Accuracy comparison: • “hits rate” before –
standard parameters; • “hits rate” after –
revised parameters
No improvement!
Correlations?
low accuracy vs. recording methods • Low accuracy
– 10% or less – Individual22, (45+31+29 video clips)
• Medium accuracy – 40 to 60% – EdX_EDM, EdX_ajax, (20 video clips)
• High accuracy – 90% or higher – Phone.p2, online_shopping, (30 video clips)
Online Shopping A=90%
Individual 22 A=10%
rfeb07 A=10%
phone.p2 A=90%
Phone.p2 A=90%
r002 A=10%
edX. ajax A=60%
edX.EDM A=50%
Voice-over re-Recording
• Re-recorded voices on videos
• Merge audio track with original videos
• Signal analyzing while recording
• Accuracy significantly improved!
Correlations
• Low accuracy is expressed in high quefrency – A measurement of ambient noises – echo
• Recording methods – One microphone (per person) – Used condenser microphone instead of dynamic one
• Recording setting could affect the audio quality (for digital processing) – Experiments – Guide to digital recording
Transcript Time-stamping System (TTS)
• Adding timestamps for already transcribed videos
• Fuzzy Search
Further Discussions
• Web API • Search progression (over the years) • Voice cancellation/reduction • Automatic Time-stamping • Curriculum Development • Build community - collaborations
Publications
Shuangbao (Paul) Wang, Ph.D.
William Kelly, Metonymy Corporation
Xiaolong Cheng, Doctoral Candidate, George Washington University