paul wang soed 2016

Video-based Big Data Analytics in Cyberlearning

Shuangbao (Paul) Wang, Ph.D. Professor, Director

Center for Security Studies

1913

1920 * 1080 * 50 * 60 -- 3000 * 2M -- 6G

2016

Videos in Cyberlearning – big data

• Video use is growing rapidly in education (and elsewhere) • MOOCs (ex. EdX, Coursera, Udacity) rely on videos • Huge repositories (ex. RBDIL, NSDL) contain

extraordinary amounts of valuable video data • Videos are big data, unstructured

– Hardly being analyzed by current data analytics tools

• Cyberlearning requires more interactions.

RBDIL – Rutgers University

NSDL – National Science Digital Library

Using Videos Effectively in Learning

• Interactive? (DoD) • Assessments ? (adaptive) • Easy for instructors to use? (course development) • Accessibility? (Mac, mobile) • Track students’ growth? (over multiple years) • Long videos vs. short ones? (crop) • Recording methodology? (noises and echo)

inVideo - A Novel Big Data Analytics Tool for Video Data Analytics

• Analyzing Video by Keywords • Content Based Image Retrieval (CBIR) • Pattern Recognition (PR) • Multiple Languages

inVideo: Analyzing Video by Keywords

• Audio is stripped and used to generate a transcript • Transcript is indexed back to original media • Video is now searchable/mineable by keyword

Result shows that 7 video clips from three videos were retrieved for keyword “online”

inVideo: Content Based Image Recognition (CBIR)

• Provide a picture reference • Search video content (frames) that contains the reference picture • Return the video clips

Result shows that the match is at 0.05th sec. in video named “student”

inVideo: Pattern Recognition (PR)

• Provide a keyword reference • Search video content (frames) that contains the object described as

“keyword”

The results shows three videos were retrieved that contain objects look like the keyword “credit card”

inVideo: Analyzing Different Languages

• Input keywords in other languages • Search transcript for keywords in that language • Retrieve video clips that match

The results shows two video clips in one video contain the keyword “学生” (the word “student” in Chinese).

Clip

#0 Introduction

Clip

#1 Pwd cracking

Clip

#2 Port scan

Clip

#3 Encryption

Clip

#4 Forensics

Clip

#5 Cyber

weapon

Video: Linear – to -- Interactive

Before: A 46-minute long video

After: 2-3 minute video clips with assessments in between and at the end

Assessments

• Learning objects composed of short video clips

• Assessment of learning outcomes of studying video content

• Teachers: selecting a video segment and assign Q&As

Drag the stage bar and click “From” button; continue dragging and then click “To” button. Add a question and answers.

Define “Learning Objects” (Instructors)

Learning and Assessments

• View the whole video, and take a quiz

• Review the video clip corresponding to the question

Click “Review” button to review the video clip; click the speak icon to speak out the question; click “Confirm” to check your answer

Case Study: Cybersecurity Program

Student Engagement for the 24 Classrooms

inVideo: Turn videos into interactive learning contents

Low Accuracy

Video1: 45 Video Clips

Video1: 29 Video Clips

Video3: 29 Video Clips Video1: Individual Video 2: Small Class Video 3: Full Classroom

Accuracy of transcripts of 9 video clips from three original videos

SDLC

Accuracy comparison: • “hits rate” before –

standard parameters; • “hits rate” after –

revised parameters

No improvement!

Correlations?

low accuracy vs. recording methods • Low accuracy

– 10% or less – Individual22, (45+31+29 video clips)

• Medium accuracy – 40 to 60% – EdX_EDM, EdX_ajax, (20 video clips)

• High accuracy – 90% or higher – Phone.p2, online_shopping, (30 video clips)

Online Shopping A=90%

Individual 22 A=10%

rfeb07 A=10%

phone.p2 A=90%

Phone.p2 A=90%

r002 A=10%

edX. ajax A=60%

edX.EDM A=50%

Voice-over re-Recording

• Re-recorded voices on videos

• Merge audio track with original videos

• Signal analyzing while recording

• Accuracy significantly improved!

Correlations

• Low accuracy is expressed in high quefrency – A measurement of ambient noises – echo

• Recording methods – One microphone (per person) – Used condenser microphone instead of dynamic one

• Recording setting could affect the audio quality (for digital processing) – Experiments – Guide to digital recording

Transcript Time-stamping System (TTS)

• Adding timestamps for already transcribed videos

• Fuzzy Search

Further Discussions

• Web API • Search progression (over the years) • Voice cancellation/reduction • Automatic Time-stamping • Curriculum Development • Build community - collaborations

Publications

Shuangbao (Paul) Wang, Ph.D.

[email protected]

William Kelly, Metonymy Corporation

Xiaolong Cheng, Doctoral Candidate, George Washington University

paul wang soed 2016

Education