multi speaker detection using audio and video sensors
TRANSCRIPT
![Page 1: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/1.jpg)
Multi Speaker Detection And Tracking Using
Audio And Video Sensor Using Gesture Analysis
By: Abhishek M K Under the guidance of:
Manjunath Raikar Asst.Prof
Dept of CSE
![Page 2: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/2.jpg)
CONTENTS
• Introduction• What is E-Learning class?• Working• Block diagram• Types of virtualization• Conclusion• References
![Page 3: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/3.jpg)
INTRODUCTION• E-learning uses the concept of video conferencing
for interaction between students and tutors in different locations.
• The tutor’s actual presence is in a real classroom and the students can view their tutor through a video in a virtual classroom.
• Audio and video sensors are used to make the E-learning classroom more efficient.
![Page 4: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/4.jpg)
• Audio sensors such as microphone are used to receive audio input and video-sensors such as cameras are used to receive video signals.
• Gestures are used as a form of non-verbal communication.
• Multiple students asking questions at the same time can be answered by using gesture analysis.
![Page 5: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/5.jpg)
What is e-learning class
• The main objective of our work is to make E-learning classrooms as similar to normal classrooms.
• Multispeaker detection is enabled in the system and tutor’s gestures are used to make decisions.
• Both the real and the virtual classroom has cameras, as well as audio sensors.
![Page 6: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/6.jpg)
CONTINUED…
• Students who have questions will either raise their hand or talk.
• These audio video sensors will collaboratively work together and detect the first event either in the virtual or real classroom.
• The PTZ camera will zoom in onto a particular location and the focus will be on a specific student.
![Page 7: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/7.jpg)
Working• The speaker is identified by using a microphone array
and PTZ camera.
• The speaker who first talks is identified either from virtual or real classroom using audio/video signals.
• The PTZ camera and the audio sensors are used to track the students who want to speak.
• Students who gesture or speak will be put in a queue, with priority given to who gestured/speak first.
![Page 8: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/8.jpg)
CONTINUED…
• As the student who first gestures or speaks will become the focus of the camera.
• The virtual classroom is a place where the students need a screen to view the professor.
• We need three cameras for taking pictures.
• The students are localized using audio and video sensors.
![Page 9: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/9.jpg)
Fig 1: The tutor is taking class.His video will be displayed in remote classroom and remote students video will be displayed in real classroom
Fig 2: A student in the remote classroom raises his hand for doubt.His face is focussed in the real classroom as he produces the first interrupt
![Page 10: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/10.jpg)
Block diagram
Real Classroom
Audio-sensor
Video-sensor
Human voice
detector
Detecting hand
Gesture
Virtual Classroom
Audio-sensor
Video-sensor
Human voice
detector
Detecting hand
Gesture
Priority Detection System
Localization
Tutor’s Gesture Analysis
Video Sensor Focus
![Page 11: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/11.jpg)
• The Audio sensors will sense the students who are asking doubts and the video sensors will sense the images of the students.
• The audio sensor will be fed to human voice detecting system for detecting human voice and the video sensor will be used to detect hand raise of the students.
• Then we need to use priority detecting system to detect which event happens first.
![Page 12: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/12.jpg)
• After it’s prioritized, the camera will focus the particular student who asks doubts first.
• The real and remote classrooms are connected via internet.
CONTINUED…
![Page 13: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/13.jpg)
TYPES OF VIRTUALIZATION
• Audio Virtualization• Video Virtualization
![Page 14: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/14.jpg)
Audio virtualization• For Audio Localization we are using the concept of estimating
time delay between pair of microphones.
• Cross correlation between audio signals is used for getting the time delay.
• Steps for audio localization Obtain audio signals Convert to frames calculate average energy of frames If it is above a threshold it is speech Cross correlate to find the time delay
![Page 15: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/15.jpg)
Video virtualization• The students hand raise gesture as well as professors gestures
needs to be find out for taking decision in E-class.
• The Gesture analysis Algorithm works on basis of comparison between the reference frames with the frame to be checked.
• For creating reference image, we need to train the gestures of different category and save in a database.
• The captured image is compared with each of the reference frame.
• Those who get the maximum correlation will be detected as the match.
![Page 16: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/16.jpg)
Conclusion• The main purpose of the project is to make the E-
Learning classroom more natural by effectively using gesture analysis of tutor .
• E-learning classroom is a challenge but it will make the classroom more similar to a real classroom.
![Page 17: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/17.jpg)
References• [1] Remote Student Localization using Audio and Video
Processing for Synchronous Interactive E-Learning Balaji Hariharan, Aparna Vadakkepatt, Sangeeth Kumar Amrita Centre for Wireless Networks and Applications, Amrita Vishwa Vidyapeetham Kerala, India.
• [2] Sensors for Gesture Recognition Systems-IEEESignal Berman, Member, IEEE, and Helman Stern, Member, IEEE.
• [3] Robust Joint Audio-Video Localization in Video Conferencing Using Reliability Information David Lo, Rafik A. Goubran, Member, IEEE, Richard M. Dansereau, Member, IEEE, Graham Thompson, and Dieter Schulz .
![Page 18: Multi Speaker Detection using audio and video sensors](https://reader033.vdocuments.site/reader033/viewer/2022042716/55c65838bb61eb674d8b4572/html5/thumbnails/18.jpg)
THANK YOU…..