![Page 1: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/1.jpg)
Revolutionary Voice Enhancement in Real-Time Communications
with GPU
Davit Baghdasaryan, CEO, 2HzArto Minasyan, CTO, 2Hz
![Page 2: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/2.jpg)
2
![Page 3: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/3.jpg)
![Page 4: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/4.jpg)
Mute Background Noises
![Page 5: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/5.jpg)
Voice Quality with Deep Learning
5
•Mute Background Noise
•Mute Everyone Except Me
•Remove Room Echo
•High Resolution Voice Everywhere
![Page 6: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/6.jpg)
6
Real-Time Noise Suppression with Deep Learning
![Page 7: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/7.jpg)
7
-Requires 2-4 mics
-Runs on edge device
-Cancels only limited noises
-Outbound only
Traditional Noise Cancellation
![Page 8: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/8.jpg)
8
Train krispNet Deep Neural Network
Background Noises
Clean Human Speeches
Deep Learning powered Noise Cancellation
-No dependency on mics
-Bi-directional
-Cancels all noise types
-Runs everywhere - on device and in the cloud
![Page 9: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/9.jpg)
9
How to Measure Voice Quality?
![Page 10: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/10.jpg)
10
- Academia - PESQ, Subjective
- Industry - 3QUEST (Speech MOS, Noise MOS, Global MOS)
- Skype Audio Test and 3GPP TS 26.131 specifications
Industry Standards
![Page 11: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/11.jpg)
11
Audio Lab
![Page 12: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/12.jpg)
12
![Page 13: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/13.jpg)
13
krisp.ai
![Page 14: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/14.jpg)
![Page 15: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/15.jpg)
Seamlessly Integrates in Conferencing Apps
Supports any Microphone or Headset
![Page 16: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/16.jpg)
![Page 17: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/17.jpg)
17
krisp.aiBest Product in Audio/Voice 2018
![Page 18: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/18.jpg)
18
Training and Inference
![Page 19: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/19.jpg)
19
Training Process
![Page 20: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/20.jpg)
20
- 2K distinct speakers - gender and age diverse distribution
- >10K distinct noises - babble, construction, traffic, cafeteria, office, etc
- 2000+ hours
Training Data
![Page 21: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/21.jpg)
21
- All in Python
- Distributed TensorFlow
- Multiple in-house NVIDIA 1080ti. Takes a full week.
- p2.16xlarge in AWS. 16x NVIDIA K80
Training on GPUs
![Page 22: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/22.jpg)
22
- Supports NVIDIA, Intel and ARM platforms
- All in C/C++. Sometimes ASM
- Smaller network (5x boost with some quality penalty)
- TensorRT boosts ~2x
Inference
![Page 23: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/23.jpg)
23
Moving to the Cloud
![Page 24: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/24.jpg)
24
Server-side Noise Cancellation
![Page 25: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/25.jpg)
25
Latency Constraints
200ms end to end latency
Codecs and other DSP (10-80ms) Network (varies)
DNN Compute ( < 5ms)
DNN Algorithmic (15ms)
< 20ms
![Page 26: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/26.jpg)
26
How do you scale to 100K+ concurrent streams with such latency constraints?
Ex. Discord processes 2.5M concurrent audio streams
![Page 27: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/27.jpg)
27
10x-20x less costly
…
CPU Servers
GPU Servers
![Page 28: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/28.jpg)
28
Scalability with Batching
![Page 29: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/29.jpg)
29
Ultimate Quality
Remove Noise
Remove Room Echo
Expand Voice HD
Audio Frame
Ultimate Quality Audio Frame
} 5ms
![Page 30: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/30.jpg)
30
Maximum Quality and Scale withNVIDIA Tensor Cores
![Page 31: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/31.jpg)
31
TensorRT is pretty awesome
0
750
1500
2250
3000
P100 V100 K80 T4
TensorFlow Batching TensorRT Batching
![Page 32: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/32.jpg)
32
T4 and V100 are both awesome
0
1250
2500
3750
5000
P100 V100 T4
FP32 FP16
![Page 33: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/33.jpg)
33
1. Voice Quality Enhancement is moving to the Cloud
2. For large scale deployments we need GPUs
3. T4 and V100 GPUs are most efficient for this
Key Takeaways
![Page 34: Revolutionary Voice Enhancement in Real-Time Communications · Revolutionary Voice Enhancement in Real-Time Communications with GPU Davit Baghdasaryan, CEO, 2Hz Arto Minasyan, CTO,](https://reader035.vdocuments.site/reader035/viewer/2022070909/5f90e9de84b6960479188b28/html5/thumbnails/34.jpg)
34
Thank You!
Booth #247