audio synchronization...

33
MPEG-4 Audio Synchronization Masayuki Nishiguchi, Shusuke Takahashi, Akira Inoue Oct 22, 2014 Sony Corporation

Upload: others

Post on 05-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

MPEG-4 Audio Synchronization

Masayuki Nishiguchi, Shusuke Takahashi, Akira Inoue

Oct 22, 2014

Sony Corporation

Page 2: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Use case

Synchronization Scheme

Audio Feature Extraction tool (Normative)

Audio Feature Similarity Calculation Tool (Informative)

Performance evaluation

Conclusion

Agenda

Page 3: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Audio Synchronization Use case of “Second Screen” Application

Audio Signal of Main Media Stream

Receiver

Sub Device(2nd screen)

Audio FeatureOf Main Media Stream

Main Media Stream

Sub Media Stream

Transmitter

Internet (IP)

Main Device(1st screen)

• foreign language audio tracks• audio commentary• closed caption information• audio/visual contents recorded from various angles• high quality audio/visual contents• advertisement

Synchronization using Audio Feature !

1st screen 2nd screen

Page 4: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

(Synchronous in time)

Synchronization Scheme

DeMUX & Playback

Main Device

DeMUX

Sub Device

Audio Feature Extraction

Audio Feature Similarity

Calculation

Timing Adjustment& Playback

Audio Signal of Main Media Stream

Audio Playback device(such as speaker)

Audio Recording Device(such as microphone)

Audio Feature ofMain Media Stream

(Transmitted)

Sub Media Stream

Audio Feature ofMain Media Stream(Extracted)

Sub MediaPresentation

Main MediaPresentation

Synchronization Information

noise

Main Media Stream

Sub Media Stream

MUX

Audio Feature Extraction

Audio Feature of Main Media StreamAudio Signal

Multiplexed Data Stream- Sub Media Stream- Audio Feature of Main Media Stream

Receiver (Sub device)Transmitter

1st screen

2nd screen

Page 5: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Audio Feature Extraction tool (Normative)

Page 6: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 7: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Overall Signal Flow

Input Signal(After Pre emphasis filter)

Auto Correlation

Confidence Measure for thisband is less than threshold

Integrated Auto Correlation

Band split signal

Integrate together into singleAuto Correlation

split the audio signalsinto 5 equally spacedfrequency bands in logfrequency domain

1 0.97 ∙

converted into a 128-bit length feature vector

Prominent peak : 1Otherwise : 0

Page 8: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 9: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Framing

Input

Output

……

……

Feature Extraction

Audio FeatureFrame Rate Conversion

Input frame length: 32msec (256samples), Hamming Window

Input frame interval: 8msec (64sample)

Output frame interval: 8msec or 32msec (audio_sync_feature_time_resolution)

Page 10: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 11: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Filter Bank

For each audio frame, a pre-emphasis filter is applied to emphasize the high frequency, then band pass filtering is applied in order to split the audio signals into 5 equally spaced frequency bands in log frequency domain.

Page 12: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 13: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Auto-correlation

∙ , 0 , 0

0 0 , 0

For each band, Auto-correlation is calculated using:

The Auto-correlation is normalized using:

For each frequency band , confidence measure is calculated based on the auto-correlation value.

max , 0

: input frame length, : index of frequency band: index of lag for autocorrelation : order of auto-correlation and is set to 128,: index of the input audio signal. : number of frequency bands and is set 5

Page 14: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 15: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Integration

∑ ∙∑

, 0

0, 0.31, 0.3

The normalized auto-correlation function values derived from each sub-band are summed together into a single integrated auto-correlation function.

where is defined as following

Page 16: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram of Audio Feature Extraction tool

Framing

PeakDetection

Audio Signal(fs=8kHz)

Filt

er

bank

Auto-correlation

Inte

gra

tion

Audio Feature…

Audio FeatureFrame Rate Conversion

Pre Emphasis

Page 17: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Peak Detection

The integrated auto-correlation function is converted into a 128-bit length feature vector and each bit position corresponds to the lag of the auto-correlation function.

(lag)

……

0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0……

Page 18: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Audio Feature Similarity Calculation Tool (Informative)

Page 19: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram Audio Feature Similarity Calculation Tool (Informative)

Time difference between

audio signals

Audio Feature Sequence #1

Audio FeatureBlock Extraction

Block SimilarityCalculation

Audio Feature Sequence #2

Audio Feature Frame Rate Conversion

Audio FeatureBlock Extraction

Audio Feature Frame Rate Conversion

Page 20: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram Audio Feature Similarity Calculation Tool (Informative)

Time difference between

audio signals

Audio Feature Sequence #1

Audio FeatureBlock Extraction

Block SimilarityCalculation

Audio Feature Sequence #2

Audio Feature Frame Rate Conversion

Audio FeatureBlock Extraction

Audio Feature Frame Rate Conversion

Page 21: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Extraction

Audio Feature sequence #1

Audio Feature sequence #2

, , , … . , , 0 N N

, , , … . , , 0 N N

The blocks are generated by concatenating the consecutive audio features

……

……

Block Similarity Calculation is performed between two blocks of audio features.

N

N

N

Page 22: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Diagram Audio Feature Similarity Calculation Tool (Informative)

Time difference between

audio signals

Audio Feature Sequence #1

Audio FeatureBlock Extraction

Block SimilarityCalculation

Audio Feature Sequence #2

Audio Feature Frame Rate Conversion

Audio FeatureBlock Extraction

Audio Feature Frame Rate Conversion

Page 23: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Block Similarity Calculation

J A , B ∩

time

τ

A B ∩ ∪

∪410 0.4

Example

Block Similarity between and is calculated as follows:

Page 24: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Time Difference Estimation

i

j

τ1 Ng’τ0 τ2

Score(τ0) Score(τ1) Score(τ2)

Ng’-τ2

Nf’

-τ0

argmax Score

The time difference which has the largest score is regarded as the time difference between two audio feature sequences:

Each box represents block similarity J ,

the summations for each time difference is performed along the arrows.

For each time difference , a score is calculated by using the block similarity as follows:

Score1

min ′ , ′ max , 0 J ,,

,

Page 25: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Time Difference Estimation

i

delay

j

0 20 40 60

-20-40 80

Example

Page 26: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Performance evaluation

Capturing the 1st screen content and additive noise sound at the 2nd screen.→ The noise contaminated 1st screen content files

Line-out of the 1st screen and the 2nd screen are captured as a single stereo wave file.→ Time difference between the L-ch and

the R-ch in the file is measured

Page 27: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

1st Screen content and additive noise files

Filename of 1st

screen content files Description

1st_betty 5.1down mix (according to ARIB STD-B32) version

of CO_11_Betty3b_output

1st_speech 2 Speech (German Male, SQAM track 54)

1st_music 2 Music (Wind ensemble, SQAM track 67)

Filename of additive noise sound files

Description

File4 noise_pinknoise

File5 noise_speech Speech (English Female, SQAM track 49)

File6 noise_music Music (Eddie Rabbitt, SQAM track 70)

The 1st Screen Content Files

Additive Noise Sound Files

Page 28: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Result

Filename of 1st screen content files

Filename of additive noise sound files

Signal level of the 1st screen content (dB)

0 ‐6 ‐12 ‐18 ‐24 ‐30 ‐36

1st_bettynoise_pinknoise 0.003 0.003 0.007 N/A N/A N/A N/Anoise_speech 0.019 ‐0.009 ‐0.014 ‐0.014 N/A N/A N/Anoise_music ‐0.002 0.001 0.018 0.007 N/A N/A N/A

1st_speechnoise_pinknoise ‐0.004 0.007 0.012 ‐0.001 N/A N/A N/Anoise_speech 0.002 0.014 ‐0.013 0.003 0.008 N/A N/Anoise_music ‐0.005 0.018 0.018 0.000 0.014 ‐0.011 ‐0.009

1st_musicnoise_pinknoise 0.007 ‐0.007 0.015 N/A N/A N/A N/Anoise_speech 0.002 0.008 ‐0.004 0.007 ‐0.016 N/A N/Anoise_music 0.002 0.016 ‐0.007 0.015 0.018 N/A N/A

Time Difference between 1st Screen and 2nd Screen Line-Out Signals (sec)

The figures with orange background is approximately within 1frme length (32ms). → Synchronization is successful !

Page 29: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Result (cont.)Synchronization robustness against interference noise

Allo

wable

SNR for

synchro

niz

ation

measu

red

at 2

nd

scre

en(dB)

pinknoise

‐24

‐21

‐18

‐15

‐12

‐9

‐6

‐3

0

speech music pinknoise speech music pinknoise speech music

betty speech music

Additive noise

1st screencontent

Page 30: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

MPEG-4 Audio Object Type (ISO/IEC 14496-3:2009)

Obj

ect T

ype

ID

Aud

io O

bjec

t Typ

e

gain

con

trol

[…]

Rem

ark

0 Null[..] […]43 SAOC44 LD MPEG Surround45 SAOC-DE46 Audio Sync

47 -95 (reserved)

Page 31: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Demonstration

1st screen(blue walkman): Instrument only

2nd screen(my note PC): Vocal only

Noise(white walkman): Female speech

Same song

Page 32: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

Conclusion

• MPEG-4 Audio Synchronization standard defines:

Audio Feature Extraction tool and syntax of the feature stream(Normative) Feature Similarity Calculation Tool (Informative) The Audio Object Type (AOT=46) “Audio Sync”

to allow transmission of audio feature for synchronization as elementary stream

• The MPEG-4 synchronization mechanism works with highly noisy environment and proven that the scheme is useful under practical conditions.

Page 33: Audio Synchronization Workshop20141022mpeg.chiariglione.org/sites/default/files/files/standards... · 2016-02-05 · Audio Synchronization Use case of “Second Screen” Application

End