new life for old media (nem presentation)

New Life for Old Media Investigations into Speech Synthesis and Deep Learning-based Colorization for Audiovisual Archive Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen

Upload: victor-de-boer

Post on 21-Jan-2018




0 download


New Life for Old Media

Investigations into Speech Synthesis and Deep Learning-based Colorization for

Audiovisual Archive

Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen

Netherlands Institute for Sound and Vision (NISV)

70% audio-visual heritage material

More than 1.000.000 hrs of

TV (public broadcasters)

Radio, Music,Documentaries, Film, Commercials,


Photographs, objects, …

CC BY - SA as preferable license

3000 items “Internet Quality”

Polygoon newsreels

Supporting a National and

European Audiovisual Commons

Public outreach by embracing

new technologies and

‘participatory culture’ /

Explore AI techniques to enrich this archival material to allow for new types of engagement

1. Text-To-Speech engine based on limited single narrator2. Colorization of old black-and-white video footage

Philip Bloemendal

Famous anchorman

Iconic voice

(not a virus)

Limited Domain Speech Synthesis

Can the current corpus of audio recordings of Bloemendal be used to construct a TTS engine?

• Percentage of the Dutch language can be

generated with the current corpus?• What can we do to improve?• How well is the text-to-speech engine

recognizable as Philip Bloemendal?

• How understandable are the constructed audio files?



The Dutch football played Germany

the.wav dutch.wav football.wav

Spoken Language Elements Repository

(35,000 words)


Slot-and-filler Text-to-speech

3,300 newsreels, speech recognition

How to expand the coverage of the index?

•Many (contemporary) words have not been pronounced by Philip Bloemendal

•Multiple strategies–Change format (Lowercase, diaeresis)–Numbers–Finding synonyms–Decompounding

Finding Synonyms

• Open Dutch Wordnet Dutch lexical semantic database (Postma et al. 2016)

• Yields synsets

(e.g. Hoofdmeester -> Rector, Schoolhoofd)

• Computationally expensive lookup


• Dutch language allows for compounding words, each word is distinct in the corpus

• Decompounding is computationally expensive (for large corpora, long words)

• Constructed Bigrams and Trigrams

School, hoofd -> Schoolhoofd

Regen, water -> regenwater

Staat, hoofd -> StaatShoofd

4 corpora to test against

•News articles (same domain, different time) | 50 articles, 2743 unique words

•1970s news articles from the (same domain, time) | 50 articles | 16,191 words

•E-books (different domain, various times) |6 books | 2,657 words

•Tweets (different domain, different time) | 1000 tweets| 27,180 words

• Evaluation

– Number of distinct words

– Number of sentences


Results (words)Coverage

• 8 people tested the software

• Philip was recognized (or ‘that news guy’)

• Words with more consonants were easier to recognize

• When user input their own sentences, more recognition

• When sentences were demonstrated without subtitles, less

• Speed of software / GUI limited testing capabilities

How recognisable are sentences?

The use of Deep Neural Networks in colorizing video

Neural Networks

Recent progress in computational power made implementation of Deep Neural Nets possible

Neural Networks trained on large training set can accurately make predictions in real-world examples

Zhang et al. (2012) trained a neural net on over a million images for colorization

Existing Literature

• Extract individual frames from video using FFMPEG

• Colorize each individual frame

• Re-compile video and attach original audio fileOutcome

Extract 200x200

frames 24fps

(ffmpeg) Zhang et al. implemented in


Combine into

videos (ffmpeg)

Implementation on Video

• Colorized videos are more ‘tangible’ and ‘alive’ than black/white

• Showing colorized Polygoonjournaals can augment TTS engine

• General positive responses on technology may increase attention to NISV collection


• Each frame is considered independent and is colorized as such

--> Artifacts appear between frames

• Slow performance without use of Nvidia GPU

• Low resolution

• Predicted colors still far from perfect


Hosted on Openbeelden platform

One of the colorized videos received 61,000+ views, 1,700 likes and was shared 521 times, illustrating the potential to engage new audiences.

• Collection-specific TTS systems for audio-enrichments of archive material or multimedia applications.

• Colorization of old media allows for a new view on existing images

• NISV will continue investigating these emerging technologies to enable new types of interaction and to further engage new audiences with archival material in unexpected ways. – In the media museum – On its public-facing online channels.

Take home

New Life for old Media:Investigations into Speech Synthesis and Deep Learning-based Colorization for

Audiovisual Archive

Rudy Marsman, Victor de Boer, Themistoklis Karavellas, Johan Oomen

Thank you

Annex: Results (sentences)

Dataset Unique sentences Unique sentences found

After synsets After decompounding

Contemporary news 1022 106 110 186

Old news 2626 183 190 301

Tweets 8937 174 181 296

Books 56106 9387 11385 18271