ai based character recognition and speech synthesis

Seminar on

“ AI Based Character Recognition and Speech Synthesis”

Developed By:

Kalyani Hadke Rani Kubetkar

Shreya Surjuse Ankita Jadhao

Kruttika Sorte

Guided By

Prof. H. N. Datir

Artificial Intelligence based

Character Recognition and Speech Synthesis

NEED!!!We are facing so many problem in our daily life like, if we capturing the image some time we can not get proper image and not recognize the words.Lots of people have the problem of illiteracy .So we wish that this image should be converted to text for various purposes.While studying, we don’t read the text as a regular practice. So we wish that this text can be converted into audio.Apart which we wish should be captured in image & converted into audio.As generally we prefer hearing songs,

Introduction to CR and SS

• Optical Character Recognition (OCR) is an electronic or mechanical converter.

• OCR converts scanned images or text into machine code.

• Speech Synthesis is the artificial production of human speech.• Speech synthesizer – a computer system used for this purpose.• TTS engine performs:• Language into speech• Symbolic linguistic representation to speech

• Image

• Recognized text

TEXT• Speech

engine

speech

• Image

• Recognized text

TEXT• Recognized

TEXT• Speech

engine

speech

Overview

DFD For Character Recognition System

Pre-Processing Segmentation Image Digitization

Network ImplementationTraining of Learning Network

recognition Network testing

Pre-processing explanation

De-noising

De-skew

Binarization

Pre-processing

Image segmentation Decompose sequence of characters in individual

symbols. Directly affects the rate of recognition of script. Locate and identify boundaries of image.

1. External segmentation2. Internal segmentation

SEGMENTATION

Image segmentation is the process of partitioning an image into multiple segments ,so as to change the representation of an image into something that is more meaningful and easier to analyze.

. External Segmentation: determine the character lines in the text.

Image segmentation is the process of partitioning 1

I m a g e

Internal Segmentation: decompose an image of sequence of characters to images of individual symbols

• Mapping of symbol image into a corresponding two dimensional binary matrix

• Issue – deciding the size of matrix• Sampling strategy for mapping the symbol

Image Digitization - Matrix matching

Input alphabet ‘ a ‘

Segmented grid

Digitization

• To feed matrix data to the network it must be linearize to a single dimension

…………...0 1 1

001110100….

111010011….

11001100….

000111101…..

NAMENEURAL

NETWORK

5Image of scanned document

Sub-images of individual letter from document

Binary representation of sub-images. E.g 0 is white and 1 is black.

A supervised neural network that has been trained to recognize images of characters.

Neural network output numeric values corresponding to the recognized characters.

File contains the text of the scanned document.

Artificial neural network consists of a large number of highly interconnected processing elements (neurons) working in unison to solve specific problems analogous to the biological neurons in the brain. Neurons communicated with weighted links

NEURON NEURONWeighted link

Output

SummationSigmoid function

• Feed-forward neural network • A multilayer perceptron • Teaching and adaption of ANN• Implementation the ANN

Neural Network

Input SignalOutput signal

Input layerFirst hidden layer

Second hidden layerOutput layer

Recognition Network testing

Neural Network

Input SignalOutput signal

Binary converted image

Obtained text of scanned image

Back-propagation for Error calculationERROR

001110100….

111010011….

11001100….

000111101…..

NAMENEURAL

NETWORK

Sub-images of individual letter from document

Binary representation of sub-images. E.g 0 is white and 1 is black.

A supervised neural network that has been trained to recognize images of characters.

Neural network output numeric values corresponding to the recognized characters.

File contains the text of the scanned document.

Image of scanned document

Speech Synthesis

Input Image

File containing

Text of scanned document

NLP DSP SPEECH

TTS Engine

• TTS-Text to Speech engine• a computer-based system that read any text

aloud.• TTS engine consist of Front-end - NLP Back-end -DSP

Speech Synthesis

Modules of Text-to-Speech

Natural language processing

Text PreprocessingText Analysis

Linguistic Analysis

Digital signal

processing

SpeechSynthesizer

TEXT SPEECH

Prosody

Phonemes

Figure 1. A simple but general functional diagram of a TTS system

Input Output

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

• This step called high-level, front-end or text-to-phoneme.

• It consists of the following parts: Text analysis Automatic Phonetization Prosody generation

NLP Module

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

NLP Module

Text Analysis

A pre-processing

A morphological analysis

A contextual analysis

A syntactic-prosodic

Text analysis

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

NLP Module

Automatic Phonetization

Rule-Based

Dictionary-based

Hybrid-approach

Automatic Phonetization

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Concatenative synthesis

NLP Module

Prosody Generation

Intonation

Ryhthm

ProsodyGeneration

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

DSP component• Low level phoneme to speech• There are two main technologies used for the

generating synthetic speech waveforms: • Concatenative synthesis • Formant synthesis

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Formant Synthesis• Formant synthesis – rule-based synthesis• does not use any human speech samples at runtime.• Wave-form created using an acoustic model of the

human vocal tract.• Generates artificial, somewhat robotic speech

Speech Synthesis

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

• Based on the concatenation of segments of recorded speech.

• Gives the most natural sounding synthesized speech.

Concatenative Synthesis

Diphone Concatenation

Synthesis

Unit Concatenation

Synthesis

Somewhat robotic speech, sonic glitches natural speech

SUBTYPES

• Unit Concatenation Synthesis– Algorithm

• Break language down to small units (phonemes, syllables, etc.)• Create a large database of recorded speech• Each unit is labeled: pitch, duration, prosody, position in syllable, etc.

Labeling is synthesizer-dependant• Target utterance is selected at runtime by determining the best chain

of units (HMM, Decision Tree)• Use DSP to smooth transitions between units

Approaches To Wave-form Generation Concatenative

Input Image

File containing

NLP DSP SPEECH

TTS Engine

TEXT ANALYSIS

Auto PHONEME

Prosody Generation

Formantsynthesis

Advantages• Machine Language Translation

• Information Retrievals

• Visual Issue (Difficulty seeing text)

• Motor Issue(Difficulty handling a book or paper)

QUESTIONS????

ai based character recognition and speech synthesis

Engineering

spem: modeling human speech recognition - mrc ... · web...

speech recognition

issues in speech recognition shraddha sharma. contents:...

speech recognition and speech translation

8-speech recognition speech recognition concepts speech...

the practical guide to speech recognition · speech...

pos tagging - systems group · - named entity recognition,...

speech recognition:

optical data capture: optical character recognition (ocr)...

speech recognition. what makes speech recognition hard?

information for speech recognition joint processing of ......

advanced ocr with omnipage and finereader. overview optical...

artificial intelligence for speech recognition. introduction...

neural networks used for speech recognition - · pdf...

image understanding – pattern recognition in imaging...

longman ict exam...

chapter 5: speech recognition an example of a speech...

speech and speech recognition resources