detection and extraction of artificial text from videos
Post on 05-Jan-2016
38 Views
Preview:
DESCRIPTION
TRANSCRIPT
Detection and Extraction of Artificial Text from Videos
PROJECT France Télécom Research & Development 001B575
Laboratoire de Reconnaissance de Formes et VisionBât. Jules Verne INSA
69621 Villeurbanne CEDEX
10th July 2001
Christian Wolf and Jean-Michel Jolion
http://rfv.insa-lyon.fr/~{wolf,jolion}
Plan of the presentationIntroductionDetection Image enhancement - multiple frame
integrationBinarisation of the text boxesSetup of the experimentsResults
Detection Binarisation OCR
Conclusion and outlook
683
1011
6
246
Slides:
Intro Detection Enhancement Binarisation ResultsExperiments
Content based image retrieval
SimilarityFunction
ResultExample image
Indexing phase
Detection Enhancement Binarisation ResultsExperimentsIntro
Similarity measures
similar
similar
Not similar
Detection Enhancement Binarisation ResultsExperimentsIntro
Indexing using Text
Keyword basedSearch
Patrick Mayhew
Patrick MayhewMin. chargé de l´irlande de NordISRAELJerusalemmontageT.Nouel...............
ResultKey word
Indexing phase
Detection Enhancement Binarisation ResultsExperimentsIntro
Video properties
80 px
12 px 8 px
Detection Enhancement Binarisation ResultsExperimentsIntro
Text extraction: general scheme
TrackingDetection of the text in single frames
Image enhancement - Multiple frame integration
Segmentation/Binarisation
OCR
"EVENEMENT""ACTU""SPELEOS""Gouffre Berger (Isére)""aujourd'hui""France 3 Alpes""un spéléologue sauveteur"
Video
Intro Detection Enhancement Binarisation ResultsExperiments
Detection in single frames
Calculation of the gradient
Accumulation
Binarisation
MathematicalMorphology
Connected componentsAnalysis
Verification of geometric constraints
Combination of the rectangles
Verification of special cases
Video
List ofrectangles
Intro Detection Enhancement Binarisation ResultsExperiments
Detection in single frames: examples
Intro Detection Enhancement Binarisation ResultsExperiments
A filter for text detection
SG yx
1,
2S
x
2S
x
dxxI
Accumulation of horizontal gradients. Justification: Text forms a regular texture containing vertical edges which are aligned horizontally. W
0m 1m
M-W
fk
1
2
01 ))(1( mmMW
MW
S
Intro Detection Enhancement Binarisation ResultsExperiments
Mathematical morphology
Close
Deletion of small bridgesbetween the components
dilate (special) toconnect characters
erode (special) toconnect characters
erode horizontally
dilate horizontally
Intro Detection Enhancement Binarisation ResultsExperiments
Detection in video sequences
Detection per single frame
List of rectanglesper frame
Tracking -keeping track of text occurrences
Suppression offalse alarms
Image Enhancement -Multiple frame integration
Text occurrences
Frame nr.(time)
Intro Detection Enhancement Binarisation ResultsExperiments
Integration of the rectangles occurrences
At every new frame, the detected rectangles must be matched with the stored text occurrences
List of rectangles detected for the current frame
Text occurrences
Frame nr.(time)
List containing the most recent rectangleof each text occurrence
The integration is done using overlap information (overlap matrix)
Intro Detection Enhancement Binarisation ResultsExperiments
Suppression of false alarms: Examples
All detections
Aftersuppression
of false alarms
Intro Detection Enhancement Binarisation ResultsExperiments
Image enhancementSuper-resolution(interpolation)
Multiple frame integration:Averaging
Intro Detection Enhancement Binarisation ResultsExperiments
Integration of multiple frames to create a single image of higher quality.
)(
)()(1
1
k
kk
i
i
k
MV
MMMFg
M1
M4
M2
M3
Fi ith imageM Mean imageV Std.deviation image
An additional weight is included into the interp.scheme:
Robust bi-linear Robust bi-cubic
Interpolation: Examples
Bi-linear interpolation
Robust bi-linear interpolation
Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation ResultsExperiments
Interpolation: thresholded examples
Bi-linear interpolation
Robust bi-linear interpolation
Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation
Different Binarisation algorithms have been implemented and evaluated:
• Fisher/Otsu and windowed Fisher/Otsu algorithm
• Yanowitz-Bruckstein• Niblack, Sauvola• Our adaptive version of Niblack/Sauvola´s
method.
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation methodsYanowitz Bruckstein: The threshold surface is calculated from the edge information.
Windowed-Fisher, Niblack-Sauvola: The threshold surface is calculated from the statistics collected in a window which is shifted across the image.
Threshold surface
Threshold surface
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation by Niblack
Niblack proposed a method which calculates a threshold surface by gliding a rectangular window over the image and calculating statistics on this window:
skmT m mean s standard deviationk parameter, = -0.2
0
50
100
150
200
250
300
value
mean
std.dev.
niblack
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation by Niblack: ProblemsProblems are light textures in the background, which are considered as text with small contrast:
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation: Improvement by Sauvola
m mean s standard deviationk parameter, = 0.5R parameter (dynamic
range of std.dev.), R = 128
To overcome these problems, Sauvola et al. proposed a new improved formula to calculate the threshold:
))1(1(Rs
kmT
Reformulation shows, that a hypothesis on the gray values of text and non-text are used to remove the noise produced by background textures:
kmmT Rs1
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation by Sauvola, examples
Original image
Binarised using Niblack´s method
Binarised using Sauvola et al.´s method
Intro Detection Enhancement Binarisation ResultsExperiments
Improvement: Adaptive dynamic range
Nib
Sauv.R=128
R ad.
Fixing the dynamic range R=128 might be ok for document images, but not for text boxes taken from videos. Binarisation will not be correct, if the contrast of the image is smaller. We therefore set the parameter R to the
maximum standard deviation for all windows calculated:
)max(sR
To avoid two passes of the windowing algorithm, the mean and standard deviation can be stored in a table during the first pass and the threshold
surface calculated on this data.
Intro Detection Enhancement Binarisation ResultsExperiments
Improvement: Shift of the image rangeThe strong hypothesis on the gray values (text pixels must be near zero) is not justified for some video text boxes:
Gray value histogram
Niblack
SauvolaR=128
R ad.
kmmT
Intro Detection Enhancement Binarisation ResultsExperiments
Improvement: Shift of the image rangeA correction of the image´s histogram resolves this problem:
Original image Corrected image binarised, R adaptive
Intro Detection Enhancement Binarisation ResultsExperiments
)max(sR
)min(IM
m mean s standard deviationk parameter, = 0.5R = maximum of the
std.dev. of all windows
M = minimum gray value of the text box
The same effect can also be achieved by changing the threshold formula:
Rs1
)( MmkmT
Fast incremental calculationMean and variance can be calculated in one pass:
)(1 2
2
Na
bN
s aN
m1
ixa 2
ixb
RyxLyx
tt yxIyxIaa),(),(
1 ),(),(
2
),(),(
21 ),(),(
RyxLyx
tt yxIyxIbb
L
R
At the beginning of each line, the full window is calculated and the variables a and b kept. After each shift, a and b are calculated incrementally by subtracting the column of pixels which left the window and adding the column which entered the window.
Mean and standard deviation are stored in 2d tables, then the maximum R=max(s) is computed before calculating the threshold surface
Intro Detection Enhancement Binarisation ResultsExperiments
The experimentsDescription of the experimentsThe videos used in the experiments.Description of the evaluation process (OCR Evaluation).
Results for:Text detection BinarisationOCR
Intro Detection Enhancement Binarisation ResultsExperiments
Test videos
We performed experiments on 5 different MPEG 1 videos of resolution 384x288:
Video Length(min) Length(frames) OccurencesAIM2,AIM3,AIM4,AIM5 provided by INA. We used the first 15000 frames of each file
40 60000 322
Video provided by France Télécom, used in its entire length
22 33000 91
Total 62 93000 413
Intro Detection Enhancement Binarisation ResultsExperiments
AIM3News
AIM4Cartoon, News
AIM5News
AIM2Commercials
Intro Detection Enhancement Binarisation ResultsExperiments
Video example - France Télécom
~22 minutes of video~33000 frames
Intro Detection Enhancement Binarisation ResultsExperiments
The interface to the OCR softwareIdeal situation:Pass individual (binarised) text boxes to an OCR software which recognises the contents box after box.
In reality:We used standard commercial OCR software for our tests. This software has been designed to recognise scanned A4 or US letter pages and cannot directly process text boxes.
A4 page
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Page - Manual An input image,ready for the OCR
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Output051Q07Ô7 N*Verf 05JQ0707PUBLICITE IPUBIIÏITE IPUBLICITEprenez prenez prenezboyard boyard boyard^française ^française ^françaiseFRANCE FRANCE FRANCE FRANCE FRANCEc'est plus musclé c'est plus muscléiï 'Jfort fort fort fort fort .fort .fort .fortcotHfUet blé cotHfUet blé cQ#tfUet bléuutàfruuk On va beaucoup {&*$ loin avec Itineris.Partout Partout Partout Partout Partout Partout Partout Partout Partout PartoutI22h35 I22h35 I22h35 I22h35 I22h35PUBLICITE \PUBLICITE \PUBLICITE>3h55l23h55l23h55l23h55l23h55l23h55 20h.50120h50 |20h50120h50 |20h50120h50,f ort boyard ,f ort boyard ,f ort boyard ,f ort boyard2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg J 2,4 Kg g 2,4 Kg JII II II II II II II II IIgà dents gà dents gà dentsIIH r Lessive classique lljir Lessive classique I[HT Lessive classiquele temps le temps le temps le temps le temps ^PUBLICITE ^PUBLICITE ^PUBLICITEI Par Amour du Goût. Il Par Amour du Goût. Ien en en en en en en en enrévolution révolution révolution
Intro Detection Enhancement Binarisation ResultsExperiments
Post processing of OCR output
23h55051Q07Ô7
PUBLICITEprenez
boyard^françaiseFRANCEc'est plus musclé
fort
blé cotHfUetuutàfruukOn va beaucoup {&*$ loin avec Itineris.
PartoutI22h35PUBLICITE \>3h55l20h.50
,f ort boyard
dimanche23h55N Vert 05100707BerlingoPUBLICITEprenezdiffusion simultanée en stéréo surboyardfrançaiseFRANCEc'est plus muscléPUBLICITEfortCoralblé completfruitsOn va beaucoup Plus loin avec Itineris.BohêmePartout22h35PUBLICITE23h5520h50fortfort boyard
Post processed OCR output Ground truth
Intro Detection Enhancement Binarisation ResultsExperiments
Automatic evaluation using markersThe manual processing of the OCR output (separation of the output strings and search of the corresponding input box) is time consuming and error prone, especially in cases where the quality of the OCR output is very poor.
Automatic OCR output processing can be achieved by placing marker images between the text boxes. The marker boxes contain text which is easily recognised by the OCR software.
In the results section we will present results for both types of evaluation.
Intro Detection Enhancement Binarisation ResultsExperiments
An input image with markers,ready for the OCR
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Evaluation
Tkenchar 037 Tkenchar 037'gfrançaise 'gfrançaise 'gfrançaise 'gfrançaiseTkenchar 038 Tkenchar 038Mpe pire de| fj^e pire de| fj^e pire de|Tkenchar 039 Tkenchar 039
@S Par Amour du Goût.@S en@S révolution@S la@S française@S le pire de@S 20H45
OCR output Raw ground truth
Search output for individual text boxes
List of strings, each corres-ponding to the output for a text box, but eventually multiple times
# Page 1:P 1T 1 2M 1 2T 2 3M 2 2T 3 2
Structure log
Prepare ground truth
List of strings, each corresponding to the ground truth for a text box. Each string is repeated the same number of times as the corresponding text image in the OCR input image
Evaluation
Transformation costRecallPrecision
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Evaluation: Wagner & Fischer
ba ),( bacost
b ),( bcost
a ),( acost
Substitution:
Insertion:
Deletion:
AirbagGtroônn
Airbag Citroën
A measure for resemblance of two character strings. The cost to transform string A into string B is calculated. Basic transformation operations are used, which correspond to a certain cost. The cost function is minimised.
Intro Detection Enhancement Binarisation ResultsExperiments
Detection results - INA VideosAIM2 % AIM3 % AIM4 % AIM5 % Total %
Pred. Text 102 93,6 80 94,1 59 93,7 60 92,3 301 93,5Pred. Non-Text 7 5 4 5 21Total 109 85 63 65 322
Positives 114 78 72 86 350False alarms 138 185 374 250 947Logos 12 0 34 29 75Scene text 22 5 28 17 72Pos+Log+Scene 148 51,7 83 31,0 134 26,4 132 34,6 497 34,4Total 286 268 508 382 1444
Video FT AIM2-5 TotalPred. Text 82 90,1 301 93,5 383 92,7Pred. Non-Text 9 21 30Total 91 322 413Positives 189 350 539False alarms 482 947 1429Logos 21 75 96Scene text 50 72 122Pos+Log+Szene 260 35,0 497 34,4 757 34,6Total 742 1444 2186
85 93,416
911898362252
263 23,91099
Video FT
No suppression of false alarms
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation methods: Examples
Original image
Fisher
Fisher (windowed)
Yanowitz B.
Yanowitz B. + PP
Niblack
Sauvola et al.
Our method
Intro Detection Enhancement Binarisation ResultsExperiments
Binarisation methods: ExamplesOriginal image
Fisher
Fisher (windowed)
Yanowitz B.
Yanowitz B. + PP
Niblack
Sauvola et al.
Our method
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Results - Classification by binarisation method
Input Bin. method Recall Precision CostAIM2 Niblack 67,4 87,5 499
Sauvola R=128 53,8 87,6 616,5R=ad 75,0 87,8 384,5R=ad, shift 78,4 90,4 344,5
AIM3 Niblack 92,5 78,1 196Sauvola R=128 69,9 89,6 206R=ad 85,3 92,5 110R=ad, shift 96,2 95,3 51,00
AIM4 Niblack 78,5 92,0 252,00Sauvola R=128 48,6 87,7 490,50R=ad 69,8 84,8 360,50R=ad, shift 80,1 90,4 211,50
AIM5 Niblack 62,1 71,4 501,50Sauvola R=128 66,7 89,3 324,50R=ad 64,8 90,1 328,00R=ad, shift 69,0 91,0 294,50
Total Niblack 73,1 82,6 1448,5Sauvola R=128 58,4 88,5 1637,5R=ad 73,0 88,4 1183R=ad, shift 79,6 91,5 901,5
Results obtained using the manual evaluation method (no markers in the input page).
44 pages
Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation ResultsExperiments
OCR Results: Interpolation methods
Input Chars gt Chars output Recognized Recall Precision CostAIM2 1216,7 931,0 732,4 60,2 78,7 687,3AIM3 765,6 732,3 577,9 75,5 78,9 365,0AIM4-french 568,1 466,0 438,7 77,2 94,1 188,5AIM4-german 279,9 285,3 255,4 91,3 89,5 65,7AIM5 840,8 552,1 448,6 53,4 81,3 524,6Total 3671,0 2966,8 2453,0 66,8 82,7 1831,0
Input Chars gt Chars output Recognized Recall Precision CostAIM2 1216,7 1057,1 805,0 66,2 76,1 672,0AIM3 765,6 659,6 540,5 70,6 81,9 340,4AIM4-french 568,1 484,7 428,1 75,4 88,3 236,7AIM4-german 279,9 240,5 210,0 75,1 87,4 101,2AIM5 840,8 540,4 386,3 45,9 71,5 642,8Total 3671,0 2982,3 2369,9 64,6 79,5 1993,1
Robust bi-linear interpolation
Robust bi-cubic interpolation
97 pages
Results obtained using the automatic evaluation method (including markers in the input page).
Input Chars gt Chars output Recognized Recall Precision CostVideo FT 1626,8 878,3 677,8 41,7 77,2 1199,6
Robust bi-cubic interpolation
Intro Detection Enhancement Binarisation ResultsExperiments
ConclusionWe developed a system for detection, tracking,
enhancement and binarisation of text.A detection performance of 93.5% is obtained.We derived a new binarisation method adapted to the type
of text found in videos.The total recognition rate is surprisingly high, given the
quality of the text, but not yet good enough for indexation purposes.
OCR integration problem: No software development kits for direct access to the recognition functions available. A collaboration with an OCR company seems to be inevitable.
Intro Detection Enhancement Binarisation ResultsExperiments
OutlookThe perspectives of our work are situated in the extension of the existing algorithms to text with more difficult properties, and the enhancement and deeper studies of the existing techniques:
Scene text: The binarisation techniques developed in the last 30 years are aimed either at document images or images from computer vision. The method we introduced in the framework of this project is an improvement of the work already presented, but the quality of the text is not yet satisfying enough. Especially the binarisation of scene text will demand the development of new methods.
Detection recall: We are convinced, that the recall of the detection system can still be increased by further research, e.g. on the binarisation technique applied to the map of accumulated gradients.
Intro Detection Enhancement Binarisation ResultsExperiments
top related