text detection in video min cai 2002.3.13. background video ocr: text detection, extraction and...
Post on 19-Dec-2015
229 views
TRANSCRIPT
Text Detection in Video
Min Cai
2002.3.13
Background
Video OCR: Text detection, extraction and recognition
Detection Target: Artificial text
Text detection: Detect the region from Single frame Refine the region by combining consecutive frames
Existing Work
Feature Extraction Text Detection based on feature
Color Connected-component
Texture Texture-Segmentation
Edge Top-Down
Bottom-Up
Connected-component-based methods
Basic idea Treat text as an uniform color (color level) and classify each pixel as
text or non-text according to the color value. Combine connected text-pixels into connected components. Group collinear connected components into a text string.
Advantage Can detect an arbitrary orientation text ---- with similar color and in
a simple background. Disadvantage
Sensitive to color variance Lossy compression of video introduces color bleeding Complex background
Texture Segmentation method
Basic idea Treat text as a type of texture Use texture segmentation algorithms to detect text
Gabor Filter Gaussian derivatives
Advantage Can segment text areas & graphic areas in a simple background
efficiently. It is usually used in document analysis.
Disadvantage Time-consuming Cannot handle well a text embedded in various background.
Bottom-Up method
Basic idea A seed region is defined as a small region with high edge density. Grow a seed region into successively larger components until all
seed regions are reached on the image.
Advantage It is a generic method to detect a homogeneous object of various
shape. That is, it can detect not only a rectangular object, but also other shapes.
Disadvantage Sensitive to noise. Can not handle the large range of font-size. Sensitive to the stroke density (different language).
Top-Down method
Basic idea Based on run-length smoothing algorithm Analyze horizontal and vertical projection profiles
Advantage Can detect the boundary of horizontal alignment text string quickly
and correctly Noise insensitive
Disadvantage Cannot handle diagonal alignment text. One pass of horizontal & vertical projection cannot handle the
complex layout.
Analysis (1)
A certain contrast against background Artificial text strings are designed to be read easily
A certain stroke density Text strings always appear horizontally Spatial cohesion
Characters of the same text string are of similar heights, orientation and spacing
Size constraint Text strings have certain size restriction
A text string appears in multiple consecutive frames and the similar position.
Analysis (2)
Problems Resolutions
How to extract more useful edge? Local Thresholding
How to highlight text areas? Text area recovery
How to detect text regions fast and correctly
?
Coarse-To-Fine detection
Single Threshold
Local threshold (1)
Use a small kernel (red) to scan the whole image. In a bigger window (gray) surrounding the kernel, calculate
the local threshold corresponding to its local histogram.
a. Window move
MIN MAXT-local
Count
Edgestrength 0
Low half High half
b. Local threshold selection
Local threshold (2)
Text-like area recovery (1)
Before recovery After recovery
Text-like area recovery (2)
Before recovery After recovery
High pass filter
Using Top-down scheme to detect text-like areas
Coarse-to-Fine detection
Horizontal project
Vertical project
Can divide?
The first region from the array
Add to Processing array
Initial:Add the whole
Image to processing array
Add to result array YesNo
Detect text-like areas
b. Coarse vertical projection
1) 2)
3) 4)
Refinement
Combine the neighboring text areas with similar height
Using size constraints to remove unsatisfied areas
Multi-frame analysis
Text region matching Find all the regions corresponding to the same text
Text region enhancement Enhance the text image quality by multi-frame integration
Repetitive text elimination Only record the text at its first emergence.
Thank you!
End