high-level component filtering for robust scene text detection weilin huang ( 黄韡林 ) shenzhen...

Download High-level Component Filtering for Robust Scene Text Detection Weilin Huang ( 黄韡林 ) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy

If you can't read please download the document

Upload: barry-walsh

Post on 18-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1
  • High-level Component Filtering for Robust Scene Text Detection Weilin Huang ( ) Shenzhen Institutes of Advanced Technology (SIAT), Chinese Academy of Sciences Multimedia Laboratory, The Chinese University of Hongkong
  • Slide 2
  • Outline Introduction Stroke Feature Transform Text Covariance Descriptor (TCD) Colour Information on Text Stroke Detection Connected Component and Sliding-Window Methods Stroke Width Transform (SWT) SWT based Text Detection Convolution Neural Network Induced MSER Trees CNN for Component Classification Component Splitting TCD for Component Filtering TCD for Text-line Filtering Maximally Stable Extremal Regions (MSERs)
  • Slide 3
  • I. Introduction: Text Detection Methods Connected Component Methods Sliding-Window Methods Step 1: Separate text and non-text information at pixel-level Step 2: Group text pixels to construct character components Advantages: fast computing Limitations: not robust, erroneous components, many false alarms Step 1: Train a text classifier Step 1I: Scan a sliding sub-window though the image Advantages: high-level text classification Limitations: computing costly, difficulty in feature design Examples: SWT, MSERs
  • Slide 4
  • I. Introduction: Stroke Width Transform(1) Low-level pixel filter Gradient orientation for ray tracking Compute stroke width bwt. paired pixels Example SWT Operator Canny edges SWT Map Problem 1: Erroneous connection Problem 2: many non-text components Connecting multiple characters Separating single characters Stroke width constraint: |O p - O q |<
  • Slide 5
  • I. Introduction: SWT based Text Detection Complete Processing: SWT Comp. filtering Text componentsGrouped text lines Final text lines TL filtering GP C. Yao, X. Bai, W. Liu, Y. Ma, Z. Tu, Detecting texts of arbitrary orientations in natural images, CVPR, 2012. Heuristic Filtering Random Forest classifier (heuristic and geometric features) More powerful high-level filters Our Improvements
  • Slide 6
  • II. Stroke Feature Transform (SFT) (1) Stroke Feature Transform(SFT): SWT Stroke Width Constraint: |O p - O q |< 1 SFT Stroke Color Constraint: |C p - C q |< 2 Neighborhood Coherency Constraint Stroke width constraint: |O p - O q |< OutputStroke Width Map Stroke Color Map
  • Slide 7
  • II. Stroke Feature Transform (SFT) (2) SFT vs SWT Mitigate inter-component connections Enhance intra-component connections Better character candidate detection Higher Recall
  • Slide 8
  • II. Stroke Feature Transform (SFT) (3) Limitation: not robust by low-level operation Text-like outliers Many false alarms Bricks Windows Leaves Low Precision Heuristic filter not work well High-level learning based filtering required
  • Slide 9
  • III. Text Covariance Descriptor (TCD) (1) Text Covariance Descriptor Each pixel represented by d-features TCD is computed as: U is a given region: Multiple features are incorporated in a matrix
  • Slide 10
  • III. Text Covariance Descriptor (TCD) (2) TCD for components Pixel coordinates in X- and Y-axis Pixel intensities and RGB values Stroke width and distance values Edge information by Canny detector Totally 9 features to construct a 9 x 9 matrix Transform to a 45-dim feature vector Encode spatial information Color uniformity Stroke width/distance consistency Stroke spatial layout Get component confident maps by RF classifier 9x9 Covariance Features
  • Slide 11
  • III. Text Covariance Descriptor (TCD) (3) TCD for Text-line Mean properties of component features Coordinates of component centers Heights of components Horizontal distances between components Spatial information Consistency Text spatial layout Get Text-line Confident Maps by RF classifier Uniformity 16-bins HOG on edge pixels Orientated spatial features 12x12 Covariance Features 16x16 Covariance Features
  • Slide 12
  • III. Text Covariance Descriptor (TCD) (4) Component and text-line confidence maps
  • Slide 13
  • III. Text Covariance Descriptor (TCD) (5) Top: TCD for component; Middle: TCD for text-line; Bottom: detection
  • Slide 14
  • III. Text Covariance Descriptor (TCD) (5) Results W. Huang, Z. Lin, J. Yang and J. Wang, Text localization in natural images using stroke feature transform and text covariance descriptors, ICCV, 2013. Failure Cases
  • Slide 15
  • V.Convolution Neural Network Induced MSER Trees (1) Maximally Stable Extremal Region (MSER) Tree Detect low-quality texts Higher Recall Generate more non-text components Lower Precision Require a more powerful classifier/filter MSER vs SWT L. Neumann and J. Matas. Text localization in real-world images using efficiently pruned exhaustive search, ICDAR, 2011.
  • Slide 16
  • V.Convolution Neural Network Induced MSER Trees (2) A Two-layers Convolution Neural Network (CNN) T. Wang, D. J. Wu, A. Coates and A. Y. Ng, End-to-end text recognition with convolutional neural networks, ICPR, 2012.
  • Slide 17
  • V.Convolution Neural Network Induced MSER Trees (3) Data Transformation Fixed-size of 32x32 Horizontal warp Include additional image context Training Data: Synthetic 15000 samples
  • Slide 18
  • V.Convolution Neural Network Induced MSER Trees (3) CNN Confident Scores MSERs CNN Scores Detection Comp. Splitting
  • Slide 19
  • V.Convolution Neural Network Induced MSER Trees (4) Component Splitting Erroneously connected Component High aspect ratio Positive conf. score Leaf of the MESR tree or conf. score> all children
  • Slide 20
  • V.Convolution Neural Network Induced MSER Trees (5) Comparisons with SFT-TCD
  • Slide 21
  • V.Convolution Neural Network Induced MSER Trees (6) Results
  • Slide 22
  • V.Convolution Neural Network Induced MSER Trees (7) Results on the ICDAR 2011 Database W. Huang, Y. Qiao, and X. Tang, Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees, ECCV, 2014.
  • Slide 23
  • The End Thank You!