finereader engine overview & new features in v10 - abbyyevent:d1-04_abbyy_finereader... ·...
TRANSCRIPT
![Page 1: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/1.jpg)
FineReader Engine Overview & New Features in V10
Semyon SerguninABBYY HeadquartersSeptember 2010
Michael FuchsABBYY Europe GmbHSeptember 2010
![Page 2: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/2.jpg)
FineReader Engine – Processing Steps
Step 1: Image/Document Input
Step 2: Image Pre-processing Algorithms
Step 3: Document & Layout Analysis
Step 4: Recognition
Step 5: Verification of the Recognition Results
Step 6: Synthesis & Export
![Page 3: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/3.jpg)
Step 1Image Input
![Page 4: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/4.jpg)
Step 1. InputOpening existing images
Load images from disc or memoryBMP, PCX, DCX, GIF, PNG, DjVuJPEG and JPEG2000 (part 1)TIFF ● B&W (uncompressed, CCITT3,
CCITT3FAX, CCITT4, PackBits, ZIP, LZW)● Grayscale (uncompressed, Packbits,
JPEG, ZIP, LZW)● Colour (uncompressed, JPEG, ZIP, LZW)
PDF● Adobe PDF Library 9.0 ● Access to internal data (Metadata,
Annotations, Text Objects, etc.)
Memory Image formats: Raw, Bitmap [HBITMAP], DIB
Load images from digital camerasAdvanced image pre-processing algorithms in FRE available!
Screenshot ReaderCapture any area from the screenAny formats (including Flash)
![Page 5: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/5.jpg)
Step 1. Input Scanning documents (TWAIN)
Scanning via TWAIN Interface
ADF (Automatic Document Feeder)Manual paper feederScanner settings
BrightnessColourResolutionImage compressionDefine scanning area (zone)Simplex / DuplexOrientation / automatic rotation / manual rotationPaper formatPaper Top/Bottom/Left/RightEtc.
Visual Component:
Alternatively the original dialogue from the scanner driver can be used
![Page 6: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/6.jpg)
Step 2 Image Pre-Processing
![Page 7: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/7.jpg)
Noise removalDespecklingScale images (i.e. interpolate images with low resolution)Rotation (90°, 180° and 270°)
Step 2. Image pre-processing Available Options
Automatic deskewing
Automatic image splitting Straighten lines of text
CroppingAutomatic rotation
![Page 8: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/8.jpg)
Step 2. Image pre-processing Binarisation Overview
Intelligent background filtering
Adaptive Binarisation
![Page 9: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/9.jpg)
Step 2. Image pre-processing New V10: New Binarisation
Original scan
Prev. binarisation
New binarsation
![Page 10: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/10.jpg)
Step 2. Image pre-processing New V10: Binarisation,Textured Background optimisations
Original scan
Prev. binarisation
New binarisation
![Page 11: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/11.jpg)
Step 2. Image pre-processing New V10: Binarisation for the IMPACT project
Original Prev. binarisation New
No text from the other page
![Page 12: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/12.jpg)
Step 2. Image pre-processing New V10 Colour Filtering (stamps and marks)
![Page 13: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/13.jpg)
Step 2. Image pre-processing: Camera OCRNew V10: Automatic correction of 3D perspective distortions
Before
After
![Page 14: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/14.jpg)
Step 2. Image pre-processing: Camera OCRNew V10: Blurred images correction
Before
After
![Page 15: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/15.jpg)
Step 2. Image pre-processing: Camera OCRNew V10: ISO noise reduction
Before
After
![Page 16: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/16.jpg)
Step 3Document & Layout Analysis
![Page 17: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/17.jpg)
Step 3. Document & Layout Analysis Detecting sections of a document, analyse layout and find barcodes
![Page 18: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/18.jpg)
Step 3. Document & Layout Analysis
3 layout analysis modes are available:
Document Analysis – NormalReturns text, tables, graphics (pictures), barcodes & patchcodes, lines (separators)
Document Analysis for full text indexingGraphics & pictures are OCRed as wellReturns text, tables, graphics (pictures), text inside of pictures and diagrams, barcodes & patchcodes, lines (separators)
Document Analysis for invoices (DAI)Optimized for small fontsReturns text, tables as plain text, text inside ofpictures and diagrams, barcodes & patchcodes, lines (separators)
![Page 19: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/19.jpg)
Step 3. Document & Layout AnalysisNew V10: Improved detection of charts and graphics
Improved detection of pictures (photographs)
Old Technology V 10 Technology
Old Technology V 10 Technology
![Page 20: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/20.jpg)
Step 3. Document & Layout Analysis New V10: Improvements for magazine-style pages
Old Technology V 10 Technology
Correct detection of image and text blocks
Wrong detection of image and text blocks
![Page 21: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/21.jpg)
Step 4Recognition
![Page 22: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/22.jpg)
Step 4. RecognitionAfter line detection, character recognition is applied with different classifiers
Raster classifier Contour classifier
Feature differentiating classifier Structure classifier
![Page 23: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/23.jpg)
Step 4. RecognitionProcessing speed - Accuracy Balance
New Accurate Mode for low resolution/quality images – slightly slower
The “old Conflict” Recognition Accuracy vs. Processing Speed still exists.
Engine 10 “solves” this with different approaches!
Significant speed increase on good quality images in a new enhanced Fast Mode
Slightly improved accuracy in Normal Mode
Image Quality does matter!
![Page 24: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/24.jpg)
Step 4. RecognitionNew V10: Accurate mode for low resolution scans
Additional classifier trained on low resolution scans and faxes
About 20% more accurate for low resolution scansAbout 10% slower than Normal mode
![Page 25: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/25.jpg)
Step 4. Recognition Accuracy Improvements FRE10 Normal mode vs. FRE9 Normal mode
*based on ABBYY internal tests; number of recognition errors normalized relative to FRE9_R1 values
![Page 26: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/26.jpg)
Step 4. Recognition Speed Improvements - important notes*
*based on ABBYY internal tests
Values of speed and accuracy make sense only for comparison of ABBYY OCR technologies in these particular conditions for these particular test batches.
Please DO NOT USE these numbers as absolute values, comparing to other results of OCR technologies, taken for different batches!
Background color keys:
![Page 27: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/27.jpg)
Step 4. Recognition Speed Comparison FRE 8, 9, 10 modes*
*based on ABBYY internal tests
![Page 28: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/28.jpg)
Step 4. Recognition Increased speed for European languages*
*based on ABBYY internal tests
![Page 29: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/29.jpg)
Chinese Simplified
Recognition testChinese Simplified, Books 79
FRE9_R7
FRE10_R1
FRE9_R1
![Page 30: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/30.jpg)
Built in Multi-core support for multi page documents Added in V9 Improvements in V10
New V10: Newtuned processing profiles increase the overall performance for specific scenarios
2 Sessions tomorrow !
Step 4. RecognitionSpeed improvements through Multi-Core Support* & tuned Profiles
*based on ABBYY internal tests
1,0
1,5
2,0
2,5
3,0
3,5
4,0
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Rat
e, ti
mes
Pages in a document
Recognition performance increase rate for multi-core systems comparing to one-core system
2 cores
4 cores
![Page 31: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/31.jpg)
Step 6 Synthesis & Export
![Page 32: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/32.jpg)
FRE9.0 PDF Export ParametersAuthorBw FormatColor FormatCreatorEmbed FontsEncryption InfoExport ModeFont ModeGray FormatKeep Text And Background ColorKeywordsPaper HeightPaper WidthPDF VersionPicture FormatPicture ResolutionProducerQualityReplace Uncertain Words With ImageRunning Title ModeSet Page Size By Layout SizeSubjectTitleWrite LinksWrite Tagged PDF MRC Params (READ ONLY)
FRE10 PDF Export ParametersScenarioMRC ModePDFA Compliance ModeResolutionResolution TypeColorityText Export ModePDF Features (READ ONLY)Picture Compression Params (READ ONLY) PDF Features
Embed FontsEncryption InfoMeta Data Writing ParamsPaper SizePDF VersionReplace Uncertain Words With ImageRunning Title ModeWrite LinksWrite Tagged PDF
Scenario ProfilesMax QualityBalancedMin SizeMax Speed
FRE10 – 7 parameters
Scenario profilesMAX PDF Quality MIN PDF Size MAX Export SpeedBalanced Quality-Size-Speed
Fast and easy adjustment of PDF export and ability to set up any of all parametersFRE 9.0 – 25 parameters
Step 6. Document ExportNew API for PDF Export in FineReader Engine 10
![Page 33: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/33.jpg)
Step 6. Synthesis & Export 2nd Generation of ADRT®
New elements and enhancements from the previous ADRT®
Engine 10 offers a new API to the internal ADRT results
New elements Overall enhancement of ADRT 1.0 work
![Page 34: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/34.jpg)
E-book Reader: PDFs can be displayed but the new formats allow much more flexible rendering when switching from portrait to landscape modeFB2*ePub*
Libraries: AltoXML*
Open Document Text format: .odt* ISO Standard, XML based export format More and more often required in public projects
Step 6. Synthesis & Export New XML Output Formats
*planned for a Maintenance Release of FRE 10
![Page 35: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/35.jpg)
FineReader Engine 10 – Jumpstart Samples and Source Code for Developers
![Page 36: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/36.jpg)
FineReader Engine 10 – The must have SDK!
ABBYY made significant technology optimisations in Engine 10:
Image Pre-processing: New Binarisation = better OCR = better Results
Speed Improvements: New Fast Mode, improved Multi-core Support
Quality Improvements: New mode for low resolution images, improved Fraktur OCR
New and Improved Language Support
Improved Document Analysis and ADRT
New API Calls and Optimised Processing Profiles
New and Improved Export formats
![Page 37: FineReader Engine Overview & New Features in V10 - ABBYYevent:d1-04_abbyy_finereader... · FineReader Engine Overview & New Features in V10 Semyon Sergunin ABBYY Headquarters September](https://reader030.vdocuments.site/reader030/viewer/2022021615/5c98478709d3f21c248b909f/html5/thumbnails/37.jpg)
Any questions?
Thank you for your attention!