smarter document capture this presentation will begin at 2 ...€¦ · platinum sponsor: gold...
TRANSCRIPT
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Smarter Document CaptureSmarter Document Capture
This presentation will begin at 2:00 PM EDTThis presentation will begin at 2:00 PM EDT 1 PM Central, 12 PM Mountain, 11 AM Pacific1 PM Central, 12 PM Mountain, 11 AM Pacific
Please check that the volume on your computer is onPlease check that the volume on your computer is on This presentation runs through voice over IPThis presentation runs through voice over IP
Until then, enjoy the sounds of silenceUntil then, enjoy the sounds of silence……
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Peggy Winton – VP, AIIM Market Access
Ari Gross – CEO, CVISION Technologies Inc.
Ralph Gammon – editor, Document Imaging Report
AIIM Presents:AIIM Presents:
Smarter Document CaptureSmarter Document Capture
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
About AIIMAbout AIIMAIIM is the community focused on providing education, research, and best practices to help organizations find, control, and optimize their information for maximum value.
Learn more about AIIM at www.aiim.org.
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
About AIIMAbout AIIMWe offer year-round programming in:
• Market Education
• Peer Networking
• Industry Advocacy & Research
• Professional Development & Training
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Smarter Document CaptureSmarter Document Capture
Ari Gross CEO – CVISION Technologies Inc.
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Smart Captured DocumentsSmart Captured Documents• Web-optimization :: On demand access
• Recognition :: OCR, ICR, bar codes, form coding
• PDF/A :: Reproducibility, long-term archiving
• Compression :: Image files at electronic file sizes
• Metadata :: Embed field info, Database independence
• Color Imaging :: Improved appearance & recognition
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
CompressionCompression• Significant progress in compression technology
• Scanned files can be compressed as small as the original
generated files
• Amenable to web hosting, email & backups
• Print on demand
Word Document921 KB
Scanned TIFF13,124 KB
Standard PDF13,058KB
Compressed PDF870 KB
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
RecognitionRecognition• OCR, Optical Character Recognition, recognize printed text
• ICR, Intelligent Character Recognition, recognize handwritten text
• Barcode, identify and recognize barcodes
• Form recognition, identify form type & extract relevant database fields
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
MetadataMetadata• Metadata insertion supports document portability, i.e., platform
independence
• Make documents self-aware, e.g., re-attach dead documents
• Consistent with ARMA & NARA recommendations
• Useful for encoding important document information, e.g., dbase field
data, retention policy
• Automated insertion into document management system
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Recognition (OCR, ICR): Advantage ColorRecognition (OCR, ICR): Advantage Color
0
50
100
150
200
250
300
350
400
450
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
Color vs. B&W Recognition Rates
Num
ber o
f w
ords
Metrics
36 invoicesGreen – color invoicesBlue – bitonal (B&W)
Words’ RecognitionColor invoices - 4390B&W invoices – 2824 55% improvement
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Elements of Smart Captured DocumentsElements of Smart Captured Documents• “Smart” captured documents increase the functionality of image files • Smart documents include support for web-optimization, OCR, reproducibility, metadata, and auto-indexing• PDF supports smart documents • PDF/A is a restricted version of PDF (1.4), especially suited for document reproducibility & archiving • Smart captured documents result in improved corporate ROI• Smart captured documents are very compelling for Web-based, distributed database, and email applications
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Smarter Document CaptureSmarter Document Capture
Exploring next-generation document images
Ralph Gammon – editor, Document Imaging Report
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Traditional Document ImagesTraditional Document Images• Captured in centralized environments with high-speed
scanners• Black-and-white, TIFF, Group 4 compressed• Meta data managed through document management
systems• Not considered a long-term archiving format
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Who am I?Who am I?• Editor of the Document Imaging
Report since 1998• Premier source of news and
analysis in the document capture and imaging market
• Accept no advertising in print publication
• Paid subscription• www.documentimagingreport.com• Publisher RMG Enterprises
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
The Potential of Document ImagingThe Potential of Document Imaging• Color scanners now available for the same price as
black-and-white• Distributed capture infrastructure in place• Advanced compression methods increase usability of
color images• PDF/A approved as an ISO standard
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDF: A better file format?PDF: A better file format?Stands for portable document formatIntroduced by Adobe in 1993According to Adobe, more than 500 million free PDF readers have been downloadedIn 2007, Adobe submitted the PDF specs to ISO for ratification as an international standardPDF/A (archiving) approved in 2005
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDF: A Versatile Document FormatPDF: A Versatile Document Format• Supports both imaged and electronically-generated files• Supports color and bi-tonal compression
– Group 4, JBIG2, JPEG, JPEG 2000– Supports image segmentation and layering
• Self-describing image format
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDF: A selfPDF: A self--defining image formatdefining image formatProvides structured container for carrying important document information
Full-text OCR results for searchabilityMeta data such was when the document was created, who the authoris, when it was scanned, what type of document it is, etc.
Historically, this information has been kept in a database separate from the image
If you change image management systems this information needs to be transitionedMeta data is not portable
.
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Capturing Meta DataCapturing Meta DataSeveral options for capturing meta data
Key entryBar codesOCR/ICR/IDR
Meta data entry increasingly automated Improvements in OCR/ICR
VotingDatabase look-upsBetter image quality
Introduction of intelligent document recognition (IDR)
More meta data means more optionsData miningRecords managementAutomated workflows
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDF Compression: Why Smaller is Better PDF Compression: Why Smaller is Better • Cost of storage is falling, but still can be significant when talking
about millions of document images• When viewing on the Web, smaller files mean faster downloads and
a better user experience• In distributed scanning environments, smaller files are simpler to
move around• JBIG compression can create bi-tonal PDF files similar in size to the
original electronically created files• PDF color compression techniques can create file sizes up to 100
times smaller (and higher quality) than their JPEG counterparts
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDFs for Web ViewingPDFs for Web ViewingPDF viewer is universalPDFs can be optimized for Web viewing
This is helped by advanced compression that creates smaller files to downloadAlso supports multi-page files and use of intelligent downloads
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Why color images?Why color images?
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Why Color Document Images?Why Color Document Images?• Truer representation of the original• Contains more information• Adoption of color printing• Better for Web viewing• Improved recognition rates
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Why not color scanning?Why not color scanning?Color file sizes can be very large
Typical 300 dpi scanned color page represents 24 MB of raw dataEven a JPEG compressed document images can be more than 10 times the size of a bi-tonal counterpart (400 KB for color vs. 40 KB for bi-tonal
JPEG not optimized for image viewing
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
How advanced color compression worksHow advanced color compression worksPDF supports MRC (Mixed Raster Content)
Enables segmenting of document in layersOnce segmented, those layers can be compressed separatelyEnables optimal compression of each layer and file sizes 10 to 100 times smaller
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
LossyLossy vs. Losslessvs. Lossless• Lossless can be a misnomer, as any color document
captured in black-and-white is losing information• Perceptually lossless images are those that when
viewed from a certain distance appear identical to human observers.• While compression formats like JBIG2 and JPEG are not
technically lossless, they can also be classified as perceptually lossless
• Best practices call for users to adjust their advanced compression settings until they are satisfied that images for a certain type of document are perceptually lossless
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
PDF/A: longPDF/A: long--term archiving formatterm archiving format• Designed so that a PDF/A file created today will be able
to decoded by a PDF/A reader in perpetuity• Internally contains all resources necessary to be
rendered– Contains provisions for meta data
• Approved as an ISO standard in 2005– Has yet to gain widespread adoption, but are starting to see some
initiatives on the international and state gov. level
• Applicable across electronic files and images– Applications available for testing validity of PDF/A files– PDF Center for Competence dedicated to developing best practices
around PDF/A adoption (www.pdfa.org)
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Levels of image enhancementLevels of image enhancement• Basic: deskewing, despeckling, auto-crop, blank-page
removal, analog color dropout• More advanced: line removal, electronic color dropout,
auto-rotation based on text, multi-streaming• Most advanced: grayscale thresholding, color
segmentation
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Grayscale can be as good as colorGrayscale can be as good as color
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Example of Grayscale ThresholdingExample of Grayscale Thresholding
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
SummarySummary• PDF represents a more versatile file format than TIFF or
JPEG• PDF represents a smarter, self-contained file format
– PDF/A represents an ISO certified long-term image file format• Technological advances in the following areas have
combined to make PDF a more attractive imaging format – Compression– Display– Meta data capture– Scanning
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Questions?Questions?
On the bottom left hand side of your screen, type your question in the white box and hit
“Submit Question” Button.
Platinum Sponsor: Gold Sponsors:Platinum Sponsor: Gold Sponsors:
Upcoming WebinarsUpcoming Webinars
April 23rd – Implement Your ECM Roadmap in 2008
May 7th – Finding Content: The best information in the world is worthlessif you can’t access and use it.
May 14th – Shop Smart: Critical Buying Decisions for Capture
June 4th – Enterprise Report Management Can't be Overlooked
June 18th – Get Rid of Your Paper! Or not.
Register Today at www.aiim.org/webinars