10 tips for making your sharepoint scanning project a sucess
DESCRIPTION
PSIGEN presentation at SharePoint Intelligence Anaheim. During this QuickStart event presentation, we gave an overview of success factors and planning required tTRANSCRIPT
10 Tips To Make Your SharePoint Scanning
Project a Success
Stephen Boals949-916-7700- x230
Plan Your Storage
Description Number of Pages Storage
1 Scanned Page – 8.5 x 11 1 30-50KB
1 Scanned Page – 11x17 1 100KB
1 File Cabinet – 4 drawers 10,0000 500MB
1 Box 2500 125MB
1 Linear Inch 100 5MB
1 E Size Engineering Drawing (48x36)
16 – 8.5x11 800KB
How much storage?
Key Factors in Storage and Sizing• DPI Setting• Color/Black White/Grayscale• Image Format – PDF or TIFF??• Image Processing technology can reduce file
size by 10-30%– Despeckle– Border removal– 3 hole punch removal– Binarization***********
Scanning Mode/DPI File Size
Black and White – 200 DPI 26K
Black and White - 300 DPI 38K
Black and White - 400 DPI 51K
Black and White - 600 DPI 80K
Greyscale – 300 DPI 301K
Color- 300 DPI 577K
File Size Comparison
SharePoint Storage Architecture
• Image file sizes can lead to DB issues if proper planning does not take place and storage considerations are not examined.
• Consider the use of Remote BLOB Storage (RBS)
Latest Content Database Limitation• Content databases of up to 4 TB are supported
when the following requirements are met:– Disk sub-system performance of 0.25 IOPs per
GB. 2 IIOPs per GB is recommended for optimal performance.
– You must have developed plans for high availability, disaster recovery, future capacity, and performance testing.
• http://technet.microsoft.com/en-us/library/cc298801.aspx
• http://sharepoint.microsoft.com/blog/pages/BlogPost.aspx?pID=988
• Backup and restore.• Skilled administrators.• Complexity of customizations and configurations on
SharePoint Server 2010 may necessitate refactoring (or splitting) of data into multiple content databases.
• 100-200GB is still the best size for backup and restore, and overall manageability.
Considerations on Content DB Size
Microsoft RBS Recommendations
• RBS provides benefits in the following:– The content databases are larger than 500 gigabytes
(GB).– The BLOB data files are larger than 256 kilobytes (KB). – The BLOB data files are at least 80 KB and the
database server is a performance bottleneck. In this case, RBS reduces the both the I/O and processing load on the database server.
Use Folder and File Names
No More Folders!!
• Maybe not• Majority of our implementations,
customers required folder naming in libraries
• Why?
Why Folders?
• Users are familiar with Folder structures– Easier adoption
• Use of 3rd party tools– Colligo Briefcase– Access Tools
• WebDav applications• Office
Folders and Filenames
• Search Aid• Flexibility for migration• Structured data for DR• Overall Contingencies
Think Search
Capture drives Search• How do you want to find your documents?• Index fields (Columns in SharePoint) are the
critical focus.• Use Term Store and Managed
Metadata• Rules to live by:
– 5 <= defining fields per document type– Always include dates– Steer clear of field “overdrive”
• Automation and data sources can let you go beyond
OCR for Search
Full Text
• The Insurance Policy• Adobe PDF Image + Hidden Text– Industry Standard– One “Package” for image and OCR text– Portable
• Provide the ultimate in searchablility with iFilter
Define Your Scanning Model
Scanning Models
Scanning Models• Centralized Capture – Documents are scanned
at one location and in “batches” at a particular time or times
• De-centralized Capture – Documents are still scanned in batches at a particular time, but are now scanned at multiple locations
• Distributed Capture – Documents are scanned at the point of transaction and at multiple locations
Trend from Centralized to Distributed Scanning
Choose the Correct Scanners
Choosing your Weapon
• MFPs or Scanners??
MFPs – The Pros
• Leverage your existing investment in the MFP• Most copier maintenance plans do not charge
for scans• MFP manufacturers are really focusing on
scanning • Network scanning functions:
– Scan to email– Scan to Windows Folders– Scan to FTP
• One-to-Many relationship: all workers can use one device.
MFPs – The Cons
• Contention – “line at the copier”• Poor performance with differing paper sizes• Lack of color dropout (Scanning blue or black
backgrounds will result in a black page)• Small Document Feeder sizes (50 – 100 pages)• On average, file sizes are 10-20% larger• Duplex scanning/DPI increase greatly slows
down rated speed• Black and White scanning only on some models
Scanners – The Pros
• Convenience – scan at your desk• Duplexing does not slow down scanner• Color dropout• Superior image quality due to
enhancement features• Ease in handling differing paper
sizes/types• Larger document feeder selections (up
to 1000+ pages)
Scanners – The Cons
• One to One relationship – directly connected to PC
• Additional Maintenance costs• Can be quite expensive to outfit your
whole organization.
When to use a Dedicated Scanner
• Scanning 10+ documents per day• Workers that are constantly scanning throughout
the day• Mixed paper sizes, weights and colors• Poor quality, older documents or when image
enhancement is required• OCR or ICR applications• High volume copying and printing environments• Large Document scanning• High security environments
Key Points When Purchasing
• Scanning speed• Document Feeder Capacity• Daily Duty Cycle• Scanning Mode• Warranty and Service
Correctly Configure Devices
Too Many IT Killers
Focus
• Almost all MFPs Scan in Color by Default
• DPI is always set above 200 DPI• Huge network impact• Huge DB Impact• Huge drain on resources
Recommendations-Default
• 200 DPI • Black and White• Only add color for specific
departmental needs• Use TIFF and PDF
(compressed)• Linearized PDF (WebFast)
Scan or Capture?
Scanning Challenges
• Basic capabilities• No standardization• Documents not searchable• Time intensive• Lack of integration into
Enterprise Applications
Capture vs. Scanning
• A scanning application is just a means to take paper, and quickly and easily convert it from paper to digital form. They are well suited to environments with very basic needs, and what I call "onsie-twosie" scanning, or low volume environments.
Capture software can be utilized for basic scanning needs, but takes you to a whole new level from a "capture" perspective. These applications typically have a number of ways to "slice and dice" documents, and really focus on efficiency, and minimizing the time required to scan, index and capture data.
Why capture?
• Reduce the required time for scanning and indexing documents = Efficiency
• Enable a standard process for scanning, capturing, indexing, naming, and processing = Standardization
• Provide numerous gateways to multiple repositories = Flexibility
Automation is Key
Extraction Technologies
Advanced Data Extraction (ADE)
Zone OCR
Manual Entry
What is ADE?
Automated Routing
12332
ATT
1232.00
Use Barcodes and OMR
Routing/Separator Sheets
• Utilize barcodes and/or Optical Mark Recognition (OMR)
• Capture reads and determines routing based on them
Intelligent Routing
How are they Created?
• Most Capture Apps Include them
• Ad Hoc and Bulk Generation
• Excel and Word Macros
• Custom SP Apps
JDoe
Summary: Plan, Plan, Plan
Keys to Project Success
• All items in this presentation are critical to overall planning
• Focus on meeting needs and driving users to proper use of technology
• Start small – POC• Learn from smaller projects• Expand
Who is PSIGEN?
• Founded 1995• Mature capture company• Innovative Capture• Focus on Automation• Integration with 56 ECM
systems
Links
• www.psigen.com• www.scanningwithsharepoint.com