ubiq – low bandwidth visual communication
DESCRIPTION
UBIQ – Low Bandwidth Visual Communication. Jonathan H. Connell Exploratory Computer Vision Group IBM T. J. Watson Research Center [email protected]. What is it?. Links camera phone to any PC PC user can see video, snap pictures Good for a quick “beam in”. - PowerPoint PPT PresentationTRANSCRIPT
Exploratory Computer Vision Group
© 2007 IBM Corporation
Jonathan H. ConnellExploratory Computer Vision GroupIBM T. J. Watson Research Center
UBIQ – Low Bandwidth Visual Communication
Exploratory Computer Vision Group
© 2007 IBM Corporation
What is it?
Links camera phone to any PC
PC user can see video, snap pictures
Good for a quick “beam in”
Exploratory Computer Vision Group
© 2007 IBM Corporation
UBIQ concept: The expert can be everywhere
Hybrid solution– Send out medium-skilled person for quick fix in most cases
– Call back to main office for more difficult problems
“Beaming in” the expert– Sometimes verbal communications is insufficient
– Pictures can be sent, but take a long time to transmit
– Person in the field might take picture of wrong aspect
Provide a real-time “viewfinder” mode to allow expert to quickly snap the right picture on a remote mobile phone
Field service dilemma (e.g. repair):– Most problems have simple solutions
– Some field-service problems require experts
– Experts are expensive, want to utilize effectively
– Medium-skilled labor can fix many problems
Exploratory Computer Vision Group
© 2007 IBM Corporation
Scenario: fixing a copier
Local maintenance guy shows up promptly– Checks for correct paper & toner level
Expert asks for view of “fuser roller”– “What’s that?” – Local person uses video mode to
get to correct location
Customer calls in problem = streaks on paper
Calls back to home office for advice– Shows paper markings
Expert snaps image and examines
– Fix problem by using alcohol wipe on this component (marked)
… there.
Open the side …
… closer …
Exploratory Computer Vision Group
© 2007 IBM Corporation
Demo – click here to play
Exploratory Computer Vision Group
© 2007 IBM Corporation
Other Scenarios
Specialized medical consultation
– Remote clinic in Botswana
– Experts can’t (or won’t) travel there quickly
– Check out foot rash without fear of contagion
Inspection at construction site
– Concrete slab slipping down hill in Brazil
– Fly in a civil engineer (while site idles)
– Problem really requires a hydrologist?
Exploratory Computer Vision Group
© 2007 IBM Corporation
Value Proposition Lower cost of operations
– No expense for cars, plane trips, lodging …
Right brain at the right place quickly– Can easily change experts if needed
– No delays due to flights, visa approval …
– Increases customer satisfaction
Better leverage existing expertise– No time lost on travel (or getting lost)
Bigger expert recruitment pool– No onerous travel or relocation
– Social skills less important
Making the top of the skill pyramid
virtually ubiquitous.
Exploratory Computer Vision Group
© 2007 IBM Corporation
Critical Point: Designing around bandwidth
Verizon 3G EV-DO cites (uncompressed data):
– Rev A peak: down = 600-1400 kbaud, up = 500-800 kbaud
– Non Rev A peak: down = 400-700 kbaud, up = 60-80 kbaud
– Local test: uplink 200KB in 25 sec 8KB / sec = 64 kbaud
Older CDPD / GPRS networks = 9.6-40 kbaud
– Remote areas in US (Nebraska)
– Developing countries (South Africa)
South Africa: blue = 30 kbaudUSA: blue = 400 kbaud, green = 50 kbaud
Exploratory Computer Vision Group
© 2007 IBM Corporation
Video transmission
Uplink bandwidth intrinsically limited
– Handset radiated power (batteries, FCC limits)
– Distance to base station
Generally assume 10-50 kbaud (like old dial-up)
Motion “video” requires 5-10 fps – H.264 (MPEG-4) lowest = 64 kbaud for 176x144 @ 15fps
– WMV for dialup = 38 kbaud for 160x120 @ 15 fps
Need very low-bandwidth codecs– 350 bytes / frame @ 53 kbaud for 15fps
– 100-200 bytes / frame @ 10 kbaud for 5-10fps
Exploratory Computer Vision Group
© 2007 IBM Corporation
Key technology
Low-bandwidth viewfinder suited to task– WHY: Allows expert to guide image acquisition more effectively– HOW: Use computer vision techniques to focus on “semantic” aspects
US patent 7,219,364 to IBM “System and Method for Selectable Semantic Codec Pairs
for Very Low Data-Rate Video Transmission”
Rudolf Bolle & Jonathan Connell (filed Feb. 2001, issued May 2007)
Claims:
1. A system for compressing one or more video streams comprising:
one or more image input devices creating the one or more video streams; and a selector process that selects a semantic compression process out of a set of
semantic compression processes, the selected semantic compression process compressing the one or more video
streams based on a task that required the compression of the one or more video streams and that utilizes content of the one or more video streams.
Exploratory Computer Vision Group
© 2007 IBM Corporation
Codec 1: JPEG stills
Compression settings
– Moderate resolution
– low quality (50)
Balance of clarity & speed
– Non-linear with resolution
– Network issues
64 x 48 = 1242 bytes (1.2 secs)
128 x 96 = 2765 bytes (2.8 secs)32 x 24 = 812 bytes (0.8 secs)
4x more pixels
2.2x slower
4x fewer pixels
1.5x faster
Exploratory Computer Vision Group
© 2007 IBM Corporation
Interaction with network Ethernet TCP/IP packet structure:
– 8 bytes Ethernet framing– 20 byte TCP header– 14 bytes IPv4 MAC header– 46-1500 bytes payload– 4 bytes CRC check code
Effective bandwidth over raw 10 kbaud link:– 100 bytes 146 bytes = 8.6 fps (32% overhead)– 200 bytes 246 bytes = 5.1 fps (19% overhead)– 1000 bytes 1046 bytes = 1.2 fps (4% overhead)
Nagel algorithm in TCP– Tries to combine small packets for better efficiency– Need to disable for acceptable latency (and smoothness)
Delayed ACK in TCP– Multi-packet transmit can be delayed 200ms if no down-linked command
Exploratory Computer Vision Group
© 2007 IBM Corporation
Codec 2: Progressive gray
Low spatial and intensity resolution
– 16 x 12 pixels
– 4 bit gray scale
Image = 96 bytes
10fps @ 10 kbaud
No Huffman coding
– not effective on short messages
16 x 12 x 8 bits = 192 bytes
16 x 12 x 4 bits = 96 bytes
Interpolated 8 bits
Interpolated 4 bits
nearly identical
Exploratory Computer Vision Group
© 2007 IBM Corporation
Algorithm
Progressive refinement
– Send very low 4 bit resolution base
– Send next resolution in 4 pieces
– Send best resolution in 16 pieces
– Add in low order bits in 16 pieces
Motion sensitivity
– If basic scene changes start with new base image
– Add resolution from the center outward
Long term stability
– Don’t replace a good resolution image with a poorer one
– Send new best resolution in 32 pieces in background
Exploratory Computer Vision Group
© 2007 IBM Corporation
Refinement sequence
Central quarter 1 Central quarters 1 & 2 Central quarters 1 & 2 & 3
Base 16 x 12 pixels
Exploratory Computer Vision Group
© 2007 IBM Corporation
Resolution sequence16 x 12 @ 4 bits (0.1 secs) 32 x 24 @ 4 bits (0.5 secs)
64 x 48 @ 4 bits (2.1 secs) 64 x 48 @ 8 bits (3.6 secs)
Exploratory Computer Vision Group
© 2007 IBM Corporation
Codec 3: Prominent lines
Convolve with Sobel masks
Y vs. X = angular direction
RMS value = magnitude
Edge Magnitude
Edge Direction (only 4 matter)
Input
1 0 -1
2 0 -2
1 0 -1
1 2 1
0 0 0
-1 -2 -1
Exploratory Computer Vision Group
© 2007 IBM Corporation
Choosing edges
Separate into horizontal and vertical edges
Find connected components
Determine maximal length elements
Keep best N
Exploratory Computer Vision Group
© 2007 IBM Corporation
Approximating edges
Find blob parameters
– First order moments (centroid)
– Second order moments (inertia)
– Bounding box (max & min of x, y)
Get line endpoints
– Line passes through centroid
– Line is parallel to minimal axis
– Clip to bounding box
Better than least squares
– Not just minimum y error
Pixel pattern
Exploratory Computer Vision Group
© 2007 IBM Corporation
Final line versionINPUT
Keep and code 50 best
– (x0, y0, x1, y1) in 240x180
– 200 bytes total 5fps
Exploratory Computer Vision Group
© 2007 IBM Corporation
Blend successive frames
now previous mixed (grayed)
+ =
Client side smoothing
But only if low motion
now fattened previous “extra” edges moved
- =
Exploratory Computer Vision Group
© 2007 IBM Corporation
Comparison of codecs
Different rates:
10 fps, 5 fps, 0.8 fps
Color vs. gray
Iconic vs. graphical
Demo – click here to play
Input Progressive
Lines JPEG
Exploratory Computer Vision Group
© 2007 IBM Corporation
UBIQ summary
Enhances visual communication
– Multiple viewfinder codecs
– Remote acquisition controls
– Image mark-up possible
Fundamentals covered under US patent
Single platform implementation
– Windows XP (PC client)
– Windows Mobile 5.0 (Smartphone server)
Demo possible
– http://www.research.ibm.com/people/j/jhc/ubiq/
Exploratory Computer Vision Group
© 2007 IBM Corporation
Future work
Field testing
– See which codecs are useful for which tasks
Porting to other phones
– Java, Symbian (camera access?)
Development of additional codecs
– Area based analog to lines
– Hybrid lines + blobs
– Spatially varying resolution
– Camera tracking partial stills
– Quick remote zoom refinement