multimedia retrieval architecture anandi giridharan electrical communication engineering, indian...
TRANSCRIPT
Multimedia Retrieval Architecture
Anandi GiridharanElectrical Communication Engineering,
Indian Institute of Science, Bangalore – 560012, India
Multimedia Storage Techniques
Multimedia Retrieval Architecture
Media and Storage Requirements
Characteristics of multimedia data their storage requirement.
Multimedia data tends to be voluminous. eg.100min of video compressed using JPEG compression algorithm requires 9GB of storage space. Most storage systems do not provide for such large continuous locations.
Continuous media data, such as video and audio have timing characteristics associated with them.
In Real time data need to be collected without losing a portion of the data. Imposes timing constraints on multimedia data.
Multimedia Retrieval Architecture
Media requirements of MM applications and
storage space.
Multimedia Retrieval Architecture
Multimedia Standards
• A standard implies consistency and conformity, which means they facilitate interoperability and compatibility.
• Standards in computing are developed to solve problems:
– Interoperability – allow systems to communicate with each other (e.g., TCP/IP)
– Portability – allowing software to work on different systems (e.g., Java)
– Data exchange – allowing data to be transferred to different systems (e.g., JPEG)
• Factors to consider: Lifetime, Portability and Costs
Multimedia Retrieval Architecture
Storage Structures of Video Data• In digital video 4 types of control information have to
considered for smooth running of any mm information
• Control Information
• Frame Rate:
• Video is made up of 30 (or 24) pictures or frames for every second of video.
• Frames are split in half (odd lines and even lines), to form what are called fields.
• Interlaced video: When a television set displays its analogue video signal, it displays the odd lines (the odd field) first. Then it displays the even lines (the even field).
• Non-Interlaced Video: Computer monitor uses “progressive scan" to update the screen. Computer displays each line in sequence, from top to bottom.
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Interlaced video on the left, deinterlaced video on the right. - See more at: http://www.streaminglearningcenter.com/articles/shooting-for-streaming---progressive-or-interlaced.html#sthash.iAFBM02x.dpuf
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Storage Structures of Video Data• Control Information
• Color Resolution:
– Color resolution refers to the number of colors displayed on the screen at one time
– RGB (red-green-blue) and YUV (luminance component (the brightness) and U and V chrominance (color) components)
• Spatial Resolution:
– “How big is the picture?” Resolution
• Image Quality:
– Video should look acceptable for an application.
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Spatial resolution is a parameter that shows how many pixels are used to represent a real
object in digital form. Fig. 2 shows the same color image represented by different spatial
resolution. Left flower have a much better resolution that right one
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Video Data Compression• Factors associated with compression
– Real-Time versus Non-Real-Time
• Some systems compress to disk, decompress and playback video (30fps) all in real time. There are no delays. Other systems are only capable of capturing some of the 30fps and are capable of playing back some of the frames not all .
– Symmetrical Versus Asymmetrical
• Symmetrical: if a sequence of 640x480 can be played at 30 fps, capturing, compressing and storing is also possible at the same rate.
• Opposite of Asymmetrical. It takes lot longer, elaborate.
Multimedia Retrieval Architecture
Compression Ratios
The numerical representation of the original video in comparison to the compressed video. eg.200:1 compression ratio means that the original video is represented by the number 200 and compressed video is represented by smaller number in this case 1.
Multimedia Retrieval Architecture
Lossless Versus Lossy
Loss factor determines whether there is a loss of quality between the original image and the image after it has been compressed.
With lossless compression, every single bit of data that was originally in the file remains after the file is uncompressed. All of the information is completely restored.
lossy compression reduces a file by permanently eliminating certain information, especially redundant information. When the file is uncompressed, only a part of the original information is still there (although the user may not notice it).
Multimedia Retrieval Architecture
Examples of Lossless and lossy (200:1) images decoded
from the same file.
Multimedia Retrieval Architecture
Video Data Compression• Interframe Versus Intraframes
– Intraframe method compresses and stores each video frame as a discrete picture
– Interframe method: Reference Frame and the differences between frames are recorded.
• Bit Rate Control
– Parameters such as frame rate, quality of the images should be allowed to be modified w.r.t. the application requirements
• Selecting a Compression Technique
– Motion JPEG, MPEG-1, MPEG-2, so on up to MPEG-7 and MPEG-2000 are internationally recognized standards for compression of moving pictures.
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
Data compression
Data Compression coverts an Input data stream into another stream of smaller size.
Process of reducing the amount of data needed for
storage typically by use of encoding techniques.
Compression helps in reducing storage space
Reduce bandwidth
Lower cost
Used in new applications.
Multimedia Retrieval Architecture
Audio Compression
• Predictive encoding: Difference between samples are encoded instead of absolute sample values resulting in lower bit rates. Compression is not that high.
• Perceptual encoding: It makes use of the flaws in our auditory system based on the study of how people perceive sound
Ear's sensitivity to sound Is not uniform2 to 4 kHz ear is sensitiveHigher or lower ranges not sensitive.Audio samples that are below the threshold can be deleted.Some sound can mask other sounds.
Multimedia Retrieval Architecture
Lossy Audio compressionSounds are masked by other sounds.
• Frequency masking: A loud sound in Frequency range can partially or fully mask another sound in nearby frequency range.
• Temporal masking: Loud sound can numb our ears for short duration even after sound has disappeared.
Multimedia Retrieval Architecture
MPEG-1 Audio Compression• Sampling is done at 32KHz, 44.1 KHz or 48 KHz.
• 44.1 KHz for CD quality audio.
• Signal is converted from Time domain to frequency domain using Fast Fourier Transform,
• Resulting Spectrum is divided into at-most 32 frequency bands each of which are processed separately.
• Frequency ranges that are to be completely masked are allocated zero bits
• That are to be partially masked are allocated small number of bits
• That are not to be masked are allocated large number of bits.
• In case of stereo, redundancy are similar in two audio sources are exploited.
Multimedia Retrieval Architecture
Video Compression• Video is temporal combination of frames
• Each frame can be considered as an still image comprising of spatial combination of pixels.
• Two principles:
• Joint photographic expert group: is used to compress images by removing spatial redundancy that exists in each frame.
• Moving Picture Expert Group: is used to compress video by removing temporal redundancy of a set of frames.
Multimedia Retrieval Architecture
JPEG• JPEG involves four steps
– Block preparation
– Discrete cosine transformation
– Quantization
– Compression
Multimedia Retrieval Architecture
Phases of JPEG
Multimedia Retrieval Architecture
Block preparation• Block preparation: After video signal is digitized , is
converted to array of pixels. i.e. 640*480 pixel
• Each pixel has RGB components each 8 bits totally 24 bits/ pixel.
• Before compression it is converted to Luminance (brightness) (more sensitive to our eyes) Chrominance (color)
• Chrominance is very sensitive to our eyes so sent with lesser resolution. It is compactable with Black and white picture
• Allows more compression so in YUV.
• Y=0.30R+0.59G+0.11B
• U=-0.18R-0.29G+0.44B
• V=0.62-0.52G-0.10B
Multimedia Retrieval Architecture
Discrete Cosine transform• Each block of 64 pixels goes through a
transformation called DCT
• Example: with uniform intensity
It has only one DC component and Other ac componets
It has one Dc and few AC Number of zeros are more.
Example 2: With 2 different intensities
Multimedia Retrieval Architecture
Quantization
Further increasesNumber of zeros
Multimedia Retrieval Architecture
Zig Zag scanning
• To compute all the zeros together and sent in compact number as fewer number.
Multimedia Retrieval Architecture
MPEG-1
• First standard that finalized for video compression for interactive video on CD and digital audio Broadcasting.
• VCR quality 640*480 pixel , 24 bits/pixel, 25 frame /sec gives 368.64 Mbps (UC)
• After MPEG-1 compression gives 1.5 Mbps.
• It is likely to dominate the encoding of CDROM based movies, gives good quality movie.
• It can be used to transmit over twisted pair for modest distance (5km)
Multimedia Retrieval Architecture
MPEG-1• It has 3 components, Audio, Video and system,
• 90 KHZ clock outputs the current time valve (time stamps) to both the encoders and propagated all the way to receiver.,
Audio signal
SystemMultiplexerclock
Audioencoder
Videoencoder
90KHzMPEG-1
Video signal
Multimedia Retrieval Architecture
MPEG-1 Video compressing• Encoding each frame separately with jpeg removes
spatial redundancy.
• Additional compression can be achieved by taking advantage of the fact that consecutive frames are often almost identical.
Multimedia Retrieval Architecture
• MPEG-1 has 4 kinds of frames for motion compensation.
• (Difference between 2 frames are computed)
• P frame(Predictive)- Uses Block by block difference with preceding I and P.
• B (Bidirectional)- Difference with preceding and following I or P frames are used as references
• I (Intracode)- Self contained JPEG encoded appears periodically and can be decoded independently.
• D (DC coded) frames- Block average used for fast Farward..
Multimedia Retrieval Architecture
MPEG Frames
Multimedia Retrieval Architecture
Frame construction
Multimedia Retrieval Architecture
Multimedia Retrieval Architecture
MPEG-2
• Similar to MPEG-1
• Developed for Digital TV
• No fast forward , not supporting D frames
• DCT-10*10 instead of 8 * 8
• For better quality
• Supports 4 resolutions and 5 profiles.
• Has a more general way of multiplexing
• Each streams are packetized with time stamps
Multimedia Retrieval Architecture
MPEG-4• Started for low bit rate
• For used in portable like video phone
• Standard includes much more than just data compression
• Functionality: Content based MM access tools
• Manipulation and Bit stream editing
• Improved temporal random access
• Robustness in error prone environment
• Content based scalabilty.
Multimedia Retrieval Architecture
H 261• H.261 is a ITU-T video coding standard.
• H.261 was originally designed for transmission over ISDN lines on which data rates are multiples of 64 kbit/s.
• The coding algorithm was designed to be able to operate at video bit rates between 40 Kbit/s and 2 Mbit/s.
• The standard supports two video frame sizes: CIF (Comman Intermediate format) and QCIF (Quarter CIF) using a 4:2:0 sampling scheme.
• Both encoder and decoder should be v.fast used for interactive VC, real time.