prelim question paper solution...

1113/Engg/BE/Pre Pap/2013/INFT/Soln/DSIP 1

Vidyalankar B.E. Sem. VII [INFT]

Digital Signal and Image Processing Prelim Question Paper Solution

Classification of Signals 1. Continuous and discrete time signals

Continuous signal : A signal of continuous amplitude or time is known as continuous signal or analog signal. This signal is having some value at every instant of time. Discrete time signal : In this case the value of signal is specified only at specific time. So the signal represented at “discrete interval of time” is called as discrete time signal.

2. Periodic signal A signal which repeats itself after a fixed time period or interval is called as periodic

signal. The periodicity of continuous time signal can be defined mathematically as, x(t) = x(t + T0) … (1) This is called as condition of periodicity. Here T0 is called as fundamental period. That

means after this period the signal repeats itself. For the discrete time signal, the condition of periodicity is, x(n)= x(n + N) … (2)

Here number ‘N’ is the period of signal. The smallest value of N for which the condition of periodicity exists is called as fundamental period.

1. (a)

0 0

t

x(t) x(t)

Continuous signal Continuous signal

Fig. 1 : Examples of continuous signals

6 5 4 3 2 1 0 1 2 3 4 5 6

TS Train of sampling pulses

Discrete signal n

t

Continuous signal

t (a)

(b)

(c)

x(t)

Fig. 2 : Discrete time signal Vidyala

nkar

Vidyalankar : B.E. DSIP


Periodic signals are shown in figure

Non-periodic signal A signal which does not repeat itself after a fixed time period or does not repeat at all is

called as non-periodic or aperiodic signal. Thus mathematical expression for non-periodic signal is, x(t) x(t + T0) … (3) and x(n) x(n + N) … (4)

Sometimes it is said that non-periodic signal has a period T = as shown in figure39(c). This is exponential signal having period, T = .

3. Even and odd signals Even signals An even signal is also called as symmetrical signal. A discrete time real valued signal is said to be symmetric (even) if,

Condition of symmetry : x(n) = x(n) … for D.T. signal Examples : Figure 4(a) and (b) are examples of D.T. even signals.

Periodic signal x(t) = A sint

Period of repetition (T)

x(t)

t 2T T

Periodic signal x(n) = A sinn

Period of repetition (N)

x(n)

n

Fig. 3(a) : Continuous time periodic signal

Fig. 3(b) : Discrete time periodic signal

0

x(t)

Non-periodic signal

t

Note that one cycle period is, T =

Fig. 3(c) : Aperiodic signal having period, T =

n 3 2 1

1

2 3

x(n)

x(n) x(n)

1 n 3 2 1 2 3

x(n)

x(n) x(n)

1 0

(a) D.T. even signal (b) D.T. odd signal Fig. : 4

0

Vidyala

nkar

Prelim Question Paper Solution


Discrete time (D.T.) signal x(n) is said to be antisymmetric or odd if it satisfies the following condition.

Condition of antisymmetric : x(n) = x(n) … for D.T. signal 4. Energy and power signals

Power signals : There are three power signals 1) Instantaneous power 2) Normalized power 3) Average normalized power

Definition of Power Signal :

A signal x(t) is said to be power signal if its normalized average power is non-zero and finite. Thus for the power signals the normalized average power, P is finite and non-zero. For the discrete time signals, average power P is given by,

P = N

2

N n N

1lim x (n){N 1}

2N 1

… (5)

In equation (5), it is expected that N >>1. The power signal is as shown in figure 5.

Energy Signal : The total normalized energy for a “real” signal x(t) is given by,

E = 2x (t)dt

… (6)

But if the signal x(t) is complex then equation (6) is modified as,

E = 2| x(t) | dt

… (7)

E = 2| x(t) | dt

… (8)

Definition of Energy Signal

A signal x(t) is said to be an energy signal if its normalized energy is non-zero and finite. Hence for the energy signals the total normalized energy (E) is non-zero and finite. The energy of discrete time signal is denoted by E. It is given by,

E = 2

n

| x(n) |

… (9)

5. Deterministic and Random Signals :

Deterministic Signal : A signal which can be described by a mathematical expression, loop-up table or some well defined rule is called as the deterministic signal.

Random signal : A signal which cannot be described by any mathematical expression is called as random signal. Due to this it is not possible to predict about the amplitude of such signals at a given instant of time.

n

u(n) Power (P) =

1

2

Fig.: 5

Vidyala

nkar



Example : A good example of random signal is “noise” in the communication system. Such a random signal is as shown in figure (6b).

Properties of DTFT We know that x(n) and X() are fourier transform pairs and denoted as,

x(n) FT X()

1) Linearity

Statement : If x1(n) FT X1() and x2(n) FT X2 ()

then a1 x1(n) + a2 x2(n) FT a1 X1() + a2 X2(). Fourier transform of a linear combination of two or more signals is equal to the same linear combination of fourier transform of individual signal.

2) Time Shifting

Statement : If x(n) FT X()

then x(n k) FT ejk X() If a signal is shifted in time domain by k samples then the magnitude spectrum is unchanged but the phase spectrum is changed by amount (k).

3) Time Reversal


then x(n) FT X()

If we fold the sequence in time domain then the magnitude spectrum is unchanged but the polarity of phase spectrum is changed.

4) Periodicity The periodicity is defined as X() = X( + 2k)

5) Convolution Theorem

Statement : If x1 (n) FT X1 () and x2 (n) FT X2()

then x1 (n) * x2 (n) FT X1 (). X2 () Convolution of two signals in time domain is equivalent to multiplication in frequency domain.

6) Correlation Theorem

Statement : If x1(n) FT X1() and x2 (n) FT X2 ()

then 1 2x xr (n) FT X1() . X2 ()

Here the term X1(). X2() is denoted by 1 2x x ( ) and it is called as cross energy density

spectrum. The energy spectral density of a signal is fourier transform of its autocorrelation sequence.

1. (b)

x(t)

(a) Deterministic signal (b) Random signal

2 t

A x(t) = A sin (2 ft)

T0 = 0

1

f

Random signal x(t)

t

Fig. 6

Vidyala

nkar



7) Frequency Shifting


then 0j ne x(n) FT X( 0)

Multiplying sequence x(n) by 0j ne in time domain corresponds to frequency shift by o in frequency domain.

8) Multiplication of two sequences

Statement : If x1(n) FT X1() and x2(n) FT X2()

then x1(n) . x2(n) FT X3() = 1 21

X ( ) X ( )d2

Multiplication of two discrete time sequences is equivalent to convolution of their fourier transforms.

9) Parseval’s theorem

Statement : If x1(n) FT X1() and x2(n) FT X2()

then * *1 2 1 2

n

1x (n) x (n) X ( )X ( ) d

2

10) Differentiation in frequency domain


then n x(n) FT d

jd

X()

Proof : According to the definition of DTFT we have,

X() = j n

n

x(n)e

…(1)

Taking differentiation on both sides,

dX( )

d

= j n

n

dx(n)e

d

= j n

n

dx(n) e

d

dX( )

d

= j j n

n

nx(n)e

Multiplying both sides by j,

dX( )

jd

= j j j n

n

nx(n)e

dX( )

jd

= j2 j n

n

nx(n)e

…(2)

Here j2 = 1 and comparing R.H.S. with equation (1) we get,

dX( )

jd

= FT {nx (n)}

n x (n) FTdX( )

jd

Hence proved. Consider, x(n) = {1, 0, 0, 1}

Vidyala

nkar



Obtain its DFT

Solution : N = 4, therefore one has to make use of 4 point DFT.

X(K) = 3

k . nN

n 0

x(n)

Now N = 2

jNe

k = 0, X(0) = 3

0N

n 0

x(n)

= 3

n 0

x(n)

= x(0) + x(1) + x(2) + x(3) = 2

X(0) = 2

k = 1, X(1) = 3

nN

n 0

x(n)

= x(0) No + x(1) N

1 + x(2) N2 + x(3) N

3 = (1) (1) + (1) (j) = 1 + j.

k = 2, X(2) = 3

2nN

n 0

x(n)

= x(0) No + x(1) N

2 + x(2) N4 + x(3) N

6 = 1.1 + 1 (–1) = 1 – 1 = 0

k = 3, X(3) = 3

3nN

n 0

x(n)

= x(0) . No + x(1) N

3 + x(2) . N6 + x(3) . N

9 = 1.1 + (1) (–j) = 1 – j. X(K) = {2 , 1 + j, 0, 1– j}

o o o oN N N N

o 1 2 3N N N N

o 2 4 6N N N N

o 3 6 9N N N N

X(0) x(0)

X(1) x(1)

X(2) x(2)

X(3) x(3)

i.e.,

X(0) 1 1 1 1

X(1) 1 j 1 j

X(2) 1 1 1 1

X(3) 1 j 1 j

.

Biometric Authentication

Facial recognition system A facial recognition system is a computer application for automatically identifying or

verifying a person from a digital image or a video frame from a video source. One of the ways to do this is by comparing selected facial features from the image and a facial database.

It is typically used in security systems and can be compared to other biometrics such as fingerprint or eye iris recognition systems.

2. (a) Vidyala

nkar



Techniques 3-D A newly emerging trend, claimed to achieve previously unseen accuracies, is three-

dimensional face recognition. This technique uses 3-D sensors to capture information about the shape of a face. This information is then used to identify distinctive features on the surface of a face, such as the contour of the eye sockets, nose, and chin.

One advantage of 3-D facial recognition is that it is not affected by changes in lighting like other techniques. It can also identify a face from a range of viewing angles, including a profile view.

Even a perfect 3D matching technique could be sensitive to expressions. For that goal a group at the Technion applied tools from metric geometry to treat expressions as isometries.

Skin texture analysis Another emerging trend uses the visual details of the skin, as captured in standard digital or

scanned images. This technique, called skin texture analysis, turns the unique lines, patterns, and spots apparent in a person’s skin into a mathematical space.

Tests have shown that with the addition of skin texture analysis, performance in recognizing faces can increase 20 to 25 percent.

Fingerprint Recognition A fingerprint is an impression of the friction ridges of all or any part of the finger. A friction

ridge is a raised portion of the epidermis on the palmar (palm), digits (fingers and toes), or plantar (sole) skin, consisting of one or more connected ridge units of friction ridge skin.

These are sometimes known as "epidermal ridges" which are caused by the underlying interface between the dermal papillae of the dermis and the interpapillary (rete) pegs of the epidermis.

These epidermal ridges serve to amplify vibrations triggered when fingertips brush across an uneven surface, better transmitting the signals to sensory nerves involved in fine texture perception. The ridges assist in gripping rough surfaces, as well as smooth wet surfaces.

Fingerprints may be deposited in natural secretions from the eccrine glands present in friction ridge skin (secretions consisting primarily of water) or they may be made by ink or contaminants transferred from the peaks of friction skin ridges to a relatively smooth surface such as a fingerprint card.

The term fingerprint normally refers to impressions transferred from the pad on the last joint of fingers and thumbs, though fingerprint cards also typically record portions of lower joint areas of the fingers (which are also used to make identifications).

Fingerprint types Exemplar prints Exemplar prints, or known prints, are prints deliberately collected from a subject, such as at

the point of enrollment in a system, or at the time of arrest. For criminal arrests, a set of exemplar prints includes one image of each finger that has been

rolled from nail to nail, in addition to plain (or slap) impressions of the four fingers of each hand and plain impressions of each thumb.

Exemplar prints can be collected using live scan devices, or by using ink on paper cards. Automatic number plate recognition Automatic number plate recognition (ANPR) is a mass surveillance method that uses optical

character recognition on images to read the license plates on vehicles. They can use existing closed-circuit television or road-rule enforcement cameras, or ones specifically designed for the task.

They are used by various police forces and as a method of electronic toll collection on pay-per-use roads and monitoring traffic activity, such as red light adherence in an intersection.

Vidyala

nkar



ANPR can be used to store the images captured by the cameras as well as the text from the license plate, with some configurable to store a photograph of the driver.

Systems commonly use infrared lighting to allow the camera to take the picture at any time of the day. A powerful flash is included in at least one version of the intersection-monitoring cameras, serving both to illuminate the picture and to make the offender aware of his or her mistake.

ANPR technology tends to be region-specific, owing to plate variation from place to place. Regional Descriptors Regional descriptors are used for describing image regions. In usual practice, boundary & regional descriptors are used combinely.

1) Some simple Descriptors The area of a region is defined as the number of pixels in the region. The perimeter of a region is the length of its boundary. Although area & perimeter are sometimes used as descriptors, they apply primarily to

situations in which the size of region of interest is invariant. These descriptors are more commonly used in measuring compactness of a region,

defined as (perimeter)2 / area. Compactness is a dimensionless quantity (insensitive to uniform scale changes) & is

minimal for a diskshaped region. Other simple measures used as region descriptors include the mean & median of the gray

levels, the minimum & maximum gray level values & the number of pixels with values above & below the mean.

Even a simple region descriptor such as normalized area can be quite useful in extracting information from the images.

2) Topological Descriptors

Topological properties are useful for global description of regions in the image plane. Simply defined, topology is the study of properties of a figure that are unaffected by any deformation, as long as, there is no tearing or joining of the figure.(sometimes these are called rubbersheet distortions)

The fig 1 shows a region with two holes. Thus if a topological descriptor is defined by the number of holes in the region.

This property will not be affected by a stretching or rotation transformation. In general the number of holes will change if the region is form or folded.

Topological properties do not depend on the notion of distance or any properties implicitly based on the concept of a distance measure.

Another topological property useful for region description is the number of connected components.

2. (b)

Fig. 1 : A region with two holes

Fig. 2 : A region with 3connected components

Vidyala

nkar



3. Texture An important approach to region description is to quantify its texture content. Texture is

descriptor that provides measures of properties such as smoothness, coarness & regularity.

The three principle approaches used in image processing to describe the texture of a region are statistical, structural & spectral.

Statistical approaches yield characterization of textures as smooth, coarge, graing & so on. Structural technique deal with the arrangement of image primitives, such as the description of texture based on regularly spaced parallel lines.

Spectral technique are based on properties of the Fourier spectrum & are used primarily to detect global periodicity in an image by identifying high energy, narrow peaks in the spectrum.

Statistical Approach

One of the simplest approaches for describing texture is to use statistical moment of graylevel histogram of an image.

The nth moment of a random variable z denoting gray levels about mean is

n(z) = L 1

n

i ii 0

z m p z

…(1)

where p(zi) = histogram function, L = no. of distinct gray levels m = mean value of z i.e gray levels

= L 1

i ii 0

z p(z )

…(2)

The second moment (variance 2(z) = 2(z)) is of importance in texture description. It is a measure of gray level contrast that can be used to establish descriptors of relative smoothness.

The 3rd moment

3(z) = L 1

3

i ii 1

z m p(z )

…(3)

is a measure of skewness of the histogram while fourth moment [4(z)] is a measure of its relative flatness.

The fifth & higher moments are not so easily related to histogram shape, but the they do provide further quantitative discrimination of texture content.

Some useful additional texture measures based on histogram include a measure of “uniformity” given by

= L 1

2i

i 0

p (z )

…(4)

and an avg. entropy measure, which the reader might recall from basic information theory, is defined as,

e = L 1

i Z ii 0

p(z ) log p(z )

…(5)

Because the p’s have values in the range [0, 1] & their sum equals 1. Measures of texture computed using only histogram suffer from the limitation that they

carry no information regarding the relative position of pixel w.r.t. each other. One way to bring this type of information into the texture analysis process is to consider

not only the distribution of intensities, but also the position of pixels with equal or nearly equal intensity values.

4. Moments of Two dimensional Functions : For 2D continuous function f(x, y), the moments of order (p +q) is defined as

p qpqm x y f (x, y)dx dy

…(6)

Vidyala

nkar



for p, q = 0, 1, 2, …, A uniqueness theorem (Papoulis[1991]) states that if f(x, y) is piecewise continuous and has nonzero values only in a finite part of the xy=plane, moments of all orders exist, and the moment sequence (mpq) is uniquely determined by f(x, y). Conversely, (mpq) uniquely determines f(x, y).

The central moments are defined as

p q

pq x x y y f x, y dx dy

(i)

1

1

1

1

X (0) 1 1 1 1 1 5

X (1) 1 j 1 j 2 2 3j

X (2) 1 1 1 1 3 3

X (3) 1 j 1 j 1 2 3j

1 2 3N N N N1, j, 1, j

x2(n) = x1(n + 2)

(ii) We have to find out DFT of x1(–n + 2) i.e., to find out DFT of x1{–(n–2)}. x1(n) X1(k). Rotate in by 2 unit x(n – 2) and then find it to get x{–(n – 2)}. x2(n) = x1(n – 2)

fold

We have to find DFT of x(–n – 2) = x {–(n + 2)} first rotate by 2 units and then fold.

3. (a)

x1(n) 3

–1

1

2

x1(k) 3 5

–2–3j

–2+3j

x2(n) 1

2

3

–1

x2(k) X2(2)=3

x2(1)= (–2–3j)(–1)

X2(0)=5

x2(3)= (–2+3j)(–1)

multiply by = N

–kl = N

–2k

x1(n) 3

–1

1

2

X1(k)3

–2+3j

5

–2–3j

x2(n) 1

2

3

–1

X2(k)3

(–2+3j)(–1)=2–3j

5

2+3j

2+3j

3 5

2–3jVidyala

nkar



x1(n) = x(n+ 2)

Compare of Image Transforms

DFT/unitary DFT

Fast transform, most useful in digital signal processing, convolution, digital filtering, analysis of circulant and Toeplitz systems. Requires complex arithmetic. Has very good energy compaction for images.

Cosine Fast transform, requires real operations, near optimal substitute for the KL transform of highly correlated images. Useful in designing transform coders and Wiener filters for images. Has excellent energy compaction for images.

Sine About twice as fast as the fast cosine transform, symmetric, requires real operations; yields fast KL transform algorithm which yields recursive block processing algorithms, for coding, filtering, and so on; useful in estimating performance bounds of many image processing problems. Energy compaction for images is very good.

Hadamard Faster than sinusoidal transforms, since no multiplications are required; useful in digital hardware implementations of image processing algorithms. Easy to simulate but difficult to analyze. Applications in image data compression, filtering, and design of codes. Has good energy compaction for images.

Haar Very fast transform. Useful in feature extraction, image coding, and image analysis problems. Energy compaction is fair.

Slant Fast transform. Has “imagelike basis”; useful in image coding. Has very good energy compaction for images.

Karhunen-Loeve

Is optimal in many ways; has no fast algorithm; useful in performance evaluation and for finding performance bounds. Useful for small size vectors e.g., color multispectral or other feature vectors. Has the best energy compaction in the mean square sense over an ensemble.

Fast KL Useful for designing fast, recursiveblock processing techniques, including adaptive techniques. Its performance is better than independent blockbyblock processing techniques.

Sinusoidal transforms

Many members have fast implementation, useful in finding practical substitutes for the KL transform, analysis of Toeplitz systems, mathematical modeling of signals. Energy compaction for the optimumfast transform is excellent.

SVD transform

Best energypacking efficiency for any given image. Varies drastically from image to image; has no fast algorithm or a reasonable fast transform substitute; useful in design of separable FIR filters, finding least squares and minimum norm solutions of linear equations, finding rank of large matrices, and so on. Potential image processing applications are in image restoration, power spectrum estimation and data compression.

(i) Linear Transformation

Image Negatives : The negative of an image with gray levels in the range [0, L1] is obtained by using the

negative transformations shown in figure 1 which is given by the expression. s = L 1 r

3. (b)

4. (a)

2+3j

3 5

2–3j

X1(–k) x(n+2) 1

2

3

–1

X1(k3

2–3j

5

2+3j

Vidyala

nkar



Reversing the intensity levels of an image in this manner produces the equivalent of a photographic negative. This type of processing is particularly suited for enhancing white or gray detail embedded in dark regions of an image, especially when the black areas are dominant in size.

(ii) Log transformations The general form of log transformation is

s = c log (1+ r) … (1)

where c is a constant, and it is assumed that r 0. The shape of the log curve, as shown in figure 1,

shows that this transformation maps a narrow range of low graylevel values in the input image into a wider range of output levels.

The opposite is true of higher values of input levels. We use this type of transformation to expand the values of dark pixels in an image while compressing the higherlevels values. The opposite is true of the inverse log transformation.

Any curve having the general shape of the log functions shown in figure 1 would accomplish this spreading / compressing of gray levels in an image. However, the log function has the important characteristics that it compresses the dynamic range of images, with large variations in pixel values.

A classic illustration of an application in which pixel values have a large dynamic range is the Fourier spectrum, where spectrum value range from 0 to 106 or higher.

Image display system, generally, will not be able to reproduce faithfully such a wide range of intensity values. The net effect is that, a significant degree of detail will be lost in the display of a typical Fourier spectrum.

Figure below shows a Fourier spectrum with values in the range 0 to 1.5 106. When these values are scaled linearly for display in an 8 bit system, the brightest pixels

will dominate the display at the expense of lower values of spectrum. It is shown in figure 2 below.

Fig.2 : (a) Fourier spectrum (b) Result of applying the log transformation given in eqn.(1) with c = 1

(a)

s

L – 1

L – 1 r0

Fig. 1

s = c log (1 + r)

r0Fig. 1

Vidyala

nkar



We can observe that relatively small area of the image in fig 2(a) is not perceived as black. If instead of displaying the values in this manner, we first apply Log transform equation (with c = 1 in this case) to the spectrum values, then the range of values of result become 0 to 6.2, a more manageable number.

The figure 2(b) shows the resulting image after scaling this new range linearly and displaying the spectrum in the same 8 bit display. We can see that more details of the image are now visible.

(iii) PowerLaw Transformations PowerLaw transformation have the basic form

s = cr where c and are positive constants. Sometimes above equation is written as s = c(r+ ) to account for an offset (i.e. a measurable O/P when the input is zero). Plot of s Vs r for various values of are shown in the figure 3.

From above figure, we can note that, unlike log function, here, a family of possible transformation curves obtained simply by varying . (a) The curves generated with values > 1 have exactly the opposite effect as those

generated with values of < 1. (b) Equation s = c r reduces to the identity transformation when c = = 1.

A variety of devices used for image capture, printing, and display respond according to a power law.

In power law transformation equation s = cr the exponent ‘’ is called as “gamma” & hence the transformation is often known as “gamma correction”.

With reference to the curve for = 2.5 in figure 4, we see that such display systems would tend to produce images that are darker than intended. This effect is illustrated in figure 4 above.

Figure 4(a) shows a simple grey scale linear wedge input into a CRT monitor. As expected, the output of the monitor appears darker than the input, as shown in figure 4(b).

When gamma correction as s = r1/2.5 = r0.4 is performed & then gamma corrected image is given as input the it produces an output that is close in appearance to the original image as shown in figure 4(d).

Fig. 3 : Plots of the equation s = cr for various values of (c = 1 in all cases).

Vidyala

nkar



Gamma correction is important if displaying an image accurately on a computer screen is of concern. Images that are not corrected properly can look either bleached out, or too dark.

In addition to gamma correction, powerlaw transformations are useful for general purpose contrast manipulation.

The original image is a digital mammogram showing a small lesion. In spite of the fact that visual content is the same in both images, we can note that, it is more easier to analyze the breast tissue in the negative image than the original image.

Use of Second Derivate for Enhancement The Laplacian Here, simple isotropic derivative operator called Laplacian is used. It is defined as

2 2

22 2

f ff

x y

… (1)

where f(x, y) is a function (image) of two variables. Since derivatives of any order are linear operations, the Laplacian is a linear operator.

Now, the secondderivative in xdirection is given by

2

2

ff (x 1, y) f (x 1, y) 2f (x, y)

x

… (2)

Similarly, second derivative in y direction is given by:

2

2

ff (x, y 1) f (x, y 1) 2f (x, y)

y

… (3)

The digital implementation of two dimensional Laplacian in eqn (1) is obtained by summing these two components:

2f f (x 1, y) f (x 1, y) f (x, y 1) f (x, y 1) 4f (x, y) … (4)

4. (b)

Fig. 4 : (a) Linear wedge grayscale image. (b) Response of monitor to linear wedge (c) Gamma corrected wedge (d) Output of monitor

(a) (b)

(c) (d)

Vidyala

nkar



This eqn can be implemented using the following mask: 0 1 0 1 4 1 0 1 0

The above mask gives an isotropic result for rotations in increments of 90 . Since each diagonal term also contains a 2f(x, y) term, the total subtracted from the difference

terms now would be 8f(x, y). The mask used to implement this new definition is as follows: 1 1 1 1 8 1 1 1 1

This mask yields isotropic results for increments of 45. Following two masks are also frequently used. They are based on a definition of the laplacian

that is negative of the one used above.

0 1 0 1 1 1 1 4 1 1 8 1 0 1 0 1 1 1

As Laplacian is a derivative operator, its use highlights graylevel discontinuities in an image and deemphasizes regions with slowly varying gray levels. This will tend to produce images that have grayish edge lines and other discontinuities, all superimposed on a dark, featureless background.

Background features can be “Recovered” while still preserving the sharpening effect of the Laplacian operation simply by adding the original and Laplacian images. If Laplacian definition used has a negative center coefficient, then we subtract, rather than add, the Laplacian image to obtain a sharpened result.

Thus, the basic way in which we are the Laplacian for image enhancement is as follows: 2

2f (x y) f (x, y) ...(i)

g(x, y)f (x, y) f (x, y) ...(ii)

… (5)

eqn (5)(i) is used if the center coefficient of the Laplacian mask is negative. eqn (5)(ii) is used if the center coefficient of the Laplacian mask is positive.

Detection of discontinuities: There are three basic types of discontinuities in a digital image viz

(i) Points (ii) Lines (iii) Edges. The most common way to look for a discontinuity is to run a mask through the image. For the

3 3 mask shown in the figure 1, this procedure involves computing the sum of products of the coefficients with the gray levels contained in the region encompassed by the mask

W1 W2 W3

W4 W5 W6

W7 W8 W9

Fig. 1: A general 3 3 mask

The response of the mask at point in the image is given by R = W1Z1 + W2Z2 + … + WgZg

= g

i ii 1

W Z …(1)

where Zi is the gray level of the pixel associated with mask coefficient Wi.

(1) Point Detection Using mask shown in fig (1), a point in an image is detected at the location on which

the mask is centered if R T …(2)

5. (a)

Vidyala

nkar



where T = nonnegative threshold & R = g

i ii 1

W Z

This formulation measures the weighted differences between the centerpoint and its neighbors. The idea is that an isolated point (a point whose gray level is significantly different from its background and which is located in a homogeneous or nearly homogeneous area) will be quite different from its surroundings, and thus be easily detectable by this type of mask.

Note that the mask coefficient sums to zero, indicating that the mask response will be zero in areas of constant gray level.

A commonly used mask for point detection is as follows:

1 1 1

1 8 1

1 1 1

(2) Line detection Consider the masks shown in fig below. These are called as Line masks. 1 1 1 1 1 2 1 2 1 2 1 1 2 2 2 1 2 1 1 2 1 1 2 1 1 1 1 2 1 1 1 2 1 1 1 2

Horizontal + 45o Vertical 45o

If the 1st mask was moved around an image, it would respond more strongly to lines (one pixel thick) oriented horizontally with a constant background, the max. response would result when the lines passed through the middle row of the mask.

This is easily verified by sketching a simple array of 1’s with a line of different gray level (say, 5’s) running horizontally through the array.

A similar experiment would reveal that the 2nd mask fig (2) responds best to lines oriented at + 45, the 3rd mask to vertical lines & the 4th mask to 45 direction lines.

Let R1, R2, R3 and R4 denote the responses of the masks in fig (2), from left to right, where the R’s are given by equation (1). Suppose that the four masks are run individually

through an image. If, at a certain point in the image, i jR R , for all j i, that point is

said to be more likely associated with a line in the direction of mask i.

For example, if at a point in the image, 1 jR R for j = 2, 3, 4 that particular point is

said to be more likely associated with a horizontal line. If we are interested in detecting all the lines in an image in the direction defined by a given

mask, we simply run the mask through the image and threshold the absolute value of the result. The points that are left are the strongest responses, which, for lines one pixel thick,

correspond to the direction defined by the mask.

(3) Edge Detection For the detection of edges in an image first and second order derivatives can be implemented.

Basic Formulations An edge is a set of connected pixels that lie on the boundary between two regions. A

reasonable definition of “edge” requires the ability to measure graylevel transitions. An ideal edge according to this model is a set of connected pixels, each of which is located at an orthogonal step transition in gray level.

The image processing operations such as sampling and image acquisition yield edges that are blurred. Edges are more closely modeled as having a “ramp like” profile, such as shown in fig 3(b) below:

Fig. 2

Vidyala

nkar



The slope of the ramp is inversely proportional to the degree of blurring in the edge. The “thickness” of the edge is determined by the length of the ramp, as it transitions from an

initial to a final gray level. This length is determined by the slope, which, inturn, is determined by the degree of blurring. Thus, blurred edges tend to be thick and sharp edges tend to be thin.

Consider the following figure : Region Splitting & merging In case of region growing, the regions are grown from a set of seed points. An alternative is

to subdivide an image initially into a set of arbitrary, disjointed regions and then merge and/or split the regions in an attempt to satisfy the condition s of region based segmentation.

A split and merge algorithm that iteratively works toward satisfying these constraints is developed below:

Let R represent the entire image region & select a predicate P. One approach for segmenting R is to subdivide it successively into smaller & smaller quadrant regions so that, for any regions Ri, P(Ri) = TRUE.

Starting with the entire region, if P( R ) = FALSE, divide the image into quadrants. If P is FALSE for any quadrant, subdivide that quadrant into subquadrants & so on.

This particular splitting technique has a convenient representation in the form of a socalled quadtree (i.e. a tree with each node having exactly four descendants) as illustrated in figure below :

Fig. 4 : (a) Two regions separated by a vertical edge. (b) Detail near the edge, showing a gray level profile, and the first and second derivatives of the profile

First derivative

Second derivative

(a)

(b)

Model of an ideal digital edge Model of an ramp digital edge

Graylevel profile of a horizontal line through the

i

Graylevel profile of a horizontal line through the image

(a) (b) Fig. 3

5. (b)

Vidyala

nkar



R1 R2

R3

R41 R42

R43 R44

If only splitting were used, the final partition likely would contain adjacent regions with

identical properties. This drawback may be remedied by allowing merging as well as splitting. Satisfying the constraints of Regionbased segmentation requires merging only adjacent regions whose combined pixels satisfy the predicate P.

That is, two adjacent regions Rj & Rk are merged only if j kP(R R ) = TRUE.

The algorithm can be summarized by the following procedure Step 1 : Split into four disjoint quadrants any region Ri for which P(Ri) = FALSE Step 2 : Merge any adjacent regions Rj and Rk for which P(Rj RK) = TRUE Step 3 : Stop when no further merging or splitting is possible.

Several variations of the preceding basic theme are possible: (a) To split the image initially into a set of blocks (b) Merging can initially limited to group of four blocks that are descendants in the quadtree

representation & that satisfy the predicate P. (c) When no further merging of this type are possible, the procedure is terminated by one

final merging of region satisfying step 2. The principal advantage of this approach is that it uses the same quadtree for splitting &

merging until the final merging step.

(i) Coding Redundancy We have,

kr k

np (r ) k 0,1,2,...,L 1

n …(1)

where, rk = Discrete Random variable in the interval [0, 1] representing gray level of an image.

pr(rk) = Probability of occurrence of each rk. L = Number of gray levels. n k = Number of times that kth gray level appears in the image. n = total number of pixels in the image.

If number of bits used to represent each value of rk is (rk), then the average number of bits required to represent each pixel is,

Lavg = L 1

k r kk 0

(r )p (r )

…(2)

That is, the average length of the code words assigned to the various graylevel values is found by summing the product of the number of bits used to represent each gray level and the probability that the gray level occurs. Thus the total number of bits required to code an M N images in MNLavg.

Representing the gray levels of an image with a natural mbit binary code reduces the righthand side of eqn.(2) to m bits. That is, Lavg = m when m is substituted for (rk). Then the constant m may be taken outside the summation, leaving only the sum of the Pr (rk) for 0 k L1, which equals to 1.

The process of assigning fewer bits to the more probable gray level than to the less probable ones is referred to as variablelength coding of data compression.

(a) Partitioned image

R

R1 R2 R3

R4

R41 R42 R43

R44

(b) corresponding quadtree

Fig. 1

6. (a)

Vidyala

nkar



If gray level of an image are coded in a way that uses more code symbol than necessary to represent each gray level, the resulting image is said to contain ‘coding Redundancy’.

In general coding redundancy is present when the codes assigned to a set of events (such as gray level values) have not been selected to take full advantage of the probabilities of the events.

It is almost always present when an image’s gray levels are represented with a straight or natural binary code.

The basis for the coding redundancy is that images are typically composed of objects that have a regular and somewhat predictable morphology (shape) & reflectance and are generally sampled so that the objects being depicted are much larger than the picture elements.

(ii) Psychovisual Redundancy

The brightness of a region, as perceived by the eye depends on factors other than simply the light reflected by the region. E.g., intensity variations (Match bands) can be perceived in an area of constant intensity. Such phenomena results from the fact that the eye does not respond with equal sensitivity to all visual information.

Certain information simply has less relative importance than other information in normal visual processing. This information is said to be “psychovisually redundant”.

It can be eliminated without significantly impairing the quality of image perception. Unlike coding & interpixel redundancy, psychovisual redundancy is associated with real or quantifiable visual information.

Its elimination is possible only because the information itself is not essential for normal visual processing. Since the elimination of psychovisually redundant data results in a loss of quantitative information, it is commonly referred to as quantization. Quantization results in lossy data compression.

In quantization procedure, the false contouring can be greatly reduced at the expense of some additional but less objectionable graininess. The method used to produce this result is known as improved grayscale (IGS) quantization.

It recognizes the eye’s inherent sensitivity to edges & breaks them up by adding to each pixel, a pseudorandom number (which is generated from the loworder bits that are fairly random).

This amounts to adding a level of randomness, which depends upon the local characteristics of an image, to the artificial edges normally associated with false contouring.

Huffman coding The most popular technique for removing coding redundancy is due to Huffman. When

coding the symbol of an information source individually, Huffman coding yields the smallest possible number of code symbols per source symbol.

In terms of noiseless coding theorem, the resulting code is optimal for a fixed value of n, subject to the constraints that the source symbols be coded one at a time.

The first step in Huffman’s approach is to create a series of source reduction by ordering the probabilities of the symbols under consideration and combining the lowest probability symbols into a single symbol that replaces them in the next source reduction.

6. (b)

Vidyala

nkar



Figure below illustrates this process for binary coding

At the far left, x hypothetical set of source symbol and their probabilities are ordered from top

to bottom in terms of decreasing probability values. To form first source reduction, the bottom two probabilities 0.06 & 0.04 are combined to form a “compound symbol” with probability 0.1.

This compound symbol and its associated probability are replaced in the first source reduction column so that the probabilities of the reduced source are also ordered from the most to the least probable. This process is then repeated until a reduced source with two symbols (at far right) is reached.

The 2nd step in Huffman’s procedure is to code each reduced source, starting with the smallest source and working back to original source. The minimal length binary code for a twosymbol source is the symbols 0 & 1. figure (2) below show Huffman code assignment procedure.

Original source Source reduction Symbol Prob. Code 1 2 3 4

a2 0.4 1 0.4 1 0.4 1 0.4 1 0.6 0 a6 0.3 00 0.3 00 0.3 00 0.3 00 0.4 1 a1 0.1 011 0.1 011 0.2 010 0.3 01 a4 0.1 0100 0.1 0100 0.1 011 a3 0.06 01010 0.1 0101 a5 0.04 01011

As shown in the fig, theses symbols are assigned to the two symbols on the right. As the reduced

source symbol with probability 0.6 was generated by combining two symbols in the reduced source to its left, the 0 used to code it is now assigned to both of these symbols, and a 0 & 1 are arbitrarily appended to each to distinguish them from each other.

This operation is then repeated for each reduced source until the original source is reached. The final code appears at the far left in fig 4. The avg. length of this code is,

Lavg. = (0.4) (1) + (0.3) (2) + (0.1) (4) + (0.06) (5) + (0.04) (5) = 2.2 bits / symbol & the entropy of the source is 2.14 bit / symbol, the resulting Huffman code efficiency is 0.973.

The code itself is an instantaneous uniquely decodable block code. It is called a block code because each source symbol is mapped into a fixed sequence of code symbol can be decoded in only one way.

Thus any string of Huffman encoded symbol can be decoded by examining the individual symbols of the string in a left to right manner.

For the binary code of fig 4, a lefttoright scan of the encoded string 010100111100 reveals that the first valid code word is 01010, which is the code for symbol a3.

The next valid code is 011, which corresponds to symbol a1. Continuing in this manner reveals the completely decoded message to be a3a1a2a2a6.

Original source Source reduction Symbol Probability 1 2 3 4

a2 0.4 0.4 0.4 0.4 0.6 a6 0.3 0.3 0.3 0.3 0.4 a1

0.1 0.1

0.2 0.3

a4 0.1 0.1 0.1 a3 0.06 0.1 a5 0.04

Fig 1. Huffman Source Reduction

Fig. 2 : Huffman code assignment

Vidyala

nkar



Other near Optimal Variable Length codes When a large number of symbols are to be coded, the construction of the optimal binary

Huffman code is nontrivial task. For general case of J source symbol, J2 source reductions must be performed & J2 code

assignments made. Thus construction of the optimal Huffman code for an image 256 gray level requires 254 source reductions & 254 code assignments.

MPEG (Motion Picture Experts Group) Multimedia video compression standards for video on demand, digital HDTV broadcasting,

and image/ video database services use similar motion estimation and coding techniques. The principal standards MPEG 1, MPEG 2 and MPEG 4 were developed under the

auspices of the Motion Picture Experts Group of the CCITT and ISO. MPEG 1 is an “entertainment quality” coding standard for the storage and retrieval of video

on digital media like compact disk readonly memories (CD ROM). It supports bit rates on the order of 1.5 M bit/s.

MPEG 2 addresses applications involving video quality between NTSC / PAL and CCIR 601; bit rates from 2 to 10 M bit / s, a range that is suitable for cable TV distribution and narrowchannel satellite broadcasting are supported.

The goal of both MPEG 1 and MPEG 2 is to make the storage and transmission of digital audio and video (AV) material efficient.

MPEG 4, on the other hand provides (1) improved video compression efficiency; (2) context based interactivity, such as AV objectbased access and efficient integration of natural and synthetic data and (3) universal access including increased robustness in errorprone environments, the ability to add or drop AV objects and object resolution scalability.

Although these functionalities create a need to segment arbitrarily shaped video objects, segmentation is not a part of the standard.

A great deal of video content like computer games is produced and readily available in the form of video objects.

MPEG 4 targets bit rates between 5 and 64 k bit / s for mobile and public switched telephone network (PSTN) applications and upto 4 M bit /s for TV and film applications. In addition, it supports both constant bit rate and variable bit rate coding.

Like the ITU teleconferencing standards, MPEG standards are built around a hybrid blockbased DPCM / DCT coding scheme.

Fig. 1 shows a typical MPEG encoder. It exploits redundancies within and between adjacent video frames, motion uniformity between frames and the psychovisual properties of the human visual system.

The input of the encoder is an 8 8 array of pixels called an image block. The standards define a macroblock as a 2 2 array of image blocks (i. e, a 16 16 array of image elements) and a slice as a now of nonoverlaping macroblocks.

For colour video, a macroblock is composed of four luminance blocks denoted Y1 through Y4 and two chrominance blocks, Cb and Cr.

Color difference signal Cb is blue minus luminance and Cr is red minus luminance. Because the eye has far less spatial acuity for color then for luminance, these two components are often sampled at half the horizontal and vertical resolution of the luminance signal resulting in a 4:1:1 sampling ratio between y : Cb : Cr.

The grayed elements of the primary input-to-output path in fig. 13 parallel the transformation quantization and variable length coding operations of a JPEG encoder.

The principal difference is the input, which may be a conventional block of image data or the difference between a conventional block and a prediction of it based on similar blocks in previous and /or subsequent video frames.

7. (a)

Vidyala

nkar



This leads to three basic types of encoded output frames: 1. Intraframe or independent frame (I frame): An I frame is compressed independently

of all previous and future video frames, of the three possible encoded output frames, it most highly resemble a JPEG encoded image. Moreover, it is the reference point for the motion estimation needed to generate subsequent P and B frames. I frames provide the highest degree of random access, ease of editing and greatest resistance to the propagation of transmission error. As a result all standards require their periodic insertion into the compressed codestream.

2. Predictive frame (P frame): A Pframe is the compressed difference between the current frame and a prediction of it based on the previous I or P frame. The difference is formed in the leftmost summer of fig. 13. The prediction is motion compensated and typically involves sliding the decoded block in the lower part of fig.13. around its immediate neighbourhood in the current frame and computing a measure of correlation (such as the sum of the square of the pixelbypixel differences). Infact, the process is often carried out in subpixel increments (such as sliding the sub image (1/4) pixels at a time), which necessitates interpolating pixel values prior to computing the correlation measure. The computed motion vector is variable length coded and transmitted as an integral part of the encoded data stream. Motion estimation is carried out on the macroblock level.

3. Bidirectional frame (B frame): A B frame is the compressed difference between the current frame and a prediction of it based on the previous I or Pframe and next Pframe. Accordingly, the decoder must have access to both past and future reference frames. The encoded frames are therefore recorded before transmission; the decoder reconstructs and displays them in the proper sequence. The encoder of fig. 1 is designed to generate a bit stream that matches the capacity of the

intended video channel. To accomplish this, the quantization parameters are adjusted by the rate controller as a

function of the occupancy of the output buffer. As the buffer becomes fuller, the quantization is made coarser, so that fewer bits stream

into the buffer.

Rate controller

buffer Variable

lengthcoding quantizerDCT + _

Inverse quantizer

Inverse DCT

Prediction block

Motion estimator & compensator w/frame delay

Variable length coding

Encoded motion vectorDecoded

block

Image block

Difference block

Encoded block

Fig. 1

Vidyala

nkar



Dilation With A & B as sets in Z2, the dilation of A by B, denoted A B, defined as

A B = zˆZ (B) A … (1)

This equation is based on obtaining the reflection of B about its origin and shifting this reflection by z. The dilation of A by B then is the set of all displacements, z, such that B and A overlap by at least one element.

Based on this interpretation, Eqn. (1) may be rewritten as

A B = zˆZ [(B) A] A … (2)

Set B is referred to as the structuring element in dilation as well as in other morphological operations.

The basic process is of “flipping” B about its origin and then successively displacing it so that is slides over set (image) A.

Figure 1(a) shows a simple set & figure 1(b) shows a structuring element & its reflection. In this case, structuring element & its reflection are equal because B is symmetric with respect to its origin.

The dashed line in figure 1(c) shows the original set for reference, & the solid line shows the

limit beyond which any further displacements of the origin of B̂ by Z would cause the

intersection of B̂ and A to be empty. Therefore all points inside the boundary constitute the dilation of A by B.

Figure 1(d) shows a structuring element designed to achieve more dilation designed to achieve more dilation vertically than horizontally. Figure 1(e) shows the dilation achieved with this element.

Application : One of the simplest applications of dilation is for bridging gaps. Erosion For set A & B in Z2, the erosion by B, denoted A B , is defined as,

A B = { Z/ (B)2 A } … (3)

This equation indicates that the erosion of A by B is the set of all points z such that B, translated by z is contained in A.

d

d

A

(a) Set A

d/4

d/4

B̂ B

(b) square structuring element (dot is the center)

A B d

d/8 d/8

(c) Dilation of A by B, shown shaded

d/4

d

B̂ B

(d) Elongated structuring element

d/2 A B

d/2

d

d/8 d/8 d

(e) Dilation of A using this element Fig. 1

7. (b)

Vidyala

nkar



Figure below shown the process of erosion.

Set A is shown as dashed line for reference in figure 2(c). The boundary of the shaded region shows the limit beyond which further displacement of origin of V would cause this set to cease being completely contained in A.

Thus, the locus of points within this boundary (i.e. shaded region) constitutes the erosion of A by B. Figure 2(d) shows an elongated structuring element & figure 2(e) shows the erosion of a by this element.

Thus, in erosion the origin of structuring element, ‘B’ moves within the boundary of set ‘A’ without touching the boundary.

Application : One of the simplest uses of erosion is for eliminating irrelevant detail (in terms of size) from a binary image. Dilation & erosion are duals of each other with respect to set complementation &

reflection.

i.e. (A B)C = AC B̂

It can be proved as follows: Proof : From definition of erosion, we have

(A B )C = CzZ (B) A

If set (B)Z is contained in set A, then (B)Z AC = then above equation becomes,

( A B )C = CCZZ (B) A

But the complement of the set of z’s is such that (B)Z AC . Thus

( A B )C = CZZ (B) A

(A B )C = AC B̂ Hence Proved.

Fig. 2 : (a) Set A, (b) Square structuring element, (c) Erosion of A by B, Shown shaded, (d) Elongated structuring element, (e) Erosion of A using this element.

(a) (b) (c)

(d) (e)

Vidyala

nkar

prelim question paper solution...

Documents