error detection, correction and encryption: keeping data...

32
Error Detection, Correction and Encryption: Keeping Data Clean and Secure 3 February 2014 Error Detection & Correction 3 February 2014 1/33

Upload: others

Post on 19-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Error Detection, Correction and Encryption:Keeping Data Clean and Secure

3 February 2014

Error Detection & Correction 3 February 2014 1/33

Page 2: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Error Detection

If you tell somebody your phone number and they write down the numberincorrectly, then they will not be able to call you. Such mistakes are easyto make. Are there ways to encode information that allows for errors to bedetected? Even better, are there ways to encode information so that errorswhich are detected can be corrected?

Error Detection & Correction 3 February 2014 2/33

Page 3: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The UPC

Error Detection & Correction 3 February 2014 3/33

Page 4: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Commercial products, such as grocery store items, are identified witha universal product code, or UPC for short. This 12 digit numericalcode uniquely identifies an item. It appears on the item both as anumber and as a bar code.

The first time a UPC was used was in 1974.

The first string of five digits identify the manufacturer, and the secondstring of digits identify the item. The last digit is the one we willfocus on. It is called the check digit. Its purpose is to check for errors.

Error Detection & Correction 3 February 2014 4/33

Page 5: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

When a bar code is scanned, a calculation is done to see if itrepresents a valid UPC.

To see if a sequence is valid the following computation is done:Multiply the odd-numbered digits by 3, and the even-numbered digitsby 1. Add up the resulting numbers. If the sum is evenly divisible by10, then the sequence is a valid UPC.

For example, if we check 6 71860 01332 7, we will get

weights 3 1 3 1 3 1 3 1 3 1 3 1digits 6 7 1 8 6 0 0 1 3 3 2 7 sum

products 18 7 3 8 18 0 0 1 9 3 6 7 80

So, the number is valid.

Error Detection & Correction 3 February 2014 5/33

Page 6: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

If we check 6 18918 14147 3, we will get a sum of 107, which tells usthe number is not valid. The problem is, we don’t know what shouldbe the correct number. The only thing we can do is rescan the item.This isn’t a big problem; grocery clerks rescan all the time.

By creating UPCs with the check digit, if a single digit is readincorrectly, the scanner will detect that an error has been made.Roughly, the reason is that the difference between the correct digitand the read digit is nonzero, and multiplying it by 1 or 3 results in anumber not divisible by 10. Adding or subtracting this from amultiple of 10 results in a number which is not a multiple of 10.

Error Detection & Correction 3 February 2014 6/33

Page 7: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Clicker Question

The first 11 digits of a UPC are 1 82338 00001. What is the check digit?Remember that you multiply the digits alternatively by 3 and 1, add thetotal, and the sum should be evenly divisible by 10.

Enter the digit on your clicker and hit send.

Error Detection & Correction 3 February 2014 7/33

Page 8: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Answer

weights 3 1 3 1 3 1 3 1 3 1 3 1digits 1 8 2 3 3 8 0 0 0 0 1 ? sum

products 3 8 6 3 9 8 0 0 0 0 3 ? 40+?

The sum is 40+? and must be evenly divisible by 10. The only choicefor ? is then 0. So, the correct UPC is 1 82338 00001 0.

Error Detection & Correction 3 February 2014 8/33

Page 9: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Zip Codes

The bar code at the bottom of the mailing card identifies the zip code.

Error Detection & Correction 3 February 2014 9/33

Page 10: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Zip Codes

The bar code corresponding to a zip code represents a 10 digitnumber. The first 9 digits form the zip code, and the last digit is acheck digit.

A 10-digit code represent a valid zip code and check digit if when the10 digits are added, the result is a multiple of 10.

For example, the code on the bottom of the picture we saw earlierrepresents 13215-5523-3. The check digit is 3, which you can’t tellwithout knowing how to read the barcode. But, we can check this isvalid, by noting that

1 + 3 + 2 + 1 + 5 + 5 + 5 + 2 + 3 + 3 = 30

which is evenly divisible by 10.

Error Detection & Correction 3 February 2014 10/33

Page 11: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

How do you Find the Check Digit for the Zip Code?

The main NMSU zip code is 88003-8001. What would the check digit be?If we temporarily call it x , then we must have

8 + 8 + 0 + 0 + 3 + 8 + 0 + 0 + 1 + x

is a multiple of 10. But, the sum is 28 + x . The only options for x are0, 1, . . . , 9. The only choice that gives us a multiple of 10 is x = 2, whichmakes the sum 30.

Error Detection & Correction 3 February 2014 11/33

Page 12: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Clicker Question

Milagro Coffee has zip code 88001-5780. What is the check digit forthis zip code?

Make sure you hit send after entering your number.

The answer is 3 since 8 + 8 + 0 + 0 + 1 + 5 + 7 + 8 + 0 = 37, and wemust add 3 to make the result a multiple of 10.

Error Detection & Correction 3 February 2014 12/33

Page 13: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The ISBN

ISBNs, or International Standard Book Numbers, identify books.

An example of an ISBN-10 is 0-387-94753-1. The first digit refers tothe language of the book (0 = English). The second block refers tothe publisher, the third block to the book itself, and the last digit isthe check digit.

Error Detection & Correction 3 February 2014 13/33

Page 14: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The check digit has the same purpose as for the UPC, to allow forsingle errors to be detected.

What mathematics is behind the UPC and other identificationnumber schemes?

Error Detection & Correction 3 February 2014 14/33

Page 15: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Clock, or Modular, Arithmetic

4 + 10 = 14

4 + 10 = 14 ≡ 2 (mod 12)

This is read 4 plus 10 is equivalent to 2 modulo 12. To do modulararithmetic, we perform an operation and throw away multiples of 12to get a number between 0 and 11.

5× 5 = 25 ≡ 1 (mod 12)

Error Detection & Correction 3 February 2014 15/33

Page 16: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

We can perform modular arithmetic (addition, multiplication, subtraction,sometimes division) with any modulus, not just 12. The UPC and ZipCode schemes use arithmetic modulo 10. The version of ISBN wementioned uses a modulus of 11, but a newer version, ISBN-13, uses amodulus of 10.

Error correction commonly uses arithmetic modulo 2.

Some methods of encryption uses arithmetic modulo a large number.

Error Detection & Correction 3 February 2014 16/33

Page 17: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Clicker Question

If we are doing modular arithmetic modulo 12, what is 10 + 7?

One way to think about this is to ask what time will it be 7 hours after10:00.

Error Detection & Correction 3 February 2014 17/33

Page 18: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Answer

Modulo 12, 10 + 7 = 5 because 10 + 7 = 17, and throwing away 12 resultsin 5.

In symbols,10 + 7 ≡ 5 (mod 12)

Error Detection & Correction 3 February 2014 18/33

Page 19: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

How can we find the check digit?

When a manufacturer produces an item and assigns a UPC, the firstchunk of the sequence is the code for the manufacturer. They thenselect the second chunk to identify the item. Finally, they have todetermine the check digit. How do they do this?

Suppose 0 25192 59452 x is to be a UPC. What should be the checkdigit? If we repeat the calculation to test validity, we get

weights 3 1 3 1 3 1 3 1 3 1 3 1digits 0 2 5 1 9 2 5 9 4 5 2 x sum

products 0 2 15 1 27 2 15 9 12 5 6 x 94 + x

Error Detection & Correction 3 February 2014 19/33

Page 20: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

We must choose x so that 94 + x is a multiple of 10. This can bephrased as an equation modulo 10:

94 + x ≡ 0 (mod 10)

We can solve this the same way we would solve regular equations,nearly.

First, 94 ≡ 4 (mod 10), so the equation reduces to4 + x ≡ 0 (mod 10).

Subtracting 4 from both sides, and realizing −4 ≡ 6 (mod 10) yieldsx ≡ 6 (mod 10).

Since x is a digit,x = 6.

Error Detection & Correction 3 February 2014 20/33

Page 21: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Error Correction

Sometimes error detection is not enough. Retransmitting informationmay be impossible or impractical.

A DVD player is one example. If the player must rescan part of thedisk, then the movie playback may be delayed.

Error Detection & Correction 3 February 2014 21/33

Page 22: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The Mariner 9 Mars Observer

In 1979 Mariner 9 took black and white photos of Mars.

Error Detection & Correction 3 February 2014 22/33

Page 23: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Each photo was a 600 by 600 grid of dots. Each dot on the grid wasassigned 1 of 64 shades of gray.

This, in order to transmit a single photo, the spacecraft had totransmit color information each of the 600× 600 = 360, 000 points.

The spacecraft transmitted the information electronically.Electromagnetic interference, such as from sunspot activity, couldcorrupt the data. NASA decided to incorporate error correction.

Error Detection & Correction 3 February 2014 23/33

Page 24: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The idea of error correction is to attach multiple check digits to astring of digits in a clever way. By doing this in appropriate ways, if acertain fraction of the digits are errors, the coding allows not only theerror to be detected, but the correct information to be gleaned fromthe transmission.

There are fairly complicated engineering issues in deciding how manyerrors can be corrected. The more error correction one wants, thelonger the sequence of digits must be used, which adds time and costto the encoding and decoding process.

For Mariner 9, sequences of 0 and 1 were used. There are enoughsequences of 6 digits to allow for 64 colors to be used, if no correctionis needed. However, NASA encoded colors with a sequence of 32digits, which allowed for correcting up to 7 errors per sequence.

Error Detection & Correction 3 February 2014 24/33

Page 25: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Voyager 1

The Voyager spacecrafts took color photographs of Jupiter and Saturnbetween 1979 and 1981. Photographs were made with 4096 colors. NASAchose to encode a color with a sequence of length 24. The code they usedallowed for correcting up to 3 errors in each sequence.

Error Detection & Correction 3 February 2014 25/33

Page 26: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The Hamming Code

We will look at the first example of an error correcting code, which todayis called the Hamming code. It was discovered independently by MarcelGolay in 1949 and Richard Hamming in 1950. Hamming was frustrated byerrors in reading punch cards, which led him to develop such codes.

Error Detection & Correction 3 February 2014 26/33

Page 27: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The Hamming code has 16 codewords, each made up of 7 digits. Thecode has the capacity to correct a single error in one of the 7 digits.It also has a simple decoding scheme. This is one important aspect ofdeveloping error correcting codes; it must be relatively easy or quickto correct errors.

There are two mathematical ideas that go into the design andimplementation of the Hamming code. The first is the use ofarithmetic modulo 2. With this, we can represent any number with 0or 1. The only real difference between arithmetic with 0, 1 andmodulo 2 arithmetic is that 1 + 1 = 0 in arithmetic modulo 2.

One way of thinking about modulo 2 arithmetic is to think of 1 asrepresenting odd and 0 as even. If you add two odd numbers, theresult is even (e.g., 3 + 5 = 8 and 1 + 9 = 10).

Error Detection & Correction 3 February 2014 27/33

Page 28: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The second idea is matrix arithmetic. A matrix is a rectangular arrayof numbers. One can do arithmetic (+,−,×, sometimes ÷) withmatrices of numbers, or matrices with modulo 2 entries. Theseobjects satisfy many of the same properties as usual arithmetic.

Error Detection & Correction 3 February 2014 28/33

Page 29: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

The Hamming matrix is a matrix with 3 rows and 7 columns. It is:

H =

0 0 0 1 1 1 10 1 1 0 0 1 11 0 1 0 1 0 1

The most important property of the Hamming matrix is that thecolumns of H represent all strings of 3 binary digits other than thestring of three 0s.

Error Detection & Correction 3 February 2014 29/33

Page 30: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

A string of 7 digits is a codeword in the Hamming code if it satisfiesthe equation Hx = 0.

If x is not a valid codeword, then Hx 6= 0.

When this occurs, the product will be equal to exactly one of thecolumns of H.

If x was obtained by making a single mistake in a codeword, changingthe i-th entry, then Hx is the i-th column of H.

Error Detection & Correction 3 February 2014 30/33

Page 31: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

For example, 0000000 is a valid codeword. If we make a mistake inthe 5th entry, and have x = 0000100, then Hx is the fifth column ofH. This tells us that by changing the fifth entry, we get a codeword.

More sophisticated codes have different decoding algorithms.Nevertheless, the key in building a useful code is to be able to correcterrors and to do it efficiently.

To build a good coding scheme, one needs to analyze the type oferrors likely to arise. DVDs commonly get large sequences of errorsresulting from scratches.

Error Detection & Correction 3 February 2014 31/33

Page 32: Error Detection, Correction and Encryption: Keeping Data ...sierra.nmsu.edu/morandi/oldwebpages... · We must choose x so that94 + x is a multiple of 10. This can be phrased as an

Next Time

We will discuss encryption on Wednesday, which is about keeping datasecret. Some methods of encryption, including the main one we’ll discuss,use modular arithmetic in a clever way.

Error Detection & Correction 3 February 2014 32/33