networks lab, rpi1 recent advances in error/erasure correcting and coding vijay subramanian

Networks lab, RPI 1

Recent Advances in Error/Erasure Correcting and Coding

Vijay Subramanian

Networks lab, RPI 2

Overview of Presentation

• Background on Error Correcting Codes.

• Some Powerful Codes– Reed-Solomon Codes

– Low-Density Parity Check (LDPC) Codes (Rediscovered)

– Tornado Codes

• Digital Fountain Approach

• Fountain Codes– LT Codes

– Raptor Codes

• Comparisons of Different Codes

• Applications

Networks lab, RPI 3

Block Codes • Accepts a k-bit information symbol and transforms it into a n-bit symbol.

– It is a fixed length code

• Must be implemented using combinatorial logic circuit.

Networks lab, RPI 4

Some Important Types of Block Codes

• Linear Block Codes: – Sum of any 2 code words results in a third unique codeword.

• Systematic Code: – The data bits also are present in the generated codeword.

• BCH: – Generalization of Hamming code for multiple error correction.– Very special class of linear codes known as Goppa codes.

• Cyclic Codes: – Important subclass of linear block codes where encoding and decoding

can be implemented easily.– Cyclic shift of a code word yields another code word.

• Group Codes:– Same as linear block codes. Also known as generalized parity check

codes.

Networks lab, RPI 5

Convolution Codes

• Generates n-bit message block from a k-bit block. – Different from block codes in that encoder has memory order m or constraint length.

• Must be implemented using sequential logic circuit.• Decoders have high complexity (decoding operations) per output bit.• More suited for low-speed digitized voice traffic (lower integrity) than for high-

speed data needing high integrity.• If the decoder loses or makes a mistake, errors will propagate.• Performance depends on the constraint length.

Networks lab, RPI 6

Concatenated Codes

• A concatenated code uses two levels of coding– An inner code and an outer

code to obtain desired performance.

• Inner code is designed to correct most of the channel errors.

• The outer code is a higher rate (lower redundancy) code which reduces the probability of error to a specified level.

• Concatenated codes are effective against a mixture of random errors and bursts.

Networks lab, RPI 7

Concatenated Codes

• Having 2 steps reduces the complexity required to obtain desired error rate (compared to a single code)

• Outer code is typically RS code.

• Inner decoder uses all the available soft decision data to provide the best performance under Gaussian conditions.

• For a one-level concatenated code, if the minimum distances of the inner and outer codes are d1 and d2, then the minimum distance of the concatenation is at least d1d2.

• Choice of the inner code is dictated by the application– For high data rates, inner code should be a block code

– For predominantly slower Gaussian channels, the inner code should be a convolution code.

Networks lab, RPI 8

Turbo Codes

• Important class of convolution codes proposed in 1993• Achieves large coding gains by combining two or more relatively

simple building blocks (component codes).• A serial concatenation of codes is often used for power limited

systems.• A popular approach is to have a RS outer code (applied first, removed

last) with an inner code (applied last, removed first).• Of all practical error correction methods known to date, turbo codes,

together with LDPC codes, come closest to approaching the Shannon limit, the theoretical limit of maximum information transfer rate over a noisy channel.

• Turbo codes make it possible to increase available bandwidth without increasing the power of a transmission, or they can be used to decrease the amount of power used to transmit at a certain data rate.

• Main drawbacks are the relative high decoding complexity and a relatively high latency, which makes it unsuitable for some applications

Networks lab, RPI 9

Block Codes versus Convoluted Codes

• For convolution codes, decoder complexity increases as the redundancy decreases whereas for block codes, complexity decreases as the redundancy decreases. For high code rates, this favors block codes.

• Practically, the speed of convolution codes is limited compared to block codes. On the other hand, fast and powerful RS block decoders have been built to operate at rates above 120Mb/s.

• Convolution Codes are weak while RS codes are superior when it comes to burst errors.

Networks lab, RPI 10

RS-code versus Convolution Codes

• Which is better?– For digitized voice,

concatenated code is better.

– For high speed/high integrity, RS/block codes are better.


Erasure Correcting Approaches

• Gallager Codes [1962], Rediscovered as LDPC • RS Codes [1960, Reed and Solomon]

– Complexity in Theory and encoding/decoding algorithms (Finite Field Operations)

• Tornado Codes [1997, Luby et al.]– Naïve and simple linear equation encoding/decoding algorithm.– Complexity and theory in design and analysis of irregular graph structure.

• LT Codes [1998, Luby]– Simpler scalable irregular graph structure.– Simpler analysis and design– Independent generation of each encoding symbol– Unlimited number of encoding symbols i.e. rateless codes– First practical realization of a digital fountain.

• Raptor Codes [2001, Shokrollahi]– Even better practical realization of a DF.


Reed-Solomon Codes

• Reed-Solomon codes are special and widely implemented because they are "almost perfect" in the sense that the extra (redundant) letters added on by the encoder is at a minimum for any level of error correction, so that no bits are wasted.

• RS codes are easier to decode than most other non-binary codes.

• RS codes allow the correction of erasures. RS codes have a combined error and erasure correction capacity of 2t+s <= n-k

• RS codes can correct twice as many erasures as errors (where the decoder has no information about the error).

• RS codes exist on b bit symbols for every value of b.


RS Codes

• Nonbinary cyclic codes with m-bit sequences where m > 2.

• RS(n,k) codes n m bit symbols exist for all n and k for which 0 < k < n < 2^m + 2

• RS codes achieve the largest possible code minimum distance for any linear code with the same encoder input and output block lengths.

• RS codes are particularly useful for burst-error correction (for channels that have no memory.)

• Any linear code is capable of correcting n-k symbol erasure patterns is the n-k erased symbols all happen to lie on the parity symbols.

• However, RS codes have the remarkable property that they can correct any set of n-k symbol erasures in the block.

• RS codes can be designed to have any redundancy.


Properties of RS codes

• Special case of BCH codes.

• Excellent erasure correcting property.

• They are popular because they can combat combinations of both random and burst errors.

• They also have a long block length, assuring a sharp performance curve.

• RS codes exhibit a very sharp improvement of block error-rate with an improvement of channel quality making their use ideal for data.

• Since they cannot exploit soft decision data well, they are used in concatenated codes.

• Real number arithmetic is avoided since they operate directly on bits.


RS code Performance (Error Correction; No Erasures)


Tornado Codes

• Class of erasure codes that support error-correcting and have fast encoding and decoding algorithms.

• Software-based implementations of Tornado codes are about 100 times faster on small lengths and about 10,000 times faster on larger lengths than software-based Reed-Solomon erasure codes while having only slightly worse overhead.

• Tornado codes are fixed rate, near optimal erasure correcting codes • Use sparse bipartite graphs to trade encoding and decoding speed for

reception overhead. • Since the introduction of Tornado codes, many other similar codes have

emerged, most notably Online codes, LT codes and Raptor codes. The are based on sparse graphs.

• Tornado codes are block erasure codes that have “linear in n” encoding and decoding times.

• They are not rateless.• Running times for encoding and decoding algorithms are proportional to

their block length. This makes it slow for small rates.


Structure of Tornado Codes

• Each column acts as redundancy for the column on its left.• Redundant packets are generated by taking the EXOR of the packets

to the left.• Tornado codes have the erasure property that to recover k source

packets we need slightly more than k encoding packets.


Tornado Codes versus RS Codes

• Advantage of Tornado codes over standard codes is that they trade off a slight increase in reception overhead for decreased encoding and decoding times.

• The algorithm runs in real time and file is reconstructed as soon as enough packets arrive.

• Source and Sink need to agree on the graph a priori.


Digital Fountain

• Digital Fountain:– Key property of a digital

fountain is that the source data can be reconstructed from any subset of the encoding packets equal in total length to the source data.

– The client can reconstruct the data using any k of the encoded packets.

– Ideally, the encoding and decoding processes require very little processing overhead.


Digital Fountain

• A DF is conceptually simpler, more efficient and applicable to a broader class of networks than previous approaches.

• The key property of a DF is that the source data can be reconstructed intact from any subset of the encoding packets equal in length to the source data.

• RS codes had unacceptably high running times which led to the development of Tornado codes.

• DF codes are record-breaking sparse graph codes for channels with erasures.

• Retransmissions are needless as per Shannon’s theory.

• RS codes work only for small k,n and failure probability need to be determined beforehand.


Properties of DF Codes…

• Required Encoding only as long as data.

• Rateless implies that number of encoding packets that can be generated is potentially limitless. Can produce an unlimited flow of encoding

• Data is always recoverable from required encoding.

• Low complexity for encoding and decoding.

• Can encode large amount of data.

• Ideal under conditions:– High/ unknown / Variable loss

– Large RTT

– Intermittent Connectivity

– One way channels

– Can be used as background data transport


Properties of DF Codes

• Every packet is useful.

• Can serve a large number of clients without need to maintain state.

• One user with poor connection cannot affect others.

• Advantages are pronounced when we apply this to broadcast/multicast.


DF Encoder

• This encoding defines a graph connecting source packets and encoding packets.

• If the mean degree d is smaller than K then the graph is sparse.

• The decoder needs to know the degree of each received packet and the source packets it is connected to.


LT-code Decoding Process

• Initially, all the input symbols are uncovered.• Step 1: All encoding symbols with 1 neighbor are released

to cover their unique neighbor.• The set of covered input symbols not yet processed is

called the ripple.• In each step, 1 input symbol from the ripple is processed.• The process ends when the ripple is empty.

– The process fails if there is at least one uncovered input symbol at the end.

– Process succeeds if all the input symbols are covered at the end.

• Goal of the degree distribution design is to slowly release encoding symbols while keeping the ripple small while ensuring that it does not disappear.


DF Decoder


LT Codes

• Rateless erasure codes

• LT Codes are universal in the sense that they– Are near optimal for every

erasure channel

– And are very efficient as the data length grows.

– Close to the minimum number of encoded symbols can be generated and sent to the receivers

– Number of packets receivers need to generate the original data is also close to the minimum.


LT Codes

• Objectives:– As few encoding symbols as possible on average are required to

ensure success of the LT process.

– The average degree of the encoding symbols is as low as possible. The average degree time K is the number of encoding symbols that ensure complete recovery of the data.

• Soliton wave is one where dispersion balances refraction perfectly.

• Soliton Distribution: The basic property required of a good degree distribution is that input symbols are added to the ripple at the same rate as they are processed.


Design of Degree Distribution

• A few encoding packets must have high degree (close to K)• Many packets must have low degree so that the total number of operations is

kept small.• Ideally, the received graph should have exactly one check node of degree 1 at

each iteration. • The ideal soliton distribution is as shown below but this works poorly in

practice because of variance (fluctuations) which means– During decoding, there will be no nodes of degree 1– Some source nodes will receive no connections.– ρ is the ideal soliton distribution.– Expected degree under this distribution is ln K


Robust Soliton Distribution

• Adds 2 extra parameters c and δ

• δ is a bound on the probability that the decoding fails after we receive a certain number of packets.

• C is a constant that can be viewed as a free parameter.

• τ is a positive function.

• µ is the robust soliton distribution.

• Expected degree distribution:


Tornado Codes versus LT codes

• Both RS and Tornado codes are systematic codes whereas LT codes are not systematic.

• Encoder and decoder memory usage in Tornado codes is much more than with LT codes.

• LT codes are rateless i.e. further encoding packets can be generated on the fly. In contrast, given k and n, Tornado codes produce only n encoding packets.

• Tornado codes are generated from graphs with constant maximum degree ; LT codes use graphs of logarithmic density.


Low Density Parity Check :LDPC

• Shannon showed the existence of capacity achieving codes but achieving capacity is only part f the story. For practical communication, we need fast encoding and decoding algrithms.

• Linear codes obtained from sparse bipartite graphs. Linear codes associated with sparse bipartite graphs are called LDPCs.– Any linear code has a representation as a code associated with a bipartite

graph.

• LDPC codes are good for Error Correction

• Graph gives rise to a linear code of block length n and dimension k.

• LDPC codes are equipped with very fast (probabilistic) encoding and decoding algorithms.


LDPC Structure

• The graph gives rise to a linear code of block length n and dimension n-r.

• The codewords are those vectors such that for all check nodes the sum of the neighboring positions among the message nodes is 0.

• Code == Graph == Matrix

• If the matrix/graph is sparse, the code is called LDPC.

• Sparsity is the key property that allows for the algorithmic efficiency.


More on LDPC Codes

• General class of decoding algorithms for LDPC codes is called message–passing algorithm.

• One important subclass is called the belief propagation algorithm.

• Performance of the regular graphs deteriorates as the degree of the message nodes increases.

• One instance of LDPC codes using a specific degree distribution is a capacity achieving sequence called Tornado codes.

• Another capacity achieving sequence has also been developed.


Raptor Codes…

• Extension of LT codes.

• First known class of fountain codes with linear time encoding and decoding.

• Raptor codes encode a given message consisting of a number of symbols, k, into a potentially limitless sequence of encoding symbols such that knowledge of any k or more encoding symbols allows the message to be recovered with some non-zero probability.


Raptor Codes

• The basic idea behind Raptor codes is a pre-coding of the input symbols prior to the application of an appropriate LT-code.

• The probability that the message can be recovered increases with the number of symbols received above k becoming very close to one once the number of received symbols is only very slightly larger than k.

• A symbol can be any size, from a single bit to hundreds or thousands of bytes.

• Raptor codes may be systematic or non-systematic. • Used in

– 3rd Generation Partnership Project for use in mobile cellular wireless broadcast and multicast

– DVB-H standards for IP datacast to handheld devices.


Comparisons among the Different Codes

• Data Length:– RS : Severely limited in practice.– Tornado: Moderate to Large– LT: Moderate to Large– Raptor: Small to Large

• Encoding Length:– RS, Tornado: Limited to a small multiple of data length– LT, Raptor: Unlimited– On-the-fly/ Dynamic generation

• Flexibility to receive from multiple sources:– RS, Tornado: Hard to coordinate– LT, Raptor: No coordination Needed.

• Memory Requirements:– RS, Tornado: Proportional to encoding length– LT, Raptor: Proportional to data length


Comparisons…

• Computational Work:– RS: O(k) per encoding symbol, O(k) to decode

– Tornado: O(1) per encoding symbol, O(1) to decode

– LT: O(ln(k)) per encoding symbol, O(k*ln(k)) to decode

– Raptor: O(1) per encoding symbol, O(k) to decode

• Reception Overhead:– RS: None

– Tornado, LT, Raptor: Small constant fraction

• Failure Probability:– RS: Zero

– Tornado: Small

– LT: Tiny

– Raptor: Really Tiny.


Application: Scalable Protocol for Distributing Software

• Following properties are desired for scalable and reliable multicast.– Reliable:

• The file is guaranteed to be delivered completely to all receivers.

– Efficient: • Overhead incurred must be minimal.

• User should not be able to distinguish between a multicast service and a pint-to-point service.

– On demand: • Clients should be able to initiate the service at their discretion.

– Tolerant:• Service must be tolerant of a heterogeneous set of receivers.

• No feedback channel!


More Application Domains

• DF codes have been applied to reliable multicast currently.

• Robust Distributed Storage

• Delivery of streaming content

• Delivery of content to mobile clients in wireless networks

• Peer-to-peer applications

• Delivery of content along multiple paths for improved resilience.

• Transport layer design for unicast is under exploration.


References

• A Digital Fountain Approach to Reliable Distribution of Bulk Data (1998), John W. Byers, Michael Luby, Michael Mitzenmacher, Ashutosh Rege, SIGCOMM 98.

• LT Codes (2002), Michael Luby,The 43rd Annual IEEE Symposium on Foundations of Computer Science.

• David MacKay, Information Theory, Inference & Learning Algorithms.

• Bernard Sklar, Digital Communications: Fundamentals and Applications.

• E. Berlekamp, R. Peile and S. Pope, The Application of Error Control to Communications. 1987.

• Michael Luby’s Usenix presentation.


Message Passing

networks lab, rpi1 recent advances in error/erasure correcting and coding vijay subramanian

Documents

outer codes

rs code

cyclic codes

systematic code

code words

goppa codes

group codes

rpiblock codes