lpc speech coder on the ti c6x dsp mark anderson, jeff burke ee213a / ee298-2 prof. ingrid...

26
LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Upload: virgil-atkins

Post on 23-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

LPC Speech Coder on the TI C6x DSP

Mark Anderson, Jeff Burke

EE213A / EE298-2Prof. Ingrid Verbauwhede

Page 2: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Summary Implementation platform

Texas Instruments TMS320C6000 Low-quantity cost US $35 (‘C6211)

Architecture clock frequency 150 MHz (‘C6211)

Throughput 75-80 channels @ 8000 samples/sec

Page 3: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Summary Total energy per sample

1.8 uJ/sample ‘Area’

1.2% of cycle budget per chan. per frame

8.5% of unified memory per channel 25% of unified memory for algorithm

Page 4: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Summary Flexibility of implementation

High; programmable processor with C compiler, GUI debugger & simulator

SegSNR_A: ?

SegSNR_Q: 26 dB (voiced segments)

Page 5: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Architecture overview 256-bit VLIW

Two “clustered” data paths Four functional units in each data path

16x16 multiply Two ALUs Data addressing unit

32-bit instruction for each functional unit

(256 bit “instruction” for 8 func. Units)

Page 6: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Data path diagram

Page 7: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Architecture overview Split register file

Only two cross-paths exists Cluster is limited to one source read

from opposite register file per cycle. Data types

8, 16, 32-bit with 40-bit accumulate 40-bit = register pair

Page 8: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Memory architecture ‘C6211 (US$35) has a cache! 4kB L1 Instruction cache (L1P) 4kB L1 Data cache (L1D) 64kB L2 Unified memory and/or

cache Extra DMA channels

Page 9: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Memory architecture

Page 10: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Design Tools Command-line

Compiler, debugger, simulator Code Composer Studio

Same tools Windows NT GUI 30-day “evaluation” license Draconian copy protection, pulls out

the rug from under you

Page 11: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Design Flow Consolidate Matlab reference into

a single function Matlab rewritten C-style Verified C-style Matlab C prototype created Imported into Code Composer,

optimized & simulated

Page 12: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Fixed-point quantization Input samples

16-bit, normalized to [-1,1) <1.15> format used

Coefficient quantization Hamming window, pre-emphasis, FIR <1.15> format used No noticeable change in

characteristics

Page 13: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Fixed-point quantization Most values 16 bit

Take advantage of 16x16 fast multipliers

Remain close to other class implementations

Add metric for overpowered LPC engine Use # of channels as performance

metric

Page 14: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Fixed-point quantization Energy stored in <5.27>

Prevent overflow, provide precision for low energy segments

Temporary values stored in <10.30> Take advantage of extended precision

Modified autocorrelation used <16.0> All whole numbers

Page 15: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Fixed-Point SNR Matlab simulation of magnitude

truncation Tools again.

SegSNR_A = ? SegSNR_Q = 26 dB

Voiced segments only Sent_female test data

Page 16: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Performance results Initial version: 80,000 CPU cycles/frame Optimization

Take advantage of VLIW, pipelining observe assembly, modify C loops

Use TI’s DSP Library Assembly advantage without assembly

Optimized version: 30,182 cycles/frame Had to stop early, still at least 5K cycles

wasted

Page 17: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Performance Then, the tool license expired. The tool would not install on other

machines. TI responded, but wasn’t too helpful. Moral #1: Avoid the evaluation

version. Moral #2: Give tools away to sell

hardware

Page 18: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Cycle count details

Routine % Cycles/frame

Windowing, pre-emphasis 4.3 1285

Energy calc 0.8 254

Autocorrelation in Levinson-Durbin

8.0 2421

Autocorrelation in pitch detection

51 15334

Algorithm total 95 28561

Total w/ housekeeping 30182

Page 19: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Additional optimizations Use more DSPLIB routines

Autocorrelation Assembly-level optimization

Code size reduction? Reduce number of buffers to reduce

L1D usage per frame

Page 20: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Energy per sample ‘C6211 consumes 1.24W

75% high activity / 25% low activity 1.24W / 80 channels

= 15.5mW/channel 15.5 mJ/sec/channel * 1/8000

= 1.8 uJ / sample

Page 21: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Number of channels

150 x 106 cycles/sec x 0.02 sec/frame= 3.0 x 106 cycles/frame

3.0 x 106 cycles/frame / 30,182 cycles= 99 channels

Page 22: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Memory ‘C6211 Cache complicates

estimates Performance is 85-99% of optimal

for typical applications 30,182 cycles becomes

35,508 cycles/frame for 85% efficiency

=> now support only 86 channels

Page 23: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Memory Try to account for off-chip memory

transfers ~220,000 cycles for 150ns fetches

for 80 channels

=> support 75-80 channels

Unable to verify/simulate because of unexpected tool expiration

Page 24: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Memory L2 usage

~16kB Code size thanks to VLIW 512 32-byte instruction clusters More suited for ‘C6201 & larger processors

Remaining used by data for channels 480 bytes each (8.5% of remaining memory)

L1 usage L1P: Can’t tell because of cache L1D: 2.2kB (~56%)

Page 25: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

Tool comments Powerful, easy to use IDE… When it worked.

Licensing problems for eval version Debugging support a bit odd

puts/printf

Page 26: LPC Speech Coder on the TI C6x DSP Mark Anderson, Jeff Burke EE213A / EE298-2 Prof. Ingrid Verbauwhede

C6x Conclusions Easily support 75-80 channels of

coding 26 dB fixed-point SNR, 16-bit types VLIW = Large code size Cache on a low-end DSP! Good tools,

but draconian copy protection