introduction to blackfin bf532 dsp
TRANSCRIPT
ANALOG DEVICES BLACKFIN PROCESSOR
BF 532
Agenda
Introduction to DSP
Introduction to Blackfin family
Getting Started with VisualDSP 5.0
Programming and Optimizing C on Blackfin
Examples
What is DSP
What is [a] DSP? In brief, DSPs are processors or
microcomputers whose hardware, software, and instruction
sets are optimized for high-speed numeric processing
applications— an essential for processing digital data
representing analog signals in real time.
. What a DSP does is straightforward. When acting as a
digital filter, for example, the DSP receives digital values
based on samples of a signal, calculates the results of a
filter function operating on these values, and provides
digital values that represent the filter output. The DSP’s
high-speed arithmetic and logical hardware is programmed
to rapidly execute algorithms modeling the filter
transformation.
EX
Key Difference
The combination of design elements—arithmetic operators,
memory handling, instruction set, parallelism, data
addressing—that provide this ability forms the key
difference between DSPs and other kinds of processors.
The real-time signal comes to the DSP as a train of
individual samples from an analog-to-digital converter
(ADC). To do filtering in real-time, the DSP must complete
all the calculations and operations required for processing
each sample (usually updating a process involving many
previous samples) before the next sample arrives. To
perform high-order filtering of real-world signals having
significant frequency content calls for really fast processors
General features of DSP
Efficient ALU and MAC Units (Multiple)
Harward or Super-Harward Architecture
Extended Precision in Computational Units
Hardware Looping
Efficient and fast peripherals
Circular Buffering
High Speeds of operation
Continued….
Fast Multipliers
Multiple Execution Units
Efficient Memory Access
-Harward and Super Harward architectures
Data Format
-Fixed point and Floating point
Zero Overhead Looping
Streaming I/O
Specialized Instruction Sets
Introduction
Blackfin processors embody a new type of 16/32-
bit embedded processor designed specifically to
meet the computational demands and power
constraints of today’s embedded audio, video,
automotive, industrial/instrumentation, and
communications applications
Blackfin processors combine a 32-bit RISC
instruction set, dual 16-bit multiply accumulate
(MAC) digital signal processing functionality, and
8-bit video processing performance.
Roadmap
Continued……………..
Characteristics of a Embedded Processor
BF531/2/3
BF532 Block Diagram
Features
Up to 600Mhz High Performance Processor
2 16 bit MAC’s, 2 40-bit ALU’s, 4 8-bit Video ALU’s and a
40bit Shifter
0.85 V to 1.30 V core VDD with on-chip voltage
regulation
1.8 V, 2.5 V, and 3.3 V compliant I/O
Up to 148K bytes of on-chip memory which can be used
as a cache or SRAM and having both data and code banks
External Memory controller with glue less support for
SDRAM, SRAM , flash and ROM
Multiple booting Options from SPI and Parallel Flash
Peripherals and Units
Dynamic Power management Unit
Direct Memory Access
SPI interface
Parallel Port Interface
Serial Port Controllers
UART
Programmable Flags
Timers and RTC
EBIU(External Bus Interface Unit)
Core
the Blackfin processor core contains two 16-bit
multipliers, two 40-bit accumulators, two 40-bit
ALUs, four video ALUs, and a 40-bit shifter. The
computation units process 8-bit, 16-bit, or 32-bit
data from the register file.
The compute register file contains eight 32-bit
registers. When performing compute operations on
16-bit operand data, the register file operates as 16
independent 16-bit registers.
The ALUs perform a traditional set of arithmetic
and logical operations on 16-bit or 32-bit data.
Each MAC can perform a 16-bit by 16-bit multiply in each
cycle, accumulating the results into the 40-bit
accumulators. Signed and unsigned formats, rounding, and
saturation are supported.
The 40-bit shifter can perform shifts and rotates and is used
to support normalization, field extract, and field deposit
instructions.
The program sequencer controls the flow of instruction
execu tion, including instruction alignment and decoding.
For program flow control, the sequencer supports PC
relative and
Hardware is provided to support zero-over
head looping. The architecture is fully
interlocked, meaning that the programmer
need not manage the pipeline when executing
instructions with data dependencies.
Operating Modes
The architecture provides three modes of
operation: user mode, supervisor mode emulation mode.
User mode has restricted access to certain system
resources, thus providing a protected software
environment, while supervisor mode has
unrestricted access to the system and core
resources
Emulation Mode is used for Testing Purposes only
Core
Memory
Booting
The Process of
loading of internal memories of the processor by using external memories by using itself is called Booting
The processor is having 2 Boot pins BMODE0,1 so, it will support 4 Boot Modes
Which are shown in side window
Dynamic Power Management Unit
The Processor has 3 Power Domains VDDEXT(Peripherals) VDDINT(Core) VDDRTC(RTC) And it has 2 CLOCK domains Peripherals will work with SCLK and the Core
will work with CCLK
The Processor has Internal PLL by using
which we can get multiple frequency of
operations by just changing register values
The dynamic power management feature of the ADSP-
BF531/ ADSP-BF532/ADSP-BF533 processor allows
both the proces sor’s input voltage (VDDINT) and
clock frequency (fCCLK) to be dynamically controlled.
Different Applications require Different Clock speeds,
According to the Clock speed the VDDINT will be
reduced thereby reducing overall power dissipation
Because of this feature Blackfin processors are used
in Low Power applications
Different Modes Available
Different applications requires different types of modes.
These power modes will offer different levels of power savings.
Hibernate Mode will be having High power saving where as Full-On will be having less power savings with more performance
VisualDSP++ 5.0
Start UP
Creating the session
Selecting the Processor
Selecting Connection Type
Selecting the Platform
Completing the session
Selecting the session
Creating a Project
Project Information
Processor selection
Application Settings
Startup code/ldf
Completed
Project Layout
Adding Source Files to the Project
Contd….
Building and Running the Project
Build the project by performing one of these actions.
• Click the Build Project button or
• From the Project menu, choose Build Project.
Or Click the Rebuild All button ( ) to build the project.
The C source file opens in an editor window, and execution halts at the main ()
At the End we will be seeing
“Build completed successfully.”
Press F5 to run the project
Changing the Project Options
Options
1. Processor: BF532
2. Type : Loader File
3. Revision: Automatic
Loader file Settings
Choose : boot mode as
flash/ PROM, Boot Format as ASCII and Output width as 16 bit.
Choose a folder for an output file . After changing the options again Rebuild All
C Language
Advantages:
C is much cheaper to develop. ( encourages experimentation )
C is much cheaper to maintain.
C is comparatively portable.
Disadvantages ANSI C is not designed for DSP.
DSP processor designs usually expect assembly in key areas.
DSP applications continue to evolve. ( faster than ANSI Standard C )
Missing operations provided by software
emulation (floating point!)
C is more machine-dependent than you might
think for example: is a “short” 16 or 32 bits?
Can be a poor match for DSP – accumulators?
SIMD? Fractions?
Not really a mathematical focus. Systems
programming language
Increasing C Performance
Process of Performance Tuning is a
Specialization of the program for the
particular hardware
Work at the Higher level first Improve the algorithm Make sure that algorithm suits to Architecture
Look at Machine capabilities May have specialized instructions
Linear Profiling tools
Using the compiler Optimization (Automatic Compiler
Optimization)
Optimizing the algorithm for the Hardware
Using the Pipeline viewer
Using the Compiler Libraries given which are already
optimized routines for the Hardware fractional builtins fract types fract16 and fract32 ETSI(European Telecommunications Standards Institute's
fract functions)
Fractional Arithmetic is 100 times faster than floating point
arithmetic
Arrays and Pointers
Arrays are easier to analyse. void va_ind(int a[], int b[], int out[], int n) { int i; for (i = 0; i < n; ++i) out[i] = a[i] + b[i]; }
Pointers are closer to the hardware. void va_ptr(int a[], int b[], int out[], int n) { Int i, for (i = 0; i < n; ++i) *out++ = *a++ + *b++ }
Which produces the fastest code?
Mostly no difference Start with Array if performance not
sufficient use Pointers
Avoid Loop Carried dependencies
Bad: Scalar dependency. for (i = 0; i < n; ++i) x = a[i] - x; Value used from previous iteration. So
iterations cannot be overlapped.
Bad: Array dependency. for (i = 0; i < n; ++i) a[i] = b[i] * a[c[i]]; Value may be from previous iteration. So
iterations cannot be overlapped.
Avoid Loop Carried dependencies
Using Hardware Loops
Word align your data 32-bit loads help keep compute units busy 32-bit references must be at 4 byte boundaries Top-level arrays are allocated on 4 byte
boundaries Only pass the address of first element of arrays Write loops that process input arrays an
element at a time
Use of the tools “volatile” and “const”
Volatile: Volatile is essential for hardware or interrupt-
related data Data is not changed by the Program it will be
changed by the hardware and used by the program
Const: It will remove wrong access of the memory and
changing the memory contents
Use of Circular addressing
Use of the Key word “asm”
Replace Conditionals with Min, Max and Abs
Avoid jump statements
Avoid Division Statements: Using Shift by 2
Removing Conditionals
Duplicate small loops
rather than have a
conditional in a small loop.
References
www.analog.com //Analog Devices
www.dspguru.com
www.dspdesignline.com
www.dspguide.com The Scientist and Engineers' guide to DSP
www.dspstore.com //A very nice site which is
having all DSP updates
www.globalspec.com
Any Queries