













Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of the tms320c6713 dsp system, including its architecture, features, and specifications. It covers topics such as computational architectures, memory organization, and functional units. The document also includes a block diagram and data path diagram of the c6713 dsp.
Typology: Study notes
1 / 21
This page cannot be seen from the preview
Don't miss anything!
DSP of Analog Signals
Characteristics of DSP Applications
Computationally demanding (multiply,
multiply-accumulate …)
Stringent real-time requirement
“Streaming” data
High data bandwidth
Predictable program flow
Modern DSPs
Texas Instruments
TMS320C6x DSP family
Freescale
MSC81xx multi-core DSP family
Analog Devices
SHARC, Blackfin
C67x DSP Roadmap
C6713 DSK Overview
225 MHz TMS320C6713 floating point DSP
AIC23 stereo codec
8~92 KHz sample rates
Memory
16 MB dynamic RAM
512 KB nonvolatile FLASH memory
General purpose I/O
4 LEDS
4 DIP switches
USB interface to host PC
C6713 DSK Physical Layout
C6713 Block Diagram
C6713 Data Path
D
S2S1D
M
D S1 S
D
D S1 S
1X 2X
L 1 S
S1 S2 D DL SL SLDLDS1S
M2 S
S2 S1 D S2 S1DDL SL
Registers A0 - A15 Registers B0 - B
L
SL DL D S2 S
Cross Paths
40-bit Write Paths (8 MSBs)
40-bit Read Paths/Store Paths
Function Unit & Operations
Advanced VLIW (VelociTI
® )
Example 1
A B C D E F G H
A B C D E F G H
Example 2
A B
C
D
E
F G H
Example 3
Addressing Modes
Indirect addressing
*R Register R contains the address of memory location
*R++( d ) Post-increamented with modification
*++R( d ) Pre-increamented with modification
*+R( d ) Pre-increamented without modification
Circular addressing
Address is bounded to a range
Controlled by AMR register
AMR Register
Cross-Path Constrains
There can be at most two instructions per
cycle using cross-paths
Valid
Invalid
Load/Store Constrains
Address register must be on the same side as the .D unit
A load (store) using one register file in parallel with another
load (store) must use a different register file
Valid Invalid
Invalid Valid
Branch Instructions
Branch using a displacement
Unit can be S1 or S
Branch using a register
Only on S
Integer Instructions
Arithmetic instructions:
Move instructions:
Comparison instructions:
CLR – Clear a Bit Field
Syntax
EXT – Extract and Sign-Extend a Bit
Field
Syntax
or
EXT (.unit) src2, src1, dst
Example: Dot Product
C
Assembly
Double-Word Loading
MVK .S1 100, A
ZERO .L1 A
|| ZERO .L2 B
LOOP: LDDW .D1 *A4++, A3:A
|| LDDW .D2 *B4++, B3:B
SUB .S1 A1, 1, A
NOP 2
[A1] B .S2 LOOP
MPYSP .M1x A2, B2, A
|| MPYSP .M2x A3, B3, B
NOP 3
ADDSP .L1 A6, A7, A
|| ADDSP .L2 B6, B7, B
; branch occurs here
NOP 3
ADDSP .L1 A7, B7, A
NOP 3
Optimization with Software Pipeline
Assembly Code for Loop Kernel
The loop kernel can be done in one cycle!
Butterfly Diagram
X(0)
X(1)
X(2)
X(3)
X(4)
X(5)
X(6)
X(7)
w
w
w
w
y(0)
y(4)
X(2)
y(6)
y(1)
y(5)
y(3)
X(7)
w
w
w
w
w
w
w
w
X(0)
X(4)
X(2)
X(6)
X(1)
X(5)
X(3)
X(7)
w0 -
w
w
w
y(0)
y(1)
y(2)
y(3)
y(4)
y(5)
y(6)
y(7)
w
w
w
w
w
w
w
w
Out of order
In order
Implementation in C
Stage k
k
k
k
sub-group
2
sub-groups in total
Implementation in
Linear Assembly
Code
.global _DSPF_sp_cfftr2_dit_DSPF_sp_cfftr2_dit .cproc A_xptr, B_wptr, A_n
MV A_n, A_n2 ; init n SHR A_n, 1, A_n ; outer loop cntr MV A_n, A_cnt ; inner loop cntr oloop: SHR A_n2, 1, A_n2 ; n2>> LDDW *B_wptr, B_s:B_c ; load s:c ADD B_wptr, 8, B_w ; init w ptr MV A_n2, A_i ; init ia MV A_cnt, A_icntr ; init loop cntr SHL A_n2, 3, A_8n2 ; n2<< ADDAD A_xptr, A_n2, A_x ; init load ptr ADDAD A_xptr, A_n2, A_xs ; init store ptr MV A_xptr, B_x ; init load ptr MV B_x, B_xs ; init store ptr loop: [!A_i] ADD A_x, A_8n2, A_x ; if(!i) A_x+=8n [!A_i] ADD B_x, A_8n2, B_x ; if(!i) B_x+=8n [!A_i] LDDW *B_w++, B_s:B_c ; if(!i) load s:c [!A_i] ADD A_xs, A_8n2, A_xs ; reset store ptr [!A_i] ADD B_xs, A_8n2, B_xs ; reset store ptr
LDDW *A_x++, A_x2mp1:A_x2m ; load x[2m+1]:x[2m]
[!A_i] MV A_n2, A_i ; reset ia [A_i] SUB A_i, 1, A_i ; decr ia
MPYSP A_x2m, B_c, A_p1 ; p1=cx[2m] MPYSP A_x2m, B_s, B_p4 ; p4=sx[2m] MPYSP A_x2mp1, B_s, A_p2 ; p2=sx[2m+1] MPYSP A_x2mp1, B_c, A_p3 ; p3=cx[2m+1]
ADDSP A_p1, A_p2, A_rtemp ; rtemp=p1+p SUBSP A_p3, B_p4, B_itemp ; itemp=p3-p
LDDW *B_x++, B_x2iap1:B_x2ia; load x[2ia+1]:x[2ia]
SUBSP B_x2ia, A_rtemp, A_x2ms; x[2m]=x[2ia]-rtemp ADDSP B_x2ia, A_rtemp, A_x2ias; x[2ia]=x[2ia]+rtemp
SUBSP B_x2iap1,B_itemp, B_x2mp1s; x[2m+1]=x[2ia+1]-itemp ADDSP B_x2iap1,B_itemp, B_x2iap1s;x[2ia+1]=x[2ia+1]+itemp
STW A_x2ms, A_xs++ ; perform all stores STW A_x2ias, B_xs++ STW B_x2mp1s,A_xs++ STW B_x2iap1s,B_xs++
[A_icntr] SUB A_icntr, 1, A_icntr ; decr loop cntr [A_icntr] B loop ; branch inner
SHR A_n, 1, A_n ; half outer loop cntr [A_n] B oloop ; branch outer loop