-
2022-09-21 17:24:28
ADSP-21467/ADSP-21469 is a Sharc processor Ⅰ
Abstract
High -performance 32 -bit/40 -bit floating -point processor; optimized for high -performance audio processing; single instructions, multiD data (SIMD) computing architecture; 5 megabytes of RAM, 4 mega tablets on 4 mega films ROM; the operating frequency is as high as 450 MMI; suitable for automotive applications; code compatible with all other members of the Sharc family can be used for unique audio -centered peripheral devices, such as digital application interface content protection protocol), serial port, precision clock Generator, S/PDIF transceiver, asynchronous sampling rate converter, input data port, etc.
General instruction Members of the DSP Simd Sharc series of DSP architecture. The source code of these processors is compatible with ADSP-2126X, ADSP-2136X, ADSP-2137X, and ADSP-2116X DSP, and is compatible with the first-generation ADSP-2106x Sharc processor in SISD (single instructions, single data) mode. These 32 -bit/40 -bit floating -point processors have optimized high -performance audio applications, with SRAM, multiple internal bus on large pieces to eliminate I/O bottlenecks, and innovative digital applications/peripheral interfaces (DAI/DPI).
Table 1 shows the performance benchmark of the processor, Table 2 shows the characteristics of the product.
1. Factory programming ROM includes: Dolby AC-3 5.1 decoding, Dolby Pro logic IIX, Dolby smart mixing frequency Emix (EMIX), Dolby Volume processor, Dolby headset V2, DTS NEO: 6 piercing, DTS 5.1 decoding (96/24), mathematical meter/rotation factor/256 and 512 FFT and ASRC.
2. For more information about the availability of ADSP-21467/ADSP-21469 processor supporting DTCP, please contact your local simulation equipment sales office.FIG. 1 shows two clock domains that constitute the processor. The core clock domain contains the following functions:
two processing components (PEX, PEY), each component has one ALU, multiplier, displacement and data register file
] Data address generator (DAG1, DAG2)
program serial device with instruction cache
PM and DM bus can be in each core processor cycleSupport 2 × 64 -bit data transmission between memory and cores
SRAM (5 mbits)
]
JTAG test access port for simulation and boundary scanning. JTAG provides software debugging through user breakpoints, allowing flexible abnormal processing.FIG. 1 also shows the peripheral clock domain (also known as I/O processor), which contains the following functions:
DMA) and IOD1 (outer port DMA) bus
outer and external port bus of the core connection
outer port with AMI and DDR2 controller [123 ]
4 units for pulse width control control1 MTM unit for internal to internal memory transmission
Digital application interface, including four precision clock generators (PCG), one input data port (IDP) for serial and parallel interconnection, a S/PDIF receiver/transmitter, four asynchronous sampling rate converters, Eight serial ports, a flexible signal routing unit (DAI SRU).
Digital peripheral interface, including two timers, one 2 -line interface, one UART, two serial peripheral interfaces (SPI), 2 precision clock generators (PCG) and a flexible flexibility Signal routing unit (DPI SRU).
As shown in Figure 1, the processor uses two computing units, providing significant performance improvement in a series of DSP algorithms compared to the previous Sharc processor. Use SIMD to calculate the hardware. The processor can run 2.7GFLOPS at 450MHz and run 2.4GFLOPS at 400MHz.
Family core architecture
These processors are processed with ADSP-2137X, ADSP-2136X, ADSP-2126X, ADSP-21160 and ADSP-21161, and the first generation of ADSP-2106X Sharc processing Code compatibility. ADSP-21467/ADSP-21469 processor and ADSP-2126X, ADSP-2136X, ADSP-2137X, and ADSP-2116x SIMD Sharc processor sharing architecture functions, as shown in Figure 2 and explained in detail in the following sections.
SIMD calculation engine
The processor contains two computing processing elements.) Engine operation. The processing element is called PEX and PEY, and each element contains an alu, multiplier, displacement and register file. PEX is always active, and PEY can be enabled by setting the PEYEN mode bit in the Mode1 register. When this mode is enabled, the same instructions are performed in the two processing elements, but each processing element operates different data. This architecture can effectively execute mathematical dense DSP algorithms.
Entering the SIMD mode also affects the method of transmitting data between memory and processing components. In the SIMD mode, you need to double the data bandwidth to maintain the calculation operation in the processing unit. Due to this requirement, entering the SIMD mode will also double the bandwidth between memory and processing elements. When using Dags to transmit data in SIMD mode, two data values are transmitted every time you access the memory or register file.
Independent parallel computing unit
There is a set of computing units in each processing unit. Computing units include arithmetic/logical units (ALU), multiplier and displacement. These units perform all operations within a cycle. The three units in each processing unit are arranged in parallel to maximize the calculation throughput. A single multifunctional instruction executes parallel operation unit and multiplier operation. In the SIMD mode, the operation of the parallel ALU and the multiplier occurs at the same time in two processing units. These computing units support the IEEE 32 -bit single -precision floating point, 40 -bit extension accuracy floating point, and 32 -bit fixed -point data format.
timer
A core timer that can generate cyclical software interruption. The core timer can be configured as an expired signal of the timer 3.
Data register file
The general data register file is included in each processing element. The register file transmits data between the calculation unit and the data bus, and stores the intermediate result. These 10 ports, 32 registers (16 main registers, 16 secondary registers) register files, combined with the Harvard architecture enhanced by the processor, allow the calculated data stream without constraints between internal memory. The register in PEX is called R0-R15, and the register in PEY is called S0-S15.
Context switch
The register of many processors has auxiliary registers, which can be activated during the period of interrupt service for fast context switching. The data registers, DAG registers, and multiplier result registers in the register file have auxiliary registers. The main register is activated during resetting, and the auxiliary register is activated by the control bit in the mode control register.
General register
These registers can be used for tasks for general uses. This USTAT (4) register allows simple bit operations (settings, removal, switching, testing, differential) to all system registers (control/status) of the core.123]
Data bus exchange register (PX) allows transmission data between 64 -bit PM data bus and 64 -bit DM data bus, or between 40 -bit register files and PM/DM data bus. These registers include hardware that process data width difference.
Single cycle acquisition of instructions and four operations
The processor adopts an enhanced Harvard architecture, where the data memory (DM) bus transmission data, program memory (PM) bus transmission instruction and data (See Figure 2). Through its independent programs and data storage bus and the instruction cache on the film, the processor can obtain four operations (two operations per data bus) and one instruction (from cache) within one cycle.
The instruction cache
The processor contains a on -board instruction cache, which supports the three bus operations to obtain a instruction and four data values. The cache is selective, only the instruction of the data access conflict with the PM bus data. The cache allows full -speed execution of core, circulating operations (such as digital filter multiplication) and FFT butterfly treatment.
The data address generator supports zero -expense hardware circulating buffer
These two data address generators (DAG) are used to indirect addressing and realize the circulating data buffer in the hardware. The circular buffer allows other data structures required for effective programming delay lines and digital signal processing, which is usually used for digital filter and Fourier transformation. The two DAGs of the processor contain sufficient registers, which can create up to 32 cyclic buffer (16 main registers set and 16 auxiliary registers). DAG automatically processes the surrounding address pointer to reduce expenses, improve performance, and simplify implementation. The circulating buffer can start and end at any memory position.
Flexible instruction set
48 -bit instructions can accommodate various parallel operations to achieve simple programming. For example, the processor can perform the multiplication, addition and subtraction of the two processing elements, and at the same time, in a instruction to branch from the memory and obtain up to four 32 -bit values.
Variable instruction collective structure (VISA)
In addition to supporting the previous SHARC processor's standard 48 -digit instructions, these processors also support new instructions of 16 and 32 -bit. This feature called variable instruction collective structure (VISA) puts extra/unused bit into 48 digits to create more efficient and compact code. The program serializer supports these 16 -bit and 32 -bit instructions from internal and external DDR2 memory. The VISA option needs to be used to build a source module so that the code generation tools can create these more effective operating codes.
Mascies on the film
The processor contains 5 trillion memory. Each block can be configured as different code and data storage combinations (see Table 4). Each memory block supports the core processor andI/O processor's single -period independent access. The memory architecture is combined with its separate ONCHIP bus, allowing two data transmission from the core and I/O processor within a cycle.
The Sram of the processor can be configured to be configured with 32 -bit data with a maximum of 160k characters, 16 -bit data of 320K characters, 48 -bit instructions (or 40 -bit data) of 106.7k characters, or different words with a maximum of 5 mbits. The combination. All memory can be accessed as 16 -bit, 32 -bit, 48 -bit or 64 -bit words. Support 16 -bit floating -point storage format, which effectively doubles the amount of data that can be stored on the chip. The conversion between 32 -bit floating point and 16 -bit floating -point format is executed in a instruction. Although each storage block can store the combination of code and data, when one block uses DM bus storage data for transmission, and the other blocks use the PM bus storage instruction and data for transmission, the access efficiency is the highest.
Use DM bus and PM bus, one of which is dedicated to a memory block to ensure the single cycle execution of the two data transmission. In this case, the instruction must be available in the cache.
Memory mapping in Table 3 Display the internal memory address space of the processor. The 48 -bit space part describes what this address range is like to retrieve 48 -bit memory. The 32 -bit section describes the appearance of this address range for retrieval of 32 -bit memory.
The memory bandwidth on the film
The internal memory architecture allows the program to make four access to any of the four blocks at the same time (assuming that there is no conflict). The total bandwidth is achieved by DMD and PMD bus (2 × 64 bits, CCLK speed) and IOD0/1 bus (2 × 32 bits, PCLK speed).
Non -safe only read the memory
For non -secure ROMs, use the bootcfg pin to select the guidance mode, as shown in Table 8. In this mode, the simulation is always enabled. IVT is placed on the internal RAM, except for the situation of bootcfgx 011.
Based on ROM
ROM security characteristics provide hardware support, and protect user software code by preventing unauthorized read internal code when enabled. When using this function, the processor does not guide the loading any external code and only executes from the internal ROM. In addition, the processor cannot access freely through the JTAG port. Instead, the only 64 -bit key scanned through the JTAG or test access ports will be assigned to each customer.
Digital transmission content protection
DTCP specification defines an encryption protocol to protect audio and entertainment content from being illegally replicated and intercepted when crossing high -performance digital bus (such as IEEE 1394 standard) And tampering. Only transmitted to the source setting through another approved copy protection system (such as DVD content addition system)The preparation of legal entertainment content is protected by this copy protection system.
Family building
The processor contains a set of rich peripheral devices, supporting a variety of applications, including high -quality audio, medical imaging, communication, military, military, military, military , Test equipment, 3D graphics, voice recognition, motor control, imaging and other applications.
External port
The external port interface supports access to external memory through the core and DMA. The external memory address space is divided into four groups. Any bank can be programmed as asynchronous or synchronized memory. The external port consists of the following modules.
asynchronous memory interface, can communicate with SRAM, flash memory and other devices that meet the standard asynchronous SRAM access protocols. AMI supports 40,000 external memory of the 20,000 -word external memory of the bank of 0 banks, 1 bank, 2 banks, and 3 banks.
DDR2 DRAM controller. It can support external memory devices with a maximum of 2 GB.
Arbitration logic, which is used to coordinate the core and DMA transmission between the internal and external memory through the external port.
External memory
The external port on the processor provides high -performance and glue -free interfaces for various industrial standard storage equipment. The external port can be used to interface with synchronous and/or asynchronous memory device interfaces through its separate internal DDR2 memory controller. The 16 -bit DDR2 DRAM controller is connected to the industrial standard synchronous DRAM device, while the second 8 -bit asynchronous memory controller is used to connect to various memory equipment. Four memory choices allow up to four independent equipment coexistence, supporting any required combinations of synchronization and asynchronous equipment types. Non -DDR2 DRAM external memory address space is shown in Table 4.
SIMD access to the external memory
DDR2 controller supports 64 -bit EPD (external port data bus) SIMD access, allows to allow it in normal words space (NW) access the complementary register on the Pey unit. This improves performance because there is no need to explicitly load free registers in SISD mode.
Visa and ISA access external memory
DDR2 controller also supports VISA code operation. Because the VISA instruction is compressed, the memory load is reduced. In addition, the number of bus collection is reduced, because in the best case, one 48 -bit draw contains three effective instructions. It also supports the execution code from the traditional ISA operation. Please note that no matter what VISA/isa, the code execution is only supported by bank 0. Table 5 shows the address range of obtaining the instruction in each mode.
Enjoy the external memory The processor supports connecting to a public shared external DDR2 memory with other ADSP-2146X processors to create a shared bus processor system. This support includes:
distributed arbitration on the distributed film of the external bus
fixed and rotating priority bus arbitration
bus Timeout logic
Bullet lock
Multiple processors can share the external bus without additional arbitration logic. The arbitral logic is included on the chip and allows up to two processors. Page 14 Table 10 provides a description of the pipe used in the multi -processor system.
DDR2 supports
The processor supports 16 -bit DDR2 interface, and the highest operating frequency is half of the core clock. Support from external memory execution. It can support external memory devices with a maximum of 2 GB.
DDR2 DRAM controller
DDR2 DRAM controller provides a 16 -bit interface, which can connect up to four sets of industrial standard DDR2 DRAM devices. It fully meets the DDR2 DRAM standard. Each bank can have its own memory selection line (DDR2 CS3 -DDR2 CS0), and it can be configured to memory containing 32 MB to 256 MB. DDR2 DRAM's external memory address space is shown in Table 6.
A set of programmable time parameters can be used to configure the DDR2 DRAM storage group to support memory equipment.
Please note that the displayed external repository address is used for normal words (32 -bit) access. If the 48 -bit instructions and 32 -bit data are placed in the same external repository, they must be careful when mapping them to avoid overlapping.
Asynchronous memory controller
The asynchronous memory controller provides configurable interfaces for up to four sets of independent memory or I/O devices. Each storage group can be independently programmed with different timing parameters, so as to connect to various storage devices, including SRAM, Flash, and EPROM, and I/O devices with standard storage control line interfaces. Type 0 occupies the window of 2 words, and the 1, 2, and 3 occupies 4 words in the processing space of the processor, but if there is no fully filled, these windows will not be set by the logic of the memory controller to be set to to be set to the memory controller to be set to to be set to the logic of the memory controller to continuously.
External port throughput
The throughput of the external port based on the 400MHz clock, AMI is 66M byte/second, and DDR2 is 800m byte/second.
Link port
Two 8 -bit wide link ports can be connected to the link port of other DSP or peripheral devices. The link port has eight data cables, A two -way port of confirmation line and a clock line. The link port can work at a maximum frequency of 166 MMS.
Media B
The car model has a MLB interface that allows the processor to run as the media's local bus device. It includes support for the local bus agreement of 3 stitches and 5 -stitches. It supports up to 1024FS (49.25m/second, FS 48.1kHz) speed and up to 31 logical channels. Each media local bus frame has up to 124 bytes of data.
MLB interface supports MOST25 and MOST50 data rates. Synchronous transmission mode is not supported.
Pulse width modulation
The pulse width modulation module is a flexible, programmable pulse width modulation waveform generator, which can be programmed to generate the motor and engine control or audio power control related to The switch mode required for various applications. The PWM generator can produce a central or marginal PWM waveform. In addition, it can generate complementary signals on the two outputs in the pairing mode, or independent signals (suitable for a set of four pulse width modulation waveforms) under the non -compatible mode. When the center of the center -to -PWM waveform is generated, the PWM generator can work in two different modes: single update mode or dual update mode.
There are four sets of PWM modules, four PWM outputs in each group. Therefore, the module generates a total of 16 PWM outputs. Each pulse width modulation group generates two pairs of pulse width modulation signals at the four pulse width modulation.
Digital application interface (DAI)
Digital application interface (DAI) provides the ability to connect various peripheral devices to any DAI pin (DAI_P20–1).
The program uses the signal route unit (SRU) to perform these connections, which is shown in Figure 1.
SRU is a matrix routing unit (or multi -road repeat device group), which enables the peripheral device provided by DAI to be connected under software control. This enables a set of algorithms with larger signal paths than unopened, which can easily use DAI -related peripheral devices in a wider range of applications.
DAI includes peripheral equipment described in the following sections.
Serial port
The processor has eight synchronous serial ports, for various digital and mixed signal peripheral devices (such as the AD183X series audio compilation, ADC and DAC) of the analog device) Provide cheap interfaces. The serial port consists of two data cables, one clock and frame synchronization. The data cable can be programmed to send or receive, and each data cable has a dedicated DMA channel.
When all eight movements are enabled, the serial port can support up to 16 audio data sending or 16 receiving DMA channels, or four full dual -work Ts of 128 channels per frame.DM stream.
The serial port runs at the maximum data rate of FPCLK/4. The serial port data can be automatically transmitted between the memory/external memory through the dedicated DMA channel. Each serial port can work with another serial port to provide TDM support. One movement provides two transmitting signals, and the other moves two receiving signals. Frame synchronization and clock are shared.
The serial port work in five modes:
Standard digital signal processor serial mode
Multi -channel (TDM) mode
I2S mode
compression I2S mode
The receiver/transmitter
S/PDIF receiver/transmitter does not have a separate DMA channel. It receives the audio data in serial format and converts it into a dual -phase encoding signal. The serial data entered to the receiver/transmitter can be formatted to the left, I2S or right alignment, and the word width is 16, 18, 20 or 24 bits.
The serial data, clocks, and frame synchronous inputs of S/PDIF receiver/transmitter input through signal routing unit (SRU) routing. They can come from a variety of sources, such as exercise, external pins and precision clock generators (PCG), and are controlled by SRU control registers.
Asynchronous sampling rate converter
Asynchronous sampling rate converter (ASRC) contains four ASRC blocks, which is the same as the AD1896 192 KHz stereo asynchronous sampling rate converter, providing a letter of up to 128DB Noise ratio. ASRC block is used to perform synchronization or asynchronous sampling rate conversion across independent stereo channels without using internal processor resources. The four SRC blocks can also be configured to work together to convert multi -channel audio data without compatibility without phase. Finally, ASRC can be used to remove audio data from the jitter clock source (such as S/PDIF receiver).
Enter the data port
IDP provides up to eight serial input channels, each channel has its own clock, frame synchronization and data input. The eight channels were automatically reused by eight depth FIFO multi -way multiple. The data always formats to 64 -bit frames and is divided into two 32 -bit characters. The serial protocol design is used to receive I2S, left alignment sample pairs or right -to -alignment audio channels. A frame synchronization cycle represents a 64 -bit left/right, but the data is sent to FIFO in the form of 32 -bit characters (that is, half of the frame). The processor supports 24 -bit and 32 -bit I2, 24 -bit and 32 -bit left alignment, as well as 24 -bit, 20 -bit, 18 -bit, and 16 -bit right -nitrogen formats.
Precision clock occurInstrument
The precision clock generator (PCG) consists of four units A, B, C, and D, and each unit generates a pair of signals (clocks and frame synchronization) from the clock input signal. These units are the same functional and operate independently. The two signals generated by each unit are usually used as a string clock/frame synchronization pair.
Digital peripheral interface (DPI)
Digital peripheral interface provides two serial peripheral interface (SPI) ports, a universal asynchronous transceiver (UART), 12 signs, a 2 line 2 line The connection between the interface (TWI) and two general timers. DPI includes peripheral devices described in the following sections.
Serial peripheral interface
The processor contains two serial peripheral interface ports (SPI). SPI is a synchronous serial link of industrial standards that enable SPI compatible ports to communicate with other SPI compatible devices. SPI is composed of two data tube, one device choice tube and a clock tube foot. It is a full -duplex synchronous serial interface, which supports the main mode and the pattern. The SPI port can work in multiple main environments, which can be connected to as many as four other SPI compatible devices (as the main device or from the device). SPI compatible peripheral implementation also has a programmable Potter rate, clock phase and polarity. The SPI compatibility port uses the leakage driver to support multi -host configuration and avoid data fighting.
UART port
The processor provides a full -dual -industrial universal asynchronous and receiving (UART) port, which is completely compatible with the PC standard UART. The UART port provides a simplified UART interface for other peripherals or hosts, supports full dual -workers, DMA, and asynchronous serial data transmission. UART also has multi -processor communication capabilities using 9 -bit address detection. This allows it to be used in multi-point networks through the RS-485 data interface standard. The UART port also supports 5 to 8 data bits, 1 or 2 stop bits, and no unprepared verification. The UART port supports two operating modes:
PIO (programming I/O) -The processor sends or receives data by writing or reading the UART register of I/O mapping. Data are double -buffering when sending and receiving.
DMA (Direct memory access) -DMA controller transmission and receiving data. This reduces the number of interruptions and frequencies required to transmit data between memory.
timer
The processor has three timers: one core timer can generate cyclical software interruption, two general -purpose timers can generate cyclical interruptions, and can One of the three modes runs:
pulse wave shape generation mode
pulse width count/capture mode
external event supervision supervisionView program mode
The core timer can be configured to use FLAG3 as an expiration signal of the timer. Each general -purpose timer has one two -way PIN and four registers to achieve its operating mode. Single control and status register independently enable or disable two general timers.
2 line interface port (TWI)
TWI is a two -way, 2 -line serial bus, which is used to move 8 -bit data, while maintaining the consistency with the I2C bus protocol. TWI MASTER contains the following functions:
7 -bit address
Multi -device systems that support multi -main data arbitration at the same time
[[ 123] Digital filtering and timing event processing
100 kbps and 400 kbps data rates
Low interruption rate
I/ I/ O processor functionThe automatic version of the I/O processor provides 67 DMA channels, and the standard version provides 36 DMA channels, and a set of extensive peripheral equipment described in the following sections.
DMA controller
DMA controller allows data transmission of data without processor intervention. The DMA controller operates independently and invisible to the processor's core, allowing DMA operations when the core is performed at the same time. DMA transmission can occur between the internal memory of the processor and its serial port, SPI compatibility (serial peripheral interface) port, IDP (input data port), parallel data collection port (PDAP), or UART.
As shown in Table 7, up to 67 DMA channels can be provided. You can use DMA transmission to download the program to the processor. Other DMA functions include a DMA chain generated by DMA transmission, and the DMA chain used to automatically link DMA transmission.
delay line DMA
The delay line DMA allows processors to allow processors to read and write external delay line buffer with limited kernel interoperability (so as to read and write external memory).
Disposter/gather DMA
Disted/gathering DMA allows DMA reading/writing to non -continuous memory blocks.
IIR accelerator
IIR (infinite impulse response) accelerator is used by 1440 word coefficients (for storing dual four -yuan coefficient), data memory (for use In the storage intermediate data) and a Mac unit. Controller management accelerator. IIR accelerator runs at the periphery clock frequency.
Quick Fourier transform accelerator
FFT accelerator implementation base 2 replica/real input, re -output FFT, without core intervention. FFT accelerator is peripheralThe clock frequency runs.
Fullwood Accelerator
FIR (limited pulse) accelerator consists of 1024 -character coefficient memory, 1024 -character depth delay line, and four Mac units. Controller management accelerator. FIR accelerator runs at the periphery clock frequency.
System designThe following sections introduce system design options and power problems.
Program start
The internal memory via the external port, link port, SPI main device or SPI from the 8 -bit EPROM startup system power. The guidance is determined by the guidance configuration (BOOTCFG2–0) foot in Table 8.
Run the reset function allows the program to execute the core and peripheral devices without reset the PLL and DDR2 DRAM controller or execute guidance. The function of this reset pins is also used as the input of starting operation. For details, see the hardware reference of the ADSP-214xx Sharc processor.
Power supply
The processor has an independent power connection for internal (VDD_int), external (VDD_Ext) and analog (VDD_A) power. Internal and simulated power supply must comply with VDD international specifications. The external power supply must comply wi