Reconfigurable architecture (RA) is not that popular in recent years. Most papers coming back from googling are 10+ years old. But with the emerging and booming of machine learning and AI, parallel processing and SIMD architecture gains lots of attention and reconfigurable architecture also finds a new life.
Here is just a short review of some interesting papers about RA.
First one is “Mapping Parallel FFT Algorithm onto SmartCell Coarse-Grained Reconfigurable Architecture” from 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.
The paper presents so-called SmartCell, coarse-grained RA architecture, where PE is processing element and CB is cross bar. Within a cell, PEs are connected through CB. Among cells, there exist direct connection between PEs if they are adjacent to each other. Most inter-cell connection is done through the network on chip.
Each PE is like a tiny processor with Arithmetic Unit and Logic Unit as the processing engine. Instruction controller is to control how PE operates.
This paper shows how FFT can be achieved through this SmartCell architecture.
Another paper is “Configuration Memory Based Dynamic Coarse Grained Reconfigurable Multicore Architecture for 8 Point FFT” from 2015 7th International Conference on Emerging Trends in Engineering & Technology.
This paper presents a 4×4 coarse grained RA (CGRA). PE is the processing element. Configuration memory is the memory to hold instructions to control PEs. Shared memory is the memory to hold raw data to PEs and store processed data from PEs.
Coincidentally this paper also uses 8 point FFT to show how this CGRA works. An 8-point FFT butterfly diagram is as below.
To map it to CGRA, data flow diagram (DFG) needs to be generated as:
Finally the mapping of DFG to CGRA can be achieved. Below mapping table shows at each time slot which PE is doing what action. This table is then saved into the configuration memory which will control PEs’ action once triggered.
There is another good slides about RA online, Introduction to CGRA.
It shows an array of RCs, Reconfigurable Cells.
Each RC is just like a tiny processor.
The whole system block diagram is more interesting. “Context Memory” is above configuration or instruction memory for RCs. Instructions can be either written from “Tiny_RISC” core processor or can be DMA-ed from main memory into context memory. (Check the dotted control lines) The data to be processed initially resides in main memory. DMA moves them into Frame Buffer for RCs to process. Processed data is written back to Frame Buffer which is then DMA-ed back to main memory.