FBLAS: Streaming Linear Algebra on FPGA


Spatial computing architectures pose an attractive alternative to mitigate control and data movement overheads typical of load-store architectures. In practice, these devices are rarely considered in the HPC community due to the steep learning curve, low productivity, and the lack of available libraries for fundamental operations. High-level synthesis (HLS) tools are facilitating hardware programming, but optimizing for these architectures requires factoring in new transformations and resources/performance trade-offs. We present FBLAS, an open-source HLS implementation of BLAS for FPGAs, that enables reusability, portability and easy integration with existing software and hardware codes. FBLAS' implementation allows scaling hardware modules to exploit on-chip resources, and module interfaces are designed to natively support streaming on-chip communications, allowing them to be composed to reduce off-chip communication. With FBLAS, we set a precedent for FPGA library design, and contribute to the toolbox of customizable hardware components necessary for HPC codes to start productively targeting reconfigurable platforms.

Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC20)