关键词:
Computer Science
摘要:
Many scientific applications rely on the evaluation of elementary transcendental functions (e.g. log(x), 1/x, √x, ex). Software math libraries are a popular approach for realizing such functions, and frequently use series expansion and/or lookup-table-based (LUT-based) methods. However, software approaches necessarily suffer from the traditional overheads of fetching/decoding instructions, limited cache sizes, and so on. To this end, we present hardware accelerators for such functions that deliver high computational throughput, high accuracy and a small circuit. We implement the reciprocal (1/x) and square root (√x) functions into pipelined hardware accelerators on FPGA. The proposed accelerators are designed with iterative, and LUT-based algorithms. Here, the LUT-based algorithm uses approximately 1 KB LUT along with a degree-2 polynomial interpolation. All algorithms are specified using C language, and synthesized into RTL with the LegUp HLS. In an experimental study, we compare our LUT-based accelerators with IP cores from the Intel/Altera FPGA vendor. Results show that our LUT-based accelerators offer considerably better area usage, while Intel/Altera IP cores operate at a modestly higher throughput. Both ours and Intel/Altera IP cores achieve 1 ULP error. The LUT-based algorithm is generic in the sense that it could be used to implement an entire library of single-precision elementary functions into high-performance hardware accelerators. We also implement a 32-bit integer RISC-V multi-cycle processor on FPGA, which consists of 39 user instructions. The processor is specified using C language, and synthesized into RTL with the LegUp HLS. Custom testing programs are deveped to verify if each instruction adheres to the RISC-V specification. We demonstrate that through changes to the C specification and HLS constraints, RISC-V processors with different performance/area trade-offs can be explored rapidly. One implementation of the multi-cycle processor uses 795 ALM