FPGAs for HPC and data science
This testbed provides HPC code developers and data-scientists with access to the latest data centre Field Programmable Gate Arrays (FPGAs) to experiment with exploiting this hardware for accelerating their codes. Whilst FPGAs have been around for many years and enjoyed significant popularity in some fields, they have yet to gain traction in HPC. Arguably this is not for lack of trying, but instead traditionally the hardware was somewhat limited and development tooling highly specialised and esoteric.
However the past several years have seen very significant advances made by the vendors, where not only is the hardware far more capable but furthermore significant investment has been made in the programming eco-system. This means that now, more than ever before, it is possible to view writing codes for FPGAs as that of software development rather than hardware design. We believe that FPGAs have an important potential role for accelerating HPC codes, and to fully leverage this then the hardware should be made readily available to application developers.
It is our intention that this will be a first step towards building a future community and ecosystem around the role of FPGAs in HPC, data science, AI, and machine learning workloads in the UK. The project will also be running a series of training events and workshops, and developing training material to ensure the system is accessible and usable. The testbed is physically based in EPCC’s Advanced Compute Facility, and made publicly available. It will form a unique resource within UK academic computing, as a single system that provides access to next-generation Versal Adaptive Compute Acceleration Platform (ACAP) technology from Xilinx, which includes their revolutionary AI engines; hierarchical memory hardware provision, with high bandwidth (HBM2) and Non-Volatile (NVRAM) memory on some of the hosted hardware, providing a unique resource for software developers and algorithm designers to investigate this emerging field in computing hardware; multiple networking options including a high performance node-level network and direct FPGA to FPGA networking to enable system designers and applications developers to assess the relative merits of both approaches; and multiple families of FPGA, allowing evaluation of a range of technologies by users.
An ecosystem for FPGA code development
We strongly believe that to make this technology accessible, in addition to the FPGA hardware itself, it is also important to have a full programming ecosystem readily available and convenient for users. Therefore the testbed also provides the required toolchains and licences pre-installed, as well as hardware for building and testing the FPGA codes before deployment to the hardware.
The system is hosted within an existing, established and modern HPC system which provides sufficient resources to enable developers to quickly and efficiently develop application kernels, synthesise their FPGA bitstreams and test their codes in emulation. Furthermore, RSE effort is provided to develop an enabling software stack that should significantly reduce the barrier to entry in utilising FPGAs for scientific and data-science applications.