

# Future of FPGA Compilation and Acceleration

Presented By



Dr. Yuxin Wang Senior R&D Engineer Oct 16 2018



## Simplifying acceleration on Xilinx FPGAs since 2014

Founded in 2014 by Dr. Jason Cong, Director of the Center for Domain Specific Computing (CDSC) in UCLA





## **Falcon Acceleration Platform**



#### **Seamless Acceleration Solutions**

Vertical Specific Solutions, C/C++ FPGA Acceleration & Run-Time Environment

- ✓ Accelerated Genomics pipeline
- ✓ Fast C/C++ to FPGA accelerator development
- ✓ Accelerator management for heterogeneous compute clusters.
- ✓ Heterogeneous Platform Support across CPUs and FPGAs today and GPUs in the future

#### Available on public & private clouds or on-premise









**E** XILINX.

## **Acceleration Challenges**

#### **Software application developers**

- > New-gen applications need acceleration hardware
- App Developers need a familiar programming paradigm for these accelerators



GENOMICS



MACHINE LEARNING





#### Hardware developers

- Teams that do have HW expertise, often lack the quantity needed to meet the scale & agility of new application development
- > Need greater productivity to meet business needs





## **Designing Accelerators is Difficult**

- > C/C++ code first needs to be converted to OpenCL or HDL (Verilog/VHDL)
- > Accelerators need to be optimized for performance



Matrix multiplication 4 lines of C grows to >50+ lines!!





## **Merlin Compiler Overview**



#### C/C++ FPGA Acceleration

- Pure C/C++ flow
- No FPGA expertise required
- OpenMP programming model with auto-generated 'ACCEL' pragmas
- Automatically performs source to source optimization for accelerated performance





### Merlin Compiler C++ FPGA Accelerator Flow – 1 2 3







Software Application Developers can leverage FPGA acceleration with "As-is" code using Merlin

- 5-15X Performance Acceleration over CPUs
- Minimal code changes for acceleration benefits
- Single Acceleration platform across multiple vendor FPGAs & diverse apps such as Computer Vision, ML, Genomics

**Out-of-the-box Performance** 

#### Merlin CPU Merlin **Example Designs FPGA** (ms) Speedup (ms) Black Scholes Asian 477370 34430 13.9X 8590 Black Scholes European 116310 13.5X Heston European 341650 34430 **10X** 17220 Heston European Barrier 38630 2.2X

## Hardware Developers can increase their productivity with Merlin

- > 6-10X gain in productivity with equivalent performance
- Machine-learning based auto-generated or manually inserted pragmas
- Quality of results comparable to manually optimized OpenCL implementation

#### **Optimized Performance**

| Example Designs        | CPU<br>(ms) | Merlin<br>FPGA<br>(ms) | Manual<br>OpenCL<br>(ms) | Merlin<br>Speedup | Merlin<br>Productivity<br>Saving* |
|------------------------|-------------|------------------------|--------------------------|-------------------|-----------------------------------|
| Black Scholes Asian    | 477370      | 920                    | 830                      | 519X              | 7.8X                              |
| Black Scholes European | 116310      | 230                    | 230                      | 506X              | 6.8X                              |
| Heston European        | 341650      | 1470                   | 1530                     | 232X              | 7.1X                              |
| Heston European        |             |                        |                          |                   |                                   |
| Barrier                | 38630       | 690                    | 750                      | 56X               | 6.4X                              |

\* Productivity measured in time saving over manual rewrite of C++ to hand optimized OpenCL Kernel and Host CPU Code Platform: AWS F1 Xilinx vu9p



### Merlin Compiler C++ FPGA Accelerator Flow – 1 2 3







## **Merlin Compiler Automated Design Space Exploration**







## **Merlin Compiler Automated Design Space Exploration**



https://www.falconcomputing.com/





# Acceleration simplified across multiple applications regardless of hardware experience



Merlin Compiler

Expert Services Kestrel Runtime

Genomics



## Adaptable. Intelligent.





