The Xilinx® Vitis™ AI Library is a set of high-level libraries and APIs built for efficient AI inference with a Deep-Learning Processor Unit (DPU). It is built based on the Vitis AI Runtime with unified APIs, and it fully supports XRT 2019.2.
The Vitis AI Library provides an easy-to-use and unified interface by encapsulating many efficient and high-quality neural networks. This interface simplifies the use of deep-learning neural networks and allows you to focus more on the development of their applications rather than the underlying hardware.
AI object detection algorithms have been widely used in advanced driver-assist systems (ADAS), medical, and smart city domains. Single shot object detectors are the most popular type of algorithms. Two single shot object detectors are SSD and YOLO. Single Shot Detector (SSD) is a neural network model that is easy to train and has much better accuracy even with a smaller input image size. You Only Look Once (YOLO) is a state-of-the-art and real-time object detection system. In the YOLO algorithm series, YOLOv3 is extremely fast and has better accuracy in detecting small objects.
Vitis AI Library supports both SSD and YOLO model deployments and provides the optimized pre-processing/post-processing library in order to improve the E2E performance. If you have your own model with standard pre/post processing (either SSD or YOLO), you can deploy your own SSD/YOLO model using Vitis AI Library.
This article provides a detailed introduction for quickly deploying an SSD or YOLOv3 model on a Xilinx UltraScale+™ MPSoC device using Vitis AI Library. The latest Vitis AI Library is available as part of the Xilinx Vitis AI package: https://github.com/Xilinx/Vitis-AI.
For information about the Vitis™ DPU integration flow, refer to the following documentation:
You can customize your own Vitis AI Library Debian installation package based on the vitis_ai_library_2019.2-r1.0.deb. This installation package can be downloaded from https://github.com/Xilinx/Vitis-AI/tree/master/Vitis-AI-Library.
To customize the installation package:
1. Extract vitis_ai_library_2019.2-r1.0.deb to the vitis_ai_library folder on the host:
$mkdir vitis_ai_library
$dpkg -X vitis_ai_library_2019.2-r1.0.deb vitis_ai_library
2. Extract the Debian package information file on the host:
$mkdir vitis_ai_library/DEBIAN
$dpkg -e vitis_ai_library_2019.2-r1.0.deb vitis_ai_library/DEBIAN
Note: The Debian package information can be modified by editing the control and preinst files.
3. Customize the Debian package by removing the following unnecessary folders and files in the vitis_ai_library folder:
Note: You can view the sample list of files in the “Libraries and Samples” chapter in the Vitis AI Library User Guide (UG1354).
4. Re-package:
$dpkg -b vitis_ai_library vitis_ai_library.deb
Note: The size of new vitis_ai_library.deb file decreases to 11 MB compared with original file size of 123 MB.
5. Install on the target board. To do this, copy the vitis_ai_library.deb file to the board via scp and install:
$dpkg -i vitis_ai_library.deb
To set up the host:
You can convert your own SSD float model to an .elf file using the Vitis AI tools docker, and then generate the executive program using Vitis AI runtime docker to run it on board. Your SSD model is based on the Caffe deep learning framework; it is called ssd_user in this example.
1. From the Vitis-AI folder, copy the following to your SSD float model project folder:
2. Load the Vitis AI tools docker and enter the conda environment as follows:
$ ./docker_run.sh xilinx/vitis-ai:tools-1.0.0-gpu
/workspace$ conda activate vitis-ai-caffe
3. Modify and run the quantization and compilation script to convert the float model to ssd_user.elf as follows:
/workspace$ bash 1_caffe_quantize.sh
/workspace$ bash 2_caffe_compile.sh
Note: The parameter of --options "{'mode': 'normal'}" should be added to the 2_caffe_compile.sh script to make the .elf file in normal mode.
1. Create a ssd_user.prototxt file and add the content based on your SSD model parameter.
The following is SSD_mobilev2 as reference.
model {
name : "ssd_mobilenet_v2_480x360"
kernel {
name: "ssd_mobilenet_v2_480x360"
mean: 104.0
mean: 117.0
mean: 123.0
scale: 1.0
scale: 1.0
scale: 1.0
}
model_type : SSD
ssd_param : {
num_classes : 11
nms_threshold : 0.4
conf_threshold : 0.0
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
conf_threshold : 0.3
keep_top_k : 200
top_k : 400
prior_box_param {
layer_width : 60,
layer_height: 45,
variances: 0.1
variances: 0.1
variances: 0.2
variances: 0.2
min_sizes: 15.0
min_sizes: 30.0
max_sizes: 33.0
max_sizes: 60.0
aspect_ratios: 2.0
offset: 0.5
step_width: 8.0
step_height: 8.0
flip: true
clip: false
}
prior_box_param {
layer_width : 30,
layer_height: 23,
variances: 0.1
variances: 0.1
variances: 0.2
variances: 0.2
min_sizes: 66.0
max_sizes: 127.0
aspect_ratios: 2.0
aspect_ratios: 3.0
offset: 0.5
step_width: 16.0
step_height: 16.0
flip: true
clip: false
}
prior_box_param {
layer_width : 15,
layer_height: 12,
variances: 0.1
variances: 0.1
variances: 0.2
variances: 0.2
min_sizes: 127.0
max_sizes: 188.0
aspect_ratios: 2.0
aspect_ratios: 3.0
offset: 0.5
step_width: 32.0
step_height: 32.0
flip: true
clip: false
}
prior_box_param {
layer_width : 8,
layer_height: 6,
variances: 0.1
variances: 0.1
variances: 0.2
variances: 0.2
min_sizes: 188.0
max_sizes: 249.0
aspect_ratios: 2.0
aspect_ratios: 3.0
offset: 0.5
step_width: 64.0
step_height: 64.0
flip: true
clip: false
}
prior_box_param {
layer_width: 4,
layer_height: 3,
variances: 0.1
variances: 0.1
variances: 0.2
variances: 0.2
min_sizes: 249.0
max_sizes: 310.0
aspect_ratios: 2.0
offset: 0.5
step_width: 100.0
step_height: 100.0
flip: true
clip: false
}
}
}
2. Create a meta.json file and add the content as follows.
{
"target": "DPUv2",
"lib": "libvart_dpu.so",
"filename": "ssd_user.elf",
"kernel": [ "ssd_user" ],
"config_file": "ssd_user.prototxt"
}
3. Save the ssd_user.elf, ssd_user.prototxt, and meta.json files to a model folder called ssd_user.
1. Load the Vitis AI runtime docker in the Vitis-AI folder:
./docker_run.sh xilinx/vitis-ai:runtime-1.0.0-cpu
2. Run the build script in the Vitis-AI-Library/samples/ssd/ folder:
/workspace$sh -x build.sh
The executable program of test_jpeg_ssd is generated.
1. Create a vitis_ai_library/models/ folder under /usr/share/ on the target board.
2. Use the scp command to copy the following to the vitis_ai_library/models/ folder you just created:
3. Run the executable program as follows.
$./test_jpeg_ssd ssd_user samle_image.jpg
The result is saved as sample_image_result.jpg.
You can convert your own YOLOv3 float model to an ELF file using the Vitis AI tools docker and then generate the executive program with Vitis AI runtime docker to run it on their board. Your YOLOv3 model is based on Caffe framework and named as yolov3_user in this sample.
1. Copy the following to your YOLOv3 float model project folder:
The docker_run.sh file in the Vitis-AI folder
The script files in the Vitis-AI/mpsoc/vitis-ai-tool-example/ folder
2. Load the Vitis AI tools docker and enter the conda environment as follows:
$ ./docker_run.sh xilinx/vitis-ai:tools-1.0.0-gpu
/workspace$ conda activate vitis-ai-caffe
3. Modify and run the quantization and compilation script to convert the float model to yolov3_user.elf as follows:
/workspace$ bash 1_caffe_quantize.sh
/workspace$ bash 2_caffe_compile.sh
Note: The parameter of --options "{'mode': 'normal'}" should be added to the 2_caffe_compile.sh script to make the elf file in normal mode.
1. Create a yolov3_user.prototxt file.
2. Add the content based on your YOLOv3 model parameter. The following is yolov3_coco as reference.
model {
name: "yolov3_coco_608"
kernel {
name: "yolov3_coco_608"
mean: 0.0
mean: 0.0
mean: 0.0
scale: 0.00390625
scale: 0.00390625
scale: 0.00390625
}
model_type : YOLOv3
yolo_v3_param {
num_classes: 80
anchorCnt: 3
conf_threshold: 0.05
nms_threshold: 0.45
biases: 10
biases: 13
biases: 16
biases: 30
biases: 33
biases: 23
biases: 30
biases: 61
biases: 62
biases: 45
biases: 59
biases: 119
biases: 116
biases: 90
biases: 156
biases: 198
biases: 373
biases: 326
test_mAP: false
}
}
3. Create a meta.json file and add the content as follows:
{
"target": "DPUv2",
"lib": "libvart_dpu.so",
"filename": "yolov3_user.elf",
"kernel": [ "yolov3_user" ],
"config_file": "yolov3_user.prototxt"
}
4. Put the yolov3_user.elf, yolov3_user.prototxt, and meta.json files in a yolov3_user model folder.
1. Load the Vitis AI runtime docker under “Vitis-AI” folder:
./docker_run.sh xilinx/vitis-ai:runtime-1.0.0-cpu
2. Run the build script under the Vitis-AI-Library/samples/yolov3/ folder:
/workspace$sh -x build.sh
The executable program of test_jpeg_yolov3 is generated.
To test the YOLOv3 executable program on the target board:
1. Create a vitis_ai_library/models/ folder under /usr/share/ on the target board.
2. Use the scp command to copy the following to the vitis_ai_library/models folder you just created:
3. Run the executable program as follows:
$./test_jpeg_yolov3 yolov3_user samle_image.jpg
The result is saved as sample_image_result.jpg.
You can also measure the pre/post processing and DPU time by modifying the demo.hpp file in the /opt/vitis_ai/petalinux_sdk/sysroots/aarch64-xilinx-linux/usr/include/xilinx/ai folder of the Vitis AI runtime docker environment as follows:
DEF_ENV_PARAM(DEEPHI_DPU_CONSUMING_TIME, "0");
int main_for_jpeg_demo(int argc, char *argv[],
const FactoryMethod &factory_method,
const ProcessResult &process_result,
int start_pos = 1) {
if (argc <= 1) {
usage_jpeg(argv[0]);
exit(1);
}
ENV_PARAM(DEEPHI_DPU_CONSUMING_TIME) = 1;
auto model = factory_method();
for (int i = start_pos; i < argc; ++i) {
auto image_file_name = std::string{argv[i]};
auto image = cv::imread(image_file_name);
if (image.empty()) {
LOG(FATAL) << "cannot load " << image_file_name << std::endl;
abort();
}
xilinx::ai::TimeMeasure::getThreadLocalForDpu().reset();
auto start = std::chrono::steady_clock::now();
auto result = model->run(image);
auto end = std::chrono::steady_clock::now();
auto end2endtime = int(std::chrono::duration_cast<std::chrono::microseconds>(end - start).count());
auto dputime = xilinx::ai::TimeMeasure::getThreadLocalForDpu().get();
std::cout << "E2E time: " << end2endtime << " ms " <<std::endl;
std::cout << "DPU time: " << dputime << " ms " <<std::endl;
image = process_result(image, result, true);
auto out_file =
image_file_name.substr(0, image_file_name.size() - 4) + "_result.jpg";
cv::imwrite(out_file, image);
LOG_IF(INFO, ENV_PARAM(DEBUG_DEMO)) << "result image write to " << out_file;
}
LOG_IF(INFO, ENV_PARAM(DEBUG_DEMO)) << "BYEBYE";
return 0;
}
The following documentation provides additional context:
Jian Zhou is located in Beijing, China and serves as an Machine Learning Specialist Field Applications Engineer (FAE) in Great China sales team. Before joining AMD,, Jian Zhou worked for Deephi as marketing specialist FAE for AI solution support, and has more than 10 years’ experience on embedded system development. He has MS degree and is familiar with AI model deployment based on C++/python.