Dataflow Bypassing Tasks

Description

This message reports that dataflow pipeline is violating canonical forms which may lead to performance degradation.

Explanation

DATAFLOW optimization expects a specific structure of functions calls and data transfers. This structure is known as the canonical form. It consists of a sequence of function calls which only exchange data in the forward direction, from one function to the one that immediately follows. A dataflow region where data transfers bypass certain functions does not comply with the canonical form. This can result in reduced performance.

This loss of performance can also be seen in the dataflow viewer after running cosim. See Dataflow Viewer for more information.

One such limitation is Bypassing Tasks.

Example

In the following example, Loop1 generates the values for temp1 and temp2. However, the next task, Loop2, only uses the value of temp1. The value of temp2 is not consumed until after Loop2. Therefore, temp2 bypasses the next task in the sequence, which can limit the performance of the DATAFLOW optimization.

void foo(int data_in[N], int scale, int data_out1[N], int data_out2[N])
{
# pragma HLS DATAFLOW
int temp1[N], temp2[N], temp3[N];
    Loop1: for(int i = 0; i < N; i++)
    {
        temp1[i] = data_in[i] * scale;
        temp2[i] = data_in[i] >> scale;
    }
    Loop2: for(int j = 0; j < N; j++)
    {
        temp3[j] = temp1[j] + 123;
    }
    Loop3: for(int k = 0; k < N; k++)
    {
        data_out[k] = temp2[k] + temp3[k]; }
    }
 
}

The bypass rule of the dataflow canonical form also applies to inputs and outputs. Inputs should be read by the first function and outputs should be written by the last function of the dataflow region.

Input Bypass.

void add_kernel(int tmp1[128], int tmp2[128], int tmp3[128])
{
 
  for(int i=0;i<128;i++)
    {
      tmp3[i] = tmp1[i] + tmp2[i];
    }
 
}
 
void double_pass(int b[128], int tmp2[128], int tmp1[128], int tmp4[128])
{
   
  for(int i=0;i<128;i++)
    {
      tmp2[i] = b[i];
      tmp4[i] = tmp1[i];
    }
   
}
 
void pass(int a[128], int tmp1[128])
{
  for(int i=0;i<128;i++)
    {
      tmp1[i] = a[i];
    }
}
 
void dut(int a[128], int b[128], int c[128])
{
#pragma HLS DATAFLOW
 
  int tmp1[128], tmp2[128], tmp3[128];
   
  pass(a,tmp1);
  double_pass(b, tmp2, tmp1, tmp3);
  add_kernel(tmp3, tmp2, c);
   
}

Solution

Consider the following solutions based on the design.

Copy Buffer

The solution is to copy the temp2 data inside the Loop2 and use the buffer data in Loop 3 as shown below:

void foo(int data_in[N], int scale, int data_out1[N], int data_out2[N])
{
int temp1[N], temp2[N]. temp3[N], temp4[N];
 
Loop1: for(int i = 0; i < N; i++)
{
    temp1[i] = data_in[i] * scale;
    temp2[i] = data_in[i] >> scale;
}
Loop2: for(int j = 0; j < N; j++)
{
    temp3[j] = temp1[j] + 123;
    temp4[j] = temp2[j];
}
Loop3: for(int k = 0; k < N; k++)
{
 
data_out[k] = temp4[k] + temp3[k];
}
}

Stable Array

The stable pragma is used to mark the input or output variables of a dataflow region. This pragma can be applied to variables whose values should only change if the dataflow region is not being executed.

In example 1.2 the input b will stay constant across all the iterations:

void add_kernel(int tmp1[128], int tmp2[128], int tmp3[128])
{
 
  for(int i=0;i<128;i++)
    {
      tmp3[i] = tmp1[i] + tmp2[i];
    }
 
}
 
void double_pass(int b[128], int tmp2[128], int tmp1[128], int tmp4[128])
{
   
  for(int i=0;i<128;i++)
    {
      tmp2[i] = b[i];
      tmp4[i] = tmp1[i];
    }
   
}
 
void pass(int a[128], int tmp1[128])
{
  for(int i=0;i<128;i++)
    {
      tmp1[i] = a[i];
    }
}
 
void dut(int a[128], int b[128], int c[128])
{
#pragma HLS DATAFLOW
#pragma HLS stable variable=B
  int tmp1[128], tmp2[128], tmp3[128];
   
  pass(a,tmp1);
  double_pass(b, tmp2, tmp1, tmp3);
  add_kernel(tmp3, tmp2, c);
   
}