What is it?:
- C-like almost hardware description language
- A compiler that produces VHDL for specific devices/operating frequencies
I am looking for:
- anyone who wants to help me develop (Python, VHDL, C)
- suggestions on how to make PipelineC more useful/new features
- project ideas (heyo open source folks)
In the mean time, I am also here to share my most interesting example so far: Using PipelineC with an AWS F1 instance.
I have made an AMI that you can use to play around with. However, it cannot be made public; I can only share it with specific AWS accounts, please message me if interested.
I want to share with you why I think PipelineC is particularly powerful:
First, it can mostly replace VHDL/Verilog for describing low level, clock by clock, hardware control logic. Consider the following generic VHDL:
-- Combinatorial logic with a storage register
signal the_reg : some_type_t;
signal the_wire : some_type_t;
process(input, the_reg) is -- inputs sync to clk
variable input_variable: some_type_t;
variable the_reg_variable : some_type_t;
input_variable := input;
the_reg_variable := the_reg;
... Do work with 'input_variable', 'the_reg_variable'
and other variables, functions, etc and it kinda looks like C ...
the_wire <= the_reg_variable;
the_reg <= the_wire when rising_edge(clk);
output <= the_wire;
The equivalent PipelineC is
some_type_t some_func_name(some_type_t input)
... Do work with 'input', 'the_reg'
... and other variables, functions, etc...
Using that functionality I was able write very RTL-esque serialize+deserialize logic for the AXI4 interface that the AWS F1 shell logic provides to 'customer logic' for DMA. The AXI4 is deserialized to a stream of 4096 byte input data chunks that can be
processed by a 'work' function.
I find that most HLS tools have trouble giving the user this sort of low level control, probably under the assumption that its too low level and not meant for software folks to be concerned with. Most hardware description languages are built for exactly
Second, PipelineC can replace the most basic feature of other HLS tools: auto-pipelineing functions:
This AWS example sums 1024 floating point values via an N clock cycle pipelined binary tree of 1023 floating point adders (soft logic, not hard cores yet).
Below is the PipelineC code:
float work(float inputs)
// All the nodes of the tree in arrays so can be written using loops
// ~log2(N) levels, max of N values in parallel
float nodes; // Unused elements optimize away
// Do the computation starting at level 1
n_adds = 1024/2;
for(level=1; level<11; level=level+1)
// Parallel sums at this level
for(i=0; i<n_adds; i=i+1)
nodes[level-1][i*2] + nodes[level-1][(i*2)+1];
// Each level decreases adders in next level by half
n_adds = n_adds / 2;
// Return the last node in tree
(To be clear, I am NOT claiming that this is the best way to sum floats in hardware - its just a basic example big enough to use most of the FPGA).
The PipelineC tool inserts pipeline registers as needed to meet timing on the particular device technology + operating frequency. I find that most HLS tools are pretty good at this (and will do alot more than inferring pipelines too) but often require
some ugly pragmas that - in a way - can make the code undesirably device specific. Hardware description languages can certainly describe the above hardware. But the code will almost certainly describe a pipeline designed specific to device technology/
operating frequency - making the code hard for others to reuse even if you are kind enough to share it.
The very capable Virtex Ultrascale+ AWS hardware allows the PipelineC tool to fit the work() function into a pipeline depth/latency of 15 clock cycles (might be able to squeeze into few as 10 clocks). Running at 125MHz, it thus is capable of summing
1024 floating point values in 120 nanoseconds, with an 8 ns cycle time.
// Put output message into outgoing DMA read data when requested
o.pcis = serializer(msg_out, i.pcis.arvalid);
On the software side, utilizing the FPGA hardware with user space file I/O calls looks like:
// Do work() using the FPGA hardware
work_outputs_t work_fpga(work_inputs_t inputs)
// Convert input into bytes
write_msg = inputs_to_bytes(inputs);
// Write those DMA bytes to the FPGA
// Read a DMA bytes back from FPGA
read_msg = dma_read();
// Convert bytes to outputs and return
work_outputs = bytes_to_outputs(read_msg);
So there you have it: Low level RTL-like control, working right beside highly pipelined logic. All in a familiar C look that could just be compiled with gcc for 'simulation'. Ex. this example uses the same work() function code as hardware description and
as the 'golden C model' compiled with gcc to compare against.
In the sense that C abstracts away the hardware specifics of each CPU architecture + memory model, but only at a very minimal level, I want PipelineC to be the same for digital logic. The same PipelineC code should produce computationally equivalent
hardware on any FPGA/ASIC device technology through smarts in the compiler. But C/PipelineC obviously doesn't do everything, there isnt a whole lot of higher level abstraction done for you. Its just the bedrock to build shareable libraries.
Some big features PipelineC lacks as of the moment
- Flow control/combinatorial feed-backward signals through N clock pipelined logic
- PipelineC can describe FIFOs, BRAMs (hard BRAM IP is the only IP supported right now) to work with data flows, but the equivalent off a bare combinatorial <= assignment operator feedback is missing
- Multiple clock domains / clock crossings (have some neat ideas about this).
- This would likely be my next big...many month... task?
- The C parser I'm using doesnt let you return constant sized arrays, but PipelineC as a language really should, but I think if I modified it (oh gosh help me?) and said 'use g++' to compile this 'C code that returns arrays' I think it could work out?
Got any ideas on what you'd want to do with PipelineC? Let me know maybe we can make something cool together. Want support for an open source synthesis tool, I can give Yosys a try?