Industrial manufacturing
Industrial Internet of Things | Industrial materials | Equipment Maintenance and Repair | Industrial programming |
home  MfgRobots >> Industrial manufacturing >  >> Industrial programming >> VHDL

Implementing an AXI FIFO in Block RAM Using Ready/Valid Handshake

When I first encountered the AXI interface while designing a module, I was surprised by its streamlined flow‑control approach. Unlike traditional protocols that rely on busy/valid, full/valid, or empty/valid pairs, AXI uses only two signals: ready and valid. What began as frustration quickly became fascination.

AXI embeds flow control directly into its handshake, eliminating the need for extra status lines. While the basic rules are straightforward, careful attention is required when mapping the protocol to FPGA hardware. In this tutorial, I walk through a VHDL implementation of an AXI FIFO that runs on block RAM.

AXI solves the delayed‑by‑one‑cycle problem

Preventing over‑read and overwrite is a common challenge when two clock‑synchronised modules exchange data streams. The crux of the issue is that each module can only observe the other’s output one clock cycle after it occurs.

Implementing an AXI FIFO in Block RAM Using Ready/Valid Handshake

The diagram above illustrates a sequential module writing to a FIFO that uses the classic write enable/full signalling scheme. The writer asserts wr_en to push data into the FIFO, while the FIFO raises full when no more space is available. The problem is that full is asserted exactly on the rising edge of the clock, so the writer can no longer react to it in the same cycle. The solution is to provide a look‑ahead signal such as almost_empty, giving the upstream logic a head start.

The ready/valid handshake

AXI stream interfaces solve this timing dilemma with a two‑signal handshake: ready and valid. The receiver drives ready; a high value indicates that it is prepared to accept a new datum. The sender drives valid; a high value means the data on the bus is stable and can be sampled. A transfer occurs only when both signals are high on the same rising edge.

Implementing an AXI FIFO in Block RAM Using Ready/Valid Handshake

The waveform demonstrates a single data item transfer. Data is sampled on the rising clock edge, just as in conventional synchronous designs.

Implementation

There are several ways to build an AXI FIFO in VHDL. While a shift register would work, a ring buffer backed by block RAM is the most efficient and scalable choice. Below, I present a modular design that separates the logic into dedicated processes for clarity and synthesiser friendliness.

The entity

The entity defines two generic parameters – ram_width and ram_depth – that set the word width and the FIFO depth. The capacity is effectively ram_depth - 1, because one slot is reserved to differentiate a full FIFO from an empty one.

entity axi_fifo is
  generic (
    ram_width : natural;
    ram_depth : natural
  );
  port (
    clk : in std_logic;
    rst : in std_logic;

    -- AXI input interface
    in_ready : out std_logic;
    in_valid : in std_logic;
    in_data : in std_logic_vector(ram_width - 1 downto 0);

    -- AXI output interface
    out_ready : in std_logic;
    out_valid : out std_logic;
    out_data : out std_logic_vector(ram_width - 1 downto 0)
  );
end axi_fifo; 

The interface follows AXI conventions: the input side uses in_ready and in_valid, while the output side mirrors the signals with reversed directions. This symmetry allows multiple FIFOs to be chained seamlessly.

Signal declarations

First, we define a RAM type that matches the generic width and depth, then we instantiate the array:

-- The FIFO is full when the RAM contains ram_depth - 1 elements
  type ram_type is array (0 to ram_depth - 1)
    of std_logic_vector(in_data'range);
  signal ram : ram_type;

Next, we declare an index subtype that can represent any address in the RAM, and four core signals: head, tail, count, and a delayed copy count_p1:

subtype index_type is natural range ram_type'range;
  signal head : index_type;
  signal tail : index_type;
  signal count : index_type;
  signal count_p1 : index_type;

We also keep internal versions of the ready and valid outputs to avoid reading out signals inside the architecture:

signal in_ready_i : std_logic;
  signal out_valid_i : std_logic;

A helper signal tracks when a simultaneous read and write occurs in the previous cycle, which is critical for correctly setting out_valid when the FIFO is almost empty:

signal read_while_write_p1 : std_logic;

Subprograms

The next_index function determines the next address based on whether a read or write handshake just completed. It wraps around when the end of the array is reached.

function next_index(
  index : index_type;
  ready : std_logic;
  valid : std_logic) return index_type is
begin
  if ready = '1' and valid = '1' then
    if index = index_type'high then
      return index_type'low;
    else
      return index + 1;
    end if;
  end if;

  return index;
end function;

The index_proc procedure encapsulates the synchronous logic that updates either the head or tail pointer. This avoids duplicate code and keeps the architecture tidy.

procedure index_proc(
  signal clk : in std_logic;
  signal rst : in std_logic;
  signal index : inout index_type;
  signal ready : in std_logic;
  signal valid : in std_logic) is
begin
    if rising_edge(clk) then
      if rst = '1' then
        index <= index_type'low;
      else
        index <= next_index(index, ready, valid);
      end if;
    end if;
end procedure;

Copy internal signals to the output

in_ready <= in_ready_i;
  out_valid <= out_valid_i;

Update the head and tail

We instantiate the helper procedure twice – once for the write pointer (head) and once for the read pointer (tail):

PROC_HEAD : index_proc(clk, rst, head, in_ready_i, in_valid);
  PROC_TAIL : index_proc(clk, rst, tail, out_ready, out_valid_i);

Infer block RAM

Block RAM in most FPGAs is synchronous. To let the synthesiser infer a RAM primitive, we keep the read and write operations inside a clocked process and avoid explicit enable signals:

PROC_RAM : process(clk)
begin
  if rising_edge(clk) then
    ram(head) <= in_data;
    out_data <= ram(next_index(tail, out_ready, out_valid_i));
  end if;
end process;

The write occurs unconditionally on the cycle when in_valid is high, and the read data is updated every cycle so that the consumer can observe a valid word whenever the FIFO contains data.

Count the number of elements in the FIFO

The count signal reflects the current occupancy. When head is ahead of tail, the count is simply the difference; if the head has wrapped, we add the total depth:

PROC_COUNT : process(head, tail)
begin
  if head < tail then
    count <= head - tail + ram_depth;
  else
    count <= head - tail;
  end if;
end process;

The delayed copy count_p1 is needed to detect when the FIFO was empty in the previous cycle, which influences out_valid:

PROC_COUNT_P1 : process(clk)
begin
  if rising_edge(clk) then
    if rst = '1' then
      count_p1 <= 0;
    else
      count_p1 <= count;
    end if;
  end if;
end process;

Update the ready output

The writer should be told when the FIFO can accept more data. The ready signal is high whenever the count is less than ram_depth - 1:

PROC_IN_READY : process(count)
begin
  if count < ram_depth - 1 then
    in_ready_i <= '1';
  else
    in_ready_i <= '0';
  end if;
end process;

Detect simultaneous read and write

When both a read and a write occur in the same cycle, the FIFO does not change its occupancy but the internal state must reflect that a value will become available only after two cycles when the FIFO is almost empty. We therefore capture this event in read_while_write_p1:

PROC_READ_WHILE_WRITE_P1: process(clk)
begin
  if rising_edge(clk) then
    if rst = '1' then
      read_while_write_p1 <= '0';
    else
      read_while_write_p1 <= '0';
      if in_ready_i = '1' and in_valid = '1' and
        out_ready = '1' and out_valid_i = '1' then
        read_while_write_p1 <= '1';
      end if;
    end if;
  end if;
end process;

Update the valid output

The out_valid signal tells downstream logic when out_data holds a usable word. The combinational process below sets it high by default but clears it in two corner cases:

PROC_OUT_VALID : process(count, count_p1, read_while_write_p1)
begin
  out_valid_i <= '1';

  -- If the FIFO is empty or was empty in the previous cycle
  if count = 0 or count_p1 = 0 then
    out_valid_i <= '0';
  end if;

  -- If a simultaneous read/write occurred when the FIFO had only one item
  if count = 1 and read_while_write_p1 = '1' then
    out_valid_i <= '0';
  end if;

end process;

The first condition guarantees that out_valid remains low during the propagation delay after the first write. The second guards against a situation where the FIFO appears to stay full for one cycle but actually holds no data to output.

Synthesising in Vivado

To synthesize this design in Xilinx Vivado, supply concrete values for the generics. In the Settings dialog (GeneralGenerics/Parameters), choose a depth that matches a RAMB36E1 primitive for the target Zynq device. After implementation, the resource utilisation is modest – one block RAM plus a handful of LUTs and FFs.

Implementing an AXI FIFO in Block RAM Using Ready/Valid Handshake

Implementing an AXI FIFO in Block RAM Using Ready/Valid Handshake

AXI is more than ready/valid

AXI (Advanced eXtensible Interface) is a component of ARM’s AMBA bus architecture and offers a rich feature set beyond the simple ready/valid handshake. For deeper understanding, consider the following resources:


VHDL

  1. The Cloud Revolution: How Cloud Computing Transforms IT, Security, and Careers
  2. Unlocking IoT Data: How Business Rules Management Drives Enterprise Value
  3. Creating a Breathing LED Effect Using a Sine Wave from Block RAM
  4. Initializing FPGA Block RAM from Text Files Using VHDL’s TEXTIO Library
  5. Preparing Your Manufacturing Operations for AI with IoT
  6. Harness CMMS Data: Build Insightful Dashboards & Reports for Smart Decisions
  7. Asset Tracking Best Practices: Unlocking the Value of Your Asset Data
  8. Ensuring Trustworthy Supply‑Chain Data: A Proven Strategy
  9. Optimize Your Maintenance Supply Chain with Data-Driven Automation
  10. 3D Printing a Protective Mask: A Step-by-Step Guide