Github Repo Link:


HARU (Hardware Accelerated Read Until) is a heterogenous compute solution for Oxford Nanopore Technologies’ adaptive sampling (also known as selective sequencing and Read Until). Read Until allows genomic reads to be analyzed in real-time and abandoned halfway, if not belonging to a genomic region of ‘interest’. HARU takes advantage of heterogenous edge compute platforms and provide hardware acceleration using reconfigurable hardware on an Multiprocessor system on a chip (MPSoC).

Citing this work:

    author = {Shih, Po Jui and Saadat, Hassaan and Parameswaran, Sri and Gamaarachchi, Hasindu},
    title = "{Efficient real-time selective genome sequencing on resource-constrained devices}",
    journal = {GigaScience},
    volume = {12},
    pages = {giad046},
    year = {2023},
    month = {07},
    issn = {2047-217X},
    doi = {10.1093/gigascience/giad046},
    url = {},
    eprint = {},

Our current proof-of-concept implementation of HARU utilises a custom subsequence DTW accelerator primarily targeted for a Xilinx’s Kria AI Starter Kit which uses an Zynq Ultrascale+ MPSoC. This repository contains the source code for this accelerator, including the Verilog HDL core accelerator and user space device driver. The use of our hardware accelerator is demonstrated through an example application called sigfish-haru for which the source code is available here. The instructions for setting up sigfish-haru are given below.

What you will need:


To quickly test out HARU, you can download the pre-built binary package built for Kria from the latest release.

  1. Download the prebuilt PetaLinux image for the Kria AI Starter Kit available from the releases (named petalinux-sdimage.wic.gz).
  2. Using your preferred imaging tool (e.g. Balena Etcher), flash the image onto the micro SD card.
  3. Once the Micro SD card is prepared, insert it into the Micro SD card slot on the Kria board.
  4. Connect the USB serial port on the Kria board (micro USB slot) to your host machine’s USB port. Two serial devices (COM ports) with consecutive numbers should appear (e.g., COM5 and COM6 on Windows) where the lower numbered COM port is associated with the USART.
  5. Using your preferred serial terminal software (e.g., TeraTerm) on your host machine, open the COM port with the lower number (e.g., COM5) with BAUD rate of 115000.
  6. Power on the Kria board and go through the setup process on your first power-on. The dafault username is root which does not require a password. You can optionally connect the Kria board to Internet using Ethernet and SSH to it if you wish. IMPORTANT: to avoid any security issues make sure at least you set a password using passwd command if you are connecting to a network.
  7. Transfer the prebuilt package of HARU available under releases (named haru-<version>-binaries.tgz ) to the Kria board either through scp command or a USB drive. If you connected the Kria board to Internet using Ethernet, you can simply use wget to download directly from the GitHub link.
  8. On the Kria board, untar the package and run the installation script to install the accelerator.
     tar -xzf haru-<version>-binaries.tgz
     cd haru-<version>-binaries
  9. Now, load the accelerator on to the FPGA on the Kria board.
     # unload the existing accelerator
     xmutil unloadapp
     # load our HARU sDTW accelerator
     xmutil loadapp haru-dtw-firmware
     # list teh accelerators to verify if the loading was successfull
     xmutil listapps
  10. Run the included sigfish-haru software binary that uses the hardware accelerator. The binary package includes example data.
     ./sigfish-haru dtw -g test_data/nCoV-2019.reference.fasta -s test_data/reads_0_0.blow5 > output.paf
  11. If you wish, you can run the sigfish-cpu binary that does not use the hardware accelerator and see how slow it is.
     ./sigfish-cpu dtw -g test_data/nCoV-2019.reference.fasta -s test_data/reads_0_0.blow5 > output.paf



  • The building of the core acccelerator is not intuitive and require proprietary software from AMD Xilinx
  • The build steps tested and described below uses the 2021.1 version of Xilinx tools (Vivado and PetaLinux image). For developers with versions lower than 2020.2 you will need to update your tools to a platform supporting Kria KV260

To build HARU for Xilinx’s Kria AI Starter Kit, you will need to build two components:

  • Core Accelerator (HDL, build with Vivado)
  • Sigfish-haru + driver (C, build with cross-compilation toolchain)


  • Vitis 2021.1 - we install Vitis so that the Xilinx Command Line Tool (XSTC) is included in the installation
  • device-tree-xlnx - make sure to checkout the version to align with other tools.
      git clone
      cd device-tree-xlnx
      git checkout xlnx_rel_v2021.1
  • dtc - can use a Linux terminal such as BASH (WSL will also work). You may install detc using your package manager, but make sure it is version 1.5 or higher (e.g., sudo apt install device-tree-compiler on Ubuntu). If you want to build and install from source:
      git clone
      cd dtc
      export PATH=$PATH:/<path-to-dtc>/ # optionally, add this to your .bashrc 

    Core Accelerator

    1. Clone the repository if you have not done so.
        git clone
    2. Start Vivado, click on create project and follow the prompt to setup project. When selecting parts, navigate to Boards and search “kria” in the search bar and select Kria KV260 Vision AI Starter Kit.
    3. Once the project is created, click on Settings -> General, select Verilog as the target language. Navigate to Bitstream and tick on -bin_file for headerless bitstream to be generated later.
    4. Click on Create Block Design and provide a name for your design.
    5. Under Sources, click on Add Sources -> select Add or create design sources, -> navigate to <path-to>/HARU/hdl/src/ and select the Verilog files (not including the simulation subdirectory).
    6. Add the following IP with the corresponding configurations:
    • Zynq Ultrascale+ MPSoC; Run Block Automation for board preset, double click to configure and navigate to PS-PL Configuration -> PS-PL Interfaces -> Slave Interface -> AXI HP -> enable AXI HPC0 FPD.
    • AXI DMA; Double click to configure and make sure to DESELECT the option Enable Scatter Gather Engine.
      1. Right click on the block design diagram and select Add Module. Select dtw_accel and click OK.
      2. Click on Run Connection Automation and click OK. This should connect the AXI Lite slaves of the controller for the AXI DMA and dtw_accel modules to Zynq Ultrascale+ MPSoC’s master AXI interface. Repeat again to connect the Zynq’s other AXI master to the AXI interconnect.
      3. Connect the AXI Stream connections between AXI DMA and dtw_accel.
    • Connect SINK_AXIS of dtw_accel to S_AXIS_S2MM of AXI Direct Memory Access.
    • Connect SRC_AXIS of dtw_accel to M_AXIS_MM2S of AXI Direct Memory Access.
    • Click on Run Connection Automation and tick All Automation to configure clock of SRC_AXIS and SINK_AXIS’ clock.
      1. Under Sources, right click on design_1, click on Create HDL Wrapper, and select Let Vivado manage wrapper and auto-update. This will create a Verilog wrapper for the design block configured above. It may take some time to complete and update in the Sources window.
      2. Right click on the newly generated design_1_wrapper under Sources and click Set as Top .
      3. Run synthesis, implementation, and generate bitstream. The design_1_wrapper.bin generated under <path-to-project>/<project-name>/<project-name>.runs/impl_1/ is the headerless bitstream for the accelerator. Rename it into haru-dtw-firmware.bit.bin.
      4. Click on File -> Export Hardware -> Select Pre-synthesis -> leave name as default (design_1_wrapper.xsa) and Finish. Note that if you intend to use the PetaLinux tool to generate an image with the accelerator, you need to select include bitstream, however, this is not within the scope of this README.
      5. Start the Xilinx Command Line Tool under start. Navigate to the location of your vivado project and run the following commands:
          cd <path-to-vivado-project>
          hsi open_hw_design design_1_wrapper.xsa
          hsi set_repo_path <path-to>/device-tree-xlnx
          hsi create_sw_design device-tree -os device_tree -proc psu_cortexa53_0
          hsi set_property CONFIG.dt_overlay true [hsi::get_os]
          hsi generate_target -dir haru_dtconfig
          hsi close_hw_design design_1_wrapper
      6. Using the device tree compiler tool dtc (either in WSL or other terminals), build the device tree overlay .dtsi file into .dtbo binary. This will generated the needed device tree overlay for loading your accelerator to the PetaLinux OS during system runtime.
          cd <path-to-vivado-project>/haru_dtconfig
          dtc -@ -O dtb -o haru-dtw-firmware.dtbo pl.dtsi
      7. Transfer the bitstream (haru-dtw-firmware.bit.bin) and device tree overlay blob (haru-dtw-firmware.dtbo) to your Kria device.
      8. On your Kria, create the haru-dtw-firmware directory under /lib/firmware/xilinx/ and copy the bitstream and device tree overlay blob into it.
          mkdir /lib/firmware/xilinx/haru-dtw-firmware
          cp haru-dtw-firmware.bit.bin haru-dtw-firmware.dtbo /lib/firmware/xilinx/haru-dtw-firmware/
      9. In the haru-dtw-firmware directory, create a shell.json file with the following content:
          "shell_type": "XRT_FLAT",
          "num_slots": "1"
      10. Check if haru-dtw-firmware is in the list of accelerators and load it. ```sh xmutil listapps # List the available accelerators and status xmutil unloadapp # Unload currently loaded accelerators xmutil loadapp haru-dtw-firmware # Load haru-dtw-firmware xmutil listapps # List the accelerators and check status for haru


We recommend cross-compilation of sigfish-haru on the host machine. For cross-compilation, you will need to setup the cross-compilation toolchain for the Kria board, which is included in the release as

$ <path-to>/
PetaLinux SDK installer version 2021.1_SOM
Enter target directory for SDK (default: /opt/petalinux/2021.1_SOM): <desired-installation-dir>
You are about to install the SDK to "<desired-installation-dir>". Proceed [Y/n]? Y

When you want to cross-compile in a new terminal session, source the following file to setup the environment variables.

. <sdk-installation-dir>/environment-setup-cortexa72-cortexa53-xilinx-linux
echo $CC # to double check the configuration 

Steps to build sigfish:

  1. Clone the sigfish repo.
     git clone --recursive
     cd sigfish-haru
  2. Source environment script if cross-compiling.
  3. Build with make.
     # Building sigfish WITHOUT hardware acceleration
     make PROCESSOR=aarch64
     # Building sigfish WITH hardware acceleration
     make fpga=1 PROCESSOR=aarch64
  4. Run sigfish with accelerator loaded (see above for steps).

For developers

Developers can use our sDTW accelerator core in their own applications. To do so, refer to the driver API as explained here or browse through the code for sigfish-haru.