Page 1 of 1

ARM/FPGA AXI DMA xfers using Parallella-adi kernel

PostPosted: Tue Sep 16, 2014 9:26 pm
by yanidubin
Has anyone managed to get DMA transfers (either 1D or 2D) to/from the FPGA working well? In particular AXI streaming DMA - but I'll consider other working alternatives.

I have been playing around with AXI peripherals synthesised in the FPGA over the past few weeks. I have had success with both AXI4LITE and AXI4 via the memmap interface, and having the FPGA do ALU / pixel type operations, but no success yet with high performance memory transfers.

Ideally, I'd like to test DMA performance using a pre-existing kernel driver rather than diving in and writing my own straight away (I'm brushing up on my VHDL, specifically, at present).

I have had very little success with streaming DMA. I have a VHDL module synthesised to use this interface, but trying to interface with it under Linux from the ARM side is proving to be a problem. All the Xilinx provided examples seem to be based on a baremetal application. The most helpful pre-existing Linux driver I found was for the Zedboard, and relied on the Xilinx kernel. In particular, . While I have built this against the Parallella kernel, it triggers a kernel panic - I believe the particular driver it is trying to bind with (Xilinx AXI DMA) may be missing/different.

I haven't yet built the Xilinx kernel for the Parallella - but if you have, and have any hints, those also would be appreciated.

Or, if you know how to get DMA going (FPGA, not Epiphany) on the Parallella-ADI kernel, I'd like to hear from you. Getting it running on Xilinx is just a temporary measure, as long term, I want to be able to generate something (bitstream, devicetree if necessary) which anyone can use on their Parallella - so ideally something which works on the Parallella kernel (and default config would be a bonus here). But if I can get something working on Xilinx as a starting point, I can potentially port across the driver at a later date.

Re: Help: ARM/FPGA DMA xfers using Parallella-adi kernel

PostPosted: Tue Sep 16, 2014 9:48 pm
by theover
Well, uhm, there is the ARM_to_Epiphany interface, which uses DMA up to a few hundred MHz.

Re: Help: ARM/FPGA DMA xfers using Parallella-adi kernel

PostPosted: Wed Sep 17, 2014 7:58 am
by yanidubin
I am aware of that interface, but not sure that this is of any help.

A few weeks back, I looked at the memory interfacing for the matrix multiplication example (e-matmul16?), and if memory serves (pun intended), the Epiphany SDK (in this case at least) works by reserving a portion of memory (I seem to recall seeing it in the devicetree), and writing the data for the Epiphany to this region after a memmap call, so it is where the Epiphany expects to find it. The Epiphany then uses DMA to read/write data to/from this location. Finally, the ARM reads this from DDR. I don't believe I found any DMA transfers initiated from the ARM side at all.

I may be wrong, but at the time I concluded from my quick glance that there was no DMA controller in use here which was common to the FPGA. But I didn't do a thorough investigation of the various other examples.

What I am specifically after is DMA for the AXI interface between ARM and FPGA - I'll edit my post to make this a little clearer.

Re: ARM/FPGA AXI DMA xfers using Parallella-adi kernel

PostPosted: Sun Feb 15, 2015 9:43 pm
by peteasa
Seems like this should be an easy thing to do. But Xilinx dont half make it difficult!

My present effort adds a AXI4-Stream interface (documents in ug761_axi_reference_guide) plus the axi_dma LogiCORE (documented in pg021_axi_dma). To get the DMA hooked in you configure three additional High Performance AXI Slave ports in the System Assembly View so that you can connect Scatter Gather, MM2S and S2MM stream interfaces. Then Xilinx kicks in because the busses are not connected for you and when connected the pins are not connected and then you get warnings about reset on the stream interface being trashed late on in the netlist creation. I fixed this by removing Auto on the DMA Common Primary clock and forcing it to be Asynchronous and also added Data Realignment Engine to the MM2S and S2MM channels. So now I should be able to get a bitstream..

The kernel driver you point too seems not to hard to tackle next. The test should be quite simple, dma data from one place to another and send a few bytes from the processor over the streaming interface to memory. Let me know if you have made progress with this project! I have not simulated the fpga in anyway so there is a good chance it wont work for me!

Re: ARM/FPGA AXI DMA xfers using Parallella-adi kernel

PostPosted: Sat Feb 21, 2015 5:11 am
by yanidubin
No, I have not looked into this again, too many other projects on the go. I didn't encounter any of those issues generating the bitstream, and getting the driver built against the Parallella-adi kernel was also straight forward. But then I had an access violation when loading the driver module. I did not debug the issue any further, but expect it was support lacking in the Parallella kernel which was present in the Xilinx kernel.

Good luck, and let us know if you have any success with this :)

Re: ARM/FPGA AXI DMA xfers using Parallella-adi kernel

PostPosted: Sun Mar 15, 2015 9:24 pm
by peteasa
I did a bit more on this and have found that the problem is with a mismatch between the ADI driver and the Xilinx driver. You can see the differences by looking at drivers/dma/xilinx/xilinx_axidma.c. A more up to date driver than the one that ADI kernel has that can be found at ... linx_dma.c. However all is not lost because once you checkout the various compatibility strings in xilinx_dma.c and work out the correct device tree entry.. I am using
Code: Select all
axi_dma_0: axi-dma@40400000 {
              compatible = "xlnx,axi-dma";
              #interrupt-parent = <&ps7_scugic_0>;
              #interrupts = <0 58 4>, <0 57 4>;
              reg = <0x40400000 0x10000>;
         xlnx,sg-include-stscntrl-strm = <0x0>;
              dma-channel@40400000 {
                 compatible = "xlnx,axi-dma-mm2s-channel";
            interrupt-parent = <&ps7_scugic_0>;
                 interrupts = <0 58 4>;
                  xlnx,datawidth = <0x20>;
                  xlnx,device-id = <0x0>;
            xlnx,include-dre = <0x0>;
            xlnx,sg-length-width = <0xe>;
               dma-channel@40400030 {
                  compatible = "xlnx,axi-dma-s2mm-channel";
            interrupt-parent = <&ps7_scugic_0>;
                  interrupts = <0 57 4>;
            xlnx,datawidth = <0x20>;
                 xlnx,device-id = <0x0>;
            xlnx,include-dre = <0x0>;
            xlnx,sg-length-width = <0xe>;

Then the board boots and the ADI dma finds the fpga AXI dma channels. A simple test shows it works
Code: Select all
# ls /sys/devices/fpga-axi.2
# cat /proc/interrupts
 89:          0          0       GIC  89  xilinx-dma-controller
 90:          3          0       GIC  90  xilinx-dma-controller

Now the next bit requires a change to code from because that was built against the more recent Xilinx drivers. In particular the xilinx_dma.c driver has
Code: Select all
#define XILINX_DMA_IP_MASK             0x00700000 /* DMA IP MASK */

chan->common.private = (chan->direction & 0xFF) | (chan->feature & XILINX_DMA_IP_MASK);

whilst the xdma driver expects a bit more in the .private structure:
Code: Select all
      if (*((int *)chan->private) == *(int *)param)
         return true;

Code: Select all
match_tx = (DMA_MEM_TO_DEV & 0xFF) | XILINX_DMA_IP_DMA | (num_devices << XILINX_DMA_DEVICE_ID_SHIFT);

XILINX_DMA_DEVICE_ID_SHIFT does not exist in the older driver and the chan->private part does not point to the address of an int. So the fix is:
Code: Select all
static bool xdma_filter(struct dma_chan *chan, void *param)
   if (chan->private == *(int *)param)
      return true;

   return false;

and xdma_probe(void) has
Code: Select all
      match_tx = (DMA_MEM_TO_DEV & 0xFF) | XILINX_DMA_IP_DMA;
      tx_chan = dma_request_channel(mask, xdma_filter, (void *)&match_tx);
      match_rx = (DMA_DEV_TO_MEM & 0xFF) | XILINX_DMA_IP_DMA;
      rx_chan = dma_request_channel(mask, xdma_filter, (void *)&match_rx);

The xdma.ko now builds ok and the libxdma and demo code builds and you can see stuff working on the target if you up the kernel printk
Code: Select all
# dmesg -n 8
# cat /proc/sys/kernel/printk
8       4       1       7

Loading the xdma driver produces dmesg output including
Code: Select all
# insmod xdma.ko
[  213.348773] <xdma> probe: number of devices found: 1

When you run the various tests you get output like
Code: Select all
[ 4691.756010] <xdma> file: open()
[ 4691.759235] <xdma> file: mmap()
[ 4691.762295] <xdma> file: memory size reserved: 33554432, mmap size requested: 33554432
[ 4691.771124] <xdma> ioctl: XDMA_GET_NUM_DEVICES
[ 4691.775511] <xdma> ioctl: XDMA_GET_DEV_INFO
[ 4691.779750] <xdma> ioctl: XDMA_DEVICE_CONTROL
[ 4691.784027] <xdma> ioctl: XDMA_DEVICE_CONTROL
[ 4691.789722] <xdma> ioctl: XDMA_GET_NUM_DEVICES
[ 4691.798440] <xdma> ioctl: XDMA_PREP_BUF
[ 4691.802237] <xdma> ioctl: XDMA_PREP_BUF
[ 4691.806035] <xdma> ioctl: XDMA_START_TRANSFER
[ 4691.810458] <xdma> ioctl: XDMA_START_TRANSFER
[ 4691.817057] <xdma> file: close()
[ 4712.675516] <xdma> file: open()
[ 4712.678760] <xdma> file: mmap()
[ 4712.681839] <xdma> file: memory size reserved: 33554432, mmap size requested: 4096
[ 4712.690760] <xdma> ioctl: XDMA_GET_NUM_DEVICES
[ 4712.699497] <xdma> ioctl: XDMA_GET_DEV_INFO
[ 4712.705653] <xdma> ioctl: XDMA_DEVICE_CONTROL
[ 4712.715698] <xdma> ioctl: XDMA_DEVICE_CONTROL
[ 4712.722869] <xdma> ioctl: XDMA_PREP_BUF
[ 4712.728195] <xdma> ioctl: XDMA_PREP_BUF
[ 4712.733541] <xdma> ioctl: XDMA_START_TRANSFER
[ 4712.739513] <xdma> ioctl: XDMA_START_TRANSFER
[ 4712.745983] <xdma> file: close()
[ 4720.135399] <xdma> file: open()
[ 4720.138606] <xdma> ioctl: XDMA_TEST_TRANSFER
[ 4720.144117] <xdma> test: rx buffer before transmit:
[ 4720.148983] Y        Y       Y       Y       Y       Y       Y       Y       Y       Y
[ 4720.152485] <xdma> test: xdma_start_transfer rx
[ 4720.156983] <xdma> test: xdma_start_transfer tx
[ 4720.161443] <xdma> test: time to prepare DMA channels [us]: 9308
[ 4720.168097] <xdma> transfer: returned completion callback status of: 'in progress'
[ 4720.175784] <xdma> test: DMA transfer time [us]: 8295
[ 4720.180834] <xdma> test: DMA bytes sent: 1048576
[ 4720.185368] <xdma> test: DMA speed in Mbytes/s: 126
[ 4720.190382] <xdma> test: rx buffer after transmit:
[ 4720.195101] \xffffffcf       \xffffffd1      \xffffffd1      \xffffffd1      \xffffffcf      \xffffffd1      \xffffffd1      \xffffffd1  \xffffffcf      \xffffffd1
[ 4720.198934] <xdma> file: close()

Ok so I can see that this is not all working but at least stuff is happening. My next job is to work out what the demo, app and test xdma code actually does and check that I am getting the right stuff happening.

Re: ARM/FPGA AXI DMA xfers using Parallella-adi kernel

PostPosted: Sat Aug 29, 2015 8:36 am
by peteasa
I have published my environment now. Because I have updated to elink-redesign I have not yet re-run the AXI DMA xfers example but at least you can see all the code in its glorious detail in the examples/kernel/xdma_lib from the examples submodule in