Parallella Community

by **tnchan** » Fri Apr 11, 2014 7:32 am

How capable is the Parallella board in processing FFT on 8 bit and 16bit data respectively? Do we have any benchmark application on FFT suitable to run on Parallella with 16 cores? What would be the optimum transform size (2 to the power of something) to run on 16 core Parallella? Any info or advice will be most appreciated. Cheers

by **timpart** » Fri Apr 11, 2014 11:44 am

I'm not aware of anyone doing work on 8 or 16 bit transforms. Since the processor is 32 bit I speculate there would be little difference between them.

I was looking through FFT libraries a while back and the caught my eye as it has a BSD license and it can already generate code for ARM as well as Intel. Perhaps a relatively small step to do Epiphany as well. The algorithms used split the task down recursively I'm hoping it could be split between cores relatively easily.

There are of course many other fine FFT libraries around. I'm not aware of any library being ported to the Epiphany yet. (Corrections welcome!)

Tim

by **ysapir** » Fri Apr 11, 2014 1:13 pm

In the past we published a whitepaper showing some basic image filtering with 2-D FFT/IFFT. Here's the article:

http://www.adapteva.com/white-papers/us ... hancement/

by **tnchan** » Fri Apr 18, 2014 5:45 am

Thank you Tim and Yaniv for your fast response and helpfulness. I was away on a business trip and did not attend to this matter till now. I will read the website and the white paper from you and hopefully will get the answers I want. I will update you gentlemen in a few days. Cheers, TN

Hi Yaniv, your paper is very informative and it is a pleasure to read even though I do not have any computer science training or background. Basically I understand the flow of the information and logic and am impressed with the speed of 1.5ms for filtering a 128x128 pixel picture with 2D FFT on 16 cores. Regarding the formula for working out the max FFT point size (the picture or image size) for a given hardware configuration, I presume the formula applies to 2D FFT. Is this true? Will the max FFT point for 1D be the square of 2D? That is, 128 x 128 for 2D = 2^7 x 2^7 = 2^14 for 1D. Thank you for your help? TN

by **tnchan** » Sun Apr 27, 2014 9:01 am

The date of the above edited entry should be April 26

by **theover** » Sun Apr 27, 2014 3:31 pm

There has been a little discussion about this subject here:

Essentially you have a number of options, depending on what type of FFT you want to fit on the limited chips estate. I was looking for a replacement of the PC-power that's available in the Open Source FFTw3 library which compiles on modern Intels, and is also available for Cuda (on NVidia's GPUs).

The Arm cores can do computations on FFTs, possibly accelerated by one or two (I don't know if both cores have it) NEON parallel processing, you could use FPGA-fabric, or Xilinx' FPGA IP blocks which implement various efficient FFT blocks, also available with the Free Webpack design tools, and of course, the 16 or 64 Parallella cores should be usable to lightly-parallelize FFT computations of various dimensionality, speed and size.

T.V.

by **Gravis** » Sun Apr 27, 2014 5:05 pm

by **ysapir** » Sun Apr 27, 2014 7:55 pm

@tnchan - the article shows analysis of performing the 2D FFT under certain assumptions. One of them is that the whole image should be stored locally on the chip, assuming there is no DRAM access. This mode of operation limits the max size of the image to 128x128 (grayscale, complex-float pixels). Another key point in the chosen method was to perform the 1D FFTs on a single core. The formula was developed for this use case, and may need some adaptations for other schemes.

It was quite some time ago, but IIRC, the current Epiphany chip can do up to 1024-point FFT. Thus, if you allow DRAM access, you could do an 1024x1024 image in (1024/16=) 64 batches. Furthermore, one can perform 1D FFT using a 2D FFT engine. This means that, adding a simple intermediate stage, you could use the given method for performing 128x128=16K point FFT, in similar times.

by **tnchan** » Thu May 08, 2014 7:31 am

Thanks a lot to Gravis and Theover for your comments. I did not know FFT was discussed in a separate thread of the forum (as I am quite new to the forum)- my apologies for creating a parallel thread. :oops:

Hi Yaniv, thanks again for the explanation. My intended application of Epiphany 16 cores is for radio astronomy signal channelisation by FFT (from time to spectrum). The data set for FFT computation in my application is 2^18 (262144) single dimension of 8bit + 8bit data. I applied the formula you gave in your white paper on 2D image filtering, and found that the max point size for my application to be 2^16 = 65536 without using external memory and I will need to wait for the release of Epiphany 4096 cores (assuming 8KB per memory bank in each core) to do the job without using external memory. Can you confirm that my calculations are correct? In case I use external memory and accept the memory latency, how do I estimate the time taken? Where can I find a sample FFT for real life testing with Epiphany 16?

by **tnchan** » Mon Jun 02, 2014 6:52 am

Hi Yaniv, whilst waiting for your enlightenment on my questions in the last post above, will you tell me how you ran your FFT2D image cleaning code with Epiphany 16 cores- with APLc, OpenCL, or whatever means? Do we have newer FFT algorithm or codes for E16 by now? Cheers, TN

Parallella Community

Fast Fourier Transform

Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Re: Fast Fourier Transform

Who is online