Parallella FPGA Tutorials available

Using Zynq Programmable Logic and Xilinx tools to create custom board configurations

Re: Parallella FPGA Tutorials available

Postby yanidubin » Tue Dec 02, 2014 9:44 am

Melkhior wrote:Following that, moving to the full AXI4 interface which is apparently faster.


Tutorial 2 might be what you are after then, as this uses the full AXI4 interface to allow access to shared block memory.

I haven't done any performance testing to compare the two.

Well, not entirely true - I did some measurements in the (unpublished) tutorial 4 on what AXI4Lite can do. That tutorial has been sitting 90% done for over a month now - must really finish it off and post it. It is really just Tuturial 1 applied, and more an introductory lecture on ALUs than much actual extra VHDL. But I haven't worked out where the actual bottleneck is (userspace MMAP I/O access mechanism, or AXI bus), and haven't done a measurement with full AXI4.

I was meaning to get the AXI-Stream interface going (this is a DMA based high performance mechanism for images/video) first, and then comparing them all. But I am focussing on my camera controller project at present, so no promises :)
User avatar
yanidubin
 
Posts: 95
Joined: Mon Dec 17, 2012 3:23 am
Location: Christchurch, New Zealand

Re: Parallella FPGA Tutorials available

Postby Melkhior » Tue Dec 02, 2014 11:07 am

yanidubin wrote:Tutorial 2 might be what you are after then, as this uses the full AXI4 interface to allow access to shared block memory.


I've tried tutorial 2 as well, and I have the memory mapping working (as in, if I write data, I can read it back). Again - thank you :-)

However, I have no idea how to plug my GCM component into it. I only got things to work with AXI4-lite (see here for results) thanks to your "simple ALU" example in your GIT repository. A similar example for the full AXI4 would be *great*. Hint, hint ;-)

I also failed at pushing data from the Epiphany to the FPGA (see here).

FPGA is _hard_ for a software guy :-/

I was meaning to get the AXI-Stream interface going (this is a DMA based high performance mechanism for images/video) first, and then comparing them all. But I am focussing on my camera controller project at present, so no promises :)


I tried to get inspiration from this article, but while I manage to get a bitstream IIRC, I couldn't figure out the missing bits for the driver side of things anyway (seems what I found was for a ppc or microblaze host, or for self-contained code running at boot time, but no for-dummy-using-linux stuff).
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Parallella FPGA Tutorials available

Postby Melkhior » Tue Dec 02, 2014 1:08 pm

yanidubin wrote:I'm sure this will have been covered many times over already (and better than I can), so what I should do is look around for a good really basic (and relevant) intro and recommend this for those coming in at such a level.


I _think_ your audience might be "newbi-er" than most (I know I am for FPGA), since it's likely many people buy a Parallella for the Epiphany rather than the FPGA, and so are less knowledgeable about FPGA and the associated tools than people who buy e.g. a ZedBoard. Which also makes your tutorials that much more useful.

yanidubin wrote:This sounds like a misconfiguration. (...) (depending which distro you use


Bog-standard Debian stable, the only "exotic" package is clang-3.4 for Wheezy from llvm.org. I like my systems reliable :-)

It's not the first time I've seen issues for no-so-common packages with foreign languages (I'm French). Changing the decimal separator, the sort order or the default encoding can wreak havoc on some software. Especially suspicious are those which are not that well packaged to begin with. (doesn't _anyone_ know of $ORIGIN, RPATH, and that putting a C++ library in LD_LIBRARY_PATH is _just_ _wrong_ ?)
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Parallella FPGA Tutorials available

Postby yanidubin » Wed Dec 03, 2014 11:02 am

Melkhior wrote: thanks to your "simple ALU" example in your GIT repository. A similar example for the full AXI4 would be *great*. Hint, hint ;-)


Ah yes - that would be a sneak peek at the yet to be published tutorial 4 I was referring to.

I have written a simple image manipulation example based on Tutorial 2. This operates on an 8x8 matrix of 8-bit (grayscale) values. You load them in, then read them back, and get some sort of smoothed/interpolated result. However it really(!) is a bit too awful to share just yet. I got it working, but never revisited it to start cleaning it up.

Other than being generally hacky (took me a while to get it working), I wrote it to do everything in a single cycle (just to see how much resource this took - from memory around 40% of the DSP slices on a 7020, so might not even fit on a 7010). My plan is to make it a little more realistic, and learn (and share) about trading off speed versus gate count. For the current implementation, given bus / access limitations, having the actual calculation take place in an instant, between 64 reads, and 64 writes of a fairly slow AXI interface makes no sense.

Okay, I'll see about cleaning this up and sharing it since it sounds like this will be of immediate use to you. I'll get the code up once it is clean enough, and the tutorial will eventually follow (but I won't make you wait). Give me a week or so - and feel free to remind me if I vanish :)

Melkhior wrote:I also failed at pushing data from the Epiphany to the FPGA (see here).


Sorry, can't help you there. Have yet to do anything with the Epiphany. I am interested in tinkering with this eventually, but the FPGA is what sang to me (a little louder).

Melkhior wrote:FPGA is _hard_ for a software guy :-/

I know what you mean - a few years back, I interfaced to an HD44780 LCD controller. In C, it took me an hour or two to write a driver from scratch. A year later, I thought "aha - lets see how to do the same in VHDL!". A week of tinkering and relearning state machines, and still my code did not work.

I am a software guy myself and am just learning and sharing as I muddle through myself. But while I haven't actually done anything cool/complex with an FPGA (besides a basic projects at university), I've worked out what the important things are to know to get started with the Parallella. But it is cool to help others get a leg up, and I look forward to seeing some cool stuff come out of this (even if I don't get around to it myself) :).

Melkhior wrote: I tried to get inspiration from this article, but while I manage to get a bitstream IIRC, I couldn't figure out the missing bits for the driver side of things anyway (seems what I found was for a ppc or microblaze host, or for self-contained code running at boot time, but no for-dummy-using-linux stuff).


Ah yes, this is a baremetal approach I came across too. I did find a driver (see my thread here, but was unable to get it working with either the older, or more recent Parallella kernel (Andreas said this should now be included - see here). With either, I get a memory violation when I load the driver. I never did get around to trying to build the xilinx kernel (too much else to focus on at present). But my hope was that starting from a known working kernel/driver, I could look to getting this going on the Parallella kernel.
User avatar
yanidubin
 
Posts: 95
Joined: Mon Dec 17, 2012 3:23 am
Location: Christchurch, New Zealand

Re: Parallella FPGA Tutorials available

Postby yanidubin » Wed Dec 10, 2014 9:58 am

Hi Melkhior,

YaniDubin wrote:Okay, I'll see about cleaning this up and sharing it since it sounds like this will be of immediate use to you. I'll get the code up once it is clean enough, and the tutorial will eventually follow (but I won't make you wait). Give me a week or so - and feel free to remind me if I vanish :)


Sorry about the delay with this - I got it to almost where I was happy, then encountered a problem where I can no longer produce an FPGA build. The PAR step does not converge, even after 2+ hours.

I tried cleaning out my project, thinking it was some state it had gotten itself into. But no, it is definitely related to the code. Reverting to an old changeset results in buildable code. While I have it down to a few changes which can cause the issue, I am at a loss as to why this would be. And these are the 3 changes I need in order to make my code work (I think).

To give an example, reversing the order of the 4 bytes I pack into the 32-bit value to return over AXI increases the run time from ~12min to ~25min. Add a few other slightly more significant changes and it does not complete at all.

I cannot account for the severe impact of these minor changes. I'm a bit out of my depth here, so it may be a while before I have something I can publish.
User avatar
yanidubin
 
Posts: 95
Joined: Mon Dec 17, 2012 3:23 am
Location: Christchurch, New Zealand

Re: Parallella FPGA Tutorials available

Postby Melkhior » Wed Dec 10, 2014 10:32 am

yanidubin wrote:Sorry about the delay with this


No need to apologize for your volunteer work :-)

Reverting to an old changeset results in buildable code.


The bit I don't understand is how to access the data. Any example of that, even quite trivial, would already be great.

Thanks for your efforts !
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Parallella FPGA Tutorials available

Postby Melkhior » Sat Dec 13, 2014 12:40 pm

yanidubin wrote:Okay, I'll see about cleaning this up and sharing it since it sounds like this will be of immediate use to you.


Actually, I just realized I could easily check whether AXI4 would be suitable - I don't need it to work.

Short answer : it isn't suitable :-(

Long answer : I'm currently limited for GCM by the write speed from the ARM to the AXI4-lite core. I can do about 26.5 MB/s raw (I only get 2/3 of that usable because I need 2 writes to my control register to start the computation for every 4 data writes; why 2? dunno). Since thanks to your tutorial 2 I have some read/write memory through memory-mapped AXI4 (non lite), I can easily measure my write throughout - which will be an upper bound for GCM. And I only get 29.5 MB/s, which isn't that much better (each ARM core can already do 23 MB/s of actual GCM).

I'm still interested on how to do it, but it's not going to be useful in real-life anyway since I would need a lot more than that to get the epiphany+fpga combo to beat the Cortex-A9.
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Parallella FPGA Tutorials available

Postby yanidubin » Sun Dec 14, 2014 3:38 am

Okay, sorry to hear that. Good to know about he bandwidth though, not something I have a handle on.

I thought I'd point out that the way I demonstrated accessing mmapped IO was a quick hack, and totally not the way to do it for any level of performance (i.e. driving a script, which then calls mmap for each read or write).

I gather you have written an application which just calls mmap once? On the off chance you hadn't there could be a significant gain there.
User avatar
yanidubin
 
Posts: 95
Joined: Mon Dec 17, 2012 3:23 am
Location: Christchurch, New Zealand

Re: Parallella FPGA Tutorials available

Postby Melkhior » Sun Dec 14, 2014 8:02 am

yanidubin wrote:I gather you have written an application which just calls mmap once?


Yep, just the one map() & munmap() and direct accesses in-between.

Interestingly, you can't use NEON wide loads and stores: they cause bus errors. memcpy() works with gcc but not clang (memcpy uses NEON L&S). I tried since NEON L&S are more efficient for the epiphany shared memory.
Melkhior
 
Posts: 39
Joined: Sat Nov 08, 2014 12:19 pm

Re: Parallella FPGA Tutorials available

Postby steddyman » Fri Jan 02, 2015 11:23 pm

I've been trying to load and edit the source for the 7020 with HDMI without any success. I have used GIT to pull down the HDMI sources into Externals without any issues.

I can load the PPR project in PlanAhead but when I double click the System XPS it always reports that my license does cover XPS. I am using the ISE Webpack license which is supposed to cover the 7010, 7020 and 7030 devices.

I am running Windows 8.1 and was getting this error in PlanAhead but fixed it after a little googling on renaming DLL's to allow ISE to run under 64 bit Windows. But nothing mentions XPS.

Is this a problem with the source or a problem with ISE?
Last edited by steddyman on Sat Jan 03, 2015 7:52 pm, edited 1 time in total.
steddyman
 
Posts: 19
Joined: Thu Dec 25, 2014 8:44 pm

PreviousNext

Return to FPGA Design

Who is online

Users browsing this forum: No registered users and 3 guests

cron