The fabs are back and as you can see we have made a late change to the look of the Parallella board. It’s now blue! Interestingly enough I don’t think the question of color ever came up once during the last year. Hopefully most of you won’t mind the change. Designer’s choice:-)
The changes made to this RevC board (or Gen 1.1) include;
- Fixed HDMI wiring issue
- Made the USB port host-mode only as default build option
- Removed flaky mechanical jumper for 5V mounting hole power feed
- Added more ground vias for improved thermal performance
- Changed Tantalum caps to ceramic caps to avoid material procurement issues
The new board sources can be found HERE.
200 fabs have been sent off to the local contract manufacturer this morning for final assembly. Expecting the first boards to come off the line in 5-7 business days if there are no issues. Fingers crossed…
In this post I’m going to give a quick introduction to GNU Radio, and how the GNU Radio blocks are scheduled by the Thread Per Block (TPB) Scheduler. I will then discuss the basics of GNU Radio blocks, the potential impact that porting the blocks to the Epiphany would have, and the shortcomings of a project that hoped to accomplish a similar port.
GNU Radio is a Software Defined Radio (SDR) framework. SDR allows a software developer to write Digital Signal Processing (DSP) code to run on a general purpose PC. A Universal Software Radio Peripheral (USRP) is connected to the PC via USB or Ethernet and handles the tuning, filtering and sampling (TX and RX). The samples are then processed by the PC. This gives the developer the power to implement virtually any radio system with the same peripheral. It also allows for rapid prototyping and dynamic reconfigurability (see cognitive radio).
GNU Radio applications are written as Flowgraphs. A Flowgraph is an acyclic, one-directional graph of DSP blocks that stream data from an input (the source), to an output (the sink). The source block does not have any inputs, as it serves as the data source of the Flowgraph, and the Sink Block does not have any outputs, as it serves as the output for the data in your Flowgraph. In the case of a simple radio application, you can consider the Source Block to represent your input device, usually a TV Tuner or USRP. The output device would then represent your audio output or file.
Every basic block in a GNU Radio Flowgraph is represented by a thread which executes on that that block’s behalf. This thread is scheduled by the Operating System’s Scheduler, but the GNU Radio Scheduler determines when the thread calls its ‘work’ function on the input data. The thread reads from an input buffer, does its calculations and writes the resulting values to an output buffer.
Data between adjacent blocks is shared with shared memory pointers to a common memory location, and the first block in the Flowgraph writes to the shared memory buffer before the second reads it. In the case of a synch block (or a block that has the same number of input items as output items), the number of items in the shared buffer is indicated by the noutput_items variable, which is passed to the work function. There are several good tutorials on writing blocks and applications in C++ and Python.
I have shown in a blog post that it is possible to run GNU Radio applications on the ARM processor on the Parallella, but with very low performance. This is expected given the relatively low computational power this processor has when compared to modern x86 processor. One way to get around this problem is to off-load some of the high throughput calculations on the Ephiphany accelerator. There are two ways to accomplish this: 1. Rewrite or alter the scheduler to offload some threads to the Epiphany; 2. Rewrite some of the GNU Radio blocks to transfer data to the Epiphany and execute the computation on the accelerator, before transferring the data back to the RAM. As far as I am aware, option 1 would be very complicated. Option 2 would require a lot of porting, but could be sustainable with large enough programmer effort.
When it comes to porting GNU Radio blocks to Parallella, there are several immediate challenges. The epiphany chip does not have direct access to the Parallella’s memory, and therefore would need to copy the shared data. This introduces high latency overhead, and would require rewriting blocks to account for this.
Of course the General Purpose CPU is not optimized for DSP, and at least one other group has tried to capitalize on this. A few students out of UC Berkley headed by William Plishker tried to extend GNU Radio onto GPUs. You can read their presentation at the following link:
Although they were able to see a speedup, at least in throughput, ultimately their code has vanished from the Internet, and it is not being used (that I’m aware of). There are several reasons for this, but two are the significant increase in latency, and a high cost of porting. We need to learn from this project and avoid the same errors.
For those looking to port GNU Radio to Parallella, I would recommend knowledge of the following technologies:
Tom Tracy II
It’s been a little while since I posted an update and what follows is outline of just some of the more recent activity in our community.
Rev0 experiences, object detection and loaders
In August notzed received his Parallella Rev0 board and since then he’s been blogging about his thoughts and experiences concerning an object detection application, scheduling, and linking and loading, amongst other topics.
Eagle library and GPIO breakout
Also back in August Sylvain, a.k.a. tnt, published an Eagle CAD library for Parallella, with footprints for Rev0 and Rev1 boards, and for a single Samtec connector as used by daughter cards. Last week Sylvain posted a picture of the first GPIO breakout board that he’s made — which can seen above — and for discussion concerning this see the thread on sourcing Samtec connectors.
Java, Graal and Sumatra
Michal Warecki is exploring how OpenJDK projects, Graal and Sumatra, could be used to implement JVM support for Parallella. For further details see Michal’s blog post from Sunday and the related forum discussion thread.
It’s possible to implement GPU functionality in the Parallella FPGA and/or Epiphany. However, another route to accelerated graphics was presented this week by shrodruk, who has published the code for a lightweight streaming library which allows the GPU of a network-connected Raspberry Pi to be used with OpenGL ES 2.0 applications. Always great to have more options!
Simple message passing library
Another project announced this week is an experimental synchronous message passing API for Epiphany, dubbed “x-lib”. Created by Mark Honman, a demo application complete with documentation can be found on GitHub, and to contribute to the discussion see the forum post.
One of the primary goals of the Parallella project is to advance the use of parallel computing across the industry and across the globe. We are pleased to report that since opening up the store for Parallella pre-orders at the end of July, we now have 120 universities and research institutes as customers. They are now anxiously waiting for the delivery of their first boards and we are doing everything we can do deliver as soon as possible.
In addition, as of today we have 17 Universities signed up to receive free Parallella hardware through the the Parallella University Program (PUP).
- ETH Zurich (SWITZERLAND)
- Heriot-Watt University (UK)
- KTH -Royal Technical Institute (SWEDEN)
- Lund University (SWEDEN)
- National Technical University of Athens (GREECE)
- Technical University of Catalonia (SPAIN)
- University of Bath (UK)
- University of Bologna (ITALY)
- Universidad de Buenos Aires (ARGENTINA)
- University of Edinburgh (UK)
- University of Illinois (USA)
- University of Tijuana (MEXICO)
- Unviersité de Mons (BELGIUM)
- University of Coimbra (PORTUGAL)
- University of South Carolina (USA)
- University of Tennessee (USA)
- University of Zagreb (CROATIA)
We believe the Parallella open-hardware approach is a great match for the collaborative and open environment found in academic research and want to do as much as we possibly can to democratize access to massively parallel hardware as soon as possible.
Parallel programming can be very difficult, but it can also be very, very simple. Executing a thousand independent programs that don’t depend on each other in any way is not that difficult if the right infrastructure is available. An operating system performs multiple tasks concurrently all day long without us noticing. Sure, sometimes we’ll run into performance problems or all-out thrashing, but for the most part it works great.
For larger clusters, there have always been job schedulers to help distribute and balance batch work loads across servers. I remember using the LSF “Load Sharing Facility” package to abuse our compute clusters with random design verification tests and power simulations on the TigerSHARC DSPs over a decade ago. In a sense this was a trivial type of parallel programming. A user would send out a set of almost identical simulation jobs to the job schedule, with the only difference being the seed used to simulate the chip.
The simplicity of this “job scheduling” parallel usage model was always in the back of my mind as I started the Epiphany and later the Parallella project. Wouldn’t it be nice if there was a utility that allowed for programmers to easily take advantage of massive parallelism to scale out performance for low-hanging-fruit problems that are “embarrassingly parallel”?
During the summer I worked with our summer interns to put together a simple job scheduling demo. Well…to be honest, it was more like me sketching some half baked ideas on the board and then swiftly abandoning them for a much needed one week vacation. When I came back, the students had a basic version working and it only took one more week of work (under the guidance of Yaniv) to produce the demo you see in the video! Certainly, the outcome is mostly a testament to the hard work and talent of our interns but in some small way it also demonstrates the power and ease of use of the Epiphany and Parallella platforms.
The key Parallella platform features that enabled this project include:
- Well documented Epiphany features allowing for monitor and scheduling of programs at each core.
- The ability to run completely independent programs on each Epiphany core.
- Low latency communication between the ARM processor and the Epiphany coprocessor.
- A stable Linux distribution that runs on a very capable dual core ARM A9 processor.
The video demo in this post shows multiple independent applications running on the dual core ARM processor, with small independent kernel tasks being launched to different Epiphany cores based on availability. The Epiphany resource manager (“ERM”) runs in the background and continuously monitors the Epiphany network traffic and core workload, displaying the Epiphany status through a simple java based app.
The full project source code for the demo can be found on github:
Perhaps the most amazing part of the story is that our interns didn’t even know C before they started at Adapteva six weeks earlier!
(from left to right: Xin Mao, Wenlin Song, and Kevin Cheng)
I am confident that stories like this will play out time after time across the globe over the coming year once thousands of Parallella boards reach developers’ hands. Put an open platform in the hands of clever hard working developers, and magic happens!