The Parallella board now runs Ubuntu!

Great news, the Parallella is now a real computer!! The gigabit ethernet port is working and the full Ubuntu desktop version is up and running! We had some scary moments this week, but in the end everything worked out. Sometimes it’s really worth considering the old advice “if it ain’t broke, don’t fix it”.  Amazing how the most innocent of design changes can cause such major headaches at times…

The picture below shows the Ubuntu desktop as seen within the TightVNC client running on a PC. At this point, the Parallella can be used as a “headless” low power server. The next critical step will be to ramp up HDMI and USB functionality so that the Parallella can be used as a standalone computer.

The next two weeks will be filled with addition board bring-up work and extensive Parallella validation and testing. We need to make sure any and all major issues are rung out before we send out the first 100 prototypes to Kickstarter backers who signed up for Parallella early access. At this point we are aiming to ship these early boards by the second week of June.

We will be posting some SIGNIFICANT Parallella related news over the next few weeks so don’t go far! The best way stay up to date on Parallella news is to follow us on twitter: (@adapteva and @parallellaboard)

Posted in Blog

Google Summer of Code Opportunities

Google Summer of Code 2013 logo

Two open source projects which will be participating in Google Summer of Code (GSoC) 2013 are inviting applications from students interested in Parallella related projects.

In both cases the Parallella project will supply hardware to students along with co-mentoring from Yaniv Sapir (Adapteva), and if necessary provide network access to a prototype board.

GNU Radio

GNU Radio project logo

The GNU Radio project’s ideas list is hosted on their wiki, along with project details from their 2012 participation.

The mentor for a GNU Radio/Parallella project will be Tommy Tracy II of University of Virginia’s High Performance Low Power Lab. In a recent forum post Tommy provided instructions for building GNU Radio on Parallella, and he is currently working on porting GNU Radio blocks across to Epiphany.

Access to a Parallella prototype that is hosted in the ORBIT wireless testbed at Winlab, Rutgers University, should also be possible.

Openwall

Openwall project logo

The Openwall project are interested in projects which add support for coprocessors such as Epiphany, for use with various hash/cipher types. The also have an ideas page with further details.

Time is of the essence

The deadline for applications is Friday 3rd May and time is of the essence as students must start working with projects before then, to discuss proposals in detail and in some cases perform a qualification task.

This is a fantastic opportunity to take part in a respected global program, and to work on software that matters and with a cutting edge hardware platform, while getting support from domain experts and also receiving a stipend!

Posted in Blog

Hello World! My name is Parallella.

The Parallella Board Has Arrived!

DAY 1:

The first 10 Parallella boards arrived on Thursday 4/11 and we fired up the first boards around 1:30pm. Some very nervous moments followed…but no smoke was coming out of the board! We would like to say that everything worked perfectly out of the box..but that wasn’t quite the case. Fortunately we decided to do the initial bringup at the local factory (with some great soldering “artists” on hand) so we had most show-stopper  issues and workarounds done within a few hours. Later that evening, we did uncover one more issue that we had to resolve ourselves in order to keep moving forward. (Lesson learned: I need to improve my soldering skills and make sure we have the right equipment on hand because those pads are very small! ) One very ugly soldering job later and we were up and running.

DAY 2:

Debugging continued. Roman configured the Zynq chip to use the UART port on the PEC_POWER connector as a terminal output and wrote a bare metal application that executed out of the Zynq’s on-chip memory.  After a few hours, garbled messages were finally coming out of the UART port.  At that point it didn’t take long to figure out what had changed compared to the previous Parallella prototype … Finally we could see these beautiful words appear on the terminal screen.

DAY 3:

More good news! Roman was able to communicate directly with the Epiphany chip and set the flag output on the Epiphany chip. It may sound trivial, but it’s actually a pretty big deal since almost everything on the Epiphany chip is memory mapped.

 

Moving forward:

We still have a LOT of work in front of us to test all the features on the Parallella board, but the first tests shown here are very encouraging!  The next step is to work through the Linux boot process so that we can start testing the peripherals. After Linux boots properly, we will start performance testing the board using the software infrastructure built up for the Parallella prototype boards over the last few months.

The next few weeks are going to be a lot of fun!

Andreas

 

 

 

Posted in Uncategorized

Parallella: An Open Hardware Platform

While you are waiting for news about the Parallella board power-up, you might want to check out these latest developments.

A Parallella Platform Reference Design: A how-to guide for building a headless Parallella written by Roman. The source code for the HDL logic used to interface the Zynq to the Epiphany has been released with a GPL license.

A Guide to Building Linux for the Parallella Platform: A how-to guide for building and booting Linux (Ubuntu 12.04) on the Parallella platform from source written by Roman.

Epiphany Drivers “2.0″: Substantial improvements to our driver API to make the Epiphany architecture easier to use. Yaniv has published the new sources on github and has updated the SDK reference manual (found here).

The Parallella ftp site is now open to everyone (ftp://ftp.parallella.org). Prebuilt SD images and SDKs can be found to simplify the ramp up process.

The Parallella board design files (schematic, gerber, etc) will be released under a creative common license once the design is deemed production worthy and we start shipping to 1000′s of Parallella Kickstarter backers.

Posted in Blog

Parallella design files shipped to manufacturer!

It took much longer than expected, but the first revision of the Parallella board has been sent off to the manufacturer and we will have the first 10 Parallella boards built up and ready for testing by April 15th!

At this point, I feel like we owe everyone an explanation why we are running late. The main reason is that we underestimated the challenges involved in reaching the $99 price point when we launched the Parallella project back in September. Producing a $99 single board computer may not be that difficult if you are shipping millions of units (you can buy a $50 Android smartphone today), but with only 6,300 Parallella boards shipping, it was a whole other story. Much gratitude goes out to the  component manufacturers who really “got it” (Xilinx, Analog Devices, Intersil, Micron, Microchip, Samtec all deserve special thanks). Without their help we would be losing $100 per board!

We spent close to two months optimizing the design for cost (ie reducing component counts and scouring the data-books for inexpensive high quality alternatives) in order to get within striking distance of the $99 price target.  Unfortunately there wasn’t much that could be done about the complexity of the board layout due to the abundance of high speed differential routing, 0.8mm pitch BGAs, and tight component placement. The Parallella is now a 12 layer board with aggressive line and drill hole dimensions.

The schedule slippage is VERY painful for all of us but in terms of product features the Parallella project is coming together even better than we could have hoped.  The thought of having actual Parallella boards in hand in a  few weeks is very exciting!

Posted in Blog

Epiphany SDK Insights and Future

History

The Epiphany SDK started life as a prototype binutils & GCC port by Alan Lehotsky, which would run code on a Verilator model of the Epiphany chip.

Embecosm became involved in March 2009, initially providing an implementation of the GNU Debugger. Then over a period of 6 weeks starting that September we upgraded GCC to a commercially robust implementation, eliminating all regression test failures from the C and C++ compilers. This was still before the first silicon had been spun, and with testing against a Verilator model.

Verilator creates a cycle accurate C++ model from the Verilog of the design and in doing so faithfully replicates the exact behavior of the chip. Sometimes we found errors that just could not be the compiler and these turned out to be, as can be expected, faults in the early silicon design.

Effectively the 100,000 tests of the GNU regression test suite also became part of the pre-silicon verification of Epiphany. We believe this was the first time this technique had been used in a commercial silicon chip design, and Andreas Olofsson believes that this identified 50-60 hardware design flaws and was a key factor in the 1st generation Epiphany chip working first time.

Features

Epiphany GCC was developed with two clear goals: code must be compact so that it will fit within the relatively small memory of each core; the code generated must be high performance.

Embecosm has continued to work closely with Adapteva, placing particular emphasis on developing the vectorizing features of GCC. Epiphany has a 5-stage pipeline and both a floating point and integer execution unit on each core, and the goal is to keep both of those pipelines as full as possible.

The SDK is more than just the tool chain for which Embecosm is responsible, and it also includes e-hal — the Epiphany driver that provides a consist API across platforms,  e-lib — a library to access all the features of the Epiphany hardware,  a GNU debugger server for easy remote debugging of cores, BSPs, documentation and examples, and in the future an Eclipse CASE environment. [1]

Performance

Even though its primary objective is floating point computation, with a CoreMark performance of over 12,000 for a single core, Epiphany is fast as an integer processor.

One good way to look at performance is to see how well the compiler can schedule instructions to keep the pipeline full. Modern GPU’s consider reaching 20% pipeline occupancy a good result. Epiphany GCC has achieved up to 87% pipeline occupancy with carefully crafted C code, and occupancy of at least 50% is expected with DSP style code! [1]

Another aspect of performance is energy efficiency and the truly unique property of Epiphany is that it achieves its high performance while consuming just a couple of watts power. Working jointly with Bristol University and supported by Adapteva, Embecosm have been looking at how compiler optimizations can be used to reduce the energy consumption of programs.

Roadmap

Our core activity is to continue to developing the compiler, tuning its performance and taking advantage of the latest optimizations in GCC. However, we also want to add new features and our top priority is support for position independent code (PIC), making it easier to move code between cores.

Longer term we want to provide a robust Eclipse environment. You can already use the tool chain with standard Eclipse CDT, but we want to provide a more complete integration that includes support for the multi-core plugin.

Finally, we are continuing our work on energy minimization. The UK government is funding an open source research project to develop a compiler framework for energy optimization using machine learning techniques. Once again we’ll be working with Bristol University  and Epiphany will be one of the architectures used to develop and evaluate the technology. Which means that over the next 18 months Epiphany will be at the heart of some of the most advanced compiler technology in the world.

How you can contribute

We have only a small team working on the SDK and we are looking to the community to participate. Here are some ideas for how you can contribute:

1. Report bugs on the GitHub issue tracker. Good bug reports include a test case to generate the problem. Really good bug reports include a patch.

2. Send us benchmark examples, particularly where you think the compiler ought to do better. Again the GitHub issue tracker is a good place for such examples, but you can also discuss them on the forums to get community suggestions on optimization.

3. Tell us what new features you would like in the compiler. The forums are the best place for this, but once there is a consensus on what a new feature should look like you can record it as an enhancement request in the GitHub issue tracker.

For the ambitious, we welcome suggested code fixes and improvements. The easiest way to do this is to send us a pull request with your patch.

Epiphany already has a world class open source SDK and with help from the community we can build on this and make sure it gets even stronger.

[1] Text updated at 19:00 GMT, March 18, 2013 to provide correct SDK and pipeline occupancy performance details.

Posted in Blog

Introduction to Parallella and OpenCL

The following post provides an introduction to the use of OpenCL™ for programming Parallella.

What is OpenCL?

OpenCL is an industry standard API for parallel programming co-processors on heterogeneous platforms.  Designed primarily for computing with general-purpose graphics processing units (GPUs), the API may be used to access the compute capability of other types of devices including multicore CPUs and other accelerators.  OpenCL provides a good API for exposing the compute capability of the Epiphany co-processor on the Parallella platform.

OpenCL consists of a kernel programming API used to program the co-processor device and a run-time host API used to coordinate the execution of these kernels and perform other operations such as memory synchronization.  The OpenCL programming model is based on the parallel execution of a kernel over many threads to exploit SIMD or SIMT architectures.

OpenCL for the Epiphany Processor

Creating an API for architectures not even considered during the creation of a standard is challenging.  This can be seen in the case of Epiphany, which possesses an architecture very different from a GPU, and which supports functionality not yet supported by a GPU. OpenCL as an API for Epiphany is good, but not perfect.

Our focus with the implementation of OpenCL for Epiphany is to leverage the API to support effective parallel programming that takes advantage of the underlying architecture.  For programmers familiar with OpenCL in a GPU context, it will be important to understand the key differences and the algorithm design and extensions that must be employed to take full advantage of the Epiphany co-processor.  A best-practices guide will be provided to help new and experienced OpenCL programmers get started with Parallella.

OpenCL Support for Parallella: Where We Are Today

The OpenCL support for Parallella is provided by the COPRTHR SDK developed by Brown Deer Technology, with version 1.5 providing full support for the Parallella platform, including OpenCL for both the ARM processor and the Epiphany co-processor.  A release candidate is available now, and we are close to locking down the release. As part of an ongoing effort, we will continue to improve the implementation by expanding coverage and optimizing performance.

A few FAQs About OpenCL for Parallella

What version of the OpenCL standard is supported?
The OpenCL implementation for Parallella presently targets the 1.1 standard specification. Programmers should be aware that not all of the standard is implemented at this time and extensions have been added to expose features of the Epiphany architecture not addressed in the standard. The primary objective is to use OpenCL to provide a useful parallel programming model for Parallella programmers.

Why do we use GCC when everyone else uses LLVM?
The choice of compiler is not strongly bound to the COPRTHR SDK – in fact LLVM was once supported by default.  For Parallella we use GCC because GCC 4.7 provides the compiler back-end support for the architecture.

Why isn’t the memory local to an Epiphany core considered OpenCL __local memory?
OpenCL address space qualifiers co-mingle the concepts of locality and visibility.  For the Epiphany architecture, the per-core memory is most accurately viewed as __private in locality and __local in visibility, with non-uniform access from any core using a common address space.

Where To Go For More Information

The Khronos website provides an excellent starting point for more information about the OpenCL API. The COPRTHR SDK can be downloaded as a pre-built package for Parallella or built from source available on github.  A Parallella Quick Start Guide is also available from the download page.

OpenCL™ is a trademark of Apple Inc. used by permission by the Khronos Group which
develops and maintains the OpenCL standard.

Posted in Blog

Building on The Opportunity at Hand

Parallella is much more than simply a hardware project, and software that is able to harness the power of massively parallel systems is going to be crucial to achieving our goal of making parallel computing accessible to everyone. This includes everything from languages and frameworks that make such systems easier to program, through libraries and toolkits that provide the building blocks for powerful applications, to demonstrators and end-user applications that underline the opportunity.

Ecosystem Foundations

The Epiphany SDK and COPRTHR OpenCL implementation together provide the foundations of our software ecosystem: the compiler and debugger infrastructure, C library, and framework for creating heterogeneous applications that execute across ARM and Epiphany cores.

Only a limited number of prototype boards exist at the time of writing, but we have extensive hardware documentation, source code for the SDK that includes libraries, loader and Linux device driver, and a functional simulator for a single Epiphany core.

We are now in a position to start thinking in more detail about what the missing pieces of our software ecosystem might look like, and work has already been started in a number of key areas.

Work Under Way

Right now members of the community are either developing, or exploring how we might develop, support for:

  • Erlang
  • Python
  • Go
  • Forth
  • An Epiphany operating system
  • GNU Radio

There could easily be efforts that we’ve missed, it’s impossible to know what everyone in the community is working on and some prefer to not share details until certain progress has been made. But suffice to say it’s a really great start!

A Call to Action

If you’re working on something and in a position to do so I’d encourage you to share details via the forums, as this will help to further build momentum in the community and could lead to opportunities for collaboration. Similarly, if you have an idea for something that you’d like to work on but perhaps need some guidance, the forums are great place to find that support.

The Kickstarter surveys helped us identify technologies which we need to make a priority, and those which as far as we know are not currently been worked on include:

  • Computer vision, e.g. OpenCV
  • Hadoop
  • BOINC (as used by SETI@home)
  • OpenMP
  • MPI

If you have experience with any of the above, we need your help!

Posted in Blog

Epiphany SDK Driver and Library Sources Published on GitHub

Today we completed the publication of the last remaining Epiphany SDK source tree repositories to the Adapteva GitHub account, including all the drivers and library files. The two repositories can be found at: http://github.com/adapteva

epiphany-sdk

This repo is the master archive that integrates the various Epiphany-SDK components into a release/installation tree. It contains the master eSDK build script (build-sdk.sh) and some non-binary and non-runtime critical components like the documentation and examples.

epiphany-libs

This repo contains all the Epiphany SDK source code (minus the the Epiphany GNU compiler and binutils which can be found in the Parallella GitHub repository)

1. The e-hal library (the Epiphany Hardware Abstraction Layer, or simply “driver”), which is required to transfer data from the host to the Epiphany and back;

2. The e-loader library used by a host application to load a program to the Epiphany accelerator;

3. The e-xml library, which is the parser of an .xml text file (“Hardware Description File” – HDF) containing information about the hardware platform;

4. The e-server, which is our implementation of a remote-serial-protocol (RSP) gdb debug server, used to bridge between the e-gdb frontend and the hardware. The gdb <-> server communications works over TCP/IP and enables remote debugging of an Epiphany program;

5. The e-lib library, which is the Epiphany run-time library;

6. The e-utils directory, which contains wrapper scripts for some GNU binutils.

7. The bsps directories, which contain the files specific to each system configuration – HDF, LDF’s and a driver.

The epiphany-libs repo contains a master build script, called build-libs.sh that builds the above packages. Each buildable package is given as an Eclipse project that one can import to an Eclipse workspace for convenience.

All source repos are released under the GNU Public License v3 (GPL) or Lesser GPL (LGPL). See the license files in each project for specific licensing details.

Note that the repos reflect the current, in-the-works status of these packages, and will be modified in the future. Also note that in order to obtain the complete eSDK, one needs to download the Epiphany GNU sources, currently posted at the Parallella GitHub page. In the near future, we will integrate these packages within one streamlined build package.

Enjoy!

Posted in Blog

Parallella Stackable Daughter Boards – Tabs

Introduction

We have a real opportunity to build a thriving expansion board ecosystem for Parallella, with the potential to learn from those that went before it and to advance the state of the art. For this reason I have spent some time thinking about how an expansion system can be built — not just for our applications — that helps resolve the issues highlighted and provides a platform on which a healthy daughter card ecosystem can flourish.

For this post I will be focusing on I/O expansion daughter cards (I will refer to these as Tabs) using just the FPGA PEC and Power PEC, as the Epiphany PECs are part of a very different expansion fabric.

What is required?

So lets take a look at at our fundamental requirements, here are the basic features we require:

  1. Stackable Tabs: using the third dimension helps us maximise space usage given the relatively small dimensions available for each Tab.
  2. Cooperative Tabs: one should be able to mix and match several Tabs in any order to provide the application with the resources required to do the job.
  3. Standardised Tab interfaces: the Tabs themselves must draw on and use standard communication mediums, signals and blocks in order to draw from a common pool of available resources.
  4. Standard software abstraction: an internal HDL representation and software abstraction in order to interface to each quanta of the input/output fabric.

How do we do it?

To begin with lets make all Tabs stackable (1) — that means the bottom of each Tab has 2 PEC HTH (PEC FPGA I/O and PEC Power) connectors and 2 PEC HTS connectors aligned on the top and bottom of the Tab PCB. This would allow boards to physically stack on top of each other and signals to be passed through from Tab to Tab in appropriate ways (more on this later).

In terms of signal I/O between the tabs and the real world, external connectors should appear on the east of the board so as not to interfere with Epiphany PEC expansion boards located to the west.

Tab Stack

Next lets concern ourselves with Tab coexistence and  cooperation (2). This means one should be able to mix different Tabs and even mix several of the same Tabs (for greater capacity) in a Parallella application.

An example

Let’s consider a Parallella Robot example to explore the issues we could encounter in a typical application using such Tabs.

In this example we may require a Camera interface for vision, so lets place a Camera Link Tab first at the bottom of the stack. Next we need to add Motion Control Tab for propulsion and manipulation, and lets assume open loop stepper drivers for this purpose. However, we have a problem: due to dimension limitations the Motion Controller Tabs only support 4 stepper drivers per Tab and we need at least 8 for arms, manipulators and wheels. Thus we need to add 2 Motion Control Tabs. Finally, we need an assortment of inputs for ultrasonic distancing, feeler gauges and ADC measurements etc, so we add a breakout Tab.

We could decide, for example, to allocate certain groups of FPGA I/O pins for specific classes of board. This would enable us to mix a Camera Tab, Breakout Tab and even a Motion Tab safely without conflict. However, if the are multiple identical tabs — and there should be, let’s not reinvent wheel each time — then we will have clashes as the tabs conflict on group pin assignment. Not only that, but we would also leave lots of unused pins and I/O bandwidth because we aren’t using the other classes of tabs to which fixed I/O groups may be reserved.

Introducing Ziports

What would be a better solution here is to start with a pool of standard I/O resources and pick them off as we require them, Tab by Tab. So how can we realise such a design and standard use of limited resources given the available connectors on the on the Parallella board? I’d suggest we use the 48 FPGA I/O lines available as 24 LVDS pairs (actually slightly less as we will likely require a few clocks/strobes to be reserved). These pairs form the quanta and signal path of the standard interface which I shall refer to as Ziports.

A Given Ziport has two selectable and compatible serialiser/deserialiser endpoints: one implemented in HDL on the Zynq, and the other using either an LVDS Ser/Des chip or FPGA on the Tab. Each Ziport once claimed and configured has a fixed direction (input or output) and bit width. The width is determined by the Ziport’s instantiation — it does not change at runtime — and is determined by the Tab and its driver modules and/or definition. This then provides us with standard quanta in which to divide Tab I/O. A Tab therefore will declare how many Ziports it uses and these are fixed.

Tab stacking

Next we’ll take a look at how we could divide and connect Ziports up in a way that enables the Tabs to be stacked without conflict. I would suggest that we use a technique of shifting the free Ziports as we move up the Tab stack. Here is how that could work physically:

A given Tab uses X number of Ziports from the lowest number pins (think of these a numbered slots containing an LVDS pair) on the PEC connector, then it shifts the remaining free Ziports right onto the lowest Ziport slots (on the next Tab connector), ready for the Tabs above to consume. Each Tab consumes the Ziports it requires from the first (lowest) slots on the PEC connector and in turn shifts the remaining Ziports onto the lowest slots for its next daughter Tab. Think of the interconnects between Tabs as parallel sets of Ziport “escalators” between Tabs, carrying the unused Ziports to the first Ziport slots/pins.

Tab Stack Ziport Shift

I won’t go into detail about how we could implement the Ziports themselves in this post but will provide an overview of their basic operation and features.

The physical signal across the PEC connectors is LVDS with Ser/Des at each end configured to the required bit width. The output serialiser will have a minimum of two 32 bit registers at the processing end (inside the Zynq FPGA fabric): one for the loadable value of the bit pattern and another which can serialise that bit pattern over LVDS. The deserialiser inputs within the fabric will again have at least two registers, for the received value and a pattern register which can fire events on matching bit patterns during reception. It might later be possible to create simple serial state machine logic as well as the serialiser within the output Ziports, to create all manner of dynamic output bit patterns (timers,waveform generators and tables etc) without requiring extra core cycles to operate them once they are set up. However, I think we would need to keep things fairly simple to begin with, using basic 32 bit register serialisers in order to maintain flexibility, developing more sophisticated state machines as we progress.

We will also need a way to specify Tab configuration, i.e. which Tabs use which Ziports in order to configure and communicate with them at the eCore process end. For this I would suggest we use some sort of file describing the configuration, as this would likely be static.

Basic Ziport operation

Output case

eCore -> Statemachine -> Serialiser -> LVDS -> Deserialiser -> Output

Input case

Input -> Serialiser -> LVDS -> Deserialiser -> Patternmatcher -> eCore

In both cases “Patternmatcher” and “Statemachine” are options. We may also consider DMA source points from Ziport inputs as an additional option for dumping larger IO sequences or frames.

The basic Tab design rule is : consume your required  Ziports from the right most slots, shift all remaining unused Ziports into right most slots (think align right for Ziports).

Custom I/O and HDL Tabs

Obviously there is a Ziport configuration default in order to maintain compatibility with between Tabs, but there is nothing preventing custom HDL definition and use of FPGA pins in a other ways as long as it follows the slot shifting rules and doesn’t prevent the remaining unused ports from operating in the standard Ziport manner (i.e. leaving enough resources). That way it allows other Tabs to be mixed with them, cooperating in the Tab ecosystem.

Secondary ports

When Ziports are used by Tabs and the remaining are shifted to lower port slots, spaces are effectively left in the connectors. I can envisage cases where these port spaces may be recycled as a kind of secondary Ziport that is used for inter-tab communication.

Such communications may be used where there is not a requirement eCore processing. This could be pipeline, filtering or hard DSP like applications. Although we would need to explore how this sort of feature could work and be configured.

Conclusion

Using a solution based around Ziports with Tab based shifting, virtual port slots and inter-tab connectors would enable us to create a wealth of cooperative Tabs that could be mixed and matched seamlessly by simply budgeting for the required Ziports. This would work not only for different classes of Tabs but also for multiple Tabs of the same design and class.

Such an approach would support an ecosystem of expansion boards without having to reinvent the wheel each time, providing a catalogue of functionality for diverse applications. Enabling rapid application development based around Parallella boards using skills brought to bare via compatible hardware modules created by the community.

Get involved!

I’m keen to hear your thoughts and have created a forum thread for discussing this further and in order that we can work towards a specification and eventual implementation.

Posted in Blog