We have a real opportunity to build a thriving expansion board ecosystem for Parallella, with the potential to learn from those that went before it and to advance the state of the art. For this reason I have spent some time thinking about how an expansion system can be built — not just for our applications — that helps resolve the issues highlighted and provides a platform on which a healthy daughter card ecosystem can flourish.
For this post I will be focusing on I/O expansion daughter cards (I will refer to these as Tabs) using just the FPGA PEC and Power PEC, as the Epiphany PECs are part of a very different expansion fabric.
What is required?
So lets take a look at at our fundamental requirements, here are the basic features we require:
- Stackable Tabs: using the third dimension helps us maximise space usage given the relatively small dimensions available for each Tab.
- Cooperative Tabs: one should be able to mix and match several Tabs in any order to provide the application with the resources required to do the job.
- Standardised Tab interfaces: the Tabs themselves must draw on and use standard communication mediums, signals and blocks in order to draw from a common pool of available resources.
- Standard software abstraction: an internal HDL representation and software abstraction in order to interface to each quanta of the input/output fabric.
How do we do it?
To begin with lets make all Tabs stackable (1) — that means the bottom of each Tab has 2 PEC HTH (PEC FPGA I/O and PEC Power) connectors and 2 PEC HTS connectors aligned on the top and bottom of the Tab PCB. This would allow boards to physically stack on top of each other and signals to be passed through from Tab to Tab in appropriate ways (more on this later).
In terms of signal I/O between the tabs and the real world, external connectors should appear on the east of the board so as not to interfere with Epiphany PEC expansion boards located to the west.
Next lets concern ourselves with Tab coexistence and cooperation (2). This means one should be able to mix different Tabs and even mix several of the same Tabs (for greater capacity) in a Parallella application.
Let’s consider a Parallella Robot example to explore the issues we could encounter in a typical application using such Tabs.
In this example we may require a Camera interface for vision, so lets place a Camera Link Tab first at the bottom of the stack. Next we need to add Motion Control Tab for propulsion and manipulation, and lets assume open loop stepper drivers for this purpose. However, we have a problem: due to dimension limitations the Motion Controller Tabs only support 4 stepper drivers per Tab and we need at least 8 for arms, manipulators and wheels. Thus we need to add 2 Motion Control Tabs. Finally, we need an assortment of inputs for ultrasonic distancing, feeler gauges and ADC measurements etc, so we add a breakout Tab.
We could decide, for example, to allocate certain groups of FPGA I/O pins for specific classes of board. This would enable us to mix a Camera Tab, Breakout Tab and even a Motion Tab safely without conflict. However, if the are multiple identical tabs — and there should be, let’s not reinvent wheel each time — then we will have clashes as the tabs conflict on group pin assignment. Not only that, but we would also leave lots of unused pins and I/O bandwidth because we aren’t using the other classes of tabs to which fixed I/O groups may be reserved.
What would be a better solution here is to start with a pool of standard I/O resources and pick them off as we require them, Tab by Tab. So how can we realise such a design and standard use of limited resources given the available connectors on the on the Parallella board? I’d suggest we use the 48 FPGA I/O lines available as 24 LVDS pairs (actually slightly less as we will likely require a few clocks/strobes to be reserved). These pairs form the quanta and signal path of the standard interface which I shall refer to as Ziports.
A Given Ziport has two selectable and compatible serialiser/deserialiser endpoints: one implemented in HDL on the Zynq, and the other using either an LVDS Ser/Des chip or FPGA on the Tab. Each Ziport once claimed and configured has a fixed direction (input or output) and bit width. The width is determined by the Ziport’s instantiation — it does not change at runtime — and is determined by the Tab and its driver modules and/or definition. This then provides us with standard quanta in which to divide Tab I/O. A Tab therefore will declare how many Ziports it uses and these are fixed.
Next we’ll take a look at how we could divide and connect Ziports up in a way that enables the Tabs to be stacked without conflict. I would suggest that we use a technique of shifting the free Ziports as we move up the Tab stack. Here is how that could work physically:
A given Tab uses X number of Ziports from the lowest number pins (think of these a numbered slots containing an LVDS pair) on the PEC connector, then it shifts the remaining free Ziports right onto the lowest Ziport slots (on the next Tab connector), ready for the Tabs above to consume. Each Tab consumes the Ziports it requires from the first (lowest) slots on the PEC connector and in turn shifts the remaining Ziports onto the lowest slots for its next daughter Tab. Think of the interconnects between Tabs as parallel sets of Ziport “escalators” between Tabs, carrying the unused Ziports to the first Ziport slots/pins.
I won’t go into detail about how we could implement the Ziports themselves in this post but will provide an overview of their basic operation and features.
The physical signal across the PEC connectors is LVDS with Ser/Des at each end configured to the required bit width. The output serialiser will have a minimum of two 32 bit registers at the processing end (inside the Zynq FPGA fabric): one for the loadable value of the bit pattern and another which can serialise that bit pattern over LVDS. The deserialiser inputs within the fabric will again have at least two registers, for the received value and a pattern register which can fire events on matching bit patterns during reception. It might later be possible to create simple serial state machine logic as well as the serialiser within the output Ziports, to create all manner of dynamic output bit patterns (timers,waveform generators and tables etc) without requiring extra core cycles to operate them once they are set up. However, I think we would need to keep things fairly simple to begin with, using basic 32 bit register serialisers in order to maintain flexibility, developing more sophisticated state machines as we progress.
We will also need a way to specify Tab configuration, i.e. which Tabs use which Ziports in order to configure and communicate with them at the eCore process end. For this I would suggest we use some sort of file describing the configuration, as this would likely be static.
Basic Ziport operation
eCore -> Statemachine -> Serialiser -> LVDS -> Deserialiser -> Output
Input -> Serialiser -> LVDS -> Deserialiser -> Patternmatcher -> eCore
In both cases “Patternmatcher” and “Statemachine” are options. We may also consider DMA source points from Ziport inputs as an additional option for dumping larger IO sequences or frames.
The basic Tab design rule is : consume your required Ziports from the right most slots, shift all remaining unused Ziports into right most slots (think align right for Ziports).
Custom I/O and HDL Tabs
Obviously there is a Ziport configuration default in order to maintain compatibility with between Tabs, but there is nothing preventing custom HDL definition and use of FPGA pins in a other ways as long as it follows the slot shifting rules and doesn’t prevent the remaining unused ports from operating in the standard Ziport manner (i.e. leaving enough resources). That way it allows other Tabs to be mixed with them, cooperating in the Tab ecosystem.
When Ziports are used by Tabs and the remaining are shifted to lower port slots, spaces are effectively left in the connectors. I can envisage cases where these port spaces may be recycled as a kind of secondary Ziport that is used for inter-tab communication.
Such communications may be used where there is not a requirement eCore processing. This could be pipeline, filtering or hard DSP like applications. Although we would need to explore how this sort of feature could work and be configured.
Using a solution based around Ziports with Tab based shifting, virtual port slots and inter-tab connectors would enable us to create a wealth of cooperative Tabs that could be mixed and matched seamlessly by simply budgeting for the required Ziports. This would work not only for different classes of Tabs but also for multiple Tabs of the same design and class.
Such an approach would support an ecosystem of expansion boards without having to reinvent the wheel each time, providing a catalogue of functionality for diverse applications. Enabling rapid application development based around Parallella boards using skills brought to bare via compatible hardware modules created by the community.
I’m keen to hear your thoughts and have created a forum thread for discussing this further and in order that we can work towards a specification and eventual implementation.