by optimaler » Mon Aug 19, 2013 7:54 pm
I'd also like to hear more about the eLink cables, since I presume that the mini-cluster and full cluster backers are going to be getting a few of these (we are getting those, right?). I'm mostly concerned about designing MPI type software around the architecture provided by eLink to obviously get the best performance EVAR. I want my cluster to scream as loud as it can.
Here are some questions I have right now:
1) Let's say I have eight parallella boards and eight eLink cables to go with them. A ring topology for message passing is the obvious option here. Is there a penalty for communicating with boards more than one jump away, or does eLink have some kind of fast pass-through mechanism that skips any processing? This kind of topology has caused me grief trying to get Xeon Phi's to play nicely with some of my more serious code.
2) Is a tree topology possible with the eLink cables, or is there some kind of hardware limitation which prevents a board form having more than two connections? (A better question is, are these connecting to the GPIO pins? That obviously limits the hardware connections).
3) How long are the eLink cables? In order to achieve any of the above mentioned topologies, we need to be able to stretch at least two boards with the standoff legs.
4) Related to prodsn's question, what is actually communicating via eLink cables? Parallella only, or will the ARM procs be able to communicate directly over eLink as well? (I actually don't really care that much, although it would be a nice perk).
5) Semi related question: when the Adapteva crew put together the 42-board cluster, did you do any benchmarks on ethernet saturation? I imagine 2-4 boards wouldn't be able to saturate a 1Gb switch, but it might be a concern for more than that (say, eight boards -wink-).*
I eagerly await the response of our fearless designers.
*A follow on to this thought: Thinking of the different MPI message passing styles (point-to-point vs broadcast vs all-to-all), did you test how the Parallella cluster performed in each circumstance? I would expect all-to-all to saturate the ethernet, but not necessarily a broadcast.