by over9000 » Mon Oct 14, 2013 8:29 pm
I've lots of ideas for things I'd like to try on the Parallella, but this is the one that I'm currently most interested in.
I want some libraries to be able to do arbitrary precision arithmetic on integers, with a focus on RSA-type algorithms (finding large probable primes and doing power operations mod some large numbers).
All of the bignum libraries I've found are either too complicated (gmp) or too large to fit into the Epiphany's memory space without having some sort of dynamic code loading/overlay system so I'll write or port some simpler/smaller routines that should fit in 2Kbytes of program space or less without having to resort to overlays.
I'm also thinking of ways to exploit parallelism when doing calculations with bignums. Initially, I'll start by writing a simple grammar for a calculator and then figure out which calculations can be done in parallel. I'll output a form of 3-address code (or some other bytecode) that encapsulates the execution plan for the overall calculation. Instructions will include explicit synchronisation operations so that nodes wait for dependent operations to complete before operating on them.
The 3-address code will assume an abstract machine with multiple computation nodes, each of which is modelled as a bunch of registers (carved out of the local memory space) + ALU + synchronisation primitives. So in effect, each node is basically like a simple CPU for working with bignums, although it will also be able to address registers in other nodes as the target of an operation (eg, for a=b * c, both b and c must be local registers, while a, the result, can be remote).
It should be fairly simple to map the abstract machine and bytecode interpreter onto the Epiphany. I'll make things even simpler by doing all of the bytecode interpreting on the host side and implement simpler command queues on the Epiphany side, so it's more like the host is the (simulated) CPU and the Epiphany cores are ALUs.
I intend for this to be portable. Once I've decided on the details of the bytecode and synchronisation primitives, I want to be able to run it on a variety of platforms including multi-threaded, SMP systems (a quad-core x86_64 box), the PS3's Cell processor (which, like Parallella, has a host "PPU" and several "SPU" coprocessors), the Parallella platform itself and an OpenMPI-based virtual machine (with rank 0 host being the "CPU"). I might try to tweak the code generator so that it can take into account the different costs associated with each operation (eg, add, mul, div and copy) on each of the target platforms and try to generate an execution plan (ie, the bytecode) that is best for that platform.
If I manage to get all that working, I'll look at extending the grammar to include conditional statements and loops. I want the language to evolve to the point where it's able to express some key algorithms (finding probable primes, primality testing, gcd and extended gcd) within the language itself. I might also look into doing speculative execution, eg, following both sides of a conditional statement or executing several iterations of a loop in parallel (see "foreach" thread).
Besides the use I have for an RSA-type calculator (actually, I want it for one-way RSA accumulators), I'm mainly doing this as a learning exercise. Also, I have a bunch of machines of varying power (mainly Pis and other ARM-based machines, but also a PS3 and an old, but usable x86_64 machine as my main PC), so I'd like to come up with an application that can get them doing some useful work.