not related to multicore, but how do you divide one number through another on a risc-processor? how many ticks does it take?
looking at the documentation, I noticed there is no way to divide one register by another, neither is there divide-by-zero exception. so how will division algorithm look like? what's the fastest implementation, counted from prepared registers to an output of a result and remainder? obviously there are 3 approaches: implement it the way you learned at school, using integers only. or make use of the shifting-operator to get a "divide by two". or make use of floating-point numbers by multiplying a whole number with a fraction. guess the best would be to combine all 3 depending on what the core currently is occupied with.
I'm choosing this example because maybe someone will have an idea how to make use of some other core for this simple task. or maybe someone knows of another method. with x86 assembler I always were wondering why division is done in constant time, and if it would be possible to speed up the machine-code by different approach...