A few months back we started an effort called the “Parallel Architecture Library” (PAL). We already have a few external contributors but we are really looking for many more. PAL is an embarrassingly parallel problem so the more contributors we have the faster we can move!

I know none of you are motivated by things like free computers, but as a token of our appreciation for making parallel computing easier, Adapteva is sending out a free Parallella computer for every optimized function contributed to PAL.

**How it Works?**

- You submit a pull request for one optimized math function to the PAL git repository.
- If pull request is accepted we will then contact you to send you a free Parallella board as thanks for contributing to the cause.

**What functions?**

- Initially we are looking for contribution to the basic math, DSP, and image processing functions found in the PAL overview. There are currently 61 such functions.

**Legal Stuff:**

- Your contributed code is still owned by you under the Apache license.There is no transfer of copyright involved here.
- This is not work for hire. It’s simply us saying thanks for contributing to making parallel programming easier for everyone and for being awesome!
- Export restrictions means we can’t ship to certain countries.
- You are responsible for import duties if there are any.
- Don’t try to game the system. Adapteva can cancel this program at any time.:-)

**Guidelines:**

- Code should be written in vanilla C and compilable with gcc (no assembly!)
- These are leaf functions (no calling/linking to other libraries)
- Functions should be correct and sufficiently accurate.
- Minimize code size AND maximize performance 🙂
- At this time, functions do not have to be multi-threaded.
- Most of the PAL functions already have naive implementations, what is needed here is optimization.
- An example of an optimized function.
- UPDATE: No division operations [most DSPs and small micros don’t have hardware division circuits]
- The goal is 40% of peak performance as compared to theoretical peak or best in class commercial binary library (pick any architecture…)
- If you can’t meet the 40% performance target, write a note in your commit message stating why this an unreasonable target.
- If you think there are functions missing from the library please make a suggestion to add (or submit a PR for the function)
- For more guidelines, see library documentation

Let’s go!

Andreas

[…] Write a C function and get a free computer […]

Hmm, cloned the repo, optimized a few functions, tried to push .. failed. Used my github login (ldraconus) … did I miss a step, do something wrong …

… not a typical git user, so any help would be appreciated …

… these are trivial optimizations to prep for vectorizing the functions ….

*Not a github expert. The way I did it, I wanted to make a modification to p_max.c. I navigated to this page: https://github.com/parallella/pal/blob/master/src/math/p_max.c

I clicked the pencil button (Edit the file in your fork of the project), and it opened the file. I edited there and then was able to submit a pull request through the website.

What about the .

For example, in p_ftoi.c, we need to convert floats to integers. But floats are quite possibly larger than the maximum integer (or smaller than the minimum integer). Should we ignore any errors and go for pure performance, or include asserts to catch the standard edge cases?

Aolofsson replied to this question in https://github.com/parallella/pal/pull/73

He said: “No error checking to keep codesize minimal.”

I have a question:

For example tangent is now implemented with the tanf() function.

I suppose there should come a numerical algorithm to calculate it with like series expansion or something like that.

How accurate should that algorithm be, shouldn’t this be given?

Or am I seeing it wrong?

It seems that the median function (in math) is implemented incorrectly as the function doesn’t even use the sorted array! (I fixed it just to show the correct implementation) . Any way I replaced it with QuickSelect method instead which is much faster than current method (unnecessarily sorting all elements to find the element in the middle).

For 3×3 median filter (image) I replaced it with hardwired median search described here :

http://users.utcluj.ro/~baruch/resources/Image/xl23_16.pdf

Codes are in public domain and are taken from here :

http://ndevilla.free.fr/median/median/

https://github.com/ebadi/pal

How come none of the math functions currently seem to make use of the last two parameters: “int p, p_team_t team” Presumably they should to become parallel. Is there any example of that being done?

I was asking myself the same question. I haven’t seen a function yet which uses them

“At this time, functions do not have to be multi-threaded.”

I think just optimizing is the point right now..? Without going parallel? Idk

Andreas, Include math function examples using parallela p_team_t team and show all developers here and everywhere..

[…] Here’s the brief.. […]

Are allowed to use SIMD (SSE/NEON) instructions or not? If so, should we provide wrap everything in macros or is the target platform known?

Will this work?

//

int ledPin = 13 ;

int delayPeriod = 250 ;

void setup ( )

{

pinMode (ledPin, OUTPUT ) ;

}

void loop ( )

{

for ( int i = 0 ; i < 20; I ++ )

{

flash ( ) ;

} delay ( 3000 ) ;

}

void flash ( )

{

digitalWrite ( ledPin, High ) ;

delay ( delayPeriod ) ;

digitalWrite ( ledPin, LOW ) ;

delay ( delayPeriod ) ;

}

[…] announcing the “One board for one C function” offer for the Parallel Architecture Library (PAL), the number of contributors to this project has […]

[…] out later). If you want them shipped now, demonstrate how you have achieved at least 60% 40% [UPDATE: forgot that we lowered target] of peak theoretical processor performance on each […]

[…] Write a C function and get a free computer […]