Google Summer of Code Projects with Openwall

As announced earlier, GNU Radio and Openwall offered some project ideas related to Parallella under Google Summer of Code 2013. Well, at Openwall we ended up accepting two great students to work on tasks related to Parallella (we’ve accepted a total of four students this year).

As part of her qualification task, Katja Malvoni has ported our implementation of bcrypt password hashing to Epiphany, using one core for now. The code passes validation per a test vector. During the summer, Katja is to work on getting this bcrypt implementation to run on all Epiphany cores and with greater efficiency, and having it integrated with John the Ripper password cracker. Further, she may also work on making use of the Xilinx Zynq FPGA fabric to achieve (much) greater combined performance at bcrypt. (Did you know that it’s trivial to reprogram the FPGA on ZedBoard / Parallella board from a running Linux system, in only 85 ms? This should enable us to use specialized FPGA bitstreams for different JtR formats.)

The other student accepted for Parallella related tasks is Rafael Waldo Delgado Doblas, also known as lordrafa. Rafael’s tasks include addition of scrypt support to John the Ripper using host CPU, experimenting with scrypt time-memory tradeoff, and then implementing scrypt cracking on Epiphany making use of the tradeoff (to fit in each Epiphany core’s 32 KB of local memory regardless of scrypt’s actual settings). Considering that scrypt on Epiphany makes little sense except at very low memory settings (where having to use the tradeoff doesn’t result in too much of a slowdown), the likely next task for Rafael is to implement Litecoin mining on Epiphany. Luckily (for this task), Litecoin uses scrypt at as low as 128 KB, which translates into only a ~2x expected slowdown when we reduce the memory needs to below 32 KB.

A further task potentially available to either student, if time permits – or to a non-GSoC contributor (please feel free to contact us if interested!) – is implementation of traditional DES-based crypt(3) aka descrypt cracking on both chips (Epiphany and Zynq). So far, we have descrypt cracking on Zynq’s two ARM cores, making use of NEON SIMD extensions and of both ARM cores at once (with OpenMP), but indeed that’s still only making use of a small fraction of the board’s processing power.

We’re looking forward to working with our GSoC students and with anyone else interested.

Alexander (better known as Solar Designer, @solardiz)
Openwall Project (@Openwall)

4 Comments

  • Edwin Mizrahi says:

    Has Lordrafa been able to implement Litecoin mining on a Parallela board or even a Parallela four board cluster?

  • Yes, Rafael did implement Litecoin mining on 16-core Epiphany (we’re testing on Parallella prototype, which is ZedBoard + FMC with the Epiphany chip, but the same code should run on the real thing just as well). We did not expect this to run fast (we’d need much bigger/future and/or many more chips for this to make practical sense, so we only wanted to achieve a decent performance/Watt figure), but the performance achieved so far is even worse than what we were hoping for. Namely, our optimistic goal was 5 khash/s on 16-core at 600 MHz, whereas Rafael achieved only ~1300 h/s so far. That’s without assembly code yet. Rafael is proceeding with asm now, which may bring the speed up to 2 kh/s or so. I think slightly higher speeds (closer to our optimistic goal) are possible, but I don’t expect they will be achieved in Rafael’s GSoC project. Speaking of the time-memory tradeoff, Rafael is currently using a factor of 5 (and is experimenting with storage of some additional V elements, which would bring the effective TMTO factor to slightly less than 5), which corresponds to a slowdown associated with the TMTO of exactly 2x (or slightly less, with those additional tweaks). As to testing on a small cluster, we’re rather uninterested in trying that: in absolute terms, the speed will remain too low anyway, and we’re confident that if things work with one board, technically they will just work with a small cluster as well (simply run an instance of our modified cgminer on each board, and they’ll talk to the pool each on its own – OK to do for just four boards). We might, however, test on one of the very few 64-core boards (which use 64-core Epiphany chips). If you’d like to experiment with Rafael’s modified cgminer anyway (being developed, may be more buggy and/or slower than our best revision so far), here it is:

    https://github.com/LordRafa/cgminer

    Build/run it on the Parallella system with:

    ./autogen.sh
    make -j2
    ./build.sh
    ./run.sh

    The results of Katja’s project so far look much better: she implemented bcrypt cracking, along with John the Ripper integration, on Parallella. Using 16-core Epiphany at 600 MHz, Katja’s code achieves ~1205 c/s at bcrypt “$2a$05” hashes. (This is with carefully tuned Epiphany assembly code for bcrypt’s two most costly loops, but without use of the host ARM cores for computation yet. Without asm, the best speed was just below 1000 c/s. With two ARM cores used for computation as well, an extra ~165 c/s could be achieved on top of the ~1205 c/s figure.) This speed is similar to that of Core 2 Duo at ~2.2 GHz (with both cores in use), but is achieved at much lower power usage. Naturally, we expect ~4800 c/s on 64-core Epiphany at 600 MHz, which would be on par with recent quad-core x86-64 CPUs (e.g., Core i7-2600 achieves exactly this speed with John the Ripper, cumulative for 8 threads, at ~3.5 GHz). Somehow Katja’s recent attempt to test on 64-core failed (locking up the board even), but I guess she’ll figure out what went wrong and correct that (although it may be tricky to do with only remote access to the board). Besides the Epiphany work, Katja is now proceeding with bcrypt on Zynq FPGA (she already got an initial implementation working correctly, albeit slowly) and with John the Ripper integration of that. Katja’s modified John the Ripper tree is here:

    https://github.com/kmalvoni/JohnTheRipper

    Build it on Parallella with “make -j2 linux-parallella” in “src”, run it via the “parallella_john.sh” script in “run”.

    By the way, this year GSoC has been shifted to conclude in the end of September, so it’s still two weeks to go.

  • I forgot to mention: Katja’s changes are only on branch “bleeding-jumbo” in that repository – they are not seen on the default branch. You may clone the right branch with:

    git clone git://github.com/kmalvoni/JohnTheRipper -b bleeding-jumbo

  • Oops, that’s still not right. While most cutting-edge JtR development is on branch bleeding-jumbo (in people’s private repositories, as well as in our project’s shared one), Katja’s work specifically is on branch master in her repository, because we wanted to avoid jumbo’s extra dependencies and slower build time when we only support one JtR format on Epiphany for now (by the way, this format is called bcrypt-parallella). So a correct command is:

    git clone git://github.com/kmalvoni/JohnTheRipper -b master

Leave a Reply