Epiphany-V: A 1024-core 64-bit RISC processor

I am happy to report that we have successfully taped out a 1024-core Epiphany-V RISC processor chip at 16nm. The chip has 4.5 Billion transistors, which is 36% more transistors than Apple’s latest 4 core A10 processor at roughly the same die size. Compared to leading HPC processors, the chip demonstrates an 80x advantage in processor density and a 3.6x advantage in memory density.

Epiphany-V Summary:

1024 64-bit RISC processors
64-bit memory architecture
64-bit and 32-bit IEEE floating point support
64 MB of distributed on-chip SRAM
1024 programmable I/O signals
Three 136-bit wide 2D mesh NOCs
2052 separate power domains
Support for up to One Billion shared memory processors
Support for up to One Petabyte of shared memory
Binary compatibility with Epiphany III/IV chips
Custom ISA extensions for deep learning, communication, and cryptography
TSMC 16FF process
4.56 Billion transistors, 117mm^2 silicon area
DARPA funded

Chips will come back from TSMC in 4-5 months. We will not disclose final power and frequency numbers until silicon returns, but based on simulations we can confirm that they should be in line with the 64-core Epiphany-IV chip adjusted for process shrink, core count, and feature changes. For more information, see report below:

Epiphany-V Technical Report

Cheers,

Andreas

This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

94 Comments

Nick says:

October 5, 2016 at 8:35 am

Ripper!
Deano says:

October 5, 2016 at 10:52 am

Impressive achievement from such a small team!

Love to hear more about the deep learning ISA extension, potential a killer applications for such a chip.

Good luck with phase from tape out to working chips in your hand.
Petar says:

October 5, 2016 at 1:14 pm

just one word: WOW
Stéphane Zuckerman says:

October 5, 2016 at 2:18 pm

Congratulations! I hope I’ll be able to get my hands on this very very VERY soon. 🙂
Badhri says:

October 5, 2016 at 2:18 pm

Funtastic achievement
Jose M Monsalve says:

October 5, 2016 at 3:59 pm

Oh man! So exciting!!!!
valache says:

October 5, 2016 at 4:36 pm

I would like to have one in the first lot…
Richard G Wiater says:

October 5, 2016 at 8:32 pm

How much will the Epiphany-V: A 1024-core 64-bit RISC processor cost?
Kevin Leary says:

October 6, 2016 at 4:41 pm

Congrats. Impressive.
Mx says:

October 6, 2016 at 5:11 pm

Will it support arduino IDE?
dspx90 says:

October 6, 2016 at 5:36 pm

Is there any pricing out yet? Would like to have one for graphic development, so i hope its not top notch pricing.
Arumaraj says:

October 6, 2016 at 6:11 pm

So happy to hear this +
Witek says:

October 7, 2016 at 1:31 am

Finally!

Congratulations. Awesome job.

Doubling of local SRAM from 32 to 64KB per cores, is really good thing. But I also hope bigger programs / kernels can be loaded into few neighbours and used as a “mini cluster” of semi-local memory. Would be nice to have separate network for that (other than the one used for off chip communication).
Witek says:

October 7, 2016 at 1:45 am

Assuming 1GHz for the IO clock, 192 Bytes / cycle, translates to 192GB/s total extranal bandwidth, or 179MB/s per core. In big systems I would expect half of the IO used for memory, and half for interconecting other CPUs, giving 90MB/s per core on average. Which isn’t very high in memory intensive applications (like machine learning for example).

I know 1000 IO pins is a lot, but might still not be enough.
saeed says:

October 8, 2016 at 3:57 pm

it is good!!!!!
saeed says:

October 8, 2016 at 3:58 pm

it is good!!
Robert Fontaine says:

October 8, 2016 at 8:05 pm

Congratulations will you continue to provide an fpga alongside your lovely little chips?

You may have been a bit too forward thinking with the parallela but it seems that the hpc community and even big data is catching on to the fact that some of their algorithms can be fundamentally more efficient on an slow fpga than on a fast cpu/gpu.

Well done!
Alexandru C says:

October 12, 2016 at 7:18 am

How many gflops ?
Nancy says:

October 12, 2016 at 5:47 pm

Congratulations! Way to go!
Gregory Fowler says:

October 12, 2016 at 8:59 pm

Congratulations!!! This is an outstanding achievement!! Well done!!!
Kiran Ramaprasad says:

October 13, 2016 at 8:42 am

Wow!
roberto says:

October 13, 2016 at 9:54 am

After so many time i don’t give a dime to possibility to see this 1024 corese’s chip and i marked parallela as “good effort but failed”.
I am *SO* happy to see i was wrong.

chapeau!
Jonah Probell says:

October 13, 2016 at 4:13 pm

Impressive. Congratulations.
Maximiliam Luppe says:

October 14, 2016 at 6:44 pm

I’m very happy to be part of this achievement as a Kickstarter backer. Congratulations!
Brian Miller says:

October 16, 2016 at 10:51 pm

I am backer #2262. When I received my cluster, I was expecting to be able to buy the 64-core variant later on. And so I have waited for over two years. Now that you’ve announced the 1024-core Epiphany V, I look on with enthusiasm. However, since I can’t buy the 64-core variant, I’m sure I won’t be able to purchase this.

How do I feel? I am on the outside of the candy store looking in. Back in 2007, Intel showed off an 80-core CPU. And I thought to myself, it would be so great to work with that. After all, I had worked with the Celerity, the first minicomputer to use a 32-bit microprocessor (two processor boards, each with two FPUs and an integer coprocessor). And so here comes the Parallella, and I thought to myself, “Yeah! This will be fun!” And … No interconnects. No 64-core, except for the initial Kickstarter. Supercomputer.io came and went.

Yes, I have Pi’s, I have the Nvidia Jetson TK1, and I have other small strange things. I’ve supported Seti@Home and BOINC. Even though over 10,000 boards have shipped (how many now?), it’s still as if there’s a chicken-and-egg problem.
daniel says:

October 17, 2016 at 6:34 pm

Awesome, Congratulations!

Any plans to deliver a Paralella around this new CHIP?
Thomas says:

October 21, 2016 at 5:21 pm

Promises are nice. I hope this will be a stable product this time… I’ve burned my hand with designing with a 64 core version, when it went to EOL before it got into mass production…
Santosh says:

October 23, 2016 at 10:47 pm

Fantastic……can’t wait to get hold of this
Jack says:

October 26, 2016 at 7:14 am

Checking the blog for months waiting for the big news! =)

Awesome but don’t forget to add a decent memorycontroller on your next board.
All competing boards do either have a massive amount of fixed memory or SODIMM slots along side their cpu’s.
It would be awesome to have this amount of computational resources and a space to park a ton of data.
edward says:

October 30, 2016 at 2:57 am

This is a huge step forward for the Parallella project, as the smaller chip didn’t really offer much of a difference relative to intel’s offerings, which are up to around 20 cores. 64 limited power cores vs. 20 full power cores was not a compelling proposition. The thing that this chip offers finally, is a resounding price/performance advantage over intel’s rather expensive “big iron”. For example, in the press release for Intel’s E7-8890v4 chip (which costs over $12k/chip), they talk about 8 sockets boards, that have 196 cores that “start at $200k”. If you imagine 10 of the Parallela 1024 core chips, suddenly you have something that is very competitive. A 10,000 core system, at under $20k, would be a boon to research everywhere. Really at this stage of the evolution of massively parallel machines, the key thing is to get these systems into the hands of graduate schools around the world, so that new programming languages can be worked on. I know Tucker Taft’s project for a parallel language, called Parasail, hasn’t gotten much traction because hardly anyone has 1000 cores to program, so i am hoping the best for this new chip. The nVidia massive core systems are not easy to program, having been retrofitted from 3D graphics chips, and they present so many weird quirks to the programming. The parallella architecture is much cleaner and tremendously simpler. It remains to be seen how many algorithms can live within 64MB, but i suspect that the doubling of the RAM in this 5th gen chip will do the trick for most people. Once you start manipulating images, the memory gets eaten up fast. This whole process is going to take years to play out, but is a great step forward.
bob says:

October 31, 2016 at 7:58 am

I am a bit worried money to developed came from darpa, they are well known to not be a onlus organization, everything they do is because the have some “grey” plans on it: on other side, community was not able to provide all money needed, so the “necessary evil” must be accepted by parallella. All in all, this 1024 cores chip seems to be “the next big thing” in “number crunching for everyone” field and we could count the time BEFORE it and AFTER it as, for “embedded systems” field, we start to do in 2012 with raspberry. Low size, low energy, 16nm, are good, little size of ram is a question mark for many kind of calculation but it is a good begin. Also scaling of efficency is unknown , it depend by many factors (type of algorithm, data do be moved inside mesh network, etc). only “test on field” will answer to this question. What we need to know now is price of chip: if it came for 1000$$ it is not a democraticizing of massive calculation, if it came for 100$$ it is. of course parallella must have back all efforts they placed inside the project as money reward. Hope parallella will do the right middle size between money reward and attention to community when they will set final price. hope with part of money from 1024 core chip’s selling they will have enough revenue to do in autonomy develop of parallella-VI with 16384 cores with 1MRAM dedicated to every single core chip 🙂

we can do some speculation: if parallella-IV 64 core at 28nm run at 500Mhz, 1024 core at 16nm should reach 8-900Mhz
for sure it will need at lest a heatsinnk , in worst scenario also a fan
aside single chip board you will do, are planned some luxury version, i.e. mainboard with 4 socket where can be placed S.O.M. with parallella-V to have a scalable from 1K to 4K (or more) core system?

Hope we will receive some anticipation before spring 2017.
Andy says:

November 2, 2016 at 6:03 am

If it has “extensions” for Deep Learning (not a hardware/chip design person, myself), how will programmers make use of it? I have searched for ways to make use of epiphany with tensorflow or theano, but gotten nowhere. At least for me, developing machine learning models with C++ on bare metal is not an option.
Mike Ross says:

November 7, 2016 at 10:58 pm

I share bob’s concerns over DARPA funding and also daniel’s question about a new board based on this chip. If this anouncement is for real then a board based on this chip would be simply astonishing. The potential for advanced algorithm development would be phenomenal, especially if it could be made low cost and affordable. However, I really doubt that DARPA or NSA would like to see this technology in the public domain. I suspect that any production version being made available to the general public would probably be scaled down in some way. Anyway, I don’t get too excited about these announcements anymore. I still remember when they said 64 core boards would be available but it seems only few got those. The rest of us had to make do with 18. Sorry to be a kill joy folks but all this just sounds too good to be true.
bob says:

November 8, 2016 at 3:54 pm

mike ross, you are right. Too much often i heared “we will do” instead of “we already did”. too much often people promise (im not referring to parallella, just talk in general) “miracles” that fail on real. we must be realistic, lets wait chip arrive. let wait board is ready. let wait bencmark are done. only there we can say “it is a success”.
Neel Gupta says:

December 8, 2016 at 5:02 am

Finally !
So, when can we buy it ?
Neel Gupta says:

December 8, 2016 at 5:14 am

wait… “DARPA funded” ?
What would be the repercussions of that ?
Will we actually be able to buy it ?
Will it have backdoors, like all Microsoft products ?
kamikaze says:

December 9, 2016 at 2:18 am

no real board in stores – no trust
An open source 1024-core Epiphany Simulator | Parallella says:

December 21, 2016 at 7:44 pm

[…] McKee). This work continued in 2016 as we needed a way to validate our design decisions for the 1024-core Epiphany-V. Debugging with the simulator is an order of magnitude easier than with hardware, so you should […]
jeff says:

December 22, 2016 at 3:13 am

wow, great job
James Preisig says:

December 22, 2016 at 3:38 am

Andreas,

I work at another end of the spectrum needing from what this chip seems to be designed for (TFLOPS of processing power). My systems are embedded and I need them to come in at about 2 watts for the core processing functions. In that, I need about 150 to 160 GFLOPS (single precision floating point) of real processing capability. This may be at the low end of the what your new chip can do. If so, I would hope that cores can be disabled to save power when they are not needed. You are calling this a SOC. Will there be any control processor or even a soft core processor on an FPGA that can “run” the system or will this chip need a companion chip that handles its interface to the outside world?

Looking forward to seeing the new chip and its performance and power consumption. What is the size of the new chip?
James says:

December 22, 2016 at 4:26 am

I really like the improvements. I was expecting Adaptiva to go the full 4096 cores and shrunken to 14nm, but that wouldn’t leave much in the way for debugging, optimisation and other improvements due to the substantially increase in cost.

I’d love to see how well this scales up in performance compared to the previous generation and would be happy to test it 🙂
Roger says:

December 22, 2016 at 5:25 am

congrats man. you guys have come a long way and this sounds really impressive.
Ali Azarian says:

December 22, 2016 at 10:34 am

Congratulations! That’s a great news !
Carlos Perez says:

December 22, 2016 at 11:40 am

DARPA funded the internet. They fund a lot of stuff.

As I understand this is 64k per processor. So that’s 64k for both code and data?

It is an interesting architecture and will have to warp our minds to figure out how to use this in embedded applications.

It is not going to work well for training deep learning because of the memory bandwidth bottleneck, However, it should work okay for inference assuming that we can exploit the estimated 1 teraflops capability.
Eugen Leitl says:

December 22, 2016 at 1:01 pm

64 kByte embedded SRAM in each node has not much of a bottleneck. Accessing remote node memory is penalized by latency, commensurable with distance. So this assumes access locality, which is quite often a given for many problems.
Amit says:

December 24, 2016 at 5:44 pm

This might be very well for training deep learning too.. Just need to push the envelope enough 🙂 .. Waiting for the actual product which we can buy and experiment.
Victor says:

December 26, 2016 at 5:04 pm

Congrats! Anything you can release on the new deep learning capabilities will be widely appreciated.
Asterion says:

December 26, 2016 at 11:04 pm

I/O signals not pins
Asterion says:

December 26, 2016 at 11:06 pm

Yes, vapour wear until it turns up on the market.
dast says:

January 18, 2017 at 3:51 am

what would be the price !? could we sold it now !?
dast says:

January 18, 2017 at 4:00 am

where could we buy it now !?
Kie says:

February 6, 2017 at 3:49 am

I want one too!!!
Traroth says:

March 9, 2017 at 10:28 pm

Any news yet?
SeyedRamin says:

March 17, 2017 at 11:20 am

Where could we buy it now !? How much does it cost?
Tom says:

March 21, 2017 at 11:59 am

Are we there yet? I feel like a 5 year old waiting for this!
David says:

April 16, 2017 at 6:31 pm

Any chance for getting a status update? Especially on availability and estimated cost?

It has been 6+ months since the announcement on the tape out. At the very least, when we should check back in for an announcement? Perhaps a mailing list we could add or emails to to make sure we get the announcement when it comes out?
name says:

May 3, 2017 at 1:12 pm

so? from announcement to now quite 7 months are gone!
no update? no info? no nothing?
shall we derubricate this post from “good news” to “vaporware”?
if there is delay it can be acceptable, ok, but al least advise us!
Jeneva says:

May 16, 2017 at 10:01 am

Mann, das geht ja richtig zu Herzen. Schnief. Bin sonst nicht so leicht zu beeindrucken. Spende hiermit Trost und wÃ¼nsche Licisbngsdeutleher im Exil alles Gute! MÃ¶ge er seine neue Tussi bald in die Arme schlieÃŸen kÃ¶nnen.FUNKY
Cash says:

May 16, 2017 at 12:19 pm

Heck of a job there, it abuslotely helps me out.
R.L. Flores says:

May 23, 2017 at 9:25 pm

We are finalizing the resurrection of our S&L banking, financial and investment A.I. engine and application. Our system was developed and implemented using the Texas Instruments’ “Explorer” Lisp engine and accelerated/processed by AMD’s 2900 slice processor family. We are building a new engine that will run our inference engine, pattern recognition engine, etc. We’d like to begin design-in of an Epiphany V cluster of 128/256 and need an availability date. Please advise.
http://goanalyze.info/bloggingtipsandtricks.com says:

May 31, 2017 at 6:02 pm

Hello very cool website!! Guy .. Beautiful .. Amazing .. I’ll bookmark your blog and take the feeds additionallyâ€¦I am satisfied to search out a lot of helpful info right here in the put up, we need work out more techniques on this regard, thanks for sharing. . . . . .
My Blog says:

June 15, 2017 at 8:22 pm

Quality content is the secret to invite the users to pay a visit the web site, that’s what
this website is providing.
Patrick Law says:

June 19, 2017 at 10:07 pm

Cannot wait for this to come out so I can do a review for my reader! This is going to redefine the supercomputer market again.
kawaiibutdead.tumblr.com says:

June 28, 2017 at 4:36 am

Very nice post. I just stumbled upon your weblog and wished to say that I’ve
really enjoyed surfing around your blog posts.
After all I’ll be subscribing to your rss feed and I hope you write again soon!
Baracat says:

June 29, 2017 at 9:25 pm

Any fresh news? =) Regards,
SeyedRamin Rasoulinezhad says:

July 16, 2017 at 5:04 pm

Is there any news? 🙂
neil says:

August 16, 2017 at 6:17 pm

Getting close to a year now, and I’m getting scared.
I was around for the initial kickstarter, however was a senseless teen at the time with no brains to save up for the original board.
I’ve since then (rather recently) come up with a great idea for personal use with a 64 core or higher chip.
Now here’s where my fright enters the room. Usually, I can find things I once wanted to purchase within a few hours, however after days of looking I’ve only since been able to find the 16 core chips for sale, and at an inflated price of ~$150.
I did find your page explaining the price jump, and I understand. But after reading this, I was excited to see a 1000+ core chip, yet a year later there is no news. My hopes of getting ahold of anything above 16 cores are drying out.
This was the first piece of hardware I was ever excited about it’s release back in 2012. Starting to look like I’ll be once again, moving on.
Ladislav Jech says:

October 7, 2017 at 5:10 pm

But what is the performance is really another story here… I wasn’t able to find any realistic comparison..
Ladislav Jech says:

October 9, 2017 at 2:47 pm

This is now lonely project, I hope you replace the leader and get into 1024 clusters production.
xman says:

December 30, 2017 at 10:26 pm

Hello

I want to purchase few new devices. Where to get them?
J.G. says:

January 30, 2018 at 6:12 pm

It’s dead, jim!
Send Help says:

March 3, 2018 at 1:59 am

Damnit! I want this! No updates?!?!?
DylanChief says:

April 6, 2018 at 5:13 am

I have checked your site and i have found some duplicate content, that’s why
you don’t rank high in google, but there is a tool that can help you to create 100% unique articles,
search for; boorfe’s tips unlimited content
PiotrLenarczyk says:

May 8, 2018 at 3:51 pm

If you do not make C-newbie-level examples, without wider set of boards (kids spent thousands $ of parents money on GPU’s, so luckily they spent money a 500$ for innovative board as well), you still be at educational / kickstarter stage. Do not go RPi way, as long as they truly make money on “shields” – not the RPi’s itself. Your processors should be reliable on motherboard in a minimal way (this is one – from many reasons – why AMD is not evolving it’s CPU’s), and prototype – friendly. The lack of next – generation boards is discouraging. Also processor comparisons should be more readable, as long as none proc could be the greatest at everything. People are interested in:
– no of cores,
– language support (users prefer Java; vendors prefer C/C-based C++) – I would choose C-based C++11 (templates, some other useful stuff),
– size of memory(less important), and its throughput(very imortant),
– COOL LOOKING COOLING SYSTEM AT BOARDS – at least six heat-pipes, and biggest fan – as one can imagine. RGB LED’s are standard (it is not joke – hardware must be “sexy”),
– introduce SATAIII interface, or make some trivial battery-backed DDR4-SSD hybrid (via super-capacitor for power off handling) for proper OS deploying. Lubuntu runs smoothly on 4GB HDD. Also transfer of data, and instruction within in-RAM-DMA manner should make some basic enhancement,
– no one take care if where is single CPU Epiphany V, or 32 Epiphany V CPU’s on a single die – numbers above makes impression, despite they are pointless (Intel AMD64 uArch has peak memory throughput of few GBps, but they make commercials with some 54GBps-and above… ;D). Do not go such way – publicate real numbers,
-Intel sell’s already fabricated CPU’s as non-cut silicon dies, some cheapest Celeron could be good enough for proper work deployment to N-dimensional mesh of your coprocessors. There are a lot of already checked such models within GPGPU coprocessors,
-PCIe coprocessors are easier to maintain (and its libs), than separate boards.
Post Scriptum: I am only an amateur – do not rely on my opinion, as far as I can unintentionally mislead.
Post Post Scriptum: With all respect to founder of Adapteva, but any single-man-based business is not optimal.
NGGRPhTN says:

June 19, 2018 at 1:02 am

So, when are we going to get them for our cell phones?
Steven Douglas Gould says:

July 11, 2018 at 1:37 am

I would like one also when do these go on sale?
Como trabalhar em Portugal sendo brasileiro says:

August 14, 2018 at 1:38 am

Congratulations on the article, I would like to share it with my facebook friends.
LibidMax Mega Turbo Funciona says:

August 14, 2018 at 1:39 am

Very good content, I enjoyed learning
Singapore Pools says:

September 3, 2018 at 5:48 pm

Nice weblog right here! Additionally your web site rather a lot up very fast!
What web host are you the use of? Can I get your
affiliate link to your host? I want my site loaded up as fast as
yours lol
Carlos Eduardo says:

October 23, 2018 at 4:06 am

Hi, I like your article, this kind of article helps me a lot.
Arturo says:

April 7, 2019 at 6:14 pm

That is really interesting, You’re a very professional blogger.

I have joined your feed and stay up for searching for more
of your magnificent post. Also, I’ve shared your site in my social networks
Samantha says:

April 23, 2019 at 6:08 pm

It’s truly very difficult in this active life to listen news on TV, so I just use
web for that reason, and get the newest news.
Sam says:

June 5, 2019 at 3:52 pm

where to buy this ?
tembak ikan online android ios says:

June 22, 2019 at 4:47 pm

At tһis time I am ցoing away tߋ do my ƅreakfast, aftеr havіng my breakfast coming ovеr again to read furthеr news.
Lex says:

August 27, 2019 at 3:52 am

If the administration of this site are reading this, it would be wise to renew or to acquire a new certificate for your website, so people aren’t turned away from your site.
Allison Miller says:

March 26, 2020 at 12:34 pm

It is an open source RISC based ISA along with open source implementations of example processor cores. Then you could have had a processor that was completely open and did not include any proprietary code. The chip is about the same size as the Apple A10, so in terms of silicon area it s in the consumer domain, but price will only come down to consumer levels if shipments get into millions of units. Big companies take a leap of faith and build a product hoping that the market will get there. Small companies get one shot at that. With University volumes and shuttles, we are talking 1 costs. So the $300 GPU PICe type boards become $10K-$30K with NRE and small scale productio folded in.
Marie Maurer says:

April 22, 2020 at 6:49 am

Any update on this great chip?
Fatima says:

May 17, 2020 at 2:58 pm

What’s Happening i’m new to this, I stumbled upon this I have discovered It positively useful and it has aided me out loads.
I’m hoping to contribute & help different customers like its aided me.

Good job.
Dennis says:

May 28, 2020 at 6:07 pm

What’s up everyone, it’s my first pay a quick visit at this
web site, and piece of writing is truly fruitful designed for
me, keep up posting such articles.
Minnie says:

June 3, 2020 at 5:12 am

І have read so many content regarding the blogger loѵeres however thіs posst is reаlⅼy
a gopd article, keeep it up.
https://www.realty-cleveland.fr/ says:

June 21, 2020 at 12:49 pm

Good site you have here.. It’s difficult to find high-quality writing like yours nowadays.
I honestly appreciate people like you! Take care!!
Nureddin says:

July 29, 2020 at 7:34 pm

Eğer doğruysa 1000 adet satın almak istiyorum
پرطرفدارترین روش جراحی لاغری says:

March 12, 2021 at 8:14 pm

It was excellent and good, completely useful. Thank you for this information instead of you. May your success increase, my good friend
Michael DeByl says:

March 21, 2021 at 12:29 pm

Virtually useless without DMA shared memory access.