The Adapteva team never ceases to amaze me vs what most are cranking out. This is pretty awesome. More awesome is this part of the PDF:
“Due to the scale of the challenges faced by the Epiphany-V related to process migration, architecture co-
development, RTL rewrite, and EDA flow rampup, the project was in a constant state of flux, causing stall
cycles on a daily basis. The project was kicked off September 9th, 2015 with a design team consisting of
Andreas Olofsson, Ola Jeppsson, and two part time contractors. From January 2016 through tapeout in the
summer of 2016, design stall cycles forced Andreas Olofsson to complete the project alone to stay within the
fixed-cost DARPA budget. The tapeout of a 1024-core 16nm processor in less than one year with a skeleton
team demonstrate it’s possible to design advanced ASICs at 1/100th the cost of the status quo.”
Andreas Olofsson did it alone. A bad motherfucker in ASIC design. Feel free to contract next cutting-edge development entirely to him. Haha.
The problem isn’t taping out the ASIC: that is in fact doable with a skeleton team. The number of RTL devs even on major CPU/GPU projects is often quite small. One person doing a 1000-core CPU is still impressive, of course (though keep in mind if you do the math, he’s doing 15 hours a day, with no weekends, for 9 months…)
The thing that takes all your resources is your verification team, which often is 10 times larger than your actual dev team. Your tapeout costs 100 million dollars each time you do it, and you better not ship a broken chip. Nowadays, you often even have a verification verification team, responsible for verifying the verification.
I would be very, very nervous about a company that talks about finishing a chip with very few people; even if they have the most brilliant Verilog programmers in the world, without DV to accompany it, the chip they tape out will be littered with bugs.
Generally true. I’ve seen quite a few projects like Sandia’s SSP using a combo of conservative design with asynchronous components to get first-pass silicon with few to no defects. So, I know it can be done. Only question is if their flow has similar properties or they just skip key work as you suggested. Default rule is to be suspicious.
The tapeout of a 1024-core 16nm processor in less than one year with a skeleton team
The power of automated tools!
The Epiphany-V was designed using a completely automated flow to translate Verilog RTL source code to a
tapeout ready GDS, demonstrating the feasibility of a 16nm “silicon compiler”. The amount of open source
code in the chip implementation flow should be close to 100% but we were forbidden by our EDA vendor
to release the code. All non-proprietary RTL code was developed and released continuously throughout the
project as part of the “OH!” open source hardware library. The Epiphany-V likely represents the first
example of a commercial project using a transparent development model pre-tapeout.
Thought you might like this reply he finally got around to on how he managed the verification:
“Modern SOCs might have 100 complex blocks. We had 3 simple RTL blocks (9 hard macros). Top level communication approach was "correct by construction”. Nothing is for free."
Also mentioned in another comment to not expect all the cores to work due to difficulties of small project on 16nm. That node is still experimental versus other ones. Some of the cores might glitch out.
I’ll add that they got from start to two, product iterations spending only around $3 million. Here’s some things they wrote about that:
I know they use 3rd party I.P. where possible, shuttle runs (multi-project wafers), regular structures that connect in regular ways, and customers in HPC that pay enough to justify development costs. The recent work had a DARPA contract of sorts which might have covered some nice, verification tooling and extra I.P.. Alas, he told me on Hacker News he isn’t going to write up how that process works since he’s focused on product development. He’ll open-source the code if EDA vendor lets him to let us see it ourselves but won’t take extra time to do synthetic examples to avoid pissing them off.
Sucks but I get it.
Wow, I thought this project was vaporware! Here’s the PDF with more information on the ISA.
There’s also an experimental port of Erlang to these chips that I’d love to try out.