$99 Parallella supercomputing boards start shipping

_delirium · on July 23, 2013

Discussion from ~3 months ago, prior to shipping, has some interesting comments: https://hackernews.hn/item?id=5557985

yk · on July 24, 2013

The site did run fine for me without java script, after I deleted the semi transparent div that told me that the site needs java script. ( Apparently to automatically hide the div.)

Seriously, I sometimes wonder if there would be a market for a lightweight text display app, the server would then serve some kind of markdown text, perhaps with a few hints like headlines and the client would render the text based on the local screen setting and the hints of the text structure in the markdown. That would be great for short articles.

wladimir · on July 24, 2013

That sounds pretty useful. Arguably, that's what text-based browsers such as lynx and links already do but in another way. Those are still what I use when I want to read some article without distractions, ads, and funny layout and images.

pampa · on July 24, 2013

Sounds like gopher http://tools.ietf.org/html/rfc1436

rciorba · on July 24, 2013

If only he had some sort of text enhanced to allow references to make for easy reading. I'd call it hypertext.

_pmf_ · on July 24, 2013

Gopher.

Everlag · on July 23, 2013

It's taken quite awhile but I'm looking forward to getting the board. I hope there are decently mature go binding because that would be incredible to have a goroutine coprocessor.

If not, I've been wanting to take a stab at opencl.

gamegoblin · on July 23, 2013

Both MPI and OpenMP are fairly mature and easy to use with C or C++ (I haven't used them with FORTRAN, but I believe you can, with a few slight alterations).

oomkiller · on July 23, 2013

Anyone know how this thing compares to a cheap $~150 video card that supports CUDA/OpenCL?

dragontamer · on July 23, 2013

OpenCL / CUDA is more mature and ready... and a far more understood architecture. I bet that any GPU will crush this thing in pure compute power.

Where Parallella's advantage comes in, is its grid architecture. Whereas a typical GPU today is "very very wide" (IE: an AMD 7750 does 512 operations at the same time), the Epiphany is a "grid", and each node of the grid can be doing different things.

I'm going to greatly simplify things down for you with this example (so experts out there... don't shoot me :-p). A GPU can execute 1 program, but that one program can do 512 operations at once.

On the other hand, the Epiphany-IV truly can run 64-different programs at once.... all taking different paths and doing their own things.

Epiphany-III and Epiphany IV perform "if statements" more or less how you'd expect a computer would do it. Which is the important bit... if-statements don't really slow down your program.

In contrast, the typical wide "wavefront GPU architecture"... the "if statement" basically halves the speed of your GPU. The GPU has to execute one "half" of the branch, and then later, it comes back to execute the other "half". (Its only "really" executing one program, but doing 512 of them at once. See what I mean?)

GPUs are a "wide" architecture... but Epiphany is the first "grid-based" supercomputer for ~$99. GPUs are very mature technologies... Epiphany is still in initial research (as far as the consumer market is concerned)

EDIT: In reality, a modern GPU supports maybe 4 to 8 different wavefronts (8 for the more expensive GPUs, maybe only 4 for the cheaper ~$99 GPUs). But each of those wavefronts can do hundreds of computations at once.

iskander · on July 23, 2013

>I'm going to greatly simplify things down for you with this example (so experts out there... don't shoot me :-p). A GPU can execute 1 program, but that one program can do 512 operations at once.

CUDA & OpenCL programs can essentially do the same thing, since there's no penalty for thread divergence between thread blocks. So, as long as you have enough lock-step behavior in your smaller work items, the larger groupings can do entirely different operations. Or, maybe said differently, you can roughly think of a GPU as a collection of SIMD units (in NVIDIA-jargon, these are SMs), all of which are allowed to run whatever vector code they want.

dragontamer · on July 24, 2013

Yes, my numbers were off due to poor research.

In reality, the top end AMD 7970 has ~32 compute units, while ~$130 AMD 7770 has ~10 compute units. Each compute unit can has 4 SIMD units, each SIMD unit executes 16 single-point floating operations per clock, from a pool of 16 "hyperthread-like" work-items that are queued up. (Total of 64 FLOPs per clock)

But... that is a bit difficult to say, now isn't it :-p

In comparison, Parallela's $99 unit has 16 discrete compute units, and their (unmade) higher-end offering will have 64 compute units. Both of which will operate with 2W of power.

Its a difference of scale. GPUs focus more on SIMD instructions, because graphics are innately matrix-based operations that very easily translate into SIMD. But if you are going to be doing unique operations per core (ie: lots and lots of branching on a single work-item), then Paralella's approach would give better performance.

So... yes... I over-simplified earlier, and got a few numbers wrong. But the concept is the same. Epiphany has no SIMD units at all, and each core instead focuses entirely on single-threaded performance. (at best, 2 Floating Point operations per clock with the super-scalar architecture). This makes it slower in the strictest sense... but "more agile", and able to handle branches and conditionals better than the GPUs.

Also, remember that Epiphany works with only 2W of power... while a GPU like the AMD 7970 works with something like 300W of power. So of course, the 7970 is going to do more (especially with faster GDDR5 RAM and a PCIe x16 connection feeding it). So it isn't really a fair comparison there either.

Nonetheless, Epiphany does see a few situations where it might perform better than a classic CPU and better than a classic GPU. Its just a different software architecture, focusing on a different problem niche.

c0g · on July 24, 2013

    Nonetheless, Epiphany does see a few situations where it     might perform better than a classic CPU and better than a classic GPU. Its just a different software architecture, focusing on a different problem niche.

Do you have any examples?

dspillett · on July 24, 2013

By my understanding any situation where you need the computing power with the requirement for very low energy input and waste energy output.

I can't think of any specific examples (perhaps an embedded system that needs to perform a pile of cryptographic operations?) but its ability to do a significant amount of processing on the 2W power footprint it is in a different class to current desktop CPUs and GPUs. Even if this $99 unit only does 1/8 of what a $99 GPU can do (caveat: I pulled that "1/8" figure from my arse) it'll be doing it on 1/60 of the power (based on GPU reviews where people have tried to compare power draw with GPU idle to power draw with GPU at 100% but everything else as idle as possible, which indicate modern GPUs pull between 120 and 150W (http://www.guru3d.com/articles_pages/radeon_hd_6850_6870_rev... is the first such analysis Google found)). That potential computation-per-watt (or computation-per-energy-$ if the cost is more important than the energy supply+dissipation problem) of units like this could be very useful. Of course where power input and heat dissipation are not massive concerns, current GPUs still win on computation-per-device and computation-per-hardware-$.

c0g · on July 24, 2013

I think the interesting comparison will be between something like Tegra 5/Exynos 5 and this. Most scientific computing these days is CUDA, and getting something like cuBLAS onto this chip will be a struggle. The Exynos/Android combo has already lowered it's ability to interest me by not even accepting OpenCL as a first class citizen.

EDIT: Hopefully NVidia will give me CUDAndroid, so I have fun.

Ecio78 · on July 24, 2013

just guessing, maybe all the cases where you need parallel computing AND low power: computer vision, robotics (drones, self driving cars etc..), automation and so on?

nivertech · on July 23, 2013

Excellent explanation.

Using proper terms, GPUs are SIMT - Single Instruction Multiple Threads, while latest Epiphany is MIMD - Multiple Instructions, Multiple Data streams.

Off-course both probably can execute regular SIMD instructions sequentially. AMD GPUs can, likewise XeonPhi, NVIDIA GPUs are scalar, not sure about Epiphany.

dragontamer · on July 23, 2013

Epiphany doesn't have any SIMD as far as I can tell. Its honestly 16 (or 64) different CPUs working in a grid.

Here's the instruction set for each individual CPU. Each one is small, and weak... but the combination of CPUs makes the architecture very different (and interesting) when compared against the typical SIMT / SIMD GPUs we have today. http://www.adapteva.com/wp-content/uploads/2012/10/epiphany_...

Its really just 64 (tiny) cores hooked up together. Of course, it can be SIMD if you wanted... (you just hook up all cores to run the same program). But in doing so, it will lose out in comparison to dedicated SIMD engines, like a GPU.

hosh · on July 23, 2013

Yeah, it's too bad that the Green Array technology (http://www.greenarraychips.com/) never put out a similar product for $99.

aidenn0 · on July 23, 2013

I would hazard a guess that the epiphany cores are more heavyweight than the GA cores. The GA cores are extremely lightweight, to the point where it is expected that even the simplest tasks will involve several of them cooperating. On the other hand the GA chips are optimized for low power and have significantly more tightly integrated IO than the epiphany.

hosh · on July 24, 2013

The GA cores are meant to be linked together. According to the docs and videos, you use the special IDE to define blocks or grids of GA cores and have them working in tandem.

The GA's biggest weakness is probably the lack of software and libraries. On the other hand, you could essentially create software-configurable DSPs, or program the entire array to act as a single, traditional CPU core.

dragontamer · on July 23, 2013

Point taken :-)

There always are other chips that I'm not aware of...

hosh · on July 23, 2013

Granted, you would have had to use this strange dialect of Forth called Color Forth. Getting a kit together that you can hack on requires more than $99. On the other hand, that array runs clockless and can mimic analog signals.

I think Parallela is easily the most accessible super computer, suitable for hobbyists. I expect a lot of interesting things to come out of it, the way Arduino opened up a lot of device hacking for people. I had passed on kickstarter on it and it wasn't until I got interested in deep belief networks that I kicked myself for not jumping in.

jevinskie · on July 23, 2013

It should get trounced. I think Adapteva is hedging their bets on core counts much higher than 16 or 64.

hosh · on July 23, 2013

The main advantage is having a small power profile. We should be able to use it for things like running OpenCV on a quadcopter.

Example: http://www.kickstarter.com/projects/adapteva/parallella-a-su...

Florin_Andrei · on July 24, 2013

Well, I was just daydreaming yesterday about usage scenarios that may require an FPGA on a drone. :)

kephra · on July 23, 2013

The Parallella projects sounds interesting, but the parityportal is shit.

The site shows a big banner "Please enable javascript to view this site." that is overlaying the content, but disabling CSS also helps - an other example of js-failure.

http://en.wikipedia.org/wiki/Adapteva#Parallella_project would be a better canidate for HN. Its readable, and also shows some critical points.

Amadou · on July 24, 2013

I agree the wiki page is more informative.

I'm just posting to say that I peeled off that retardo javascript-required page blocker with the "element hiding helper" add-on to the ad-block plus add-on. I dunno what I was missing without javascript but the site seemed to work pretty well without it.

https://adblockplus.org/en/elemhidehelper

hrjet · on July 25, 2013

> The site shows a big banner "Please enable javascript to view this site." that is overlaying the content, but disabling CSS also helps - an other example of js-failure.

Simple workaround (in most browsers). Right click on the offending div -> Inspect Element -> Press the delete button.

iandanforth · on July 23, 2013

Basic questions:

1. Can I run python on linux on this

2. Would the multiprocessing module work like it does on x86 4 core chips?

andrewcooke · on July 23, 2013

to correct for my earlier mistake... the board consists of an arm processor (like a phone) and an array of smaller processors. linux and python will run on the arm, but not on the smaller processors.

multiprocessing works by running a separate copy of python on each core and then managing data transfer. so it won't help you here, because you would need python to be running on each of the smaller processors in the array.

instead you need to target the array processors in a dedicated language. people are mentioning opencl (which is c-like, but has a very strong emphasis on all processors doing the same task); the wikipedia page describes a gcc-based compiler.

at a stretch, perhaps you could use the gcc-based c compiler to compile python and get multiprocessing working that way. but i imagine that it would be a lot of work and an inefficient way to use the system (the small cores are not very powerful, so you need to keep overhead down, so python is a bad idea).

if someone can get erlang working across the array then that might be your best bet. erlang is a little bit like python and multiprocessing (not terribly similar, but close enough for many things to make sense).

iandanforth · on July 24, 2013

Thanks, appreciated!

astrodust · on July 23, 2013

There's a number of Linux distributions for ARM, as the Raspberry Pi project has several. Python, Ruby, Perl, NodeJS and a number of other tools are fully supported.

The multi-processing side, though, is probably not a standard CPU meaning you can't just throw Python code at it. You'll have to use CUDA or OpenCL techniques.

If you want a small ARM system, the Pi is a good place to start but the Beagle Board (http://beagleboard.org/) is a much better deal.

WestCoastJustin · on July 23, 2013

Looks like you can run Linux Ubuntu on it [1].

[1] http://www.parallella.org/2013/05/11/the-parallella-board-no...

dragontamer · on July 23, 2013

1. It runs Ubuntu

2. No. Its very very different.

makmanalp · on July 23, 2013

Anyone know how these would be used? I.e. is each processor exposed in the OS level, or is it more API-like as in CUDA / OpenCL?

hosh · on July 23, 2013

According to the docs, it would be more like CUDA / OpenCL. I think there are some OpenCL bindings for it.

gamegoblin · on July 23, 2013

I hope you can just sudo apt-get install openmpi... If that is the case, I will consider getting one.

zoba · on July 24, 2013

The article is a bit confusing, since, as far as I can tell, the "first model to be shipped" only has 16 (+2 ARM) cores. The 64 (+2) core board is not shipping/reservable yet.

Congrats to the people at Parallella though! I've been excitedly checking their site/twitter about every other day.

nine_k · on July 24, 2013

$99 is the price of exactly the 16-core node they started to ship.

64-core node is going to cost you quite a bit more (the pledge for it on Kickstrater was $199).

throwit1979 · on July 24, 2013

I'll believe it when I see it. These guys have been pushing off and pushing off, I've all but written off my "contribution".

The fact that they're preselling the 16 core boards before we have even received ours, and that they come with storage( the contributors have to supply our own sd cards ) at the same price point, I'm left with an unpleasant taste.

graue · on July 24, 2013

Where do you see that the pre-orders come with SD cards? The order page at

http://shop.adapteva.com/collections/parallella/products/par...

says:

"Unless otherwise specified, the Parallella-16 board ships bare without a 5V power supply or SD card. (These must be purchased separately.)"

kristianp · on July 24, 2013

The article does say they will ship to kickstarter supporters first.

gtt · on July 24, 2013

Is there a way to use ability to branch to speed up raytracing? Can I create with Parallella raytracer in some way more efficient than gpu based ones?

BhavdeepSethi · on July 23, 2013

Just when I went ahead and bought a BeagleBone Black. Would like to see how these boards fare against BBB/RPi.

manoDev · on July 24, 2013

What are the interesting applications for these boards, considering the slow interfaces (USB and Ethernet)?

ChuckMcM · on July 24, 2013

Convolving images on the fly would be one, the Zynq architecture allows for some pretty high bandwidth throughput. Xilinx keeps pushing it as a solution to 'smart' cameras (things that know what they are looking at by doing analysis on the background)

synchronise · on July 23, 2013

Would any currently released web server software be able to take advantage of this board?

ddedden · on July 24, 2013

Technically you could just run a standard LAMP stack since it's running an ARM-compatible version of Ubuntu. It just won't take advantage of the multiple processors.

synchronise · on July 24, 2013

And web server that would take advantage of the multiple cores?

felixgallo · on July 24, 2013

it will apparently ship with erlang or be erlang-capable, so it's entirely possible that you could put the high performance cowboy web framework on the device.

shank8 · on July 23, 2013

This makes me wish I didn't already have 3 RPis

andyl · on July 23, 2013

Yeah - look forward to getting mine.

Does anyone know if Erlang / Elixir will run on Parallella and take advantage of all the cores?

jzelinskie · on July 23, 2013

According to this[1] Erlang has been run on machines with similar number of cores. This[2] looks like the most interesting work done with Erlang on Parellella so far. I haven't look at Parallella for a long time, but IIRC the hardware architecture very much suites the process model in Erlang.

[1] http://kth.diva-portal.org/smash/get/diva2:392243/FULLTEXT01

[2] http://www.parallella.org/2013/05/25/explorations-in-erlang-...

plainOldText · on July 23, 2013

It might as well; here's a link: http://www.parallella.org/2013/05/25/explorations-in-erlang-...

rorrr2 · on July 23, 2013

I want to see GoLang running on that thing.

Ecio78 · on July 24, 2013

http://forums.parallella.org/viewtopic.php?f=27&t=138 and https://groups.google.com/forum/?fromgroups=#!searchin/golan...