The site did run fine for me without java script, after I deleted the semi transparent div that told me that the site needs java script. ( Apparently to automatically hide the div.)
Seriously, I sometimes wonder if there would be a market for a lightweight text display app, the server would then serve some kind of markdown text, perhaps with a few hints like headlines and the client would render the text based on the local screen setting and the hints of the text structure in the markdown. That would be great for short articles.
That sounds pretty useful. Arguably, that's what text-based browsers such as lynx and links already do but in another way. Those are still what I use when I want to read some article without distractions, ads, and funny layout and images.
It's taken quite awhile but I'm looking forward to getting the board. I hope there are decently mature go binding because that would be incredible to have a goroutine coprocessor.
If not, I've been wanting to take a stab at opencl.
Both MPI and OpenMP are fairly mature and easy to use with C or C++ (I haven't used them with FORTRAN, but I believe you can, with a few slight alterations).
OpenCL / CUDA is more mature and ready... and a far more understood architecture. I bet that any GPU will crush this thing in pure compute power.
Where Parallella's advantage comes in, is its grid architecture. Whereas a typical GPU today is "very very wide" (IE: an AMD 7750 does 512 operations at the same time), the Epiphany is a "grid", and each node of the grid can be doing different things.
I'm going to greatly simplify things down for you with this example (so experts out there... don't shoot me :-p). A GPU can execute 1 program, but that one program can do 512 operations at once.
On the other hand, the Epiphany-IV truly can run 64-different programs at once.... all taking different paths and doing their own things.
Epiphany-III and Epiphany IV perform "if statements" more or less how you'd expect a computer would do it. Which is the important bit... if-statements don't really slow down your program.
In contrast, the typical wide "wavefront GPU architecture"... the "if statement" basically halves the speed of your GPU. The GPU has to execute one "half" of the branch, and then later, it comes back to execute the other "half". (Its only "really" executing one program, but doing 512 of them at once. See what I mean?)
GPUs are a "wide" architecture... but Epiphany is the first "grid-based" supercomputer for ~$99. GPUs are very mature technologies... Epiphany is still in initial research (as far as the consumer market is concerned)
EDIT: In reality, a modern GPU supports maybe 4 to 8 different wavefronts (8 for the more expensive GPUs, maybe only 4 for the cheaper ~$99 GPUs). But each of those wavefronts can do hundreds of computations at once.
>I'm going to greatly simplify things down for you with this example (so experts out there... don't shoot me :-p). A GPU can execute 1 program, but that one program can do 512 operations at once.
CUDA & OpenCL programs can essentially do the same thing, since there's no penalty for thread divergence between thread blocks. So, as long as you have enough lock-step behavior in your smaller work items, the larger groupings can do entirely different operations. Or, maybe said differently, you can roughly think of a GPU as a collection of SIMD units (in NVIDIA-jargon, these are SMs), all of which are allowed to run whatever vector code they want.
In reality, the top end AMD 7970 has ~32 compute units, while ~$130 AMD 7770 has ~10 compute units. Each compute unit can has 4 SIMD units, each SIMD unit executes 16 single-point floating operations per clock, from a pool of 16 "hyperthread-like" work-items that are queued up. (Total of 64 FLOPs per clock)
But... that is a bit difficult to say, now isn't it :-p
In comparison, Parallela's $99 unit has 16 discrete compute units, and their (unmade) higher-end offering will have 64 compute units. Both of which will operate with 2W of power.
Its a difference of scale. GPUs focus more on SIMD instructions, because graphics are innately matrix-based operations that very easily translate into SIMD. But if you are going to be doing unique operations per core (ie: lots and lots of branching on a single work-item), then Paralella's approach would give better performance.
So... yes... I over-simplified earlier, and got a few numbers wrong. But the concept is the same. Epiphany has no SIMD units at all, and each core instead focuses entirely on single-threaded performance. (at best, 2 Floating Point operations per clock with the super-scalar architecture). This makes it slower in the strictest sense... but "more agile", and able to handle branches and conditionals better than the GPUs.
Also, remember that Epiphany works with only 2W of power... while a GPU like the AMD 7970 works with something like 300W of power. So of course, the 7970 is going to do more (especially with faster GDDR5 RAM and a PCIe x16 connection feeding it). So it isn't really a fair comparison there either.
Nonetheless, Epiphany does see a few situations where it might perform better than a classic CPU and better than a classic GPU. Its just a different software architecture, focusing on a different problem niche.
Nonetheless, Epiphany does see a few situations where it might perform better than a classic CPU and better than a classic GPU. Its just a different software architecture, focusing on a different problem niche.
By my understanding any situation where you need the computing power with the requirement for very low energy input and waste energy output.
I can't think of any specific examples (perhaps an embedded system that needs to perform a pile of cryptographic operations?) but its ability to do a significant amount of processing on the 2W power footprint it is in a different class to current desktop CPUs and GPUs. Even if this $99 unit only does 1/8 of what a $99 GPU can do (caveat: I pulled that "1/8" figure from my arse) it'll be doing it on 1/60 of the power (based on GPU reviews where people have tried to compare power draw with GPU idle to power draw with GPU at 100% but everything else as idle as possible, which indicate modern GPUs pull between 120 and 150W (http://www.guru3d.com/articles_pages/radeon_hd_6850_6870_rev... is the first such analysis Google found)). That potential computation-per-watt (or computation-per-energy-$ if the cost is more important than the energy supply+dissipation problem) of units like this could be very useful. Of course where power input and heat dissipation are not massive concerns, current GPUs still win on computation-per-device and computation-per-hardware-$.
I think the interesting comparison will be between something like Tegra 5/Exynos 5 and this. Most scientific computing these days is CUDA, and getting something like cuBLAS onto this chip will be a struggle. The Exynos/Android combo has already lowered it's ability to interest me by not even accepting OpenCL as a first class citizen.
EDIT: Hopefully NVidia will give me CUDAndroid, so I have fun.
just guessing, maybe all the cases where you need parallel computing AND low power: computer vision, robotics (drones, self driving cars etc..), automation and so on?
Using proper terms, GPUs are SIMT - Single Instruction Multiple Threads, while latest Epiphany is MIMD - Multiple Instructions, Multiple Data streams.
Off-course both probably can execute regular SIMD instructions sequentially.
AMD GPUs can, likewise XeonPhi, NVIDIA GPUs are scalar, not sure about Epiphany.
Epiphany doesn't have any SIMD as far as I can tell. Its honestly 16 (or 64) different CPUs working in a grid.
Here's the instruction set for each individual CPU. Each one is small, and weak... but the combination of CPUs makes the architecture very different (and interesting) when compared against the typical SIMT / SIMD GPUs we have today.
http://www.adapteva.com/wp-content/uploads/2012/10/epiphany_...
Its really just 64 (tiny) cores hooked up together. Of course, it can be SIMD if you wanted... (you just hook up all cores to run the same program). But in doing so, it will lose out in comparison to dedicated SIMD engines, like a GPU.
I would hazard a guess that the epiphany cores are more heavyweight than the GA cores. The GA cores are extremely lightweight, to the point where it is expected that even the simplest tasks will involve several of them cooperating. On the other hand the GA chips are optimized for low power and have significantly more tightly integrated IO than the epiphany.
The GA cores are meant to be linked together. According to the docs and videos, you use the special IDE to define blocks or grids of GA cores and have them working in tandem.
The GA's biggest weakness is probably the lack of software and libraries. On the other hand, you could essentially create software-configurable DSPs, or program the entire array to act as a single, traditional CPU core.
Granted, you would have had to use this strange dialect of Forth called Color Forth. Getting a kit together that you can hack on requires more than $99. On the other hand, that array runs clockless and can mimic analog signals.
I think Parallela is easily the most accessible super computer, suitable for hobbyists. I expect a lot of interesting things to come out of it, the way Arduino opened up a lot of device hacking for people. I had passed on kickstarter on it and it wasn't until I got interested in deep belief networks that I kicked myself for not jumping in.
The Parallella projects sounds interesting, but the parityportal is shit.
The site shows a big banner "Please enable javascript to view this site." that is overlaying the content, but disabling CSS also helps - an other example of js-failure.
I'm just posting to say that I peeled off that retardo javascript-required page blocker with the "element hiding helper" add-on to the ad-block plus add-on. I dunno what I was missing without javascript but the site seemed to work pretty well without it.
> The site shows a big banner "Please enable javascript to view this site." that is overlaying the content, but disabling CSS also helps - an other example of js-failure.
Simple workaround (in most browsers). Right click on the offending div -> Inspect Element -> Press the delete button.
to correct for my earlier mistake... the board consists of an arm processor (like a phone) and an array of smaller processors. linux and python will run on the arm, but not on the smaller processors.
multiprocessing works by running a separate copy of python on each core and then managing data transfer. so it won't help you here, because you would need python to be running on each of the smaller processors in the array.
instead you need to target the array processors in a dedicated language. people are mentioning opencl (which is c-like, but has a very strong emphasis on all processors doing the same task); the wikipedia page describes a gcc-based compiler.
at a stretch, perhaps you could use the gcc-based c compiler to compile python and get multiprocessing working that way. but i imagine that it would be a lot of work and an inefficient way to use the system (the small cores are not very powerful, so you need to keep overhead down, so python is a bad idea).
if someone can get erlang working across the array then that might be your best bet. erlang is a little bit like python and multiprocessing (not terribly similar, but close enough for many things to make sense).
There's a number of Linux distributions for ARM, as the Raspberry Pi project has several. Python, Ruby, Perl, NodeJS and a number of other tools are fully supported.
The multi-processing side, though, is probably not a standard CPU meaning you can't just throw Python code at it. You'll have to use CUDA or OpenCL techniques.
If you want a small ARM system, the Pi is a good place to start but the Beagle Board (http://beagleboard.org/) is a much better deal.
The article is a bit confusing, since, as far as I can tell, the "first model to be shipped" only has 16 (+2 ARM) cores. The 64 (+2) core board is not shipping/reservable yet.
Congrats to the people at Parallella though! I've been excitedly checking their site/twitter about every other day.
I'll believe it when I see it. These guys have been pushing off and pushing off, I've all but written off my "contribution".
The fact that they're preselling the 16 core boards before we have even received ours, and that they come with storage( the contributors have to supply our own sd cards ) at the same price point, I'm left with an unpleasant taste.
Convolving images on the fly would be one, the Zynq architecture allows for some pretty high bandwidth throughput. Xilinx keeps pushing it as a solution to 'smart' cameras (things that know what they are looking at by doing analysis on the background)
Technically you could just run a standard LAMP stack since it's running an ARM-compatible version of Ubuntu. It just won't take advantage of the multiple processors.
it will apparently ship with erlang or be erlang-capable, so it's entirely possible that you could put the high performance cowboy web framework on the device.
According to this[1] Erlang has been run on machines with similar number of cores. This[2] looks like the most interesting work done with Erlang on Parellella so far. I haven't look at Parallella for a long time, but IIRC the hardware architecture very much suites the process model in Erlang.