HN2new | past | comments | ask | show | jobs | submitlogin

"144 computers"=="144 core parallel cpu" ? I'm guessing this is not x86 and targeted at academic researchers looking to build custom, massively parallel, computational clusters on the cheap [ computational neuroscience? ]. If anyone could volunteer additional context or applications for this please do so, I'm not as familiar with hardware as I'd like to be.


It has 64 words of RAM and 64 words of ROM. It has 18-bit word for ALU operations and commands.

I think it has more than one MISC command in a word, I think count is about 3 (six-bit commands).

I cannot wrap my head around how to program that... not a beast, more like a field of tiny windmills. One of the designers of preceding chips once wrote about using it as a systolic engine, but the area of systolic algorithms is quite narrow, AFAIK.

I cannot find any C/Fortran compiler or compiler for any other high-level language.

My overall impression is that this looks like all bad ideas from Cell BE were ported to Forth language.

MISC: http://en.wikipedia.org/wiki/Minimal_instruction_set_compute... John Sokol on early GreenArray alike designs: http://hardware.slashdot.org/comments.pl?sid=274687&cid=...


My initial thought was: high density, asynchronous cores -> brain modeling. You treat each asynchronous core as a neuron.

Reading up on Charles Moore this may indeed be the intended case: http://www.pcai.com/web/ai_info/pcai_forth.html

> Charles Moore created Forth in the 1960s and 1970s to give computers real-time control over astronomical equipment. A number of Forth's features (such as its interactive style) make it a useful language for AI programming, and devoted adherents have developed Forth-based expert systems and neural networks.

Still, 100 billion neurons in brain / 144 cores * $20 per chip = ~$13 billion . Also I would guess most modern researchers in this area don't know Forth and are doing high-level programming and virtualizing neurons rather than taking a low-level hardware approach.


If you call them up and ask for a quote on 694,444,444 chips, I would kind of expect them to offer you a discount. But remember to ask, just in case.


Forth was used in AI in a pretty non-linear way.

I have a book where authors developed Lisp on Forth and then proceed developing Prolog on newly created Lisp. Then they demonstrated how to use that Prolog in the development of rule-based expert system.

There was a saying that Forth amplify programmers' ability to develop programs and to make mistakes. If you a need an AI tool, but do not need your mistakes to be amplified, stay away from Forth. I think that apply to other areas of domain-specific development as well.

While I adore Forth, I cannot recommend it to anyone. Especially to simulate brain - what if you introduce an error, Forth amplifies it and we'll get a hidden psychopath? ;)


What book is it? You can't just mention a book that has Forth, Lisp, AND Prolog and not give the title. :)

PAIP and a couple others have Lisp and Prolog, LOL has Lisp and Forth, HOPL 2 has all three (but separately), but it doesn't sound like any of those.


I think he probably spoke about "Designing and Programming Personal Expert Systems" by Carl Townsend that dates back to 1986.

From http://www.faqs.org/faqs/computer-lang/forth-faq/part5/

  Contains LISP and Prolog emulations in Forth, 
  including a unification   algorithm.  It also has some
  minimum distance classifier code. 
  The application is fault diagnosis in locomotives.


Salute!


Yep pretty much all simulation is currently being done with Phil Goodman's Neocortical Simulator (Matlab/C) and NEURON (C + recently a python api) on Blue Gene super computers. http://en.wikipedia.org/wiki/Blue_Brain_Project . So the mystery of who will use this chip continues.


Moore has been building chips like this for years. Someone must be buying them.


Perhaps the military? Small embedded neural nets could be highly useful for visual recognition algos on missiles and drones [ the majority of military planes now being built are unmanned ]. However I believe the military is now trying to move all drone tech code to a common operating system\language to increase code portability between platforms.


Don't forget that these chips are a couple of million times faster than a neuron running at 200Hz. If we forget that the brain has more interconnect, you'd only need about $7,000 dollars worth of chips.


It would also occupy a plane with the area of 12500 m^2. According to wolfram|alpha equivalent to 1.7 times the area of a FIFA-sanctioned international match soccer field.


Yeah, however, if someone were to buy $1 billion of them, they may have the ability to shrink it down to a 30-40 nm design, significantly reducing the footprint. Further conceptualize it by thinking of the footprint of a decent cluster, with all the 1u blades spread out over a certain area. Definitely feasible, though i recognize your computations are die size, not total computer size.


Three six-bit instructions per word, followed by a single three-bit instruction that can only assume a restricted subset of instructions. Since the chips have 64 words of RAM and ROM you could theoretically pack 512 instructions into each chip. In practice, this figure will be a fair bit lower- Jumps, for example, store their address in the remaining slots of a word, so they could consume as many as 4 "theoretical instructions".

However, communicating in parallel between CPUs is very easy. I/O lines between CPUs have essentially a hardware semaphore that will cause reading CPUs to block until they get a write and writing CPUs to block until they get a corresponding read. By bit-indexing ports you also get pretty easy fanout.

The docs also mention that CPUs can directly "push" instructions to one another without needing a bootstrap on the receiving end, which allows CPUs to act as extended memory for one another, eases debugging and opens up tantalyzing possibilities for self-modifying code.

You aren't going to get very far trying to execute a conventional language on this architecture, but color me interested.


> My overall impression is that this looks like all bad ideas from Cell BE were ported to Forth language.

What's so bad about the cell? I know developers that express nothing short of love for the cell processor.


Cell BE was made without compiler support. You weren't able to feed your C/C++/Fortran program to the compiler and obtain more or less parallel version of your program. This complicates things, you had to manually parallelize your program.

The tool support for Cell BE SPU was close to existent (no, I didn't mess words up). It was of such low quality so that you pretty had to use Emacs with assembler highlighting mode to to any serious work for SPU. The difference in speed between gcc and hand made assembler code circa 2007 was about 1.5-2 times.

Both PPU and SPU are in-order, so you have to avoid minefield of random memory accesses. You had to manually write allocators and such while holding up SPU constraints.

In-order architectures does not facilitate abstractions. You cannot simply recompile code from out-of-order x86 for in-order Cell BE and obtain reasonable performance (say, about 80% from maximum). You will have to optimize agressively.

You cannot load too much into 256Kbytes combined data and program memory of SPU. Divide those 256Kbytes by two, and you have about 128Kbytes of program memory (you should divide it again - one for working program and another for loaded program) and 128Kbytes, or 8Kquadwords (16 bytes per quad word) of data memory. Data memory you should divide again - one part of your data is constant, perhaps, or you work with one patr and another is being loaded. 4K quadwords. Two operations per cycle on quadwords, 2K cycles to process the whole data block. The latency of Cell memory subsystem is very high, so our 2Kcycles should be comparable to time needed to load that amount of data into SPU. So you have to be very, very careful to keep SPU loaded and working.

Many new chips suffer from lack of compiler support, especially in automatic parallelization. Cell BE surely did. So does GreenArray. Cell BE suffered from lack of memory on SPU, main parallel engine block. GreenArray does that as well. Cell BE used simple to implement but hard to program in-order architecture in all its' processing parts. GreenArray uses stack architecture which is extremely hard to program.

So, in my eyes, GreenArray is Cell BE ported to Forth. ;)


My understanding is that this work is in conjunction with Alan Kay's Viewpoints Research: www.viewpointsresearch.org and what they're working on in languages.


Are you sure? How do you know this? Any connection between Alan Kay and Charles Moore -- or even between their teams -- would be worth hearing about.


This very powerful and versatile chip consists of an 18x8 array of architecturally identical, independent, complete F18A computers, or nodes, each of which operates asynchronously. Each computer is capable of performing a basic ALU instruction in approx. 1.5 nanoseconds for an energy cost on the order of 7 picojoules. Nothing else available today comes close to that winning combination. Twenty-two of the computers on the edges of the array have one or more I/O pins and one of several classes of circuitry associated with them, as illustrated below.

http://www.greenarrays.com/home/documents/greg/PB001-100503-... [pdf]

About the mentioned F18A computer:

The F18A is a stable, mature design for a computer and its I/O whose robustness has been proven in many chip configurations. It has been proven in 180nm geometry, and a prototype in 130 nm has also performed well. The computer is small; eight fit in roughly a square millimeter. Depending on chip configurations, this yields between 100,000 and 200,000 computers per 8 inch wafer, contributing to the low cost of our chips.

http://greenarraychips.com/home/documents/greg/PB003-100822-... [pdf]

In the named pdf -files are also some information about possible applications. But it seems to be quite tough to get some outside information about it.


> I'm guessing this is not x86

Nothing really interesting is.


I am wondering whether this could be used for real-time ray tracing or even for real-time photon mapping. I always had the impression that in case of highly parallel algorithms an incredible amount of very primitive processors have the best MIPS/transistor_count ratio. I know though that the bottleneck usually is bus communication / memory access.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: