Hacker News .hnnew | past | comments | ask | show | jobs | submitlogin

Might you by chance be familiar with Rodney Brooks' work on subsumption architectures [1]? If not, I would summarize the underlying idea (my words not his) as "don't try to jump too many layers of abstraction in one go" [2].

So I wonder to what extent you would consider this a predictable outcome from the classifier in question not being part of a subsumptive architecture --- which at a guess would look like

  - glance/texture responses fed into 
  - boundary-recognition layers fed into
  - object persistence/tracking layers 
  - fed into abstract scene reasoning
It seems to me, as a non-vision researcher (I mainly worked in planning and control), that the most obvious counterargument to the image being a spotted cat is based on boundary/object/scene reasoning, and that it's "reasonable" for the texture/glance layer to say "looks a lot like a cat texture".

[1] https://en.wikipedia.org/?title=Subsumption_architecture

[2] I realize this may seem, superficially, anathema to deep network research, which advocates letting the network find its own intermediate levels of abstraction. But it's actually compatible in my view because Brooks advocates (again, paraphrasing quite a bit) that the separate layers should have different objective functions, and that in fact the need for different objective functions (in a prioritized order) is the cause of emergent layering in nature. "First, don't die. Second, find shelter. Third, find food etc." So one can imagine deep networks each finding their own locally useful abstractions for each objective function in the "Maslow" chain, while still having some macro architecture that tracks human-imposed design principles.



the limit is that training cannot force abstraction. you can only reach abstraction if you have enough neuron space and the data set is big enough to avoid over-fitting textures.

the problem is.. human vision doesn't work just by feeding a bitmap. we have structure to decode space relationships, shapes and maybe even shadow/light relations. no way we gonna see classificator working on color arrays matching our vision capabilities


Seems simple enough to feed a NN with that abstracted data.

However, the advantage to the texture approach is it's abstracted from a lot of other information. You don't want a classifier to say sofa, when it's a picture of a person on a sofa.


but then you're biasing it toward your perception:

http://www.bespokesofalondon.co.uk/assets/Uploads/bespoke-so...

anyway it does work perfectly if that's what you need, but most proponent are trying to use deep nn to classify 'as good as humans do'




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: