Apr 9, 2009

NOTE 1

1. One interesting thing I learned from Michael Paradiso's talk was about some experiments (--Need link--) showing that the regions of brain that are involved in early stages of processing continue to be active in even the later stages. This suggests that a pipeline approach to model it may not be useful.

2. Data-driven mentality, where using massive amounts of data coupled with simple algorithms has more promise than complex algorithms with small amounts of training data.

3. A problem with categorization is its inability to deal with novel categories -- something which humans must deal with at a very young age. We (humans) can often deal with arbitrary input and using analogies can still get a grip and the world around us (even when it is full of novel categories). One hypothesis is that at the level of visual perception things do not get recognized into discrete object classes -- but a continuous recognition space. Thus instead of asking the question, "What is this?" we focus on similarity measurements and ask "What is this like?". Such a comparison-based view would help us cope with novel concepts.

4. C_INFERENCE: Inference package for undirected graphical models by Talya Meltzer.
试过了,windows 下和 Linux 下都有莫名的问题,导致 matlab 死掉。
另外占用内存太大。
Loopy Belief Propagation, Generalized Belief Propagation, Mean-Field approximation, and 4 monte-carlo sampling methods (Metropolis, Gibbs, Wolff, Swendsen-Wang). simulated-annealing.
不知道和 MRF 相比如何。但是它可以用于 CRF.

5. "Instead of a 3D parse being the initial step, the pattern recognition of the contours and the access to semantic relations appear to be the primary stages" as well as "further evidence that an object's semantic relations to other objects are processed simultaneously with its own identification."

6. Focus on simplicity. Consider a particular recognition task, namely car recognition. Without any training data we are back in the 1960/1970s generation where we have to hard-code rules about what it means to be a car in order for an algorithm to work on a novel image. With a small amount of labeled training data, we can now learn the parameters of a general parts-based car detector -- we can even learn the appearance of such parts. But what can we do with millions of images of cars? Do we even need much more than a large scale nearest neighbor lookup?

As Rodney Brooks once said, "The world is its own best representation," and perhaps we should follow Google's mentaly and simply equip our ideas with more, more, more training data.

7. Aude: What is the interplay between scene-level context, local category-specific features, as well as category-independent saliency that makes us explore images in a certain way when looking for objects?

David: Is naming all the objects depicted in an image the best way to understand an image? Don't we really want some type of understanding that will allow us to reason about never before seen objects?

Ce: Can we understand the space of all images by cleverly interpolating between what we are currently perceiving and what we have seen in the past?

0 comments: