Computer vision vision (not a typo)

In my original post, I asked how a robot would be able to recognize itself, when presented with its image, covered in spaghetti. In other words: how can a robot robustly and reliably recognize itself, based on images? Simple enough, right?

Well, how would a robot ‘see’ to begin with? What’s the ‘vision’, of computer vision? Gabuglio wondered last time, in this comment whether the PR2 could be like Rosey from the Jetsons. Unfortunatley: no. Or at least, not at this point. Using only computer vision however, it could do some other jobs.

Right now, our robot could stand in a factory, matching label colors against the desired color for a paint job . A factory is a highly controlled environment, so you might get away with just using thresholding. For red paint for instance, if your image is made up of levels of red, green and blue: check to see if there’s a uniform patch in the image that’s more than 90% red, but less than 10% blue or green. He could do something more advanced as a factory worker, and be a bottle level inspector. He would probably use an edge detector for this, like you could in Photoshop or any other image editing program. These are some of the simpler operations. Generally speaking, they’re very easy to understand, and use. Like the circuit laws, or the ideal gas law…

If our robot went to school a little bit longer, it might be working for the TSA, where it would be in high demand right now. As you may know, they use so-called full-body scanners over there. That used to mean someone 30 feet down is supposed to literally, but might be figuratively, looking under your clothes, ‘checking you’. Some people were offended… Our robot could do a more acceptable job, and these days, they do.

Backscatter X-ray released by TSA in 2007

Backscatter X-ray released by TSA, Wikimedia.org

Generic view produced by millimeter wave scanners

Generic view produced by millimeter wave scanners, Chicago Tribune

Obviously there’s a lot more involved here. A lot of it though, would have to do with image segmentation: partitioning the image into more meaningful, analyzable regions. Once that’s taken care of, a computer get rids of the areas that are definitely not of concern. What remains is marked, to be inspected by a human. It could do more meaningful things too, like finding tumors in fMRIs (computer vision right now has a lot of applications in medical imaging).

Multiple steps in a more advanced segmentation algorithm

Multiple steps in a more advanced segmentation algorithm, Chen et al.

Segmentation used to automatically mark tumors on fMRIs

Segmentation used to automatically mark tumors on fMRIs, C. Yu.

How would the PR2’s computer know what to mark, and what not? To make the problem clearer, I’ll give our robot yet another job: to check my fingerprints at the border control (this is the part I hate, but anyhow…). Suppose I was a criminal, how would you compare my fingerprints against the millions of fingerprints of known criminals. This is where you need feature extraction; you need some way to extract a small amount of information from the image, that still represents its content, and can be compared with similar information. Possibly through something like the aforementioned methods, sometimes something more advanced, like in the fingerprints below. Sometimes, the features might not have a clear meaning to us anymore, and sometimes they simply don’t… The measurements in this photo of a fingerprint for instance, make total sense:

Fingerprint core point detection by intersection of ridge normals

feature extraction for fingerprints: Core point detection by intersection of ridge normals, Rajanna et al.

At least when compared to the features found in these faces:

Illustration of Gabor features selected for facial expressions

The bottom row consists of Gabor features, which were searched for around the areas marked by dots (at their center) in the images in the top row, Susskind et al.

These are the kind of tools a computer can use to transform images into something it can make sense of. But, it still doesn’t explain how he’d be able to recognize himself. Or spaghetti for that matter… I’ll explain how it can, in my next post.

Advertisements

6 thoughts on “Computer vision vision (not a typo)

  1. deckersbram says:

    Very interesting post!

  2. tijlcrauwels says:

    Interesting, last week I saw something on TV about ‘the human brain’-project. Where they try to recreate the brain. The guy talking about this also mentioned something very interesting where he said there might be a point where artificial intelligence is so smart, that it discovers all the future possibilities and innovations at once.

    • jefhimself says:

      Thanks for that reference. Looks like it might be an interesting project to look into. I especially like that the collaborators come from, sometimes very, respectable universities, and that there’s transatlantic collaboration as well.

      That point you describe sounds like the so-called “technological singularity”. As far as I’m aware, the few notable ‘singularity’ advocates also have direct commercial interests in supporting it. Like televangelists… Most notably Kurzweil. For scientific purposes: I think the pending arrival of that point is hogwash. I do feel conflicted about whether or not such ideas should be spread. Short term, it can spread enthusiasm, and thus (I guess) might bring in funding for e.g. specific fields within AI. Maybe stuff like that is part of regular marketing. Long term however, it harms your credibility (of which Kurzweil is an example: he might be credible in popular culture, I really don’t think he’s credible in the scientific communities he oracles about).

      Anyhoe (pardon me rant…). My next post will be about a branch of AI called machine learning. Not as sexy as the singularity, but I’ll try to keep it interesting anyway. 😉

  3. gabuglio says:

    What I am wondering. The research that you are doing, isn’t it a little bit outdated? I mean wouldn’t it be more meaningfull to search for new developments instead of going deeper in this one?
    Despite my question it is an interessting post and well documented!

    • jefhimself says:

      Thanks for the compliments, first of all. I don’t understand your question though, can you be more specific? What part of what I’m researching for my thesis do you mean, and what do you mean with “this one”?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: