Category Archives: Computer vision Jef

Intermezzo: Robot Operating System

In my original post, I referenced the simplicity of the PR2, the robot I’m working with. Since robotics is one of the main technologies used in my thesis, I decided to expand on it by explaining how the PR2 works in a bit more detail.

I sketched a robot I made myself, in its simplest representation, as follows:

Conceptual sketch of a single computer ROS robot for computer vision purposes

Conceptual sketch of a single computer ROS robot for computer vision purposes, own work.

Central in this image is a computer with an operating system. Thinking of ‘a computer’ and ‘an operating system’, is a good starting point if you would like to imagine what kind of robot you could make. According to the encyclopedia Britannica, a robot is “any automatically operated machine that replaces human effort, though it may not resemble human beings in appearance or perform functions in a human-like manner”. What would be the purpose of your machine, where would it operate, and what computers would be around by default (or are most likely to be around) in that setting? These days, it’s quite likely you’d come up with smartphones, tablets etc., and with stricter embedded systems (like the electronics in a car).

Below the computer, I grouped the input and output devices (I/O). Some devices can connect as is, or may be part of the computer, other might require some electronics to interface with. Just with that, a spare half an hour and some duct tape, you could make this 500$ telepresence robot:

In robotics research, this kind of setup has some problems. Consider for instance an autonomously driving car (recurring example, but it’s a good one to think about…). You could kinda make it from scratch like with the above robot, by taking the car, some computer, and just writing software to make the two work together. Considering the complexity of cars however, and the complex tasks we would want the autonomous car to perform, this software would end up being very complex and require many specialists in a large amount of engineering disciplines. Plus, at the end of it, such custom software might also be very hard to transfer to even another version of the same basic model of that car.

To me, this is where the Robot Operating System (ROS) comes in. To be clear, ROS is actually a meta-operating system. That means it is conceptually similar to an OS, but practically is not at all an OS (personally, I wish one of the creators’ wives or boyfriends or something would’ve told them that naming it an OS is simply a stupid idea). Most people just call it a robotics framework… There are many other robotics frameworks available, but ROS specifically is designed for large-scale, complex systems in a way that makes them easily expandable and transferable as well.

One of the nicer characteristics, is that it follows a peer-to-peer philosophy. The single computer in the first figure can actually just as well be 10 computers, all running the ROS framework, communicating over a network. The ROS framework would tie it all together, and these ten computers would be supporting a single virtual entity, which is what you would call ‘the robot’. This way, the ‘computer’ in a robot with ROS is not usually a single computer, but rather would look like this:

A typical ROS network configuration

A typical ROS network configuration, Quigley et al.

As this image suggests, in the end it’s all about the network setup. 🙂 As we all know, adding a computer to a network as just clicking the name of a WiFi, or plugging in an Ethernet cable. This remains true in this setup. You, practically, can just as easily add another computer to such a network, and that computer can be anything from a single computer being added to extend the robot virtually, to an entirely new robot consisting of 100 computers. This is also how I would probably solve a problem that was asked about in the comments for that telepresence robot: what if you’re on the phone with your girlfriend, thus following her with the robot, and she goes up the stairs? Well, just put another one of those cheap robots (meaning the hardware) up there. Put the Skype on one specific computer, and just switch between the webcams of the physical robots by transferring the video using network streaming. The virtual single entity that, in the end, is the robot, is not impeded by the physical limitations and can switch between the hardware at any time. Nor is there really a limit on what kind of hardware you’d be using.

Robotnik Automation Modular Arm

Robotnik Automation Modular Arm

AscTec Quadrotor

AscTec Quadrotor

These two images and dozens more of examples of robots you can use with ROS, on their website.

I’ll leave you with one final image. Without knowing what these discs are doing, they might be controlled as completely individual robots, all at the same time as one robot, or they might be individual robots working together as an actual team (it looks like that is actually what they’re doing). At this point, I hope that makes sense, and that you see the possibilities as well. 🙂

A team of iRobot Create robots

A team of iRobot Create robots at the Human-Automation Systems Lab, Georgia Institute of Technology,

Computer vision vision (not a typo)

In my original post, I asked how a robot would be able to recognize itself, when presented with its image, covered in spaghetti. In other words: how can a robot robustly and reliably recognize itself, based on images? Simple enough, right?

Well, how would a robot ‘see’ to begin with? What’s the ‘vision’, of computer vision? Gabuglio wondered last time, in this comment whether the PR2 could be like Rosey from the Jetsons. Unfortunatley: no. Or at least, not at this point. Using only computer vision however, it could do some other jobs.

Right now, our robot could stand in a factory, matching label colors against the desired color for a paint job . A factory is a highly controlled environment, so you might get away with just using thresholding. For red paint for instance, if your image is made up of levels of red, green and blue: check to see if there’s a uniform patch in the image that’s more than 90% red, but less than 10% blue or green. He could do something more advanced as a factory worker, and be a bottle level inspector. He would probably use an edge detector for this, like you could in Photoshop or any other image editing program. These are some of the simpler operations. Generally speaking, they’re very easy to understand, and use. Like the circuit laws, or the ideal gas law…

If our robot went to school a little bit longer, it might be working for the TSA, where it would be in high demand right now. As you may know, they use so-called full-body scanners over there. That used to mean someone 30 feet down is supposed to literally, but might be figuratively, looking under your clothes, ‘checking you’. Some people were offended… Our robot could do a more acceptable job, and these days, they do.

Backscatter X-ray released by TSA in 2007

Backscatter X-ray released by TSA,

Generic view produced by millimeter wave scanners

Generic view produced by millimeter wave scanners, Chicago Tribune

Obviously there’s a lot more involved here. A lot of it though, would have to do with image segmentation: partitioning the image into more meaningful, analyzable regions. Once that’s taken care of, a computer get rids of the areas that are definitely not of concern. What remains is marked, to be inspected by a human. It could do more meaningful things too, like finding tumors in fMRIs (computer vision right now has a lot of applications in medical imaging).

Multiple steps in a more advanced segmentation algorithm

Multiple steps in a more advanced segmentation algorithm, Chen et al.

Segmentation used to automatically mark tumors on fMRIs

Segmentation used to automatically mark tumors on fMRIs, C. Yu.

How would the PR2’s computer know what to mark, and what not? To make the problem clearer, I’ll give our robot yet another job: to check my fingerprints at the border control (this is the part I hate, but anyhow…). Suppose I was a criminal, how would you compare my fingerprints against the millions of fingerprints of known criminals. This is where you need feature extraction; you need some way to extract a small amount of information from the image, that still represents its content, and can be compared with similar information. Possibly through something like the aforementioned methods, sometimes something more advanced, like in the fingerprints below. Sometimes, the features might not have a clear meaning to us anymore, and sometimes they simply don’t… The measurements in this photo of a fingerprint for instance, make total sense:

Fingerprint core point detection by intersection of ridge normals

feature extraction for fingerprints: Core point detection by intersection of ridge normals, Rajanna et al.

At least when compared to the features found in these faces:

Illustration of Gabor features selected for facial expressions

The bottom row consists of Gabor features, which were searched for around the areas marked by dots (at their center) in the images in the top row, Susskind et al.

These are the kind of tools a computer can use to transform images into something it can make sense of. But, it still doesn’t explain how he’d be able to recognize himself. Or spaghetti for that matter… I’ll explain how it can, in my next post.


Gabuglio asked an interesting question last year, one people have been dragging me in arguments for. Usually because some people target me now for facilitating the ‘rise of the machines’ (just for a thesis?). I think I gave an interesting answer to it as well, and I’m curious to see what other people might think of our opposing views. So, if anyone would like: (re-)read these two comments, and get into it as well. 😉

To make things more interesting, I’ll add a third vision. One that the military is interested in:

How to make computers capable of letting spaghetti cause an identity crisis?

The little fellow below is Sven, the budgerigar (grasparkiet). He’s holding a toy, because he’s playing fetch with his owner. A lot of dog owners would be pretty proud for such a feat, mosts owned dogs would not do it as well.

A budgerigar waiting to play fetch

I’m a dinosaur (really, I am). Want to play with me?

Birds like this are sold starting at around $15. If you want the smartest bird though, $15 is a rip-off. You could catch yourself a raven, which were observed using strategy while hunting in the wild, or crows who were observed in lab tests to precisely make their own tools to reach food. The smartest bird is arguably actually the European Magpie (ekster), which you could -though obviously, you shouldn’t…- capture as well. Then you could harness the power of its 5.8 gram brain, and teach it to speak. Most importantly though: it’s the only non-mammal that scientists agree recognize themselves in a mirror, and recognize when something’s wrong with their appearance. In other words, they use mirrors like you might in the morning.

Now meet the PR2. You can get your own starting at 400,000$.


I’m two desktop computers (take my word for it, or start digging). Want to play?

Much like Sven, the PR2 is pretty cute and engaging. He has 2 computers for ‘feet’. Good computers… A very high-end CPU and 24 GB RAM each. But, they’re plain computers. Pretty close to the desktops at groupT, or the one I have under my desk (but closer to the one under my desk ;-)). They run Ubuntu, though you could install Windows or OSX on them, and they communicate with each other over a network cable. They have a bunch of cameras and sensors plugged in to them, as well as drivers for some attached motors. All of that combined, is the robot you’re looking at. All you have to do to make it work, is run a set of applications on the computers, that form the interface to the hardware.

Suppose you have a desktop at GroupT (or your work, institute, whatever…), and we give it some distinctive visual features. We give it a webcam, and write some software for it so that when presented with it’s reflection the desktop can detect itself. And it has to be able to do that with the mirror reflecting from any random location. A student drops a plate of spaghetti on the computer, covering it, and it’s taken to the basement to wash it off. Would our program still enable the computer to recognize itself, covered in spaghetti in the basement? A magpie would.

The PR2 doesn’t have to recognize itself covered in spaghetti. Worse… My task is to get such a robot to autonomously bake pancakes. More specifically, to provide and analyze the visual feedback. I’m required to only use visual feedback, so I don’t just get to poke at stuff, use microphones,…

He has to pour dough until he sees it’s enough, on a surface he determined he’s made greasy enough. The surface or tools aren’t known in advance. And he has to see whether or not the pancake is ready, or perhaps burning. “Should I turn up the heat, because the pancake isn’t doing much?” He has to see and recognize, whatever I think he needs to see, to complete his task.

My blog will deal with a single, two-fold, question: how do you get a desktop computer to bake a pancake, and what are the implications and other uses of the tools you use to do that?