How to make computers capable of letting spaghetti cause an identity crisis?

The little fellow below is Sven, the budgerigar (grasparkiet). He’s holding a toy, because he’s playing fetch with his owner. A lot of dog owners would be pretty proud for such a feat, mosts owned dogs would not do it as well.

A budgerigar waiting to play fetch

I’m a dinosaur (really, I am). Want to play with me?

Birds like this are sold starting at around $15. If you want the smartest bird though, $15 is a rip-off. You could catch yourself a raven, which were observed using strategy while hunting in the wild, or crows who were observed in lab tests to precisely make their own tools to reach food. The smartest bird is arguably actually the European Magpie (ekster), which you could -though obviously, you shouldn’t…- capture as well. Then you could harness the power of its 5.8 gram brain, and teach it to speak. Most importantly though: it’s the only non-mammal that scientists agree recognize themselves in a mirror, and recognize when something’s wrong with their appearance. In other words, they use mirrors like you might in the morning.

Now meet the PR2. You can get your own starting at 400,000$.


I’m two desktop computers (take my word for it, or start digging). Want to play?

Much like Sven, the PR2 is pretty cute and engaging. He has 2 computers for ‘feet’. Good computers… A very high-end CPU and 24 GB RAM each. But, they’re plain computers. Pretty close to the desktops at groupT, or the one I have under my desk (but closer to the one under my desk ;-)). They run Ubuntu, though you could install Windows or OSX on them, and they communicate with each other over a network cable. They have a bunch of cameras and sensors plugged in to them, as well as drivers for some attached motors. All of that combined, is the robot you’re looking at. All you have to do to make it work, is run a set of applications on the computers, that form the interface to the hardware.

Suppose you have a desktop at GroupT (or your work, institute, whatever…), and we give it some distinctive visual features. We give it a webcam, and write some software for it so that when presented with it’s reflection the desktop can detect itself. And it has to be able to do that with the mirror reflecting from any random location. A student drops a plate of spaghetti on the computer, covering it, and it’s taken to the basement to wash it off. Would our program still enable the computer to recognize itself, covered in spaghetti in the basement? A magpie would.

The PR2 doesn’t have to recognize itself covered in spaghetti. Worse… My task is to get such a robot to autonomously bake pancakes. More specifically, to provide and analyze the visual feedback. I’m required to only use visual feedback, so I don’t just get to poke at stuff, use microphones,…

He has to pour dough until he sees it’s enough, on a surface he determined he’s made greasy enough. The surface or tools aren’t known in advance. And he has to see whether or not the pancake is ready, or perhaps burning. “Should I turn up the heat, because the pancake isn’t doing much?” He has to see and recognize, whatever I think he needs to see, to complete his task.

My blog will deal with a single, two-fold, question: how do you get a desktop computer to bake a pancake, and what are the implications and other uses of the tools you use to do that?


8 thoughts on “How to make computers capable of letting spaghetti cause an identity crisis?

  1. tijlcrauwels says:

    I was wondering whether you do your thesis with Group-T, or another company? Will you be using an existing robot and implement the programming or do you have to create something from start.

    Another question I had was what exactly defines ‘visual feedback’. Will you be able to use thermal vision to measure temperature? And what about any other tool that doesn’t ‘touch’ the object?

  2. jefhimself says:

    My ‘company’ is a group of the Electrical Engineering department at the KULeuven. I’m executing it on a PR2, like the one above, in cooperation with the Mechanical Engineering department. The Mechanical department has such a robot, because of a participation within a larger European scale project.

    My thesis is specifically in the field of computer vision, which mostly means I’m working with images. You could then for instance use thermal imaging (meaning: apply techniques from computer vision on thermal images), but that would be besides the point since they want a solution with the basic hardware. It’s more about the research here, in computer vision, then about getting it to work. For my input I’m restricted to a 3D laser scanner, two stereoscopic cameras, two low-res cams, a ‘high-res’ cam (5Mpixel), and a kinect. Most of them, will not be used.

    • I guess for taking thermal images you use an infrared camera? And am I correct to say that stereoscopic camera is a camera consisting out of two or more lenses to capture a 3-D image which is useful to estimate distances? Is it useful for you then? Because you work with images but your domain is not about touching thing?

      • jefhimself says:

        To my knowledge a thermal camera is an infrared camera, but an infrared camera definitely isn’t by definition a thermal camera.

        You can make an infrared camera yourself. If you’d look at the lens of your phone, webcam, or any other (at least ‘low-budget’, I don’t know if this would still be true for a $3000 digital camera…) digital camera, you’ll see a reddish, sometimes greenish reflection. That reflection is caused by an infrared cut-off filter. If you’d take e.g. that webcam (which are easy) apart until you see that filter, take it out, and put it back together, your webcam would show infrared light with short wavelengths. If you’d buy some high power infrared LEDs as well, that thus means you can (easily) make your own night vision camera (true story, I have 😉 ). 🙂 The filter you’d have to remove looks like this (courtesy talbotron22, in a guide on

        The reason you can do this is because the CCDs (the sensors that actually ‘make the photo’, like photographic film in an analogue camera does) are inherently responsive to these infrared wavelengths (@Gabuglio: which is actually a nice result of semiconductor physics). Thermal cameras however, require responsiveness to much longer wavelengths. If you would have one, you could apply computer vision (of which I’ll introduce an application in my next post) on that image like on any other. I suspect some form of computer vision processing is actually used in the thermal images we’re used to seeing (I’m not sure you’d still call it computer vision), to add the coloring.

        The stereo cam is capable of producing some sort of 3D ‘image’, which I would use to produce a so-called pointcloud. I used the laser scanner at the start of my thesis, because it produces a much easier, more complete, pointcloud. They’re useful because I can use them to distort my camera images, to make them look like they would when the camera would be a different position. I use this to get a bird’s eye view on the pancake. Clock-wise starting at the top right, you can see the camera image made by the robot, the robot’s coordinate frames (the camera is on its head) and a point cloud representation of the laserscanner’s realtime feedback (you can clearly see the hotplate on the table), and the generated bird’s eye view of the pancake, in this video (probably best to watch it in full screen mode):

        Apologies for the late reply, hope you got a good answer out of it though. 😉

  3. gabuglio says:

    What is the real purpose of this robot?
    If I let my fantasy go in like 20 years every family wants to have a robot like that to do his housekeeping. But I see one big constrain and that is the cost of the robot. I can’t imagine this robot is practical at that price. Is it possible to let the price drop? And how will this be done?

    • jefhimself says:

      The PR2 is a research robot. It is used to research human-robot interaction, motion and task planning (not trivial for something without the intrinsic awareness us humans have, nor the many sensors we have throughout our body, including receptors in our skin), perception, machine learning,…

      That’s one way the price would definitely drop: for it not to be a research robot, so that the production volume can go up. I think most of the cost driver of the PR2 is high R&D costs, though it’s possibly also expensive due to mechanical and electrical complexity (though I would guess they’re not that much of a factor).

      Your view of the future might be part of what will happen with robotics. In any case, just like ubiquitous computing, robotics might become ubiquitous as well. We already have lawnmower and vacuum cleaner robots, primitive toy robots, and such. They can become simpler than that. Take for instance those cars (different brands have this) that park themselves:

      Just like the PR2, that car is a computer, with an operating system, some sensors for input, and some outputs it can control. You can see the car above using all of them at the same time.

      I think the real trick to making robots ubiquitous, is in the machine learning though. In the case of that car, some engineers figured out a procedure to park a car. I could just focus on a procedure to bake pancakes. But then, what if you want to bake cookies instead, someone else does a thesis on that? Ideally, and what the research focuses on, I’d be able to bake a pancake while a robot watches me, and it figures out how to do it itself. For that to happen though, the perception needs to be able to provide information for that, the motion planning needs to be able to execute, the robot needs to be able to ‘learn’, much like we did growing up, and so forth.

      Imagine robotics getting to that point, and e.g. electrical cars being more commonplace. To give those electrical cars (stuffed with electronics) the ability to park themselves, if they had all that software that’s being worked on on the PR2, you’d only have to demonstrate to your car how to do it. The car would form some sort of comprehension of parking (build a schema like we do), which it could use and adapt in different situations. It can then even share its knowledge through a global sharing point, which could then be used by a wheelchair at the other side of the world that’s trying to ‘figure out’ how to park next to a bed. That, I think, is another, possibly more influential, part of the future of robotics.

  4. […] to see what other people might think of our opposing views. So, if anyone would like: (re-)read these two comments, and get into it as […]

  5. […] with? What’s the ‘vision’, of computer vision? Gabuglio wondered last time, in this comment whether the PR2 could be like Rosey from the Jetsons. Unfortunatley: no. Or at least, not at this […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: