18 February 2009

Good-enough synthetic aperture photography

Hello -

Today I'd like to share progress on my efforts to learn about "synthetic aperture photography," a branch of computational photography.

Jenn and I drove up Mass. Ave. in Lexington (i.e., Paul Revere Lexington), put a video camera out the window, and filmed the buildings whizzing by.

Here is a typical collection of frames of, say, a school (or apartment?) blocked by trees:

After picking, say, 16 successive movie frames, you can stack them up, shear them, and slice them to simulate having a camera that's 20 feet long or so. Why? This manipulates the data so as to create the final image of the building with the trees removed:

This won't be news to those of you in the field of computer graphics since, say, 1997, but there's a lot of activity analyzing optical systems - such as large-lens or multi-lens systems - for benefits in graphics. Marc Levoy, a professor at Stanford, held a class on a collection of topics he called "computational photography." Today, the field, building on work in computer vision and 3-D display, includes:

Snapping a photo, and not worrying about focusing it until later. E.g., when you get home.

Visualizing 3-D scenes from angles between those you've taken pictures, such as the post-inauguration Microsoft / CNN project.

Taking many snapshots of a scene - say of a building obscured by bushes - and then "erasing" the bushes to show what's behind.

I've been attending talks, flipping through papers, and watching colleagues try their hand at this burgeoning field.

How It's Done

(1) Take video at constant velocity along linear track, (2) collect frames into an array - a spatio-perspective volume - and recenter synthetic film plane on the region of interest, and (3) average over all fames & display it.

I like Mathematica 7 for its functionality and elegance (though not its memory management). We took a few minutes of video & chose some few-second clips using our MacBook's video editing software. Exported it as a QuickTime movie, and from QuickTime Pro, exported an AVI. (For some reason my Mathematica won't play nice with MOV.)


Import a good 16 or 32-frame segment into an array of images. I chose frames that showed a buliding in the background with plenty of occluders, minimal vertical jumpiness (remember, we were driving), and relatively constant speed. To save memory, cropped the images to a horizontal region. Mind you, the linear camera motion doesn't need to be at constant velocity, but it will make your life easier if you'd like to automate the process.

Verify the constant velocity by viewing a 2-D slice of the 3-D "spatio-perspective volume," an informal approximation of an "epipolar-plane image," as Bolles and Baker called it in the 1980s.

What's that? For the purposes of the blog post, it is a 2-D image with time occupying the vertical axis and space occupying the horizontal.

Hang with me for a second. Even if you're not a computer graphics nut, I think this offers an interesting way to view the world, in the sprit of an earlier post regarding how engineers sometimes find it easier to manipulate information if it's first converted into a different format.

(Links to: Jan Neumann)

Imagine taking a sequence of frames from that movie, above, printing them out, and then stacking them up like a stacked deck of cards. The back card will be the car's starting point, and the facing card will be the car's ending point. Got it? Now imagine whipping out your sharpest knife, slicing through the deck, and peering down on it.

It would look like this:

Since nearby objects whiz past our field of view quickly, they cover lots of ground in very little time. The nearby trees and signposts therefore are the most horizontally-slanted objects in the image above. On the other hand, the school is in the distance, so it appears to creep along as we drive past it. It's nearly vertical in this representation.

Fortunately most of the tracks through the image are linear, meaning that this process can be completed with a minimum of pain.

What if we wanted to "freeze" the motion of the building, so we could synthetically "focus" on it? We'd need to recenter the film plane by shearing this stack of images. By trial and error, it turns out that the building moves 4 pixels to the right for each successive frame.

We recenter the data by incrementally padding it. If we do it correctly, the building's spatio-temperal tracks will be vertical:

Recentered film plane.


Now we just need to eliminate those pesky trees. Real-life lenses are good at doing that, because if the lens is big enough, it can "look around" the trees. In aggregate, little pieces of the lens really do get to see the whole face of the building. And so does our video camera.

We can simulate this giant-lens action by simply averaging over all of those frames. (And thanks to my co-worker, Joshua Napoli, for putting it so simply. Here is his similar project of last year - viewing houses through trees - but his blog post [had been] AWOL.)

What does it look like?

Success. Trees are blurred out. Compare to the photos at the top of this post.

How can we take this a step further? We can simulate a variable-focus lens by computing what every possible set of shearing parameters will do. That is, we can tilt our deck of cards by varying degrees so that the space-time path of various objects become vertical, and hence able to be imaged by our gigantic synthetic lens.

Here's how this looks. Let's recenter every 7th pixel so we can "focus" on a tree in the center, and "stop down" the aperture by averaging over fewer frames than we did above.


Still awake? If you're interested in this stuff, try out:


ps A big thank-you in advance to anyone who can explain why Blogger: (1) doesn't insert images at the cursor, but rather on top; and (2) why an extra linefeed appears for every paragraph whenever I insert a photo.


Dan said...

My friend, you never cease to amaze me. Knowing nothing really about optics, I find this amazing and fascinating - thanks for the explanation. I *think* I understand it.

It also re-enforces that I need to find better hobbies. ;-)

Dan said...

How are there no other comments on this? LOL

G-Fav said...

Hah! I don't know. Actually, a couple of folks have sent me very thoughtful e-mail in the background about it.

The nice folks at Wolfram are visiting the blog a lot lately (hello, Illinois!) & I'll post something on that in a sec.


Mike Warot said...

You can do fun things with a still camera as well... here are my experiments to date:

Anonymous said...
This comment has been removed by a blog administrator.
Noah Spurrier said...

This appears to be related to computed tomography. The slices through your image stack even look like the radon transform.