Today I'd like to share progress on my efforts to learn about "synthetic aperture photography," a branch of computational photography.
Jenn and I drove up Mass. Ave. in Lexington (i.e., Paul Revere Lexington), put a video camera out the window, and filmed the buildings whizzing by.
Here is a typical collection of frames of, say, a school (or apartment?) blocked by trees:
After picking, say, 16 successive movie frames, you can stack them up, shear them, and slice them to simulate having a camera that's 20 feet long or so. Why? This manipulates the data so as to create the final image of the building with the trees removed:
This won't be news to those of you in the field of computer graphics since, say, 1997, but there's a lot of activity analyzing optical systems - such as large-lens or multi-lens systems - for benefits in graphics. Marc Levoy, a professor at Stanford, held a class on a collection of topics he called "computational photography." Today, the field, building on work in computer vision and 3-D display, includes:
Snapping a photo, and not worrying about focusing it until later. E.g., when you get home.
Visualizing 3-D scenes from angles between those you've taken pictures, such as the post-inauguration Microsoft / CNN project.
Taking many snapshots of a scene - say of a building obscured by bushes - and then "erasing" the bushes to show what's behind.I've been attending talks, flipping through papers, and watching colleagues try their hand at this burgeoning field.
How It's Done
(1) Take video at constant velocity along linear track, (2) collect frames into an array - a spatio-perspective volume - and recenter synthetic film plane on the region of interest, and (3) average over all fames & display it.
I like Mathematica 7 for its functionality and elegance (though not its memory management). We took a few minutes of video & chose some few-second clips using our MacBook's video editing software. Exported it as a QuickTime movie, and from QuickTime Pro, exported an AVI. (For some reason my Mathematica won't play nice with MOV.)
Import a good 16 or 32-frame segment into an array of images. I chose frames that showed a buliding in the background with plenty of occluders, minimal vertical jumpiness (remember, we were driving), and relatively constant speed. To save memory, cropped the images to a horizontal region. Mind you, the linear camera motion doesn't need to be at constant velocity, but it will make your life easier if you'd like to automate the process.
Verify the constant velocity by viewing a 2-D slice of the 3-D "spatio-perspective volume," an informal approximation of an "epipolar-plane image," as Bolles and Baker called it in the 1980s.
What's that? For the purposes of the blog post, it is a 2-D image with time occupying the vertical axis and space occupying the horizontal.
Hang with me for a second. Even if you're not a computer graphics nut, I think this offers an interesting way to view the world, in the sprit of an earlier post regarding how engineers sometimes find it easier to manipulate information if it's first converted into a different format.
(Links to: Jan Neumann)
Imagine taking a sequence of frames from that movie, above, printing them out, and then stacking them up like a stacked deck of cards. The back card will be the car's starting point, and the facing card will be the car's ending point. Got it? Now imagine whipping out your sharpest knife, slicing through the deck, and peering down on it.
It would look like this:
Since nearby objects whiz past our field of view quickly, they cover lots of ground in very little time. The nearby trees and signposts therefore are the most horizontally-slanted objects in the image above. On the other hand, the school is in the distance, so it appears to creep along as we drive past it. It's nearly vertical in this representation.
Fortunately most of the tracks through the image are linear, meaning that this process can be completed with a minimum of pain.
What if we wanted to "freeze" the motion of the building, so we could synthetically "focus" on it? We'd need to recenter the film plane by shearing this stack of images. By trial and error, it turns out that the building moves 4 pixels to the right for each successive frame.
We recenter the data by incrementally padding it. If we do it correctly, the building's spatio-temperal tracks will be vertical:
Recentered film plane.
Now we just need to eliminate those pesky trees. Real-life lenses are good at doing that, because if the lens is big enough, it can "look around" the trees. In aggregate, little pieces of the lens really do get to see the whole face of the building. And so does our video camera.
We can simulate this giant-lens action by simply averaging over all of those frames. (And thanks to my co-worker, Joshua Napoli, for putting it so simply. Here is his similar project of last year - viewing houses through trees - but his blog post [had been] AWOL.)
What does it look like?
Success. Trees are blurred out. Compare to the photos at the top of this post.How can we take this a step further? We can simulate a variable-focus lens by computing what every possible set of shearing parameters will do. That is, we can tilt our deck of cards by varying degrees so that the space-time path of various objects become vertical, and hence able to be imaged by our gigantic synthetic lens.
Here's how this looks. Let's recenter every 7th pixel so we can "focus" on a tree in the center, and "stop down" the aperture by averaging over fewer frames than we did above.
Still awake? If you're interested in this stuff, try out:
- Email me if you want me to post the Mathematica 7 notebook on my personal website.
- Browse this collection of papers at Stanford, here. MIT is also active.
- RC Bolles, HH Baker, "Epipolar-plane image analysis: a technique for analyzing motion sequences," Proc. IEEE Third Workshop on Computer Vision: Representation and Control (Bellaire, MI), Oct. 13-16, 1985.
- A Adams, M Levoy, "General Linear Cameras with Finite Aperture," Eurographics Symposium on Rendering (2007).
- R Ng, "Fourier Slice Photography" SIGGRAPH 2005.
- W Chun, OS Cossairt, "Data processing for three-dimensional displays," allowed US Patent.
- MW Halle, "Holographic stereograms as discrete imaging systems," in Proc. SPIE 2176, Practical Holography VIII (1994).
- M Levoy, P Hanrahan, "Light Field Rendering," SIGGRAPH 96, 31-42.
ps A big thank-you in advance to anyone who can explain why Blogger: (1) doesn't insert images at the cursor, but rather on top; and (2) why an extra linefeed appears for every paragraph whenever I insert a photo.