It all started 16 months ago with an avocado. (This is California, after all.)
This particular avocado was set on a boardroom table in Pinterest’s San Francisco headquarters. Surrounded by half a dozen colleagues, Albert Pereta approached the fruit and carefully aimed his phone. The creative director from Pinterest was testing the company’s latest invention, a feature called Lens, which–if it performed correctly–would not only identify the fruit but also search through billions of photos that had been uploaded to the service for the past seven years to find similar images.
Pereta snapped the photo. The app took a few moments to sync with the cloud, and then pulled up the results. Pereta’s screen filled with a seemingly endless scroll of ripe, skin-on avocados photographed from every conceivable angle.
“A lot of people were in awe, saying ‘Look how good this thing is!’ recounts Pereta. Pinterest had managed to identify an object through visual cues alone, an incredibly difficult engineering problem. But Pereta was not satisfied: “And you know, I was looking at it–a thousand avocado pins–and I thought, ‘Who cares?'”
The visual search technology had worked perfectly, but its results were meaningless. No one takes a photo of an avocado in hopes of getting a near-identical photo of an avocado, let alone an endless stream of them. “We started asking the room, ‘What would you want if you took a picture of an avocado?'” Pereta recalls. Someone chimed in saying he’d want guacamole recipes. “So you wouldn’t even see an avocado, you’d see some squishy guacamole,” Pereta says. Or maybe Pinterest could serve up information on how to grow avocados, or hacks you could do with avocados. “Fuck yes, that would be amazing,” Pereta recalls saying.
Today, the avocado story has ascended to parable inside the plywood walls of Pinterest HQ–a reminder that personalization is more valuable than perfection. This is especially true as the company dives into the emerging field of visual search with tools like Lens, which launched in beta last February. That version didn’t just call up thousands of images of avocados. Incorporating Pereta’s insights, the app was able to offer ideas for things to do with them, like, yes, making guacamole. It was an early look at how Pinterest is, in a very real sense, pinning its hopes on visual AI to overhaul everything from how people shop to how they eat.
For the past two decades, we’ve looked for things online by typing in a search bar. Thanks to advancements in machine-learning technology, computer vision is on the verge of letting us search simply by taking photos. Google, Facebook, Microsoft, and Amazon are pouring resources into the technology. It’s no wonder: Google will take in an estimated $28.6 billion in 2017 through advertising on traditional text search. The potential for voice search, using services like Alexa, Siri, and Google Assistant, is only just being realized. And visual search? That could be monumental, as many technologists envision the smartphone of the future living not in our pockets, but on our eyes.
“I really believe that the camera will be the next keyboard,” says Pinterest CEO Ben Silbermann. “It will be a fundamental tool you use to query the world around you, discover things around you, or visualize how something might fit into your life.”
In this competitive landscape, Silbermann’s 1,200-person startup–best known for letting people pin ideas for rustic wedding decor and DIY children’s party favors to digital boards–might seem an unlikely challenger. But Pinterest has a lot hiding under that pile of avocados. At a time when partisan shouting has taken over many apps and websites, Pinterest’s 200 million monthly users turn to the service to literally picture a better life, be that in the form of a cozier living room, an adventurous trip, or a healthy snack. They’re not looking for food porn, a la Instagram, but everyday meals they can actually cook: Ninety-eight percent of Pinterest users report trying new things they find on the service, according to a Nielsen study. And advertisers are embracing the site. Pinterest’s annual revenue is projected to quintuple to $500 million from 2016 to 2017, while its users grow worldwide by a healthy 40%. (Pinterest declined to comment on revenue growth and projections.)
Pinterest’s popularity rests on its ability to create a unique “taste graph” for each user, uncannily connecting the dots between her pins to infer what else she might be interested in. Now, it’s working to incorporate computer vision into its deep understanding of a user’s preferences. “Everything churns on how useful Pinterest is for discovering ideas for your actual life,” Silbermann says. “If people are actually using Pinterest to decide all of the things they’re putting into their house, or food they’re going to cook, or their next vacation, there’s tremendous value there.” That means Pinterest not only has to master finding the one thing you’re looking for, like Google does; it also has to predict the things you never knew you wanted. If the company succeeds, it could use our cameras to unlock a world of endless, personal discovery.
“The lasting impact of visual search won’t be any specific product or feature,” says Pinterest cofounder Evan Sharp, “rather what it enables people to do: turn anything they see into something they can use to discover more on the internet.”
I’m standing in a dank, garden apartment in San Francisco’s SOMA neighborhood. The studio’s windows are mere slits in the walls, boarded up for privacy at night, yet the front door is glass, making modesty more or less impossible. The $250-a-night price tag that Airbnb charges for this apartment seems absurd, but the rental is immaculately staged with all the midcentury mirth of a stereotypical Pinterest board.
I pull out the Pinterest app. Even in the basement lighting, Lens works uncannily well, matching the objects I photograph with sharp specificity. Lens sees not just a chair, but a club chair. Not just a pillow, but a kilim pillow. Not just “art” but a Rothko painting. I’m actually learning something. Many results have actionable links that I can pin or even buy.
Later, I tried putting similar photos through Google Lens, a Pinterest Lens competitor that launched earlier this year, in beta, on the Pixel phones. Google’s version doesn’t understand that it sees a chair, or even furniture, and offers me an apology. It mistakes the pillow for a quilt. The only thing it does match correctly is that Rothko print, though it’s worth noting that identifying 2D art is widely considered to be among the simplest challenges for visual-search tools.
Google Lens just isn’t very good. At least not yet. But you can see how visual AI ties into the company’s larger business imperative, as well as that of other tech giants. Google is in the business of indexing, so it makes sense that the company would want to help users visually identify the world around them. Facebook has its social graph, focused on connecting users to friends, and a vested interest in using AI to identify faces. Amazon has e-commerce. For it, visual search could be a bridge between the digital and physical worlds–for instance, letting you photograph a pair of shoes to look for similar, perhaps cheaper, ones on Amazon. Each company might approach visual AI differently, but the implications are the same: There’s money at stake in this burgeoning field, even if it’s too soon to articulate just how much. “Think about the possibility of taking a picture to search for something you can’t even describe–it’s pretty powerful stuff,” says Forrester’s search analyst Collin Colburn. “It might be the most immature [search], but it probably has the most potential.”
Google has hundreds of employees working on visual AI alone. Facebook has 20,000 employees and 300 AI researchers, plus it operates 1.2 million visual AI experiments on the social network at any given moment. Amazon has over 500,000 employees with 5,000 working on Alexa–its new Echo Show features not just a microphone, but a camera to interact with Alexa, too–a camera gives Amazon the view of a whole room inside your home. Pinterest? It has just 12 employees devoted to visual search.
But Pinterest is mightier than it seems. To start with, it has a massive data set on which to train its visual AI. The more images you have, the smarter the algorithm will get, and the better it will be at serving recommendations users actually want. The largest public data set used by many researchers, Image Net, consists of 14 million crowdsourced photos of everyday objects. Pinterest has billions–uploaded by eager pinners, skimmed from blogs, and posted by corporations themselves–most of which are immaculately staged and lit because they’re official product photography. Computers see perfect images with more ease. Just as significant: These photos have been hand-tagged and labeled by Pinterest’s own loyal users for years.
“You want to have samples of everything that can happen and everything that can be seen. The larger the data set, the larger that probability that you’re not going to be surprised,” says Manuela Veloso, head of machine learning at Carnegie Mellon University. “What’s interesting about [Pinterest’s] billions is they’re going to cover extremes.”
It also helps that Pinterest, by design, offers somewhat fuzzy results for any search. Queries about denim jackets will elicit results with denim jackets. But if one image in the feed has black denim rather than blue, or perhaps a blue denim purse, it doesn’t look like a mistake. That’s the lesson Pinterest learned from avocados. Exact matches are the specialty of Google search, which has been optimized to respond to specific questions–like, “How do you grill fish?”–with the perfect link. Pinterest users tend to pose vaguer queries: They might search for “seafood dinner ideas” several times a week. For them, a non-exact match is not a error. It’s inspiration.
In other words, Pinterest’s AI can fail a visual search but still luck into a right answer. Imagine Siri doing the same thing. “In some other companies, we talked about precise recall a lot. But at the end of the day, it’s how useful the user feels [a feature is],” says Li Fan, the head of engineering for Pinterest. “They may not require 100% precision. It’s okay. As long as we fulfill the expectations, they feel it’s a consistent experience, they feel it’s useful.”
Pinterest has also engendered its users’ trust at a time when competing platforms are under fire for invading privacy. People look at Pinterest as something other than either a search engine or social network. “The relationship we try to have with our users is, when you share things about yourself, you share them because you want better recommendations, and we give them to you,” says Silbermann. “That expectation is pretty clear. You’re using Pinterest to find style, so if we ask you, ‘What are your favorite colors?’ There’s nothing [invasive] about that.” Those kinds of interactions allow Pinterest to deliver results that are both surprising and accurate. And it does it in a design language that feels more curated than calculated, more human than robotic.
Pinterest was founded in 2010 with a big bet on design: a platform that lets you collect and sort the topics you are interested in, not as ugly blue text links, but as gorgeous photos that sit on virtual index cards. Cofounder Sharp was the creative visionary–a designer who trained at Columbia’s Graduate Architecture School–and the perfect complement to CEO Silbermann, a management consultant gone entrepreneur. The platform was a hit: In 2012, comScore analytics declared it the fastest-growing web service in history.
But Pinterest’s once-novel approach to photo cards has since been adopted by titans such as Google, which uses them in everything from search results to Android’s OS. And its meteoric growth has slowed. Today, the pinning service has been overshadowed by Instagram, with its 800 million monthly active users, and Snapchat, with its radical augmented-reality technologies.
Rather than chase aggressive growth, though, Pinterest has doubled down on its core offering: predicting what users want to see. “We ended up investing a lot on [machine learning],” says Sharp, who now serves as chief product officer, as he paces around a conference table in a Pinterest-branded white T-shirt. Sharp doesn’t have his own office, technically. No one at Pinterest does. Instead, he’s taken over a small room with a King Arthurian-style roundtable at the center. On the back wall is a large, analog pin board adorned with a cross-stitched Pinterest logo, made by his mom. “Most of what [users] see is algorithmically determined,” he says. “It’s a recommendation, a search result, or a Related Pin.”
This latter feature, a relatively simple list of suggested pins, generated from what you’ve seen last, launched in 2013. It soon accounted for 10% of all impressions on Pinterest, but plateaued until the company tasked a few AI engineers with making it better in 2014. They trained algorithms to suggest related topics and recommend similar items based merely on visual cues, prioritizing pins that would get the most clicks. Tap on a coat hook made from fallen tree branches, and Related Pins include a Venn Diagram of sensible suggestions that mix home decorating and woodsiness, including a tree trunk coat rack, a sapling room divider, and a reclaimed barn wood key tray. Today, Related Pins represents 40% of all impressions on Pinterest.
The company followed up this early investment in visual AI by recruiting computer-vision guru Fan from Google in 2016. A childhood painter with a passion for visual arts, Fan was pushed into engineering by her parents at age 12. She went on to work at Google for eight years, before leading the 1,000 engineers working on search at Baidu. Then she returned to Google again, opting for a more focused role as head of Google’s image search. Finally, Pinterest came knocking. “Li’s values–as a leader and person–were very aligned with Pinterest values,” says Silbermann. “One thing that really resonated with me was she saw technology as a way to enrich people’s lives, not technology for the sake of technology.”
Under Fan, Pinterest’s visual searches–conducted on Lens, Pins, and the company’s browser extension–have increased nearly 70% year over year, with more than 300 million searches a month. Pinterest’s merchant partners, meanwhile, have seen the click-through volume on their “Shop the Look” pins–a feature that lets users click on, then purchase, items in images–double.
All of this converges into a rich business opportunity. “Originally [Pinterest] was curated by category, by people,” says Rick Heitzmann, venture capitalist from FirstMark Capital, who cut Pinterest’s cofounders their first check in 2009 and has been involved in every round of investment since. “But as technology has evolved, it’s about AI, image recognition, and finding the things you love and care about.” As Pinterest’s mission codifies toward visual search, Heitzmann is excited by the opportunities. “You can see the potential market being big.”
Pinterest already gives advertisers that want to promote their pins a compelling platform. Users are typically there on a mission: According to internal surveys, 93% of them use Pinterest to plan purchases, and 87% have bought something they discovered through it, according to a study by Milward Brown. But crucially, pinners haven’t yet made up their mind to the exact product they want. According to Pinterest, 97% of search queries don’t have a brand specified in them. Meanwhile, visual search tools make this brand discovery process even more enticing, especially when the results are tied to a user’s personal “taste graph.”
Target recently became the exclusive retail partner for Pinterest’s Lens technology. Soon, inside the retailer’s main app, you will be able to use an integrated version of Lens to, say, photograph a lamp and see a feed of visually similar ones sold by Target. “Our guests are craving ease and convenience in every shopping experience. Visual search is a huge unlock for that because it eliminates a lot of the friction in product search,” says Rick Gomez, CMO of Target. “By simply taking a picture of an item of interest, Pinterest Lens will return products that are specifically tailored to what our guest is looking for.” (Walmart, American Eagle, and Tommy Hilfiger also have visual-search features on their apps, created by the startup Slyce.) Pinterest has also partnered with Samsung to power visual search in the company’s Galaxy smartphones, and with Shopstyle, to link Lens results to shoppable pins of more than 5 million fashion accessories. Brands are starting to understand that visual search is a crucial part of their future.
The problem, of course, is that brands have limited marketing budgets. And Pinterest isn’t the only one that wants to sell them on visual AI.
In November, Pinterest debuted its latest iteration of Lens: Lens Your Look, which helps users find new ways to wear clothes they already own. Photograph something in your closet, like a pair of chunky black heels, then use text queries to search for clothing items that might go with it (like “black dresses”). Lens Your Look will serve up images of people wearing black dresses and chunky heels–maybe even the exact same brand and style as your own.
As users tap on particular images, Pinterest learns which results were essentially right, or the most right, in the larger pile, and can prioritize them next time. It’s the perfect distillation of the company’s approach to visual search: mining its outsize database, tapping into users’ tastes, and embracing imperfection.
That doesn’t mean the user flow of Lens Your Look will necessarily stick around. Sharp is the first to admit that the exact container for visual search may not even have been invented yet: Will we really use our camera phones to point-and-search, or do we need some augmented-reality headset for the concept to take off? “We’re super early,” Sharp says, “It’s like we’re at where text search was in the mid ’90s. There’s this technology, it’s interesting, but no one’s really quite dug down deep enough to know what the product is or what problems it’s going to solve.”
The question remains if Pinterest, of all companies, can be the one to crack the code. Will we ever “pin it” rather than “Google it”? The prospect sounds daunting, especially since Pinterest is going up against some of the most valuable, powerful companies in the world.
Talking about this in his makeshift “office,” Sharp seems to swallow the urge to rant. He wants to share his playbook as much as he doesn’t. Eventually, he can’t restrain himself, so he walks to the whiteboard and sketches a 2×2 grid. On the x-axis, he writes “Sharing and Searching.” On the y-axis, he writes “Text and Vision.”
Text sharing? That quadrant belongs to Facebook and Twitter. Vision sharing? Facebook, Instagram, and Snapchat. Searching text? That’s Google and Bing. But searching through vision? Sharp’s marker hangs in the air for a moment. Then he writes one company in the empty box. “Pinterest.” Drawing an x-y axis and dramatically putting your company alone in a quadrant is a classic founder move. But Sharp’s right about one thing: Vision is a rare, uncolonized space.
“You come back and yell at me in 5 to 10 years. This is the most valuable quadrant up here by a fuckin’ mile,” he continues, throwing the marker back onto the tray to punctuate his point. “That’s the premise of Pinterest.”