computer vision isNow it is part of everyday life. Facebook recognizes facesPhotos you post on popular social media. The Google Photos app can find hidden images in your collection,Identify everything from dogs to birthday parties and tombstones.twitter canIdentify pornographic imagesWithout the help of a human healer.
All this "seeing" comes from a very effective artificial intelligence calleddeep learningBut when it comes to this much-vaunted technology of recent years, a new Microsoft research experiment suggests it's just getting started. Deep learning can go even deeper.
This machine vision revolution has been around for a long time. A key turning point occurred in 2012, when artificial intelligence researchers at the University of Toronto won the so-calledImageNetImageNet pits machines against each other in an image recognition contest: which computer can most accurately identify a cat, a car, or a cloud? — That year, a team from Toronto, including researcher Alex Krizhevsky and professor Geoff Hinton, won the competition using deep neural networks, a technique that learns to recognize images by examining a large number of images, rather than relying on industriousness human to code by hand. recognition rules image.
Toronto's victory provides a roadmap for the future of deep learning. In the years since, the biggest names on the web, including Facebook, Google, Twitter and Microsoft, have used similar techniques to build computer vision systems that can match or even surpass humans. "We can't claim that our system 'sees' like a human," said Microsoft research director Peter Lee. "But what we can say is that we can learn just as well as humans for very specific and narrowly defined tasks."
Generally speaking, neural networks use hardware and software to mimic the network of neurons in the human brain. The idea dates back to the 1980s, but in 2012 Križhevsky and Hinton advanced the technology by running their neural network on a graphics processing unit, or GPU. These specialized chips were originally designed to render images for games and other highly graphical software, but have also proven suitable for the math that drives neural networks. Google, Facebook, Twitter, Microsoft, and many others now use GPU-powered AI for image recognition and many other tasks, from web browsing to security. Krizhevsky and Hinton joined Google.
ImageNet's latest winner points to another possible step forward in the development of machine vision and artificial intelligence more broadly. Last month, a group of Microsoft researchers took the ImageNet crown by using a new method they call deep residual networks. The name doesn't really describe it. They designed a neural network that was much more complex than typical designs, one that spanned 152 layers of math operations, compared to the typical six or seven layers. It suggests that in the coming years, companies like Microsoft will be able to use a wide variety of GPUs and other specialized chips to dramatically improve image recognition and other artificial intelligence services, including systems that recognize speech and even understand natural human language.
In other words, deep learning is far from reaching its potential. "We're digging into a huge design space," Lee said, "trying to figure out where to go next."
Deep neural networks are organized in layers. Each layer is a different set of mathematical operations, that is, an algorithm. The output of one layer becomes the input of the next layer. Generally speaking, if a neural network is designed to recognize an image, one layer will look for a certain set of features in the image (edges, corners, shape or texture, etc.) and the next layer will look for another set of features. . These layers make these neural networks deeper. "In general, if you make these networks deeper, they're easier to learn," said Alex Berg, a researcher at the University of North Carolina who helped oversee the ImageNet competition.
Today, atypical neural networks have six or seven layers. Some may expand to 20 or even 30. But a Microsoft team led by researcher Jian Sun just expanded it to 152. Essentially, this neural network is better at image recognition because it can examine more features. "There isA lot ofThere are more subtleties that can be learned," Lee said.
In the past, such deep neural networks were not feasible, according to Lee and researchers outside of Microsoft. Part of the problem is that your math signal moves from layer to layer, gets watered down, and tends to fade. As Lee explained, Microsoft solved this problem by building a neural network that skipped certain layers when they weren't needed, but used them when they were. "When you do this kind of hopping, you can keep the signal strength even longer," Lee said, "and it's been shown to have a huge, beneficial effect on accuracy."
This is very different from previous systems, Berg said, and he thinks other companies and researchers will follow suit.
Another problem is that building such giant neural networks is very difficult. Deciding on a particular set of algorithms—determining how each layer should behave and how it should communicate with the next layer—is almost an epic task. But Microsoft has tricks there too. He designs a computer system that canhelp build these networks.
As Jian Sun explains, researchers can identify a promising design for a large neural network, and the system can then go through a series of similar possibilities until it identifies the best one. "In most cases, after a few tries, the researcher learns something, reflects, and makes a new decision on the next try," he said. "You can think of it as a 'human-assisted search'."
According to Adam Gibson, Principal Investigator at Deep Learning Startuptianxin— Things like this are becoming more common. This is called "hyperparameter optimization". "People can run a bunch [of machines], run 10 models at the same time, figure out which model works best, and use it," Gibson said. "I can guess some basic parameters and the machine will figure out what the best solution is." As Gibson said, tweeting last yearBought the company Whetlab, which provides a similar way to "optimize" neural networks.
This approach is not exactly a "brute force" problem, as described by Peter Lee and Jian Sun. "With very large computing resources, one could imagine a giant 'natural selection' environment where evolutionary forces help guide a brutal search across a vast space of possibilities," Lee said. "There are no computing resources in the world that can be used for this kind of thing... for now, we will continue to rely on really smart researchers like Jian."
But Lee said the field of opportunity for deep learning is wide, thanks to new technologies and computer data centers full of GPU machines. A large part of the business task is simply finding the time and computing power to explore those possibilities. "This work greatly expands the design space. In terms of scientific research, the amount of ground to be covered grows exponentially," Li said. This extends well beyond image recognition to speech recognition, natural language understanding, and other tasks.
As Lee explains, this is where Microsoft tries to not only improve the capabilities of its GPU clusters, but alsoExplore using other specialized processors, including FPGAs, chips that can be programmed for specific tasks, such as deep learning. "Our researchers have also seen an explosion in demand for more experimental hardware platforms," he said. The work sends ripples through the broader world of technology and artificial intelligence. Last summer, in its biggest acquisition to date, Intel agreed, with the exception of Altera, which specializes in FPGAs.
In fact, Gibson says that deep learning has become more of a "hardware problem." Yes, we still need top-notch researchers to guide the creation of neural networks, but finding new ways is increasingly a matter of applying new algorithms to increasingly powerful collections of hardware. As Gibson points out, while these deep neural networks work extremely well, we don't really know why they do so. The trick is to find the most efficient complex combination of algorithms. More and better hardware can shorten the journey.
The bottom line is that the company that can build the most powerful hardware network will win. These are Google, Facebook and Microsoft. Those who are good at deep learning today will only get better.