Deep learning, a chip off the old block

How do you help the novice understand the technology that is deep learning? To do this, I will need to discuss linear algebra, statistics, probability theory and multivariate calculus. Only joking! Nothing would turn the novice readers off like trying to hack our way through those complex disciplines.

For myself, the more I read about deep learning, the more I realized that the discipline of using a deep learning model bore a similarity to sculpting. Let me expand — for me, this quote by Elbert Hubbard clearly describes the methods of deep learning: “The sculptor produces the beautiful statue by chipping away such parts of the marble block as are not needed — it is a process of elimination.”

Indeed, when Michelangelo was asked about sculpting, he said “I saw the angel in the marble and carved until I set him free.” Michelangelo’s minimalist explanation encapsulates in its simplest form what the deep learning progression involves. The engineer is the sculptor. The marble block represents the huge block of dense data to be processed. The act of processing the data is the chipping away of unwanted information by neural networks. The act of fine tuning the deep learning neural engine represents the technique of the sculptor carefully finessing the shape of the emerging form into a recognizable figure. The angel lies within the marble block; it is simply a matter of releasing it.

In both the role of sculptor and engineer, there is a vision of what the “fine-tuning” activity should produce. I am confident that if you as a novice accept this simple analogy you will go some distance toward grasping the fundamentals of the deep learning process.

 As a concept, deep learning is less than two decades old. The origin of the expression is attributed to Igor Aizenberg, professor and chair of the Department of Computer Science at Manhattan College, New York. Aizenberg studies, among other things, complex-valued neural networks. He came up with the concept of an artificial neural network system based on that of the human neural network — the network of the human brain.

The “deep” element of the concept refers to a multi-layered processing network of neuron filters. The equivalent process in the human brain is that of information flowing through neurons connected by synapses. In the machine equivalent, artificial neurons are used to fine-tune and refine data as it is passed through the engine. The process of deep learning also learns from experience and can adjust its processes accordingly. In sculpting, it is the equivalent of the experienced sculptor chipping and refining the marble to release Michelangelo’s hidden angel.

Jeff Dean, a senior fellow at Google’s System and Information Group — the group behind many of Google’s highly sophisticated machine learning technologies — said: “When you hear the term deep learning just think of a large deep neural net. Deep refers to the number of layers typically and so this is kind of the popular term that’s been adopted in the press.”

For many novices there is a confusion around the terms machine learning, AI and deep learning. There need not be this confusion, as the division is quite simple. Artificial intelligence is the catch-all term to cover machine learning and deep learning. Machine learning is an over-arching term for the training of computers, using algorithms, to parse data, learn from it and make informed decisions based on the accrued learning. Examples of machine learning in action is Netflix showing you what you might want to watch next. Or Amazon suggesting books you might want to buy. These suggestions are the outcome of using machine learning technology to monitor and build a preferences profile based on your buying patterns.

Deep learning is a subset of machine learning. It uses a highly sophisticated, multilayered pattern of “neurons” to process huge chunks of data looking to refine the information contained within that data. It takes an abstract jungle of information, as is contained with data, and refines these into clearly understood concepts. The data used can be clean, or not clean. Clean data is the processing of refining the preprocessed information to remove any clearly irrelevant information. Clean data can be processed quicker than data that has not been cleaned. Think of it as the human brain blocking out extraneous information as it processes what is relevant, and discards what is irrelevant. Something the human brain does every minute of every day.

But why has deep learning suddenly taken off so spectacularly? It is because of the ability to train artificial neural networks (ANN) to a level of accuracy when trained with huge amounts of data — as in the case with neural machine translation. ANN can synthesize complex nonlinear processes with a high degree of accuracy. Deep learning is also becoming predominant because of the following boosters:

• The emergence of big data

• The increase in computational power

• The emergence of the cloud

• The affordable availability of graphics processing units and tensor processing units

• The development of deep learning models using open source code

Today it is estimated that big data provides 2.5 quintillion bytes of information per day. Now, if you are like me, you will never have heard of the measure quintillion. Well, it is one million billion. Not that that helps give it finer focus!

According to IBM, “90% of the data in the world today has been created in the last two years. This data comes from everywhere: sensors used to gather shopper information, posts to social media sites, digital pictures and videos, purchase transaction and cell phone GPS signals to name a few. This data is big data.”

It is safe to say that the amount of data available will only increase over the coming years. Institutions like the EU, UN, the World Bank, the World Health Organization and social media companies make huge volumes of data available daily, and in multilingual form. The importance of this resource of massive data is underlined by Andrew Ng, chief scientist at Baidu, China’s major search engine, who said: “The analogy to deep learning is that the rocket engine is the deep learning models and the fuel is the huge amounts of data we can feed to these algorithms.”

The advent of cloud computing has allowed even small companies to have virtually unlimited storage space, and access to fantastically powerful computational power. Processors of the power of tensor processing unit (TPU) are available via cloud computing. Some examples of cloud computing sources would be Amazon’s Web Service, IBM’s SmartCloud or Google’s Cloud.

TPUs were developed by Google to specifically deal with the demands of ANN. Previously, graphics processing units reduced the machine learning process from weeks to hours. Without this level of computing power, it is unlikely deep learning would be a viable technology.

Finally, Intel is selling a device called a Neural Compute Stick, which they claim will allow companies to bypass the cloud to do their processing at a local level (a non-cloud level). This will be a boost to those companies that balk at the security implications of processing data in a remote location. It will also increase the speed of processing as all the crunching will be done at the local level. Intel says it is their intent to make deep learning work “everywhere and on every device.” If they succeed, deep learning will expand to a huge degree. Interesting times lie ahead for AI.