How Deep Learning Mimics a Human's Learning

The hottest field in Computer Science in this decade is no other than deep learning. Although deep learning dates back to the 20th century, its popularity really only boomed in the last decade that we live in. Nowadays, there is hardly any piece of technology that does not rely upon deep learning, simply due to its groundbreaking capabilities, including surpassing that of a human in several narrow tasks.

What’s interesting to analyze from these deep learning algorithms is that a lot of their methods of learning are similar to the ways humans learn. We shall also see that the way they are structured is heavily inspired by the way humans perceive things around us and how we comprehend knowledge in various domains.

Let’s take a step back and rethink why deep learning is called learning in the first place. What makes it different from the types of programs we’ve written ever since the creation of computers? Here is where the term “algorithm” in the context of deep learning separates itself from ordinary algorithms.

Deep Learning vs. Classical Programs

Generally, classical programs are divided into three rough sections: input, process, and output. This has been the way programmers like to think on a computer’s logic, which translates to the lines of code they write.

Specifically, we feed the program with whatever input we present it with, for example, a list of names, a collection of phone numbers, an array of numbers, files, etc. Then, it is up to the program to process the input according to the coder’s desire, be it sorting the list, modifying its contents, deleting several elements, and so on.

Finally, after the process has finished, the program returns the final output to which the programmer can continue working with or even feed it to a different subprogram with the final output of the previous subprogram. Given such generalization, the definition of an algorithm has long clung to the input-process-output type of flow of programs.

On the other hand, deep learning algorithms work a lot like “black boxes,” and they further modify the flow of data in the program. The way data flows in a deep learning algorithm depends on the type of learning algorithm being used.

To make things simpler, I will only be describing the flow of supervised learning. In supervised learning, you provide the algorithm, or black box, with a set of inputs and their desired outputs. Now, instead of specifying how the program should process the inputs such that they return the desired outputs, you let it learn the knobs and twists on their own, given that they slowly grasp the details which lead to the required output.

Comparison of machine learning with traditional programming. Transition Technology.

To be more detailed, say you want to train a deep learning algorithm to classify cat’s and dog’s images from each other. The input, in this case, is an image of either a cat or a dog, let’s say the first input is an image of a cat. Then, you tell the algorithm that this specific image has the label “cat.” After doing so, the algorithm is trained such that they pick up patterns that resemble a cat, and not a dog. You repeat the process until the algorithm makes as few errors as possible.

That said, as the programmer, you don’t hard-code features that resemble that of a cat, or a dog, you merely specify guidelines or steps such that the algorithm knows what it’s supposed to do. Given that difference, the word “algorithm” in the context of deep learning is different from ordinary algorithms such that the former is a lot more independent from the programmer while the latter has to be spoonfed with specific steps to solve the desired problem.

History of Deep Learning

It’s no surprise that a lot of the advanced deep learning algorithms of today works very much like how a human learns new concepts. If we try and trace back the history of deep learning, you should particularly notice that the brain is a huge inspiration behind the backbone algorithm.

In a deep learning lecture, one out of the three godfathers of artificial intelligence, Yann LeCun, presented the history of inspiration for deep learning: centralizing in the idea of mimicking how the brain works.

In particular, the history dates back to the field of cybernetics. “It started in the 1940s with McCulloch and Pitts who came up with the idea that neurons are threshold units with on and off states. You could build a Boolean circuit by connecting neurons with each other and conduct logical inference with neurons.”

Continuing its development, Donald Hebb “had the idea that neurons in the brain learn by modifying the strength of the connections between neurons. This is called hyper learning, where if two neurons are fired together, then the connection linked between them increases; if they don’t fire together, then the connection decreases.”

According to LeCun, the brain is basically a logical inference machine because neurons are binary. Neurons compute a weighted sum of inputs and compare that sum to its threshold. It turns on if it’s above the threshold and turns off if it’s below, which is a simplified view of how neural networks work.

Furthermore, LeCun gave the following remark which emphasizes the distinction between inspiration and completely emulating how the brain works – a common understanding of deep learning.

Deep learning is inspired by what people have observed about the brain, but the inspiration is just an inspiration. It’s not, the attempt is not to copy the brain because there’s a lot of details about the brain that are irrelevant, and that we don’t know if they are relevant actually to human intelligence. So the inspiration is at kind of the conceptual level.

LeCun added an analogy to this fragile distinctness between inspiration and emulation of the brain. That is, the brain is to deep learning is what birds are to planes. Airplanes are highly inspired by birds, but their details are extremely different. Their underlying principles of flight, like generating lift by propelling themselves through the air are indeed parallel, but airplanes don’t have feathers nor flap their wings.

Deep Learning and Human Learning: Similarities

Now that we’ve established the history and inspiration behind deep learning, let’s hop into the similarities that are found in deep learning algorithms and how a human learns. Do note that, like the brain, the way humans learn is a mere inspiration and is not completely identical to how the algorithm learns.

Likewise, a considerable part of the understanding of deep learning concepts are simply abstractions of how we interpret the way machines learn. As discussed here, experts are struggling to interpret how machines actually work, and thus all the plausible explanations are solely rough estimates of the actual learning algorithm in the perspective of the machine.

Convolutional Neural Network

Convolutional Neural Network is a type of neural network architecture popularly used for Computer Vision tasks, including tasks like image classification, image detection, image recognition, image segmentation, and many others.

Convolutional Neural Network combines three architectural ideas: local receptive fields, shared weights, and spatial sub-sampling. The general idea of a Convolutional Neural Network is to recognize elementary visual features of an image, like corners and edges. From there, the network gradually works up the visual feature into more advanced parts such as shapes, eyes, faces, and finally the category we would like to, say, classify.

Visualization of Convolutional Neural Networks. Analytics Vidhya.

What you should notice is that the way you, as a human, classify things or recognize things are based on an object’s or a being’s features. You know that it’s a cat because you’ve seen the features that make up a cat: a pair of eyes, ears, at times fluffy, four legs, etc. In another case, you know whether an old lady is your grandmother based on facial features which are eccentric and unique to your grandmother.

How a Convolutional Neural Network works is very much like that. Though you can almost instantly recognize an object or another person in a blink of an eye, if you try to break that “recognition,” your basis of judgment relies greatly on knowing which parts of a person’s face makes up the eye, and how the person’s eye’s color, shape, size are.

You need to “checklist” whether the known features of a person are present in the stranger you’re currently meeting. If it all checks out, then it’s very likely that you’re indeed meeting the person you know.

Interestingly, it’s also very common to say that a stranger looks like a person you know. This phenomenon likely happens due to similar, but not identical, facial features of a known person, which drives you to say, “hey that person looks like x!”

Generative Adversarial Network

Advancing with the concept of computer vision, there is another algorithm that is inter-related to Convolutional Neural Networks. Now that we know how a machine recognizes objects or images, we can work further into a more advanced task of generating images – and this is where Generative Adversarial Network comes into play.

Stemming from its name, a Generative Adversarial Network is trained to generate multiple media ranging from images, videos, audio, etc. Moreover, the generated output also depends on what type of task you’d like to solve.

For instance, Generative Adversarial Networks could be trained to generate realistic images out of “thin air,” or noises, to be exact. The algorithm is trained to draw a custom image, such that it can be categorized as the desired label.

To be more concrete, say the Generative Adversarial Network is trained to generate images of a cat. How the network goes about doing so is assisted by the presence of two networks: a critic (discriminator) and an artist (generator).

The critic is usually a Convolutional Neural Network that can properly classify cats and non-cat images. On the flip side, the artist is usually a generative network, which may rely on convolutional techniques as well, that produces a generated fake cat image.

Note that the word “fake” here is used as there isn’t any same cat in the dataset as the one being generated. The term “real” is used to differentiate the actual, real images in the dataset.

Deep Convolutional Generative Adversarial Network. TensorFlow.

The way a Generative Adversarial Network works is that the critic and the artist constantly compete with each other. On one hand, the artist has to draw a more realistic output such that it can fool the critic, whereas the critic shall continuously be trained to distinguish real from fake images. If all goes well, then hopefully the artist is good enough that even its custom drawings are recognized as “real images” by an expert critic.

If you think about it, children normally don’t draw realistic images at the start, perhaps starting with a basic generic stickman at first. Gradually, they continue to draw even more realistic images of a person, to which it can no longer be classified as a drawing or an actual photograph. Furthermore, if there isn’t a competition between the artist and the critic, that is, the critic doesn’t familiarize itself with real photographs, a simple unrealistic drawing can easily fool them.

Recurrent Neural Network

Out of all the deep learning algorithms presented here, I think Recurrent Neural Networks deserve the greatest attention. As its name suggests, Recurrent Neural Networks pass recurrent information over time, behaving in a temporal dynamic sequence.

Because of how they are structured, Recurrent Neural Networks are a great choice for sequential, or connected tasks like speech recognition, generating text, translating text, and many others. Like other networks presented earlier, there are tons of flavors of Recurrent Neural Networks, each with a specific structure and use, but has the same general idea of being recurrent.

Recurrent Neural Network Language Modelling. PyTorch.

What’s interesting with Recurrent Neural Network is that it models the way we often process sequential data, like text for instance. In, say, saying a sentence, the words we select are interrelated such that they convey the intended message. If one word is taken out of the sentence, they may be understood differently or even produce a different overall meaning than the one intended.

Recurrent Neural Networks ensure that each “piece of information” is carried over time since they are dependent on one another, be it in words, audio, or video. If we don’t take into account all the words we say previously, then our sentence might as well be in gibberish where no one understands our point.

Furthermore, there is this technique called transfer learning that is applicable across deep learning algorithms. However, Recurrent Neural Networks in particular can utilize this capability called transfer learning. Transfer learning, in general, is the idea of having a pre-trained deep learning algorithm, where your black box is already familiar with the general concepts before specializing in a narrow task.

For the case of language modelling, your pre-trained model has prior knowledge of general and widely used words before diving into a narrow task like translation, sentiment analysis, etc. More abstractly, the deep learning algorithm has been trained with the language you want it to “speak,” before it does a more concrete task. That way, it won’t be utterly surprised or confused with the choices of the word when it comes to a more narrow task.

In a lecture by fast.ai on Natural Language Processing with Deep Learning, they showed how transfer learning proves to provide better and more accurate results compared to deep learning algorithms trained from complete scratch. Particularly, by firstly training the black box to generate a sequence of words, the task of sentiment analysis becomes easier for the algorithm to adapt to.

If you pause and think about it, it’s sort of like how we as humans learn a language: before we’re able to deduce what a person’s intent of speech is, we better familiarize ourselves with the meaning of the vocabularies and be able to construct our own sentence. This can certainly be generalized to other narrow tasks like translation.

Likewise, if you want to translate an English sentence to French, then you better be familiar with both English and French words first, before informing someone about its translation. In this manner, I can safely say that Natural Language Processing with the aid of deep learning resembles various methods we also use as humans when it comes to understanding languages.

In addition, there’s this special technique called Attention in deep learning, proven to also improve results. Though we won’t get into its details, the general idea is that some words carry greater significance to the sentence, to which you need to pay more attention – hence the name. Indeed, it is a reminiscence of how at times a particular word or group of words are more important than the other words in a sentence and the rest of the words have to build upon those particular keywords.

Closing Remarks

Certainly, other types of deep learning architectures also take inspiration from the real-life approach just how a human would tackle the same problem. Nonetheless, the idea is clear: many deep learning algorithms are extensions or rough imitations of the way humans think.

Although we might someday be outperformed by our very own creations, realize that many of their backbone implementations are heavily inspired by ourselves and the uniqueness in the way we think and process information. In the meantime, enjoy being human.

Featured Image by James Bareham / The Verge.