Simple RNN

Simple RNN: the first foothold for understanding LSTM

*In this article “Densely Connected Layers” is written as “DCL,” and “Convolutional Neural Network” as “CNN.”

In the last article, I mentioned “When it comes to the structure of RNN, many study materials try to avoid showing that RNNs are also connections of neurons, as well as DCL or CNN.” Even if you manage to understand DCL and CNN, you can be suddenly left behind once you try to understand RNN because it looks like a different field. In the second section of this article, I am going to provide a some helps for more abstract understandings of DCL/CNN , which you need when you read most other study materials.

My explanation on this simple RNN is based on a chapter in a textbook published by Massachusetts Institute of Technology, which is also recommended in some deep learning courses of Stanford University.

First of all, you should keep it in mind that simple RNN are not useful in many cases, mainly because of vanishing/exploding gradient problem, which I am going to explain in the next article. LSTM is one major type of RNN used for tackling those problems. But without clear understanding forward/back propagation of RNN, I think many people would get stuck when they try to understand how LSTM works, especially during its back propagation stage. If you have tried climbing the mountain of understanding LSTM, but found yourself having to retreat back to the foot, I suggest that you read through this article on simple RNNs. It should help you to gain a solid foothold, and you would be ready for trying to climb the mountain again.

*This article is the second article of “A gentle introduction to the tiresome part of understanding RNN.”

1, A brief review on back propagation of DCL.

Simple RNNs are straightforward applications of DCL, but if you do not even have any ideas on DCL forward/back propagation, you will not be able to understand this article. If you more or less understand how back propagation of DCL works, you can skip this first section.

Deep learning is a part of machine learning. And most importantly, whether it is classical machine learning or deep learning, adjusting parameters is what machine learning is all about. Parameters mean elements of functions except for variants. For example when you get a very simple function f(x)=a + bx + cx^2 + dx^3, then x is a variant, and a, b, c, d are parameters. In case of classical machine learning algorithms, the number of those parameters are very limited because they were originally designed manually. Such functions for classical machine learning is useful for features found by humans, after trial and errors(feature engineering is a field of finding such effective features, manually). You adjust those parameters based on how different the outputs(estimated outcome of classification/regression) are from supervising vectors(the data prepared to show ideal answers).

In the last article I said neural networks are just mappings, whose inputs are vectors, matrices, or sequence data. In case of DCLs, inputs are vectors. Then what’s the number of parameters ? The answer depends on the the number of neurons and layers. In the example of DCL at the right side, the number of the connections of the neurons is the number of parameters(Would you like to try to count them? At least I would say “No.”). Unlike classical machine learning you no longer need to do feature engineering, but instead you need to design networks effective for each task and adjust a lot of parameters.

*I think the hype of AI comes from the fact that neural networks find features automatically. But the reality is difficulty of feature engineering was just replaced by difficulty of designing proper neural networks.

It is easy to imagine that you need an efficient way to adjust those parameters, and the method is called back propagation (or just backprop). As long as it is about DCL backprop, you can find a lot of well-made study materials on that, so I am not going to cover that topic precisely in this article series. Simply putting, during back propagation, in order to adjust parameters of a layer you need errors in the next layer. And in order calculate the errors of the next layer, you need errors in the next next layer.

*You should not think too much about what the “errors” exactly mean. Such “errors” are defined in this context, and you will see why you need them if you actually write down all the mathematical equations behind backprops of DCL.

The red arrows in the figure shows how errors of all the neurons in a layer propagate backward to a neuron in last layer. The figure shows only some sets of such errors propagating backward, but in practice you have to think about all the combinations of such red arrows in the whole back propagation(this link would give you some ideas on how DCLs work).

These points are minimum prerequisites for continuing reading this  RNN this article. But if you are planning to understand RNN forward/back propagation at  an abstract/mathematical level that you can read academic papers,  I highly recommend you to actually write down all the equations of DCL backprop. And if possible you should try to implement backprop of three-layer DCL.

2, Forward propagation of simple RNN

In fact the simple RNN which we are going to look at in this article has only three layers. From now on imagine that inputs of RNN come from the bottom and outputs go up. But RNNs have to keep information of earlier times steps during upcoming several time steps because as I mentioned in the last article RNNs are used for sequence data, the order of whose elements is important. In order to do that, information of the neurons in the middle layer of RNN propagate forward to the middle layer itself. Therefore in one time step of forward propagation of RNN, the input at the time step propagates forward as normal DCL, and the RNN gives out an output at the time step. And information of one neuron in the middle layer propagate forward to the other neurons like yellow arrows in the figure. And the information in the next neuron propagate forward to the other neurons, and this process is repeated. This is called recurrent connections of RNN.

*To be exact we are just looking at a type of recurrent connections. For example Elman RNNs have simpler recurrent connections. And recurrent connections of LSTM are more complicated.

Whether it is a simple one or not, basically RNN repeats this process of getting an input at every time step, giving out an output, and making recurrent connections to the RNN itself. But you need to keep the values of activated neurons at every time step, so virtually you need to consider the same RNNs duplicated for several time steps like the figure below. This is the idea of unfolding RNN. Depending on contexts, the whole unfolded DCLs with recurrent connections is also called an RNN.

In many situations, RNNs are simplified as below. If you have read through this article until this point, I bet you gained some better understanding of RNNs, so you should little by little get used to this more abstract, blackboxed  way of showing RNN.

You have seen that you can unfold an RNN, per time step. From now on I am going to show the simple RNN in a simpler way,  based on the MIT textbook which I recomment. The figure below shows how RNN propagate forward during two time steps (t-1), (t).

The input \boldsymbol{x}^{(t-1)}at time step(t-1) propagate forward as a normal DCL, and gives out the output \hat{\boldsymbol{y}} ^{(t)} (The notation on the \boldsymbol{y} ^{(t)} is called “hat,” and it means that the value is an estimated value. Whatever machine learning tasks you work on, the outputs of the functions are just estimations of ideal outcomes. You need to adjust parameters for better estimations. You should always be careful whether it is an actual value or an estimated value in the context of machine learning or statistics). But the most important parts are the middle layers.

*To be exact I should have drawn the middle layers as connections of two layers of neurons like the figure at the right side. But I made my figure closer to the chart in the MIT textbook, and also most other study materials show the combinations of the two neurons before/after activation as one neuron.

\boldsymbol{a}^{(t)} is just linear summations of \boldsymbol{x}^{(t)} (If you do not know what “linear summations” mean, please scroll this page a bit), and \boldsymbol{h}^{(t)} is a combination of activated values of \boldsymbol{a}^{(t)} and linear summations of \boldsymbol{h}^{(t-1)} from the last time step, with recurrent connections. The values of \boldsymbol{h}^{(t)} propagate forward in two ways. One is normal DCL forward propagation to \hat{\boldsymbol{y}} ^{(t)} and \boldsymbol{o}^{(t)}, and the other is recurrent connections to \boldsymbol{h}^{(t+1)} .

These are equations for each step of forward propagation.

  • \boldsymbol{a}^{(t)} = \boldsymbol{b} + \boldsymbol{W} \cdot \boldsymbol{h}^{(t-1)} + \boldsymbol{U} \cdot \boldsymbol{x}^{(t)}
  • \boldsymbol{h}^{(t)}= g(\boldsymbol{a}^{(t)})
  • \boldsymbol{o}^{(t)} = \boldsymbol{c} + \boldsymbol{V} \cdot \boldsymbol{h}^{(t)}
  • \hat{\boldsymbol{y}} ^{(t)} = f(\boldsymbol{o}^{(t)})

*Please forgive me for adding some mathematical equations on this article even though I pledged not to in the first article. You can skip the them, but for some people it is on the contrary more confusing if there are no equations. In case you are allergic to mathematics, I prescribed some treatments below.

*Linear summation is a type of weighted summation of some elements. Concretely, when you have a vector \boldsymbol{x}=(x_0, x_1, x_2), and weights \boldsymbol{w}=(w_0,w_1, w_2), then \boldsymbol{w}^T \cdot \boldsymbol{x} = w_0 \cdot x_0 + w_1 \cdot x_1 +w_2 \cdot x_2 is a linear summation of \boldsymbol{x}, and its weights are \boldsymbol{w}.

*When you see a product of a matrix and a vector, for example a product of \boldsymbol{W} and \boldsymbol{v}, you should clearly make an image of connections between two layers of a neural network. You can also say each element of \boldsymbol{u}} is a linear summations all the elements of \boldsymbol{v}} , and \boldsymbol{W} gives the weights for the summations.

A very important point is that you share the same parameters, in this case \boldsymbol{\theta \in \{\boldsymbol{U}, \boldsymbol{W}, \boldsymbol{b}, \boldsymbol{V}, \boldsymbol{c} \}}, at every time step. 

And you are likely to see this RNN in this blackboxed form.

3, The steps of back propagation of simple RNN

In the last article, I said “I have to say backprop of RNN, especially LSTM (a useful and mainstream type or RNN), is a monster of chain rules.” I did my best to make my PowerPoint on LSTM backprop straightforward. But looking at it again, the LSTM backprop part still looks like an electronic circuit, and it requires some patience from you to understand it. If you want to understand LSTM at a more mathematical level, understanding the flow of simple RNN backprop is indispensable, so I would like you to be patient while understanding this step (and you have to be even more patient while understanding LSTM backprop).

This might be a matter of my literacy, but explanations on RNN backprop are very frustrating for me in the points below.

  • Most explanations just show how to calculate gradients at each time step.
  • Most study materials are visually very poor.
  • Most explanations just emphasize that “errors are back propagating through time,” using tons of arrows, but they lack concrete instructions on how actually you renew parameters with those errors.

If you can relate to the feelings I mentioned above, the instructions from now on could somewhat help you. And I am going to share some study materials on simple RNNs in an external link so that you can gain a clear and mathematical understanding on how simple RNNs work.

Backprop of RNN , as long as you are thinking about simple RNNs, is not so different from that of DCLs. But you have to be careful about the meaning of errors in the context of RNN backprop. Back propagation through time (BPTT) is one of the major methods for RNN backprop, and I am sure most textbooks explain BPTT. But most study materials just emphasize that you need errors from all the time steps, and I think that is very misleading and confusing.

You need all the gradients to adjust parameters, but you do not necessarily need all the errors to calculate those gradients. Gradients in the context of machine learning mean partial derivatives of error functions (in this case J) with respect to certain parameters, and mathematically a gradient of J with respect to \boldsymbol{\theta \in \{\boldsymbol{U}, \boldsymbol{W}, \boldsymbol{b}^{(t)}, \boldsymbol{V}, \boldsymbol{c} \}}is denoted as ( \frac{\partial J}{\partial \boldsymbol{\theta}}  ). And another confusing point in many textbooks, including the MIT one, is that they give an impression that parameters depend on time steps. For example some study materials use notations like \frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}}, and I think this gives an impression that this is a gradient with respect to the parameters at time step (t). In my opinion this gradient rather should be written as ( \frac{\partial J}{\partial \boldsymbol{\theta}} )^{(t)} . But many study materials denote gradients of those errors in the former way, so from now on let me use the notations which you can see in the figures in this article.

In order to calculate the gradient \frac{\partial J}{\partial \boldsymbol{x}^{(t)}} you need errors from time steps s (s \geq t) \quad (as you can see in the figure, in order to calculate a gradient in a colored frame, you need all the errors in the same color).

*Another confusing point is that the \frac{\partial J}{\partial \boldsymbol{\ast ^{(t)}}}, \boldsymbol{\ast} \in \{\boldsymbol{a}^{(t)}, \boldsymbol{h}^{(t)}, \boldsymbol{o}^{(t)}, \dots \} are correct notations, because \boldsymbol{\ast} are values of neurons after forward propagation. They depend on time steps, and these are very values which I have been calling “errors.” That is why parameters do not depend on time steps, whereas errors depend on time steps.

As I mentioned before, you share the same parameters at every time step. Again, please do not assume that parameters are different from time step to time step. It is gradients/errors (you need errors to calculate gradients) which depend on time step. And after calculating errors at every time step, you can finally adjust parameters one time, and that’s why this is called “back propagation through time.” (It is easy to imagine that this method can be very inefficient. If the input is the whole text on a Wikipedia link, you need to input all the sentences in the Wikipedia text to renew parameters one time. To solve this problem there is a backprop method named “truncated BPTT,” with which you renew parameters based on a part of a text. )

And after calculating those gradients \frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}} you can take a summation of them: \frac{\partial J}{\partial \boldsymbol{\theta}}=\sum_{t=0}^{t=\tau}{\frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}}}. With this gradient \frac{\partial J}{\partial \boldsymbol{\theta}} , you can finally renew the value of \boldsymbol{\theta} one time.

At the beginning of this article I mentioned that simple RNNs are no longer for practical uses, and that comes from exploding/vanishing problem of RNN. This problem was one of the reasons for the AI winter which lasted for some 20 years. In the next article I am going to write about LSTM, a fancier type of RNN, in the context of a history of neural network history.

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

Simple RNN

Prerequisites for understanding RNN at a more mathematical level

Writing the A gentle introduction to the tiresome part of understanding RNN Article Series on recurrent neural network (RNN) is nothing like a creative or ingenious idea. It is quite an ordinary topic. But still I am going to write my own new article on this ordinary topic because I have been frustrated by lack of sufficient explanations on RNN for slow learners like me.

I think many of readers of articles on this website at least know that RNN is a type of neural network used for AI tasks, such as time series prediction, machine translation, and voice recognition. But if you do not understand how RNNs work, especially during its back propagation, this blog series is for you.

After reading this articles series, I think you will be able to understand RNN in more mathematical and abstract ways. But in case some of the readers are allergic or intolerant to mathematics, I tried to use as little mathematics as possible.

Ideal prerequisite knowledge:

  • Some understanding on densely connected layers (or fully connected layers, multilayer perception) and how their forward/back propagation work.
  •  Some understanding on structure of Convolutional Neural Network.

*In this article “Densely Connected Layers” is written as “DCL,” and “Convolutional Neural Network” as “CNN.”

1, Difficulty of Understanding RNN

I bet a part of difficulty of understanding RNN comes from the variety of its structures. If you search “recurrent neural network” on Google Image or something, you will see what I mean. But that cannot be helped because RNN enables a variety of tasks.

Another major difficulty of understanding RNN is understanding its back propagation algorithm. I think some of you found it hard to understand chain rules in calculating back propagation of densely connected layers, where you have to make the most of linear algebra. And I have to say backprop of RNN, especially LSTM, is a monster of chain rules. I am planing to upload not only a blog post on RNN backprop, but also a presentation slides with animations to make it more understandable, in some external links.

In order to avoid such confusions, I am going to introduce a very simplified type of RNN, which I call a “simple RNN.” The RNN displayed as the head image of this article is a simple RNN.

2, How Neurons are Connected

    \begin{equation*}   1 = 3 - 2 \end{equation*}

How to connect neurons and how to activate them is what neural networks are all about. Structures of those neurons are easy to grasp as long as that is about DCL or CNN. But when it comes to the structure of RNN, many study materials try to avoid showing that RNNs are also connections of neurons, as well as DCL or CNN(*If you are not sure how neurons are connected in CNN, this link should be helpful. Draw a random digit in the square at the corner.). In fact the structure of RNN is also the same, and as long as it is a simple RNN, and it is not hard to visualize its structure.

Even though RNN is also connections of neurons, usually most RNN charts are simplified, using blackboxes. In case of simple RNN, most study material would display it as the chart below.

But that also cannot be helped because fancier RNN have more complicated connections of neurons, and there are no longer advantages of displaying RNN as connections of neurons, and you would need to understand RNN in more abstract way, I mean, as you see in most of textbooks.

I am going to explain details of simple RNN in the next article of this series.

3, Neural Networks as Mappings

If you still think that neural networks are something like magical spider webs or models of brain tissues, forget that. They are just ordinary mappings.

If you have been allergic to mathematics in your life, you might have never heard of the word “mapping.” If so, at least please keep it in mind that the equation y=f(x), which most people would have seen in compulsory education, is a part of mapping. If you get a value x, you get a value y corresponding to the x.

But in case of deep learning, x is a vector or a tensor, and it is denoted with \boldsymbol{x} . If you have never studied linear algebra , imagine that a vector is a column of Excel data (only one column), a matrix is a sheet of Excel data (with some rows and columns), and a tensor is some sheets of Excel data (each sheet does not necessarily contain only one column.)

CNNs are mainly used for image processing, so their inputs are usually image data. Image data are in many cases (3, hight, width) tensors because usually an image has red, blue, green channels, and the image in each channel can be expressed as a hight*width matrix (the “height” and the “width” are number of pixels, so they are discrete numbers).

The convolutional part of CNN (which I call “feature extraction part”) maps the tensors to a vector, and the last part is usually DCL, which works as classifier/regressor. At the end of the feature extraction part, you get a vector. I call it a “semantic vector” because the vector has information of “meaning” of the input image. In this link you can see maps of pictures plotted depending on the semantic vector. You can see that even if the pictures are not necessarily close pixelwise, they are close in terms of the “meanings” of the images.

In the example of a dog/cat classifier introduced by François Chollet, the developer of Keras, the CNN maps (3, 150, 150) tensors to 2-dimensional vectors, (1, 0) or (0, 1) for (dog, cat).

Wrapping up the points above, at least you should keep two points in mind: first, DCL is a classifier or a regressor, and CNN is a feature extractor used for image processing. And another important thing is, feature extraction parts of CNNs map images to vectors which are more related to the “meaning” of the image.

Importantly, I would like you to understand RNN this way. An RNN is also just a mapping.

*I recommend you to at least take a look at the beautiful pictures in this link. These pictures give you some insight into how CNN perceive images.

4, Problems of DCL and CNN, and needs for RNN

Taking an example of RNN task should be helpful for this topic. Probably machine translation is the most famous application of RNN, and it is also a good example of showing why DCL and CNN are not proper for some tasks. Its algorithms is out of the scope of this article series, but it would give you a good insight of some features of RNN. I prepared three sentences in German, English, and Japanese, which have the same meaning. Assume that each sentence is divided into some parts as shown below and that each vector corresponds to each part. In machine translation we want to convert a set of the vectors into another set of vectors.

Then let’s see why DCL and CNN are not proper for such task.

  • The input size is fixed: In case of the dog/cat classifier I have mentioned, even though the sizes of the input images varies, they were first molded into (3, 150, 150) tensors. But in machine translation, usually the length of the input is supposed to be flexible.
  • The order of inputs does not mater: In case of the dog/cat classifier the last section, even if the input is “cat,” “cat,” “dog” or “dog,” “cat,” “cat” there’s no difference. And in case of DCL, the network is symmetric, so even if you shuffle inputs, as long as you shuffle all of the input data in the same way, the DCL give out the same outcome . And if you have learned at least one foreign language, it is easy to imagine that the orders of vectors in sequence data matter in machine translation.

*It is said English language has phrase structure grammar, on the other hand Japanese language has dependency grammar. In English, the orders of words are important, but in Japanese as long as the particles and conjugations are correct, the orders of words are very flexible. In my impression, German grammar is between them. As long as you put the verb at the second position and the cases of the words are correct, the orders are also relatively flexible.

5, Sequence Data

We can say DCL and CNN are not useful when you want to process sequence data. Sequence data are a type of data which are lists of vectors. And importantly, the orders of the vectors matter. The number of vectors in sequence data is usually called time steps. A simple example of sequence data is meteorological data measured at a spot every ten minutes, for instance temperature, air pressure, wind velocity, humidity. In this case the data is recorded as 4-dimensional vector every ten minutes.

But this “time step” does not necessarily mean “time.” In case of natural language processing (including machine translation), which you I mentioned in the last section, the numberings of each vector denoting each part of sentences are “time steps.”

And RNNs are mappings from a sequence data to another sequence data.

*At least I found a paper on the RNN’s capability of universal approximation on many-to-one RNN task. But I have not found any papers on universal approximation of many-to-many RNN tasks. Please let me know if you find any clue on whether such approximation is possible. I am desperate to know that. 

6, Types of RNN Tasks

RNN tasks can be classified into some types depending on the lengths of input/output sequences (the “length” means the times steps of input/output sequence data).

If you want to predict the temperature in 24 hours, based on several time series data points in the last 96 hours, the task is many-to-one. If you sample data every ten minutes, the input size is 96*6=574 (the input data is a list of 574 vectors), and the output size is 1 (which is a value of temperature). Another example of many-to-one task is sentiment classification. If you want to judge whether a post on SNS is positive or negative, the input size is very flexible (the length of the post varies.) But the output size is one, which is (1, 0) or (0, 1), which denotes (positive, negative).

*The charts in this section are simplified model of RNN used for each task. Please keep it in mind that they are not 100% correct, but I tried to make them as exact as possible compared to those in other study materials.

Music/text generation can be one-to-many tasks. If you give the first sound/word you can generate a phrase.

Next, let’s look at many-to-many tasks. Machine translation and voice recognition are likely to be major examples of many-to-many tasks, but here name entity recognition seems to be a proper choice. Name entity recognition is task of finding proper noun in a sentence . For example if you got two sentences “He said, ‘Teddy bears on sale!’ ” and ‘He said, “Teddy Roosevelt was a great president!” ‘ judging whether the “Teddy” is a proper noun or a normal noun is name entity recognition.

Machine translation and voice recognition, which are more popular, are also many-to-many tasks, but they use more sophisticated models. In case of machine translation, the inputs are sentences in the original language, and the outputs are sentences in another language. When it comes to voice recognition, the input is data of air pressure at several time steps, and the output is the recognized word or sentence. Again, these are out of the scope of this article but I would like to introduce the models briefly.

Machine translation uses a type of RNN named sequence-to-sequence model (which is often called seq2seq model). This model is also very important for other natural language processes tasks in general, such as text summarization. A seq2seq model is divided into the encoder part and the decoder part. The encoder gives out a hidden state vector and it used as the input of the decoder part. And decoder part generates texts, using the output of the last time step as the input of next time step.

Voice recognition is also a famous application of RNN, but it also needs a special type of RNN.

*To be honest, I don’t know what is the state-of-the-art voice recognition algorithm. The example in this article is a combination of RNN and a collapsing function made using Connectionist Temporal Classification (CTC). In this model, the output of RNN is much longer than the recorded words or sentences, so a collapsing function reduces the output into next output with normal length.

You might have noticed that RNNs in the charts above are connected in both directions. Depending on the RNN tasks you need such bidirectional RNNs.  I think it is also easy to imagine that such networks are necessary. Again, machine translation is a good example.

And interestingly, image captioning, which enables a computer to describe a picture, is one-to-many-task. As the output is a sentence, it is easy to imagine that the output is “many.” If it is a one-to-many task, the input is supposed to be a vector.

Where does the input come from? I told you that I was obsessed with the beauty of the last vector of the feature extraction part of CNN. Surprisingly the the “beautiful” vector, which I call a “semantic vector” is the input of image captioning task (after some transformations, depending on the network models).

I think this articles includes major things you need to know as prerequisites when you want to understand RNN at more mathematical level. In the next article, I would like to explain the structure of a simple RNN, and how it forward propagate.

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

As Businesses Struggle With ML, Automation Offers a Solution

In recent years, machine learning technology and the business solutions it enables has developed into a big business in and of itself. According to the industry analysts at IDC, spending on ML and AI technology is set to grow to almost $98 billion per year by 2023. In practical terms, that figure represents a business environment where ML technology has become a key priority for companies of every kind.

That doesn’t mean that the path to adopting ML technology is easy for businesses. Far from it. In fact, survey data seems to indicate that businesses are still struggling to get their machine learning efforts up and running. According to one such survey, it currently takes the average business as many as 90 days to deploy a single machine learning model. For 20% of businesses, that number is even higher.

From the data, it seems clear that something is missing in the methodologies that most companies rely on to make meaningful use of machine learning in their business workflows. A closer look at the situation reveals that the vast majority of data workers (analysts, data scientists, etc.) spend an inordinate amount of time on infrastructure work – and not on creating and refining machine learning models.

Streamlining the ML Adoption Process

To fix that problem, businesses need to turn to another growing area of technology: automation. By leveraging the latest in automation technology, it’s now possible to build an automated machine learning pipeline (AutoML pipeline) that cuts down on the repetitive tasks that slow down ML deployments and lets data workers get back to the work they were hired to do. With the right customized solution in place, a business’s ML team can:

  • Reduce the time spent on data collection, cleaning, and ingestion
  • Minimize human errors in the development of ML models
  • Decentralize the ML development process to create an ML-as-a-service model with increased accessibility for all business stakeholders

In short, an AutoML pipeline turns the high-effort functions of the ML development process into quick, self-adjusting steps handled exclusively by machines. In some use cases, an AutoML pipeline can even allow non-technical stakeholders to self-create ML solutions tailored to specific business use cases with no expert help required. In that way, it can cut ML costs, shorten deployment time, and allow data scientists to focus on tackling more complex modelling work to develop custom ML solutions that are still outside the scope of available automation techniques.

The Parts of an AutoML Pipeline

Although the frameworks and tools used to create an AutoML pipeline can vary, they all contain elements that conform to the following areas:

  • Data Preprocessing – Taking available business data from a variety of sources, cleaning it, standardizing it, and conducting missing value imputation
  • Feature Engineering – Identifying features in the raw data set to create hypotheses for the model to base predictions on
  • Model Selection – Choosing the right ML approach or hyperparameters to produce the desired predictions
  • Tuning Hyperparameters – Determining which hyperparameters help the model achieve optimal performance

As anyone familiar with ML development can tell you, the steps in the above process tend to represent the majority of the labour and time-intensive work that goes into creating a model that’s ready for real-world business use. It is also in those steps where the lion’s share of business ML budgets get consumed, and where most of the typical delays occur.

The Limitations and Considerations for Using AutoML

Given the scope of the work that can now become part of an AutoML pipeline, it’s tempting to imagine it as a panacea – something that will allow a business to reduce its reliance on data scientists going forward. Right now, though, the technology can’t do that. At this stage, AutoML technology is still best used as a tool to augment the productivity of business data teams, not to supplant them altogether.

To that end, there are some considerations that businesses using AutoML will need to keep in mind to make sure they get reliable, repeatable, and value-generating results, including:

  • Transparency – Businesses must establish proper vetting procedures to make sure they understand the models created by their AutoML pipeline, so they can explain why it’s making the choices or predictions it’s making. In some industries, such as in medicine or finance, this could even fall under relevant regulatory requirements.
  • Extensibility – Making sure the AutoML framework may be expanded and modified to suit changing business needs or to tackle new challenges as they arise.
  • Monitoring and Maintenance – Since today’s AutoML technology isn’t a set-it-and-forget-it proposition, it’s important to establish processes for the monitoring and maintenance of the deployment so it can continue to produce useful and reliable ML models.

The Bottom Line

As it stands today, the convergence of automation and machine learning holds the promise of delivering ML models at scale for businesses, which would greatly speed up the adoption of the technology and lower barriers to entry for those who have yet to embrace it. On the whole, that’s great news both for the businesses that will benefit from increased access to ML technology, as well as for the legions of data professionals tasked with making it all work.

It’s important to note, of course, that complete end-to-end ML automation with no human intervention is still a long way off. While businesses should absolutely explore building an automated machine learning pipeline to speed up development time in their data operations, they shouldn’t lose sight of the fact that they still need plenty of high-skilled data scientists and analysts on their teams. It’s those specialists that can make appropriate and productive use of the technology. Without them, an AutoML pipeline would accomplish little more than telling the business what it wants to hear.

The good news is that the AutoML tools that exist right now are sufficient to alleviate many of the real-world problems businesses face in their road to ML adoption. As they become more commonplace, there’s little doubt that the lead time to deploy machine learning models is going to shrink correspondingly – and that businesses will enjoy higher ROI and enhanced outcomes as a result.

Optimize AI Talent: Perception from Across the Globe

Despite the AI hype, the AI skill gap is turning into some pariah while businesses are accelerating to become demigods.

Reports from the “Global Talent Competitiveness Index (GTCI) 2020” cover multiple parameters both national and organizational to generate insight for further action. This report compiles 70 variables including 132 national economies across the globe – based on all groups of income and at every developmental level.

The sole purpose of the GTCI report is to narrow down the skill gap by delivering the right data inputs. The figures mentioned in the report could be of value to private and public organizations.

GTCI report covered multiple themes that need to be addressed: –

As the race to embrace AI spurs, it is evident to address the challenges faced due to AI and how best these problems can be solved.

The pace at which AI is developing is transforming the way we work, forcing a technology shift, change in the corporate structure, changing the innovation system for AI professionals in every possible way.

There’s more that is needed to be done as AI and automation continue to affect the way we work.

  • Reskilling in workplaces to eliminate dearth of talent

As the role in AI keeps evolving, organizations need a larger workforce, especially to play technology roles such as AI engineers and AI specialists. Looking closely at the statistics you may not fail to notice that the number of AI job roles is on the rise, but there’s scarce talent.

Employers must take on reskilling as a critical measure. Else how will the technology market keep up with changing trends? Reskilling in the form of training or AI certifications should be emphasized. Having an in-house AI talent is an added advantage to the company.

  • Skill gap between growing countries (low performing and high performing) are widening

Based on the GTCI report, it is seen there is a skill gap happening not only across industries but between nations. The report also highlights which country lacks basic digital skills, and this highly gets contributed toward a digital divide between nations.

  • High-level of cooperation needed to embrace AI benefits

As much as the world shows concern toward embracing AI, not much has been done to achieve these transformations. And AI has huge potential to transform society and make it a better place to live. However, to embrace these benefits, corporations must engage in AI regulation.

From a talent acquisition perspective, this simply means employers will need more training and reskilling opportunities.

  • AI to allow nations to skip generations

On a technological front, AI makes it possible to skip generations in developed nations. Although, not common due to structural obstruction.

  • Cities are now competing to become talent magnets and AI hubs

As AI continues to hit the market, organizations are aggressively coming up with newer policies to attract and retain AI professionals.

No doubt, cities are striving to attract the right kind of talent as competition keeps increasing. As such many cities are competing in becoming core AI engines in transforming energy grids, transportation, and many other multiple segments. Cities are now becoming the main test beds for AI-based tools i.e. self-driven vehicles, tele-surveillance, and facial recognition.

  • Sustainable AI comes when the society is equally up for it

With certain communities not adopting and accepting the advent of AI, it is difficult to say whether these communities will not try to distort AI narratives. As a result, it is crucial for multiple stakeholders to embrace AI and developed the AI workforce in parallel.

Not to forget, regulators and policy-makers have an equal role to play to ensure there’s a smooth transition in jobs. As AI-induced transformation skyrockets, educators and leaders need to move quickly as the new generations’ complete focus is entirely based on doing their bit to the society.

Two decades passed ever since McKinsey declared the war for talent – particularly for high-performing employees. As organizations are extensively looking to hire the right talent, it is imperative to retain and attract talent at large.

Despite the unprecedented growth in AI technologies, it is near to being unanimous regarding having hold of organizations to master in AI, forget about retaining talent. They’re not even getting better at it.

Even top tech companies such as Google and Amazon, the demand for top talent outstrips the supply. Although you may find thousands of candidates applying for the same job role, the competition just gets tougher since such employers are tough nuts and pleasing them is not an easy task.

If these tech giants are finding it difficult to hire the right talent, you could imagine the plight of other companies.

Given the optimistic view regarding the technology future, it is much more challenging to convince that the war for talent truly resembles the war on talent.

The good news is organizations that look forward to adopting new technology and reskill their employees will most likely thrive in the competitive edge.

AI For Advertisers: How Data Analytics Can Change The Maths Of Advertising?

All Images Credit: Freepik

The task of understanding a customer’s journey and designing your marketing strategy accordingly can be difficult in this data-driven world. Today, the customer expresses their needs in myriad forms of requests.

Consumers express their needs and want attitudes, and values in various forms through search, comments, blogs, Tweets, “likes,” videos, and conversations and access such data across many channels like web, mobile, and face to face. Volume, variety, velocity and veracity of the data accumulated through these customer interactions are huge.

BigData and data analytics can be leveraged to understand several phases of the customer journey. There are risks involved in using Artificial Intelligence for the marketing data analysis of data breach and even manipulation. But, AI do have brighter prospects when it comes to marketing and advertiser applications.

As the CEO of a technology firm Chop Dawg and marketer, Joshua Davidson puts it, “AI-powered apps are going to be the future for us, and there are several industries that are ripe for this.” The mobile-first strategy of many enterprises has powered the use of AI for digital marketing and developing technologies and innovations to power industries with intelligent systems.

How AI and Machine learning are affecting customer journeys?

Any consumer journey begins with the recognition of a problem and then stages like initial consideration, active evaluation, purchase, and postpurchase come through up till the consumer journey is over. The need for identifying the purchasing and need patterns of the consumers and finding the buyer personas to strategize the marketing for them.

Need and Want Recognition:

Identifying a need is quite difficult as it is the most initial level of a consumer’s journey and it is more on the category level than at a brand level. Marketers and advertisers are relying on techniques like market research, web analytics, and data mining to build consumer profiles and buyer’s persona for understanding the needs and influencing the purchase of products. AI can help identify these wants and needs in real-time as the consumers usually express their needs and wants online and help build profiles more quickly.

AI technologies offered by several firms help in consumer profiling. Firms like Microsoft offers Azure that crunches billions of data points in seconds to determine the needs of consumers. It then personalizes web content on specific platforms in real-time to align with those status-updates. Consumer digital footprints are evolving through social media status updates, purchasing behavior, online comments and posts. Ai tends to update these profiles continuously through machine learning techniques.

Initial Consideration:

A key objective of advertising is to insert a brand into the consideration set of the consumers when they are looking for deliberate offerings. Advertising includes increasing the visibility of brands and emphasize on the key reasons for consideration. Advertisers currently use search optimization, paid search advertisements, organic search, or advertisement retargeting for finding the consideration and increase the probability of consumer consideration.

AI can leverage machine learning and data analytics to help with search, identify and rank functions of consumer consideration that can match the real-time considerations at any specific time. Take an example of Google Adwords, it analyzes the consumer data and helps advertisers make clearer distinctions between qualified and unqualified leads for better targeting.

Google uses AI to analyze the search-query data by considering, not only the keywords but also context words and phrases, consumer activity data and other BigData. Then, Google identifies valuable subsets of consumers and more accurate targeting.

Active Evaluation: 

When consumers narrow it down to a few choices of brands, advertisers need to insert trust and value among the consumers for brands. A common technique is to identify the higher purchase consumers and persuade them through persuasive content and advertisement. AI can support these tasks using some techniques:

Predictive Lead Scoring: Predictive lead scoring by leveraging machine learning techniques of predictive analytics to allow marketers to make accurate predictions related to the intent of purchase for consumers. A machine learning algorithm runs through a database of existing consumer data, then recognize trends and patterns and after processing the external data on consumer activities and interests, creates robust consumer profiles for advertisers.

Natural Language Generation: By leveraging the image, speech recognition and natural language generation, machine learning enables marketers to curate content while learning from the consumer behavior in real-time scenarios and adjusts the content according to the profiles on the fly.

Emotion AI: Marketers use emotion AI to understand consumer sentiment and feel about the brand in general. By tapping into the reviews, blogs or videos they understand the mood of customers. Marketers also use emotion AI to pretest advertisements before its release. The famous example of Kelloggs, which used emotion AI to help devise an advertising campaign for their cereal, eliminating the advertisement executions whenever the consumer engagement dropped.

Purchase: 

As the consumers decide which brands to choose and what it’s worth, advertising aims to move them out of the decision process and push for the purchase by reinforcing the value of the brand compared with its competition.

Advertisers can insert such value by emphasizing convenience and information about where to buy the product, how to buy the product and reassuring the value through warranties and guarantees. Many marketers also emphasize on rapid return policies and purchase incentives.

AI can completely change the purchase process through dynamic pricing, which encompasses real-time price adjustments on the basis of information such as demand and other consumer-behavior variables, seasonality, and competitor activities.

Post-Purchase: 

Aftersales services can be improved through intelligent systems using AI technologies and machine learning techniques. Marketers and advertisers can hire dedicated developers to design intelligent virtual agents or chatbots that can reinforce the value and performance of a brand among consumers.

Marketers can leverage an intelligent technique known as Propensity modeling to identify the most valuable customers on the basis of lifetime value, likelihood of reengagement, propensity to churn, and other key performance measures of interest. Then advertisers can personalize their communication with these customers on the basis of these data.

Conclusion:

AI has shifted the focus of advertisers and marketers towards the customer-first strategies and enhanced the heuristics of customer engagement. Machine learning and IoT(Internet of Things) has already changed the way customer interact with the brands and this transition has come at a time when advertisers and marketers are looking for new ways to tap into the customer mindset and buyer’s persona.

All Images Credit: Freepik

The importance of being Data Scientist

Header-Image by Clint Adair on Unsplash.

The incredible results of Machine Learning and Artificial Intelligence, Deep Learning in particular, could give the impression that Data Scientist are like magician. Just think of it. Recognising faces of people, translating from one language to another, diagnosing diseases from images, computing which product should be shown for us next to buy and so on from numbers only. Numbers which existed for centuries. What a perfect illusion. But it is only an illusion, as Data Scientist existed as well for centuries. However, there is a difference between the one from today compared to the one from the past: evolution.

The main activity of Data Scientist is to work with information also called data. Records of data are as old as mankind, but only within the 16 century did it include also numeric forms — as numbers started to gain more and more ground developing their own symbols. Numerical data, from a given phenomenon — being an experiment or the counts of sheep sold by week over the year –, was from early on saved in tabular form. Such a way to record data is interlinked with the supposition that information can be extracted from it, that knowledge — in form of functions — is hidden and awaits to be discovered. Collecting data and determining the function best fitting them let scientist to new insight into the law of nature right away: Galileo’s velocity law, Kepler’s planetary law, Newton theory of gravity etc.

Such incredible results where not possible without the data. In the past, one was able to collect data only as a scientist, an academic. In many instances, one needed to perform the experiment by himself. Gathering data was tiresome and very time consuming. No sensor which automatically measures the temperature or humidity, no computer on which all the data are written with the corresponding time stamp and are immediately available to be analysed. No, everything was performed manually: from the collection of the data to the tiresome computation.

More then that. Just think of Michael Faraday and Hermann Hertz and there experiments. Such endeavour where what we will call today an one-man-show. Both of them developed parts of the needed physics and tools, detailed the needed experiment settings, conducting the experiment and collect the data and, finally, computing the results. The same is true for many other experiments of their time. In biology Charles Darwin makes its case regarding evolution from the data collected in his expeditions on board of the Beagle over a period of 5 years, or Gregor Mendel which carry out a study of pea regarding the inherence of traits. In physics Blaise Pascal used the barometer to determine the atmospheric pressure or in chemistry Antoine Lavoisier discovers from many reaction in closed container that the total mass does not change over time. In that age, one person was enough to perform everything and was the reason why the last part, of a data scientist, could not be thought of without the rest. It was inseparable from the rest of the phenomenon.

With the advance of technology, theory and experimental tools was a specialisation gradually inescapable. As the experiments grow more and more complex, the background and condition in which the experiments were performed grow more and more complex. Newton managed to make first observation on light with a simple prism, but observing the line and bands from the light of the sun more than a century and half later by Joseph von Fraunhofer was a different matter. The small improvements over the centuries culminated in experiments like CERN or the Human Genome Project which would be impossible to be carried out by one person alone. Not only was it necessary to assign a different person with special skills for a separate task or subtask, but entire teams. CERN employs today around 17 500 people. Only in such a line of specialisation can one concentrate only on one task alone. Thus, some will have just the knowledge about the theory, some just of the tools of the experiment, other just how to collect the data and, again, some other just how to analyse best the recorded data.

If there is a specialisation regarding every part of the experiment, what makes Data Scientist so special? It is impossible to validate a theory, deciding which market strategy is best without the work of the Data Scientist. It is the reason why one starts today recording data in the first place. Not only the size of the experiment has grown in the past centuries, but also the size of the data. Gauss manage to determine the orbit of Ceres with less than 20 measurements, whereas the new picture about the black hole took 5 petabytes of recorded data. To put this in perspective, 1.5 petabytes corresponds to 33 billion photos or 66.5 years of HD-TV videos. If one includes also the time to eat and sleep, than 5 petabytes would be enough for a life time.

For Faraday and Hertz, and all the other scientist of their time, the goal was to find some relationship in the scarce data they painstakingly recorded. Due to time limitations, no special skills could be developed regarding only the part of analysing data. Not only are Data Scientist better equipped as the scientist of the past in analysing data, but they managed to develop new methods like Deep Learning, which have no mathematical foundation yet in spate of their success. Data Scientist developed over the centuries to the seldom branch of science which bring together what the scientific specialisation was forced to split.

What was impossible to conceive in the 19 century, became more and more a reality at the end of the 20 century and developed to a stand alone discipline at the beginning of the 21 century. Such a development is not only natural, but also the ground for the development of A.I. in general. The mathematical tools needed for such an endeavour where already developed by the half of the 20 century in the period when computing power was scars. Although the mathematical methods were present for everyone, to understand them and learn how to apply them developed quite differently within every individual field in which Machine Learning/A.I. was applied. The way the same method would be applied by a physicist, a chemist, a biologist or an economist would differ so radical, that different words emerged which lead to different langues for similar algorithms. Even today, when Data Science has became a independent branch, two different Data Scientists from different application background could find it difficult to understand each other only from a language point of view. The moment they look at the methods and code the differences will slowly melt away.

Finding a universal language for Data Science is one of the next important steps in the development of A.I. Then it would be possible for a Data Scientist to successfully finish a project in industry, turn to a new one in physics, then biology and returning to industry without much need to learn special new languages in order to be able to perform each tasks. It would be possible to concentrate on that what a Data Scientist does best: find the best algorithm. In other words, a Data Scientist could resolve problems independent of the background the problem was stated.

This is the most important aspect that distinguish the Data Scientist. A mathematician is limited to solve problems in mathematics alone, a physicist is able to solve problems only in physics, a biologist problems only in biology. With a unique language regarding the methods and strategies to solve Machine Learning/A.I. problems, a Data Scientist can solve a problem independent of the field. Specialisation put different branches of science at drift from each other, but it is the evolution of the role of the Data Scientist to synthesize from all of them and find the quintessence in a language which transpire beyond all the field of science. The emerging language of Data Science is a new building block, a new mathematical language of nature.

Although such a perspective does not yet exists, the principal component of Machine Learning/A.I. already have such proprieties partially in form of data. Because predicting for example the numbers of eggs sold by a company or the numbers of patients which developed immune bacteria to a specific antibiotic in all hospital in a country can be performed by the same prediction method. The data do not carry any information about the entities which are being predicted. It does not matter anymore if the data are from Faraday’s experiment, CERN of Human Genome. The same data set and its corresponding prediction could stand literary for anything. Thus, the result of the prediction — what we would call for a human being intuition and/or estimation — would be independent of the domain, the area of knowledge it originated.

It also lies at the very heart of A.I., the dream of researcher to create self acting entities, that is machines with consciousness. This implies that the algorithms must be able to determine which task, model is relevant at a given moment. It would be to cumbersome to have a model for every task and and every field and then try to connect them all in one. The independence of scientific language, like of data, is thus a mandatory step. It also means that developing A.I. is not only connected to develop a new consciousness, but, and most important, to the development of our one.

Visual Question Answering with Keras – Part 2: Making Computers Intelligent to answer from images

Making Computers Intelligent to answer from images

This is my second blog on Visual Question Answering, in the last blog, I have introduced to VQA, available datasets and some of the real-life applications of VQA. If you have not gone through then I would highly recommend you to go through it. Click here for more details about it.

In this blog post, I will walk through the implementation of VQA in Keras.

You can download the dataset from here: https://visualqa.org/index.html. All my experiments were performed with VQA v2 and I have used a very tiny subset of entire dataset i.e all samples for training and testing from the validation set.

Table of contents:

  1. Preprocessing Data
  2. Process overview for VQA
  3. Data Preprocessing – Images
  4. Data Preprocessing through the spaCy library- Questions
  5. Model Architecture
  6. Defining model parameters
  7. Evaluating the model
  8. Final Thought
  9. References

NOTE: The purpose of this blog is not to get the state-of-art performance on VQA. But the idea is to get familiar with the concept. All my experiments were performed with the validation set only.

Full code on my Github here.


1. Preprocessing Data:

If you have downloaded the dataset then the question and answers (called as annotations) are in JSON format. I have provided the code to extract the questions, annotations and other useful information in my Github repository. All extracted information is stored in .txt file format. After executing code the preprocessing directory will have the following structure.

All text files will be used for training.

 

2. Process overview for VQA:

As we have discussed in previous post visual question answering is broken down into 2 broad-spectrum i.e. vision and text.  I will represent the Neural Network approach to this problem using the Convolutional Neural Network (for image data) and Recurrent Neural Network(for text data). 

If you are not familiar with RNN (more precisely LSTM) then I would highly recommend you to go through Colah’s blog and Andrej Karpathy blog. The concepts discussed in this blogs are extensively used in my post.

The main idea is to get features for images from CNN and features for the text from RNN and finally combine them to generate the answer by passing them through some fully connected layers. The below figure shows the same idea.

 

I have used VGG-16 to extract the features from the image and LSTM layers to extract the features from questions and combining them to get the answer.

3. Data Preprocessing – Images:

Images are nothing but one of the input to our model. But as you already may know that before feeding images to the model we need to convert into the fixed-size vector.

So we need to convert every image into a fixed-size vector then it can be fed to the neural network. For this, we will use the VGG-16 pretrained model. VGG-16 model architecture is trained on millions on the Imagenet dataset to classify the image into one of 1000 classes. Here our task is not to classify the image but to get the bottleneck features from the second last layer.

Hence after removing the softmax layer, we get a 4096-dimensional vector representation (bottleneck features) for each image.

Image Source: https://www.cs.toronto.edu/~frossard/post/vgg16/

 

For the VQA dataset, the images are from the COCO dataset and each image has unique id associated with it. All these images are passed through the VGG-16 architecture and their vector representation is stored in the “.mat” file along with id. So in actual, we need not have to implement VGG-16 architecture instead we just do look up into file with the id of the image at hand and we will get a 4096-dimensional vector representation for the image.

4. Data Preprocessing through the spaCy library- Questions:

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. As we have converted images into a fixed 4096-dimensional vector we also need to convert questions into a fixed-size vector representation. For installing spaCy click here

You might know that for training word embeddings in Keras we have a layer called an Embedding layer which takes a word and embeds it into a higher dimensional vector representation. But by using the spaCy library we do not have to train the get the vector representation in higher dimensions.

 

This model is actually trained on billions of tokens of the large corpus. So we just need to call the vector method of spaCy class and will get vector representation for word.

After fitting, the vector method on tokens of each question will get the 300-dimensional fixed representation for each word.

5. Model Architecture:

In our problem the input consists of two parts i.e an image vector, and a question, we cannot use the Sequential API of the Keras library. For this reason, we use the Functional API which allows us to create multiple models and finally merge models.

The below picture shows the high-level architecture idea of submodules of neural network.

After concatenating the 2 different models the summary will look like the following.

The below plot helps us to visualize neural network architecture and to understand the two types of input:

 

6. Defining model parameters:

The hyperparameters that we are going to use for our model is defined as follows:

If you know what this parameter means then you can play around it and can get better results.

Time Taken: I used the GPU on https://colab.research.google.com and hence it took me approximately 2 hours to train the model for 5 epochs. However, if you train it on a PC without GPU, it could take more time depending on the configuration of your machine.

7. Evaluating the model:

Since I have used the very small dataset for performing these experiments I am not able to get very good accuracy. The below code will calculate the accuracy of the model.

 

Since I have trained a model multiple times with different parameters you will not get the same accuracy as me. If you want you can directly download mode.h5 file from my google drive.

 

8. Final Thoughts:

One of the interesting thing about VQA is that it a completely new field. So there is absolutely no end to what you can do to solve this problem. Below are some tips while replicating the code.

  1. Start with a very small subset of data: When you start implementing I suggest you start with a very small amount of data. Because once you are ready with the whole setup then you can scale it any time.
  2. Understand the code: Understanding code line by line is very much helpful to match your theoretical knowledge. So for that, I suggest you can take very few samples(maybe 20 or less) and run a small chunk (2 to 3 lines) of code to get the functionality of each part.
  3. Be patient: One of the mistakes that I did while starting with this project was to do everything at one go. If you get some error while replicating code spend 4 to 5 days harder on that. Even after that if you won’t able to solve, I would suggest you resume after a break of 1 or 2 days. 

VQA is the intersection of NLP and CV and hopefully, this project will give you a better understanding (more precisely practically) with most of the deep learning concepts.

If you want to improve the performance of the model below are few tips you can try:

  1. Use larger datasets
  2. Try Building more complex models like Attention, etc
  3. Try using other pre-trained word embeddings like Glove 
  4. Try using a different architecture 
  5. Do more hyperparameter tuning

The list is endless and it goes on.

In the blog, I have not provided the complete code you can get it from my Github repository.

9. References:

  1. https://blog.floydhub.com/asking-questions-to-images-with-deep-learning/
  2. https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/
  3. https://github.com/sominwadhwa/vqamd_floyd

The Future of AI in Dental Technology

As we develop more advanced technology, we begin to learn that artificial intelligence can have more and more of an impact on our lives and industries that we have gotten used to being the same over the past decades. One of those industries is dentistry. In your lifetime, you’ve probably not seen many changes in technology, but a boom around artificial intelligence and technology has opened the door for AI in dental technologies.

How Can AI Help?

Though dentists take a lot of pride in their craft and career, most acknowledge that AI can do some things that they can’t do or would make their job easier if they didn’t have to do. AI can perform a number of both simple and advanced tasks. Let’s take a look at some areas that many in the dental industry feel that AI can be of assistance.

Repetitive, Menial Tasks

The most obvious area that AI can help out when it comes to dentistry is with repetitive and menial simple tasks. There are many administrative tasks in the dentistry industry that can be sped up and made more cost-effective with the use of AI. If we can train a computer to do some of these tasks, we may be able to free up more time for our dentists to focus on more important matters and improve their job performance as well. One primary use of AI is virtual consultations that offices like Philly Braces are offering. This saves patients time when they come in as the Doctor already knows what the next steps in their treatment will be.

Using AI to do some basic computer tasks is already being done on a small scale by some, but we have yet to see a very large scale implementation of this technology. We would expect that to happen soon, with how promising and cost-effective the technology has proven to be.

Reducing Misdiagnosis

One area that many think that AI can help a lot in is misdiagnosis. Though dentists do their best, there is still a nearly 20% misdiagnosis rate when reading x-rays in dentistry. We like to think that a human can read an x-ray better, but this may not be the case. AI technology can certainly be trained to read an x-ray and there have been some trials to suggest that they can do it better and identify key conditions that we often misread.

A world with AI diagnosis that is accurate and quicker will save time, money, and lead to better dental health among patients. It hasn’t yet come to fruition, but this seems to be the next major step for AI in dentistry.

Artificial Intelligence Assistants

Once it has been demonstrated that AI can perform a range of tasks that are useful to dentists, the next logical step is to combine those skills to make a fully-functional AI dental assistant. A machine like this has not yet been developed, but we can imagine that it would be an interface that could be spoken to similar to Alexa. The dentist would request vital information and other health history data from a patient or set of patients to assist in the treatment process. This would undoubtedly be a huge step forward and bring a lot of computing power into the average dentist office.

Conclusion

It’s clear that AI has a bright future in the dental industry and has already shown some of the essential skills that it can help with in order to provide more comprehensive and accurate care to dental patients. Some offices like Westwood Orthodontics already use AI in the form of a virtual consult to diagnose issues and provide treatment options before patients actually step foot in the office. Though not nearly all applications that AI can provide have been explored, we are well on our way to discovering the vast benefits of artificial intelligence for both patients and practices in the dental healthcare industry.


Lisa Gao, DDS, MS | Westwood Orthodontics 1033 Gayley Ave #106, Los Angeles California 90024, 310-870-1823

Accelerate your AI Skills Today: A Million Dollar Job!

The skyrocketing salaries ($1m per year) of AI engineers is not a hype. It is the fact of current corporate world, where you will witness a shift that is inevitable.

We’ve already set our feet at the edge of the technological revolution. A revolution that is at the verge of altering the way we live and work. As the fact suggests, humanity has fundamentally developed human production in three revolutions, and we’re now entering the fourth revolution. In its scope, the fourth revolution projects a transformation that is unlike anything we humans have ever experienced.

  • The first revolution had the world transformed from rural to urban
  • the emergence of mass production in the second revolution
  • third introduced the digital revolution
  • The fourth industrial revolution is anxious to integrate technologies into our lives.

And all thanks to artificial intelligence (AI). An advanced technology that surrounds us, from virtual assistants to software that translates to self-driving cars.

The rise of AI at an exponential rate has disrupted almost every industry. So much so that AI is being rated as one-million-dollar profession.

Did this grab your attention? It did?

Now, what if we were to tell you that the salary compensation for AI experts has grown dramatically. AI and machine learning are fields that have a mountain of demand in the tech industry today but has sparse supply.

AI field is growing at a quicker pace and salaries are skyrocketing! Read it for yourself to know what AI experts, AI researchers and any other AI talent are commanding today.

  • A top-class AI research laboratory, OpenAI says that techies in the AI field are projected to earn a salary compensation ranging between $300 to $500k for fresh graduates. However, expert professionals could earn anywhere up to $1m.
  • Whopping salary package of above 100 million yen that amounts to $1m is being offered to AI geniuses by a Japanese firm, Start Today. A firm that operates a fashion shopping website named Zozotown.

Does this leave you with a question – Is this a right opportunity for you to jump in the field and make hay while the sun is shining? 

And the answer to this question is – yes, it is the right opportunity for any developer seeking a role in the AI industry. It can be your chance to bridge the skill shortage in the AI field either by upskilling or reskilling yourself in the field of AI.

There are a wide varieties of roles available for an AI enthusiast like you. And certain areas are like AI Engineers and AI Researchers are high in demand, as there are not many professionals who have robust AI knowledge.

According to a job report, “The Future of Jobs 2018,” a prediction was made suggesting that machines and algorithms will create around 133 million new job roles by 2022.

AI and machine learning will dominate the tech world. The World Economic Forum says that several sectors have started embracing AI and machine learning to tackle challenges in certain fields such as advertising, supply chain, manufacturing, smart cities, drones, and cybersecurity.

Unraveling the AI realm

From chatbots to financial planners, AI is impacting the way businesses function on a day-today basis. AI makes the work simpler, as it provides variables, which makes the work more streamlined.

Alright! You know that

  • the demand for AI professionals is rising exponentially and that there is just a trickle of supply
  • the AI professionals are demanding skyrocketing salaries

However, beyond that how much more do you know about AI?

Considering the fact that our lives have already been touched by AI (think Alexa, and Siri), it is just a matter of time when AI will become an indispensable part of our lives.

As Gartner predicts that 2020 will be an important year for business growth in AI. Thus, it is possible to witness significant sparks for employment growth. Though AI predicts to diminish 1.8 million jobs, it is also said to replace it with 2.3 million jobs that will be created. As we look forward to stepping into 2020, AI-related job roles are set to make positive progress of achieving 2 million net-new employments by 2025.

With AI promising to score fat paychecks that would reach millions, AI experts are struggling to find new ways to pick up nouveau skills. However, one of the biggest impacts that affect the job market today is the scarcity of talent in this field.

The best way to stay relevant and employable in AI is probably by “reskilling,” and “upskilling.” And  AI certifications is considered ideal for those in the current workforce.

Looking to upskill yourself – here’s how you can become an AI engineer today.

Top three ways to enhance your artificial intelligence career:

  1. Acquire skills in Statistics and Machine Learning: If you’re getting into the field of machine learning, it is crucial that you have in-depth knowledge of statistics. Statistics is considered a prerequisite to the ML field. Both the fields are tightly related. Machine learning models are created to make accurate predictions while statistical models do the job of interpreting the relationship between variables. Many ML techniques heavily rely on the theory obtained through statistics. Thus, having extensive knowledge in statistics help initiate the first step towards an AI career.
  2. Online certification programs in AI skills: Opting for AI certifications will boost your credibility amongst potential employers. Certifications will also enhance your earning potential and increase your marketability. If you’re looking for a change and to be a part of something impactful; join the AI bandwagon. The IT industry is growing at breakneck speed; it is now that businesses are realizing how important it is to hire professionals with certain skillsets. Specifically, those who are certified in AI are becoming sought after in the job market.
  3. Hands-on experience: There’s a vast difference in theory and practical knowledge. One needs to familiarize themselves with the latest tools and technologies used by the industry. This is possible only if the individual is willing to work on projects and build things from scratch.

Despite all the promises, AI does prove to be a threat to job holders, if they don’t upskill or reskill themselves. The upcoming AI revolution will definitely disrupt the way we work, however, it will leave room for humans to perform more creative jobs in the future corporate world.

So a word of advice is to be prepared and stay future ready.

Visual Question Answering with Keras – Part 1

This is Part I of II of the Article Series Visual Question Answering with Keras

Making Computers Intelligent to answer from images

If we look closer in the history of Artificial Intelligence (AI), the Deep Learning has gained more popularity in the recent years and has achieved the human-level performance in the tasks such as Speech Recognition, Image Classification, Object Detection, Machine Translation and so on. However, as humans, not only we but also a five-year child can normally perform these tasks without much inconvenience. But the development of such systems with these capabilities has always considered an ambitious goal for the researchers as well as for developers.

In this series of blog posts, I will cover an introduction to something called VQA (Visual Question Answering), its available datasets, the Neural Network approach for VQA and its implementation in Keras and the applications of this challenging problem in real life. 

Table of Contents:

1 Introduction

2 What is exactly Visual Question Answering?

3 Prerequisites

4 Datasets available for VQA

4.1 DAQUAR Dataset

4.2 CLEVR Dataset

4.3 FigureQA Dataset

4.4 VQA Dataset

5 Real-life applications of VQA

6 Conclusion

 

  1. Introduction:

Let’s say you are given a below picture along with one question. Can you answer it?

I expect confidently you all say it is the Kitchen without much inconvenience which is also the right answer. Even a five-year child who just started to learn things might answer this question correctly.

Alright, but can you write a computer program for such type of task that takes image and question about the image as an input and gives us answer as output?

Before the development of the Deep Neural Network, this problem was considered as one of the difficult, inconceivable and challenging problem for the AI researcher’s community. However, due to the recent advancement of Deep Learning the systems are capable of answering these questions with the promising result if we have a required dataset.

Now I hope you have got at least some intuition of a problem that we are going to discuss in this series of blog posts. Let’s try to formalize the problem in the below section.

  1. What is exactly Visual Question Answering?:

We can define, “Visual Question Answering(VQA) is a system that takes an image and natural language question about the image as an input and generates natural language answer as an output.”

VQA is a research area that requires an understanding of vision(Computer Vision)  as well as text(NLP). The main beauty of VQA is that the reasoning part is performed in the context of the image. So if we have an image with the corresponding question then the system must able to understand the image well in order to generate an appropriate answer. For example, if the question is the number of persons then the system must able to detect faces of the persons. To answer the color of the horse the system need to detect the objects in the image. Many of these common problems such as face detection, object detection, binary object classification(yes or no), etc. have been solved in the field of Computer Vision with good results.

To summarize a good VQA system must be able to address the typical problems of CV as well as NLP.

To get a better feel of VQA you can try online VQA demo by CloudCV. You just go to this link and try uploading the picture you want and ask the related question to the picture, the system will generate the answer to it.

 

  1. Prerequisites:

In the next post, I will walk you through the code for this problem using Keras. So I assume that you are familiar with:

  1. Fundamental concepts of Machine Learning
  2. Multi-Layered Perceptron
  3. Convolutional Neural Network
  4. Recurrent Neural Network (especially LSTM)
  5. Gradient Descent and Backpropagation
  6. Transfer Learning
  7. Hyperparameter Optimization
  8. Python and Keras syntax
  1. Datasets available for VQA:

As you know problems related to the CV or NLP the availability of the dataset is the key to solve the problem. The complex problems like VQA, the dataset must cover all possibilities of questions answers in real-world scenarios. In this section, I will cover some of the datasets available for VQA.

4.1 DAQUAR Dataset:

The DAQUAR dataset is the first dataset for VQA that contains only indoor scenes. It shows the accuracy of 50.2% on the human baseline. It contains images from the NYU_Depth dataset.

Example of DAQUAR dataset

Example of DAQUAR dataset

The main disadvantage of DAQUAR is the size of the dataset is very small to capture all possible indoor scenes.

4.2 CLEVR Dataset:

The CLEVR Dataset from Stanford contains the questions about the object of a different type, colors, shapes, sizes, and material.

It has

  • A training set of 70,000 images and 699,989 questions
  • A validation set of 15,000 images and 149,991 questions
  • A test set of 15,000 images and 14,988 questions

Image Source: https://cs.stanford.edu/people/jcjohns/clevr/?source=post_page

 

4.3 FigureQA Dataset:

FigureQA Dataset contains questions about the bar graphs, line plots, and pie charts. It has 1,327,368 questions for 100,000 images in the training set.

4.4 VQA Dataset:

As comapred to all datasets that we have seen so far VQA dataset is relatively larger. The VQA dataset contains open ended as well as multiple choice questions. VQA v2 dataset contains:

  • 82,783 training images from COCO (common objects in context) dataset
  • 40, 504 validation images and 81,434 validation images
  • 443,757 question-answer pairs for training images
  • 214,354 question-answer pairs for validation images.

As you might expect this dataset is very huge and contains 12.6 GB of training images only. I have used this dataset in the next post but a very small subset of it.

This dataset also contains abstract cartoon images. Each image has 3 questions and each question has 10 multiple choice answers.

  1. Real-life applications of VQA:

There are many applications of VQA. One of the famous applications is to help visually impaired people and blind peoples. In 2016, Microsoft has released the “Seeing AI” app for visually impaired people to describe the surrounding environment around them. You can watch this video for the prototype of the Seeing AI app.

Another application could be on social media or e-commerce sites. VQA can be also used for educational purposes.

  1. Conclusion:

I hope this explanation will give you a good idea of Visual Question Answering. In the next blog post, I will walk you through the code in Keras.

If you like my explanations, do provide some feedback, comments, etc. and stay tuned for the next post.