6 Steps of Process Mining – Infographic

Many Process Mining projects mainly revolve around the selection and introduction of the right Process Mining tools. Relying on the right tool is of course an important aspect in the Process Mining project. Depending on whether the process analysis project is a one-time affair or daily process monitoring, different tools are pre-selected. Whether, for example, a BI system has already been established and whether a sophisticated authorization concept is required for the process analyzes also play a role in the selection, as do many other factors.

Nevertheless, it should not be forgotten that process mining is not primarily a tool, but an analysis method, in which the first part is about the reconstruction of the processes from operational IT systems in a resulting process log (event log), the second step is about a (core) graph analysis to visualize the process flows with additional analysis/reporting elements. If this perspective on process mining is not lost sight of, companies can save a lot of costs because it allows them to concentrate on solution-oriented concepts.

However, completely independent of the tools, there is a very general procedure in this data-driven process analysis you should understand and which we would like to describe with the following infographic:

DATANOMIQ Process Mining - 6 Steps of Doing Process Mining Analysis

6 Steps of Process Mining – Infographic PDF Download.

Interested in introducing Process Mining to your organization? Do not hesitate to get in touch with us!

DATANOMIQ is the independent consulting and service partner for business intelligence, process mining and data science. We are opening up the diverse possibilities offered by big data and artificial intelligence in all areas of the value chain. We rely on the best minds and the most comprehensive method and technology portfolio for the use of data for business optimization.

Top 5 Email Verification and Validation APIs for your Product

If you have spent some time running a website or online business, you would be aware of the importance of emails.

What many see as a decadent communication medium still holds immense value for digital marketers.

More than 330 billion emails are sent every day, even in 2022.

While email marketing is very effective, it is very difficult to do it right. One of the key reasons being the many problems that email marketers face with their email lists. Are the email IDs correct? Do they have spam traps? Are these disposable email addresses? There are a multitude of questions to deal with in email marketing and newsletter campaigns.

Email verification and validation APIs help us deal with this problem. APIs integrate with your platform and automatically check all email addresses for spam, mistyping, fake email ids, and so on. 

Top 5 email verification and validation APIs for your product

Today we will talk about the 5 best APIs that you can use to validate and verify the email addresses in your mailing list. Using an API can be a gamechanger for many email marketers. Before we get into the top 5 list, let’s discuss why APIs are so effective and how they work. 

Why APIs are so efficient

The major reason APIs work so efficiently is that it does not require human supervision. APIs work automatically and users do not have to manually configure them each time. The ease of use is one among many reasons you should start using an email verification and validation API.

If you maintain a mailing list, you would also want to know where your effort is going. All email marketers spend considerable time perfecting their emails. On top of that, they need to use an email marketing platform like Klaviyo. An API ensures that your hard work does not go in vain. By filtering out fake and disposable email IDs, you get a better idea of where your mailing stand stands. As a result, when you use a platform like Klaviyo along with an email verification API, the results are much better. In case you want something other than Klaviyo, you can learn more about Klaviyo alternatives here. 

How email verification and validation APIs work

Email verification and validation APIs work primarily in 7 ways:

  • Syntax Check
  • Address Name Detection
  • DEA (disposable email address) Detection
  • Spam Trap Detection
  • DNSBL and URI DNSBL Check
  • MX Record Lookup
  • Active Mailbox Check

With the help of these email verification and validation methods, you will see much better results from your email marketing campaign. On top of that, your business will not be identified as spam and will help in building reputation and authority.

Now that we have some idea about what email verification APIs are and what they do, let’s head over to the list. 

1. Abstract API

Abstract API is one of the most popular email verification and validation APIs out there. Here are some of its key features:

  • MX record check
  • GDPR and CCPA compliant
  • Does not store any email
  • Role email check

If you have looked for email address validation API on the internet, you must have come across Abstract API. It is among the best in the business and also comes with affordable subscription plans.

Abstract API helps with bounce rate detection, spam signups, differentiating between personal and business email IDs, and a lot more. However, the most significant feature of Abstract API is that it allows up to 500 free email checks every month. That’s a great way to see whether the product works for you before subscribing to it.

Abstract API is user-friendly and budget-friendly, which makes it a top choice for many email marketers. Anyone new to using these tools can easily learn about them from Abstract API. For these reasons, Abstract API has the number one spot on our list. 

2. SendGrid Validation API

After Abstract API, the second product to have top-notch features is SendGrid Validation API.  Here are its key features:

  • Uses machine learning to verify email addresses in real-time
  • Accurately identifies all inactive or inaccurate email addresses
  • You can check how your email appears in different mailboxes
  • Gives risk scores for all email addresses

While most email verification and validation APIs work similarly, SendGrid Validation API takes it a notch higher with machine learning and artificial intelligence. Despite having advanced features and functionalities, SendGrid Validation API is not difficult to use.

SendGrid Validation API operates on the cloud and does not store any of your email addresses. OIn top of that, there are easy settings and configuration options that users can tweak with. However, SendGrid Validation API does not have any free offering. There are only two plans: pro and premier. Users have to pay $89.95 per month to access SendGrid Validation API.

If you are looking for advanced email verification and validation API, no need to look beyond SendGrid Validation API. It has everything you would need for a solid email marketing campaign apart from having many additional features. 

3. Captain Verify

Another email verification and validation API – Captain Verify – is a one-stop solution for all email verification needs. Here are its key features:

  • Get reports on the overall quality of your email address database
  • Affordable plans
  • Compliant with GDPR regulations
  • Export encrypted CSV files

Unlike other email verification and validation APIs, Captain Verify does not stop after verifying the emails for spam, fake or invalid addresses, and so on. It helps email marketers understand how their campaign is performing and gives detailed reports on returns on investment. It is one of the best APIs available for the overall growth of your mailing campaign.

If you are looking for something simple yet powerful, Captain Verify will be a great option. Along with the features we mentioned already, it also lets users filter and refine their email lists. It can help you understand the overall quality of your mailing list much better.

As you can see, Captain Verify ticks most of the boxes to be one of the best email verification and validation APIs out there. Anyone looking for a good email API should give it a go. The best thing is that users get all this and more at only $7 per 1000 emails. 

4. Mailgun

Mailgun earns the fourth spot on our list. However, that does not mean it is any way less than the previous options discussed. Here’s what it offers:

  • RFC standards compliant
  • Daily and hourly tracking of API usage
  • Has a bulk list validation tools for faster operations
  • Supports both CSV and JSON format
  • Track bounce and unsubscribe rates

Email marketers around the world prefer Mailgun for all their email verification and validation needs. It has multiple features that allow users to check their mailing list for fakes and scams. Apart from that, it also gives users a good idea of how their marketing campaign is performing.

Mailgun enjoys high ratings across review platforms like Capterra and G2. People use it for a wide range of purposes, but email verification and validation remain the most important. Mailgun keeps track of bounce rates, hard bounce rates, and unsubscribe rates. With the help of these stats, email marketers can measure how their campaign is doing.

If you are looking for a simple email verification and validation tool, Mailgun can be a good choice. It is worth trying for anyone who wants to take their email marketing to the next level.

5. Hunter

Our last entry to the list is Hunter. It is a well-known API that is widely used by email marketers. Here’s what it gets right:

  • Compare your mailing list with the Hunter mailing list for comparative quality analysis
  • SMTP checks, domain information verification, and multi-layer validation
  • Easy integration with Google Sheets
  • Supports both CSV and .txt formats

Hunter gives what it calls confidence scores which represent how strong or weak your mailing list is. This email verification and validation tool follows all the checks that we mentioned earlier, including SMTP verification, gibberish detection, MX record checks, and more. These features have worked together to make Hunter one of the most popular email verification and validation tools.

Hunter email verification API integrates easily with any platform and has a user-friendly interface. It also has a free plan that lets users check up to 50 emails for free. Giving it a try without spending money is very useful for anyone looking for a new email verification and validation API.

If you are looking for an email finder and email verifier rolled into one, Hunter is the best solution. With so many features and functionalities, it is one of the favorite email verification and validation APIs for thousands of marketers and entrepreneurs.


When used correctly, email verification and validation APIs can give any online business a significant boost. As an email marketer, digital marketer, website owner, or entrepreneur, you should be using one of these APIs. If you aren’t using one already, find your top pick from our list of the 5 best email verification and validation APIs.

How to choose the best pre-trained model for your Convolutional Neural Network?

Introduction to Transfer Learning 

Let’s start by defining this term that is increasingly used in Data Science:

Transfer Learning refers to the set of methods that allow the transfer of knowledge acquired from solving a given problem to another problem.

Transfer Learning has been very successful with the rise of Deep Learning.  Indeed, the models used in this field often require high computation times and important resources. However, by using pre-trained models as a starting point, Transfer Learning makes it possible to quickly develop high-performance models and efficiently solve complex problems in Computer Vision.

Usual Machine Learning Approach vs Transfer Learning

As most of the Deep learning technics, Transfer Learning is strongly inspired by the process with which we learn.

Let’s take the example of someone who masters the guitar and wants to learn to play the piano. He can capitalize on his knowledge of music to learn to play a new instrument. In the same way, a car recognition model can be quickly adapted to truck recognition.

How is Transfer Learning concretely implemented to solve Computer Vision problems?

Now that we have defined Transfer Learning, let’s look at its application to Deep Learning problems, a field in which it is currently enjoying great success.

The use of Transfer Learning methods in Deep Learning consists mainly in exploiting pre-trained neural networks

Generally, these models correspond to very powerful algorithms that have been developed and trained on large databases and are now freely shared.

In this context, 2 types of strategies can be distinguished:

  1. Use of pre-trained models as feature extractors:

The architecture of Deep Learning models is very often presented as a stack of layers of neurons. These layers learn different features depending on the level at which they are located. The last layer (usually a fully connected layer, in the case of supervised learning) is used to obtain the final output. The figure below illustrates the architecture of a Deep Learning model used for cat/dog detection. The deeper the layer, the more specific features can be extracted.


Architecture of CNN

The idea is to reuse a pre-trained network without its final layer. This new network then works as a fixed feature extractor for other tasks.

To illustrate this strategy, let’s take the case where we want to create a model able to identify the species of a flower from its image. It is then possible to use the first layers of the convolutional neural network model AlexNet, initially trained on the ImageNet image database for image classification.

  1. Fitting of pre-trained models:

This is a more complex technique, in which not only the last layer is replaced to perform classification or regression, but other layers are also selectively re-trained. Indeed, deep neural networks are highly configurable architectures with various hyperparameters. Moreover, while the first layers capture generic features, the last layers focus more on the specific task at hand.

So the idea is to freeze (i.e. fix the weights) of some layers during training and refine the rest to meet the problem. 

This strategy allows to reuse the knowledge in terms of the global architecture of the network and to exploit its states as a starting point for training. It thus allows to obtain better performances with a shorter training time.

The figure below summarizes the main Transfer Learning approaches commonly used in Deep Learning.


Re-use of pre-trained machine learning models in transfer learning

How to choose your pre-trained CNN ?

TensorFlow and Pytorch have built very accessible libraries of pre-trained models easily integrable to your pipelines, allowing the simple leveraging of the Transfer learning power.
In the first part you discovered what a pre-trained model is, let’s now dig into how to choose between the (very) large catalog of models accessible in open-source.

An unresolved question:

As you could have expected, there is no simple answer to this question. Actually, many developers just stick to the models they are used to and that performed well in their previous projects.
However, it is still possible to follow a few guidelines that can help you decide.


The two main aspects to take into account are the same as most of the machine learning tasks :
⦁ Accuracy : The Higher, the better
⦁ Speed : The Faster, the better

The dream being having a model that has a super fast training with an excellent accuracy. But as you could expect, usually to have a better accuracy, a deeper model is needed, therefore a model that takes more time to train. Thus, the goal is to maximize the tradeoff between accuracy and complexity. You can observe this tradeoff in the following graph taken from the Efficient Net model original paper.

Accuracy on Imagenet

As you can observe on this graph, bigger models are not always better. There is always a risk that a more complex model overfits your data, because it can give too much importance to subtle details in features. Knowing that the best is to start with the smallest model, that is what’s done in the industry. A “good-enough” model that is small and therefore quickly trained is preferred. Of course if you aim for great accuracy with no interest in a quick training then you can target the large model and even try ensemble techniques combining multiple models power.

Most performant models at this time :

Here are a few models that are widely used today in the field of computer vision. From image classification to complex image captioning, those structures offers great performances :

  • ResNet50
  • EfficientNet
  • Inceptionv3

ResNet 50 : ResNet was developed by Microsoft and aims at resolving the ‘vanishing gradient problem’. It allows the creation of a very deep model (up to a hundred layers).

Top-1 accuracy : 74.9%

Top-5 accuracy : 92.1%

Size : 98MB

Parameters : 26 millions

EfficientNet : This model is a state-of-the art convolutional neural network trained by Google. It is based on the same construction as ResNet but with an intelligent rescaling method.

Top-1 accuracy : 77.1%

Top-5 accuracy : 93.3.0%

Size : 29MB

Parameters : 5 millions

InceptionV3 : Inception Networks (GoogLeNet/Inception v1) have proved to be more computationally efficient, both in terms of the number of parameters generated by the network and the economical cost incurred. It is based on Factorized Convolutions.

Top-1 accuracy : 77.9%

Top-5 accuracy : 93.7%

Size : 92MB

Parameters : 24 millions

Final Note: 

To summarize, in this article, we have seen that Transfer Learning is the ability to use existing knowledge, developed to solve a given problem, to solve a new problem. We saw the top 3 State-of-the-Art pre-trained models for image classification. Here I summarized the performance and some detail on each of those models.

tabel of pre-trained ai models

However, as you have now understood, this is a continuously growing domain and there is always a new model to look forward to and push the boundaries further. The best way to keep up is to read papers introducing new model construction and try the most performing new releases.


Understanding the “simplicity” of reinforcement learning: comprehensive tips to take the trouble out of RL

This is the first article of my article series “My elaborate study notes on reinforcement learning.”

*I adjusted mathematical notations in this article as close as possible to “Reinforcement Learning:An Introduction.”  This book by Sutton and Barto is said to be almost mandatory for those who studying reinforcement learning. Also I tried to avoid as much mathematical notations, introducing some intuitive examples. In case any descriptions are confusing or unclear, informing me of that via posts or email would be appreciated.


First of all, I have to emphasize that I am new to reinforcement learning (RL), and my current field is object detection, to be more concrete transfer learning in object detection. Thus this article series itself is also a kind of study note for me. Reinforcement learning (RL) is often briefly compared with human trial and errors, and actually RL is based on neuroscience or psychology as well as neural networks (I am not sure about these fields though). The word “reinforcement” roughly means associating rewards with certain actions. Some experiments of RL were conducted on animals, which are widely known as Skinner box or more classically Pavlov’s Dogs. In short, you can encourage animals to do something by giving foods to them as rewards, just as many people might have done to their dogs. Before animals find linkages between certain actions and foods as rewards to those actions, they would just keep trial and errors. We can think of RL as a family of algorithms which mimics this behavior of animals trying to obtain as much reward as possible.

*My cats will not all the way try to entertain me to get foods though.

RL showed its conspicuous success in the field of video games, such as Atari, and defeating the world champion of Go, one of the most complicated board games. Actually RL can be applied to not only video games or board games, but also various other fields, such as business intelligence, medicine, and finance, but still I am very much fascinated by its application on video games. I am now studying the field which could bridge between the world of video games and the real world. I would like to mention this in the one of upcoming articles.

So far I got an impression that learning RL ideas would be more challenging than learning classical machine learning or deep learning for the following reasons.

  1. RL is a field of how to train models, rather than how to design the models themselves. That means you have to consider a variety of problem settings, and you would often forget which situation you are discussing.
  2. You need prerequisites knowledge about the models of components of RL for example neural networks, which are usually main topics in machine/deep learning textbooks.
  3. It is confusing what can be learned through RL depending on the types of tasks.
  4. Even after looking over at formulations of RL, it is still hard to imagine how RL enables computers to do trial and errors.

*For now I would like you to keep it in mind that basically values and policies are calculated during in during RL.

And I personally believe you should always keep the following points in your mind in order not to be at a loss in the process of learning RL.

  1.  RL basically can be only applied to a very limited type of situation, which is called Markov decision process (MDP). In MDP settings your next state depends only on your current state and action, regardless of what you have done so far.
  2. You are ultimately interested in learning decision making rules in MDP, which are called policies.
  3. In the first stage of learning RL, you consider surprisingly simple situations. They might be simple like mazes in kids’ picture books.
  4. RL is in its early days of development.

Let me explain a bit more about what I meant by the third point above. I have been learning RL mainly with a very precise Japanese textbook named 「機械学習プロフェッショナルシリーズ 強化学習」(Machine Learning Professional Series: Reinforcement Learning). As I mentioned in an article of my series on RNN, I sometimes dislike Western textbooks because they tend to beat around the bush with simple examples to get to the point at a more abstract level. That is why I prefer reading books of this series in Japanese. And especially the RL one in the series was especially bulky and so abstract and overbearing to a spectacular degree. It had so many precise mathematical notations without leaving room for ambiguity, thus it took me a long time to notice that the book was merely discussing simple situations like mazes in kids’ picture books. I mean, the settings discussed were so simple that they can be expressed as tabular data, that is some Excel sheets.

*I could not notice that until the beginning of 6th chapter out of eight out of 8 chapters. The 6th chapter discusses uses of function approximators. With the approximations you can approximate tabular data. My articles will not dig this topic of approximation precisely, but the use of deep learning models, which I am going to explain someday, is a type of this approximation of RL models.

You might find that so many explanations on RL rely on examples of how to make computers navigate themselves in simple mazes or in playing video games, which are mostly impractical in the real world. However, as I will explain later, these are actually helpful examples to learn RL. As I show later, the relations of an agent and an environment are basically the same also in more complicated tasks. Reading some code or actually implementing RL would be very effective, especially in order to know simplicity of the situations in the beginning part of RL textbooks.

Given that you can do a lot of impressive and practical stuff with current deep learning libraries, you might get bored or disappointed by simple applications of RL in many textbooks. But as I mentioned above, RL is in its early days of development, at least at a public level. And in order to show its potential power, I am going to explain one of the most successful and complicated application of RL in the next article: I am planning to explain how AlphaGo or AplhaZero, RL-based AIs enabled computers to defeat the world champion of Go, one of the most complicated board games.

*RL was not used to the chess AI which defeated Kasparov in 1997. Combination of decision trees and super computers, without RL, was enough for the “simplicity” of chess. But uses of decision tree named Monte Carlo Tree Search enabled Alpha Go to read some steps ahead more effectively.  It is said deep learning enabled AlphaGo to have intuition about games. Mote Carlo Tree Search enabled it to have abilities to predict some steps ahead, and RL how to learn from experience.

1, What is RL?

In conclusion, as far as I could understand so far, as a beginner of RL, I would interpret RL as follows: RL is a sub-field of training AI models, and optimal rules for decision makings in an environment are learned through RL, weakly supervised by rewards in a certain period of time. When and how to evaluate decision makings are task-specific, and they are often realized by trial-and-error-like behaviors of agents. Rules for decision makings are called policies in contexts of RL. And optimization problems of policies are called sequential decision-making problems.

You are more or less going to see what I meant by my definition throughout my article series.

*An agent in RL means an entity which makes decisions, interacting with the environment with an action. And the actions are made based on policies.

You can find various types of charts explaining relations of RL with AI, and I personally found the chart below the most plausible.

“Models” in the chart are often hyped as “AI” in media today. But AI is a more comprehensive field of trying to realize human-like intellectual behaviors with computers. And machine learning have been the most central sub-field of AI last decades. Around 2006 there was a breakthrough of deep learning. Due to the breakthrough machine learning gained much better performance with deep learning models. I would say people have been calling popular “models” in each time “AI.” And importantly, RL is one field of training models, besides supervised learning and unsupervised learning, rather than a field of designing “AI” models. Some people say supervised learning or unsupervised learning are more preferable than RL because currently these trainings are more likely to be more successful in wide range of fields than RL. And usually the more data you have the more likely supervised or unsupervised learning are.

*The word “models” are used in another meaning later. Please keep it in mind that the “models” above are something like general functions. And the “models” which show up frequently later are functions modeling environments in RL.

*In case you’re totally new to AI and don’t understand what “supervising” means in these contexts, I think you should imagine cases of instructing students in schools. If a teacher just tells students “We have a Latin conjugation test next week, so you must check this section in the textbook.” to students, that’s a “supervised learning.” Students who take exams are “models.” Apt students like machine learning models would show excellent performances, but they might fail to apply the knowledge somewhere else. I mean, they might fail to properly conjugate words in unseen sentences. Next, if the students share an idea “It’s comfortable to get together with people alike.” they might be clustered to several groups. That might lead to “cool guys” or “not cool guys” group division. This is done without any explicit answers, and this corresponds to “unsupervised learning.” In this case, I would say a certain functions of the students’ brain or atmosphere there, which put similar students together, were the “models.” And finally, if teachers tell the students “Be a good student,” that’s what I meant with “weakly supervising.” However most people would say “How?” RL could correspond to such ultimate goals of education, and as well as education, you have to consider how to give rewards and how to evaluate students/agents. And “models” can vary. But such rewards often shows unexpected results.

2, RL and Markov decision process

As I mentioned in a former section, you have to keep it in mind that RL basically can be applied to a limited situation of sequential decision-making problems, which are called Markov decision processes (MDP). A markov decision process is a type of process where the next state of an agent depends only on the current state and the action taken in the current state. I would only roughly explain MDP in this article with a little formulation.

You might find MDPs very simple. But some people would find that their daily lives in fact can be described well with a MDP. The figure below is a state transition diagram of everyday routine at an office, and this is nothing but a MDP. I think many workers also basically have only four states “Chat” “Coffee” “Computer” and “Home” almost everyday.  Numbers in black are possibilities of transitions at the state, and each corresponding number in orange is the reward you get when the action is taken. The diagram below shows that when you just keep using a computer, you would likely to get high rewards. On the other hand chatting with your colleagues would just continue to another term of chatting with a probability of 50%, and that undermines productivity by giving out the reward of -1. And having some coffee is very likely to lead to a chat. In practice, you optimize which action to take in each situation. You adjust probabilities at each state, that is you adjust a policy, through planning or trial and errors.

Source: https://subscription.packtpub.com/book/data/9781788834247/1/ch01lvl1sec12/markov-decision-processes

*Even if you say “Be a good student,” school kids in puberty they would act far from Markov decision process. Even though I took an example of school earlier, I am sure education should be much more complicated process which requires constant patience.

Of course you have to consider much more complicated MDPs in most RL problems, and in most cases you do not have known models like state transition diagrams. Or rather I have to say RL enables you to estimate such diagrams, which are usually called models in contexts of RL, by trial and errors. When you study RL, for the most part you will see a chart like below. I think it is important to understand what this kind of charts mean, whatever study materials on RL you consult. I said RL is basically a training method for finding optimal decision making rules called policies. And in RL settings, agents estimate such policies by taking actions in the environment. The environment determines a reward and the next state based on the current state and the current action of the agent.

Let’s take a close look at the chart above in a bit mathematical manner. I made it based on “Machine Learning Professional Series: Reinforcement Learning.” The agent exert an action a in the environment, and the agent receives a reward r and the next state s'. r and s' are consequences of taking the action a in the state s. The action a is taken based on a conditional probability given s, which is denoted as \pi(a|s). This probability function \pi(a|s) is the very function representing policies, which we want to optimize in RL.

*Please do not think too much about differences of \sim and = in the chart. Actions, rewards, or transitions of states can be both deterministic or probabilistic. In the chart above, with the notation a \sim \pi (a|s) I meant that the action a is taken with a probability of \pi (a|s). And whether they are probabilistic or deterministic is task-specific. Also you should keep it in mind that all the values in the chart are realized values of random variables as I show in the chart at the right side.

In the textbook “Reinforcement Learning:An Introduction” by Richard S. Sutton, which is almost mandatory for all the RL learners, RL process is displayed as the left side of the figure below. Each capital letter in the chart means a random variable. Relations of random variables can be also displayed as graphical models like the right side of the chart. The graphical model is a time series expansion of the chart of RL loops at the left side. The chart below shows almost the same idea as the one above. Whether they use random variables or realized values is the only difference between them. My point is that decision makings are simplified in RL as the models I have explained. Even if some situations are not strictly MDPs, in many cases the problems are approximated as MDPs in practice so that RL can be applied to.

*I personally think you do not have to care so much about differences of random variables and their realized values in RL unless you discuss RL mathmematically. But if you do not know there are two types of notations, which are strictly different ideas, you might get confused while reading textboks on RL. At least in my artile series, I will strictly distinguish them only when their differences matter.

*In case you are not sure about differences of random variables and their realizations, please roughly grasp the terms as follows: random variables X are probabilistic tools for example dices. On the other hand their realized values x are records of them, for example (4, 1, 6, 6, 2, 1, …).  And the probability that a random variable X takes on the value x is denoted as Pr\{X = x\}. And X \sim p means the random variable X is selected from distribution p(x) \doteq \text{Pr} \{X=x\}. In case X is a “dice,” for any x p(x) = \frac{1}{6}.

3, Planning and RL

We have seen RL is a family of training algorithms which optimizes rules for choosing A_t = a in sequential decision-making problems, usually assuming them to be MDPs. However I have to emphasize that RL is not the only way to optimize such policies. In sequential decision making problems, when the model of the environment is known, policies can be optimized also through planning without collecting data from the environment. On the other hand, when the model of the environment is unknown policies have to be optimized based on data which an agents collects from the environment through trial and errors. This is the very case called RL. You might find planning problems very simple and unrealistic in practical cases. But RL is based on planning of sequential decision-making problems with MDP settings, so studying planning problems is inevitable.  As far as I could see so far, RL is a family of algorithms for approximating techniques in planning problems through trial and errors in environments. To be more concrete, in the next article I am going to explain dynamic programming (DP) in RL contexts as a major example of planning problems, and a formula called the Bellman equation plays a crucial role in planning. And after that we are going to see that RL algorithms are more or less approximations of Bellman equation by agents sampling data from environments.


As an intuitive example, I would like to take a case of navigating a robot, which is explained in a famous textbook on robotics named ” Probabilistic Robotics.”  In this case, the state set \mathcal{S} is the whole space on the map where the robot can move around. And the action set is \mathcal{A} = \{\rightarrow, \searrow, \downarrow, \swarrow \leftarrow, \nwarrow, \uparrow, \nearrow \}. If the robot does not fail to take any actions or there are no unexpected obstacles, manipulating the robot on the map is a MDP. In this example, the robot has to be navigated from the start point as the green dot to the goal as the red dot. In this case, blue arrows can be obtained through planning or RL. Each blue arrow denotes the action taken in each place, following the estimated policy. In other words, the function \pi is the flow of the blue arrows. But policies can vary even in the same problem. If you just want the robot to reach the goal as soon as possible, you might get a blue arrows in the figure at the top after planning. But that means the robot has to pass a narrow street, and it is likely to bump into the walls. If you prefer to avoid such risks, you should adopt policies of choosing wider streets, like the blue arrows in the figure at the bottom.

*In the textbook on probabilistic robotics, this case is classified to a planning problem rather than a RL problem because it assumes that the robot has a complete model of the environment, and RL is not introduced in the textbook. In case of robotics one major way of making a model, or rather a map is SLAM (Simultaneous Localization and Mapping). With SLAM, a map of the environment can be made only based on what have been seen with a moving camera like in the figure below. Half the first part of the textbook is about self localization of robots and gaining maps of environments. And the latter part is about planning in the gained map. RL is also based on planning problems as I explained. I would say RL is another branch of techniques to gain such models/maps and proper plans in the environment through trial and errors.

Source: Engel et.al. LSD-SLAM: Large-Scale Direct Monocular SLAM, ECCV 2014

In the example of robotics above, we have not considered rewards R_t in the course of navigating the agent. That means the reward is given only when it reaches the goal. But agents can get lost if they get a reward only at the goal. Thus in many cases you optimize a policy \pi(a|s) such that it maximizes the sum of rewards R_1 + R_2 + \cdots + R_T, where T is the the length of the whole sequence of MDP in this case. More concretely, at every time step t, agents have to estimate G_t \doteq R_{t+1} + R_{t+2} + \cdots + R_T. The G_t is called a return. But you usually have to consider uncertainty of future rewards, so in practice you multiply a discount rate \gamma \quad (0\leq \gamma \leq 1) with rewards every time step. Thus in practice agents estimate a discounted return every time step as follows.

G_t \doteq R_{t+1} + \gamma R_{t+2} + \gamma ^2 R_{t+3} + \cdots + \gamma ^ {T-t-1} R_T = \sum_{k=0}^{T-t-1}{\gamma ^{k}R_{t+k+1}}

If agents blindly try to maximize immediate upcoming rewards R_t in a greedy way, that can lead to smaller amount of rewards in the long run. Policies in RL have to be optimized so that they maximize return, a sum of upcoming rewards G_t, every time step. But still, it is not realistic to take all the upcoming rewards R_{t+1}, R_{t+2}, \dots directly into consideration. These rewards have to be calculated recursively and probabilistically every time step. To be exact values of states are calculated this way. The value of a state in contexts of RL mean how likely agents get higher values if they start from the state. And how to calculate values is formulated as the Bellman equation.

*If you are not sure what “ecursively” and “probabilistically” mean, please do not think too much. I am going to explain that as precisely as possible in the next article.

I am going to explain Bellman equation, or Bellman operator to be exact in the next article. For now I would like you to keep it in mind that Bellman operator calculates the value of a state by considering future actions and their following states and rewards. Bellman equation is often displayed as a decision-tree-like chart as below. I would say planning and RL are matter of repeatedly applying Bellman equation to values of states. In planning problems, the model of the environment is known. That is, all the connections of nodes of the graph at the left side of the figure below are known. On the other hand in RL, those connections are not completely known, thus they need to be estimated in certain ways by agents collecting data from the environment.


*I guess almost no one explain RL ideas as the graphs above, and actually I am in search of effective and correct ways of visualizing RL. But so far, I think the graphs above describe how values updated in RL problem settings with discrete data. You are going to see what these graphs mean little by little in upcoming articles. I am also planning to introduce Bellman operators to formulate RL so that you do not have to think about decision-tree-like graphs all the time.

4, Examples of how RL problems are modeled

You might find that so many explanations on RL rely on examples of how to make computers navigate themselves in simple mazes or play video games, which are mostly impractical in real world. But I think uses of RL in letting computers play video games are good examples when you study RL. The video game industry is one of the most developed and sophisticated area which have produced environments of RL. OpenAI provides some “playgrounds” where agents can actually move around, and there are also some ports of Atari games. I guess once you understand how RL can be modeled in those simulations, that helps to understand how other more practical tasks are implemented.

*It is a pity that there is no E.T. the Extra-Terrestrial. It is a notorious video game which put an end of the reign of Atari. And after that came the era of Nintendo Entertainment System.

In the second section of this article, I showed the most typical diagram of the fundamental RL idea. The diagrams below show correspondences of each element of some simple RL examples to the diagram of general RL. Multi-armed bandit problems are a family of the most straightforward RL tasks, and I am going to explain it a bit more precisely later in this article. An agent solving a maze is also a very major example of RL tasks. In this case states s\in \mathcal{S} are locations where an agent can move. Rewards r \in \mathcal{R} are goals or bonuses the agents get in the course of the maze. And in this case \mathcal{A} = \{\rightarrow, \downarrow,\leftarrow, \uparrow \}.

If the environments are more complicated, deep learning is needed to make more complicated functions to model each component of RL. Such RL is called deep reinforcement learning. The examples below are some successful cases of uses of deep RL. I think it is easy to imagine that the case of solving a maze is close to RL playing video games. In this case \mathcal{A} is all the possible commands with an Atari controller like in the figure below. Deep Q Networks use deep learning in RL algorithms named Q learning. The development of convolutional neural networks (CNN) enabled computers to comprehend what are displayed on video game screens. Thanks to that, video games do not need to be simplified like mazes. Even though playing video games, especially complicated ones today, might not be strict MDPs, deep Q Networks simplifies the process of playing Atari as MDP. That is why the process playing video games can be simplified as the chart below, and this simplified MPD model can surpass human performances. AlphaGo and AlphaZero are anotehr successful cases of deep RL. AlphaGo is ther first RL model which defeated the world Go champion. And some training schemes were simplified and extented to other board games like chess in AlphaZero. Even though they were sensations in media as if they were menaces to human intelligence, they are also based on MDPs. A policy network which calculates which tactics to take to enhance probability of winning board games. But they use much more sophisticated and complicated techniques. And it is almost impossible to try training them unless you own a tech company or something with some servers mounted with some TPUs. But I am going to roughly explain how they work in one of my upcoming articles.

5, Some keywords for organizing terms of RL

As I am also going to explain in next two articles, RL algorithms are totally different frameworks of training machine learning models compared to supervised/unsupervised learnig. I think pairs of keywords below are helpful in classifying RL algorithms you are going to encounter.

(1) “Model-based” or “model-free.”

I said planning problems are basics of RL problems, and in many cases RL algorithms approximate Bellman equation or related ideas. I also said planning problems can be solved by repeatedly applying Bellman equations on states of a model of an environment. But in RL problems, models are usually unknown, and agents can only move in an environment which gives a reward or the next state to an agent. The agent can gains richer information of the environment time step by time step in RL, but this procedure can be roughly classified to two types: model-free type and model-based type. In model-free type, models of the environment are not explicitly made, and policies are updated based on data collected from the environment. On the her hand, in model-based types the models of the environment are estimated, and policies are calculated based on the model.


*To be honest, I am still not sure about differences of model-free RL and model-based RL.

*AlphaGo and AlphaZero are examples of model-based RL. Phases of board games can be modeled with CNN. Plannings in this case correspond to reading some phases ahead of games, and they are enabled by Monte Carlo tree search. They are the only examples of model-based RL which I can come up with. And also I had an impression that many study materials on RL focus on model-free types of RL.

(2) “Values” or “policies.”

I mentioned that in RL, values and policies are optimized. Values are functions of a value of each state. The value here means how likely an agent gets high rewards in the future, starting from the state. Policies are functions fro calculating actions to take in each state, which I showed as each of blue arrows in the example of robotics above. But in RL, these two functions are renewed in return, and often they reach optimal functions when they converge. The figure below describes the idea well.

These are essential components of RL, and there too many variations of how to calculate them. For example timings of updating them, whether to update them probabilistically or deterministically.  And whatever RL algorithm I talk about, how values and policies are updated will be of the best interest. Only briefly mentioning them would be just more and more confusing, so let me only briefly take examples of dynamic programming (DP).

Let’s consider DP on a simple grid map which I showed in the preface. This is a planning problem, and agents have a perfect model of the map, so they do not have to actually move around there. Agents can move on any cells except for blocks, and they get a positive rewards at treasure cells, and negative rewards at danger cells. With policy iteration, the agents can interactively update policies and values of all the states of the map. The chart below shows how policies and values of cells are updated.


You do not necessarily have to calculate policies every iteration, and this case of DP is called value iteration. But as the chart below suggests, value iteration takes more time to converge.


I am going to much more precisely explain the differences of values and policies in DP tasks in the next article.

(3) “Exploration” or “exploitation”

RL agents are not explicitly supervised by the correct answers of each behavior. They just receive rough signals of “good” or “bad.” One of the most typical failed cases of RL is that agents can be myopic. I mean, once agents find some actions which constantly give good reward, they tend to miss other actions which produce better rewards more effectively. One good way of avoiding this is adding some exploration, that is taking some risks to discover other actions.

I mentioned multi-armed bandit problems are simple setting of RL problems. And they also help understand trade-off of exploration and exploitation. In a multi-armed bandit problem, an agent chooses which slot machine to run every time step. Each slot machine gives out coins, or rewards r with a probability of p. The number of trials is limited, so the agent has to find the machine which gives out coins the most efficiently within the limited number of trials. In this problem, the key is the balance of trying to find other effective slot machines and just trying to get as much coins as possible with the machine which for now seems to be the best. This is trade-off of “exploration” or “exploitation.” One simple way to implement exploration and exploitation trade-off is ɛ-greedy algorithm. This is quite simple: with a probability of \epsilon, agents just randomly choose actions which are not thought to be the best then.

*Casino owners are not so stupid. Just as insurance I am sure it is designed so that you would lose in the long run, and before your “exploration” is complete, you will be “exploited.”

Let’s take a look at a simple simulation of a multi-armed bandit problem. There are two “casinos,” I mean sets of slot machines. In casino A, all the slot machines gives out the same reward 1, thus agents only need to find the machine which is the most likely to gives out more coins. But casino B is not simple like that. In this casino, slot machines with small odds give higher rewards.

I prepared four types of “multi-armed bandits,” I mean octopus agents. Each of them has each value of \epsilon, and the \epsilons reflect their “curiosity,” or maybe “how inconsistent they are.” The graphs below show the average reward over 1000 simulations. In each simulation each agent can try slot machines 250 times in total. In casino A, it seems the agent with the curiosity of \epsilon = 0.3 gets the best rewards in a short term. But in the long run, more stable agent whose \epsilon is 0.1, get more rewards. On the other hand in casino B, No on seems to make outstanding results.

*I wold not concretely explain how values of each slot machines are updated in this article. I think I am going to explain multi-armed bandit problems with Monte Carlo tree search in one of upcoming articles to explain the algorithm of AlphaGo/AlphaZero.

(4)”Achievement” or “estimation”

The last pair of keywords is “achievement” or “estimation,” and it might be better to instead see them as a comparison of “Monte Carlo” and “temporal-difference (TD).” I said RL algorithms often approximate Bellman equation based on data an agent has collected. Agents moving around in environments can be viewed as sampling data from the environment. Agents sample data of states, actions, and rewards. At the same time agents constantly estimate the value of each state. Thus agents can modify their estimations of values using value calculated with sampled data. This is how agents make use of their “experiences” in RL. There are several variations of when to update estimations of values, but roughly they are classified to Monte Carlo and Temporal-difference (TD). Monte Carlo is based on achievements of agents after one episode or actions. And TD is more of based on constant estimation of values at every time step. Which approach is to take depends on tasks but it seems many major algorithms adopt TD types. But I got an impression that major RL algorithms adopt TD, and also it is said evaluating actions by TD has some analogies with how brain is “reinforced.” And above all, according to the book by Sutton and Barto “If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference (TD) learning.” And an intermediate idea, between Monte Carlo and TD, also can be formulated as eligibility trace.




In this article I have briefly covered all the topics I am planning to explain in this series. This article is a start of a long-term journey of studying RL also to me. Any feedback on this series, as posts or  emails, would be appreciated. The next article is going to be about dynamic programming, which is a major way for solving planning problems. In contexts of RL, dynamic programming is solved by repeatedly applying Bellman equation on values of states of a model of an environment. Thus I think it is no exaggeration to say dynamic programming is the backbone of RL algorithms.


The code I used for the multi-armed bandit simulation. Just copy and paste them on Jupyter Notebook.

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.


[1] Morimura Tetsuro, “Machine Learning Professional Series: Reinforcement Learning,” Kodansha, (2109)
森村哲郎 著, 「機械学習プロフェッショナルシリーズ 強化学習」, 講談社, (2019)

[2] Richard S. Sutton and Andrew G. Barto, “Reinforcement Learning: An Introduction Second Edition,” MIT Press, (2018)

[3] Kubo Takahiro, “Machine Learning Startup Series: Reinforcement Learning with Python,” Kodansha, (2019)
久保隆宏 著, 「機械学習スタートアップシリーズ Python で学強化学習 改訂第2版」, 講談社, (2019)

[4] Sebastian Thrun, Wolfram Burgard and Dieter Fox, “Probabilistic Robotics,” MIT Press, (2015), pp 487-510

[5] 布留川英一 著「AlphaZero 深層学習・強化学習・探索 人工知能プログラミング 実践入門」, 2019, ボーンデジタル
Eiichi Hurukawa, “AlphaZero Deep Learning・Reinforcement Learning・Searching Artificial Intelligence Programming Practical Introduction”, 2019, Bone Digital

Predictive Maintenance – Konzept und Chancen

(Maschinen)Zeit ist kostbar. Das trifft besonders auf produzierende Unternehmen zu. Denn hier gilt, jeder Stillstand einer Anlage kostet wertvolle Produktionskapazität. Stillstände einer Maschine lassen sich nicht 100%ig vermeiden, nur können sie mit dem passenden Predictive Maintenance Konzept (im weiteren Text als PdM abgekürzt) reduziert und besser planbar gemacht werden. Mit intelligenten Add-ons wie IoT Anbindung von Maschinen in einer Industrie 4.0 Umgebungen und die Integration in ein gut geplantes Predictive Maintenance System lassen sich Kosten einsparen.

Was ist Predictive Maintenance?

Unter PdM sind Features beim Betrieb einer Anlage oder Maschine gemeint, die aus historischen Daten lernen und in Verbindung mit aktuellen oder sogar Echtzeitdaten Prognosen über bevorstehende Ereignisse durchführen. Aus den Berechnungen können z.B. kommende Wartungsarbeiten oder sich andeutende Ausfälle von Komponenten abgeleitet werden. Ersatzteile müssen nicht mehr vorsorglich auf Lager gelegt werden, sondern können aufgrund der tatsächlichen Notwendigkeit bestellt werden.

Vorteile durch Einsatz von Predictive Maintenance

Weiterer Pluspunkt ist die gute Planbarkeit von Wartungsintervallen. Betreiber als auch Hersteller können die Terminpläne nach dem vorherberechneten Wartungszeitpunkt einteilen. Als Betreiber können die Verfügbarkeiten mit den notwendigen Kapazitäten für die Produktion korreliert werden. Als Hersteller können Sie die Bestellung von Ersatzteilen im tatsächlich gebrauchten Umfang termingerecht durchführen. Und Sie müssen nicht von jedem möglichen Ersatzteil eine Anzahl immer vorrätig haben.

Technisches Konzept und Überlegungen

Für die Implementierung eines PdM, egal ob Sie das selbst durchführen wollen oder als Produkt zukaufen wollen, ist ein technisches Konzept der Startpunkt. Wir wollen hier die Eckdaten dieser Überlegungen skizzieren, um so als Arbeitsunterlage zur Erarbeitung des Konzepts dienen zu können.

Ein PdM Konzept besteht grob aus den folgenden Komponenten:

Predictive Maintenance Konzept

  • Maschine: hier werden Daten erzeugt, die in das PdM System übernommen werden sollen. Von den Maschinen sollen die Daten möglichst rasch und instantan abgegriffen werden, um die Werte in die Cloud zu laden und am Endgerät verfügbar zu haben. Meist ist auf den Maschinen nur begrenzter Speicherplatz vorhanden. Und die Daten können dort nur kurz zwischengespeichert werden.
  • Data-Agent und Transferprotokoll: die Übertragung der Daten auf die Cloudplattform wird durch eine Softwarekomponente durchgeführt. Diese kann entweder vom Hersteller mitgeliefert und bereits in der Maschine integriert sein. Oder sie wird als Teil des PdM Konzepts ergänzt.
    Aufgabe ist, die Daten gesichert auf die Cloudplattform zu übertragen. Bei Ausfall der Netzwerkverbindung kann ein lokales Spooling der Daten mit anschließender gesammelter Übertragung erfolgen.
    Neben der Datenübertragung muss an dieser Stelle auch eine Registrierung neuer Maschinen möglich sein. Der Data-Agent darf sich nicht einfach durch Klonen auf eine neue Maschine übertragen lassen. Eine geeignete Sicherung über z.B. Hardware-IDs muss hier durchgeführt werden.
  • Data-Aggregator: kann oder soll nicht die Maschine direkt Daten in die Cloud übertragen, können die Daten mehrerer Maschinen mit einem Data-Aggregator zusammengefasst werden. Von dort wird dann die verschlüsselte Übertragung auf die Cloudplattform durchgeführt.
    Gründe für den Einsatz eines Data-Aggregators könnten sein, dass beim Endkunden keine Übertragung von einzelnen Maschinen ins Netz erlaubt ist. Oder dass auch „Legacy“ Maschinen angebunden werden sollen, für die es keine Plugins zur direkten Datenübertragung gibt. Z.B., wenn eine Maschine mit einer älteren SPS angebunden werden soll, für die technisch keine direkte Übertragung auf die Cloud möglich ist.
  • Cloud- / Webplattform: die übertragenen Daten müssen zentral in einer geeigneten Umgebung gespeichert werden. Aus diesen gesammelten Daten werden die eigentlichen Erkenntnisse und Vorhersagen eines PdM Systems gewonnen. Durch eine KI und selbstlernende Algorithmen können die Daten weiter verwertet werden. Die gewonnen Ergebnisse aus den analysierten Maschinendaten sind die Basis für das PdM System und werden den Anwendern grafisch aufbereitet oder als Infos und Warnungen per Message zugestellt.
  • Endgerät: ist der Zugangspunkt für den Anwender. Die PdM Daten werden als App oder als Webanwendung dargestellt.

Der Data-Agent / Data-Aggregator kann mittels Edge Computing lokale Intelligenz erhalten. Daten können bereits vorausgewertet und zusammengefasst werden. Das reduziert die übertragenen Daten.

Welche Werte sollen übertragen werden?

Ziel des PdM ist durch das Abgreifen und Auswerten der Maschinendaten im Endeffekt eine Vernetzung mit den informationsverarbeitenden Systemen in einem Unternehmen. Das wird z.B. ein ERP Enterprise Resource Planning oder ein MES Manufacturing Execution System sein. Dort werden aufgrund der PdM Daten die Ressourcen und Kapazitäten für die Produktion geplant.

Typische Daten zur Übertragung bei einem PdM sind:

  • Temperatur
  • Druck
  • Geschwindigkeiten
  • Zurückgelegte Wege
  • Schaltspiele
  • Viskositäten
  • Flüssigkeitsstände
  • Vibrationen

Welche Daten Sie abgreifen können hängt damit zusammen, ob Sie Anwender also ein Produktionsbetrieb oder Hersteller also ein Maschinebauer sind. Als Anwender haben Sie normalerweise weniger tiefe Zugriffsmöglichkeiten auf Daten und Parameter der Maschinen. Nur die vom Hersteller bereitgestellten und dokumentierten Werte sind zugänglich. Diese sind vom Level eher auf Applikationsschicht angesetzt. Als Hersteller können Sie auf beliebige Werte zurückgreifen. Dazu gehören auch Dinge wie Schaltspiele oder zurückgelegte Fahrwege von Motoren.

Übertragen Sie jene Daten ins PdM, aus denen Sie die Wartungsarbeiten Ihrer Maschine ermitteln können.
Z.B. bei einer Glashärteanlage wären das die Betriebsstunden der Keramikwalzen, zurückgelegte Wege der Keilriemen oder Einschaltzeiten der Heizelemente.
Z.B. bei einer Automatisierung für die Leiterplattenfertigung wären das die Betriebsstunden der Saugnäpfe oder die zurückgelegten Wege der Antriebsriemen.

Wie oft sollen Werte übertragen werden?

Wenn Sie von der Netzwerkanbindung mit keinen Einschränkungen in Bezug auf Bandbreite oder Datenlimit rechnen müssen, nehmen Sie als Übertragungsintervall eine relativ gute Granularität an. Wählen Sie es so aus, dass Probleme an der Maschine auch nachträglich noch analysiert und der Auslöser gefunden werden kann.
Bei den meisten Industrie 4.0 Umgebungen sollte die Datenmenge keine große Rolle spielen. Sollten Sie in einer IoT Umgebung mit z.B. LoRaWAN-Anbindung arbeiten, dann teilen Sie die Daten in Kategorien nach Priorität ein. Z.B. hoch, mittel, niedrig oder z.B. Produktion, Standby. Die Übertragung der Kategorien können dann je nach Betriebszustand differenziert werden, wann welche Kategorie wichtig ist und priorisiert übertragen werden soll.


Die Umsetzung eines Predictive Maintenance Konzepts hilft Ihnen die Produktion agiler zu gestalten. Terminpläne aufgrund vorausgesagter Wartungszeiten der Anlagen lassen eine präzisere und engere Planung der Kapazitäten zu. Dieser Effekt wirkt sich positiv auf Produktionskosten aus.

Ein großes Sparpotential hat ein PdM für die kommenden CO2 Steuermodelle. Mit den ermittelten Daten können Sie exakte Berechnungen über die verbrauchte Energie auf das produzierte Werkstück durchführen und so CO2 Steuer sparen.

Mit smarten Diensten wie einem PdM können Sie als Maschinenhersteller dauerhaft Geld verdienen. Sie generieren weiteren Umsatz von Ihren Kunden und erhöhen gleichzeitig die Kundenbindung. Ihre Kunden werden durch vorausschauende Wartung zufriedener mit Ihren Produkten.


Vorausschauende Wartung hat Potential für Endanwender als auch für Hersteller. Beim Endanwender steht das Sparpotential im Vordergrund beim Hersteller die Kundenzufriedenheit. Mit intelligenten Edge-Computing Komponenten lassen sich PdM Lösungen gut skalieren und die Datenmenge reduzieren.

Die Umsetzung einer Lösung für Predictive Maintenance ist nicht an die Installation oder Entwicklung einer neuen Anlage gebunden. Auch bereits laufende Maschinen können leicht in ein PdM integriert werden.


Seq2seq models and simple attention mechanism: backbones of NLP tasks

This is the second article of my article series “Instructions on Transformer for people outside NLP field, but with examples of NLP.”

1 Machine translation and seq2seq models

I think machine translation is one of the most iconic and commercialized tasks of NLP. With modern machine translation you can translate relatively complicated sentences, if you tolerate some grammatical errors. As I mentioned in the third article of my series on RNN, research on machine translation already started in the early 1950s, and their focus was translation between English and Russian, highly motivated by Cold War. In the initial phase, machine translation was rule-based, like most students do in their foreign language classes. They just implemented a lot of rules for translations. In the next phase, machine translation was statistics-based. They achieved better performance with statistics for constructing sentences. At any rate, both of them highly relied on feature engineering, I mean, you need to consider numerous rules of translation and manually implement them. After those endeavors of machine translation, neural machine translation appeared. The advent of neural machine translation was an earthshaking change of machine translation field. Neural machine translation soon outperformed the conventional techniques, and it is still state of the art. Some of you might felt that machine translation became more or less reliable around that time.

Source: Monty Python’s Life of Brian (1979)

I think you have learnt at least one foreign or classical language in school. I don’t know how good you were at the classes, but I think you had to learn some conjugations of them and I believe that was tiresome to most of students. For example, as a foreign person, I still cannot use “der”, “die”, “das” properly. Some of my friends recommended I do not care them for the time being while I speak, but I usually care grammar very much. But this method of learning language is close to the rule base machine translation, and modern neural machine translation basically does not rely on such rules.

As far as I understand, machine translation is pattern recognition learned from a large corpus. Basically no one implicitly teach computers how grammar works. Machine translation learns very complicated mapping from a source language to a target language, based on a lot of examples of word or sentence pairs. I am not sure, but this might be close to how bilingual kids learn how the two languages are related. You do not need to navigate the translator to learn specific grammatical rules.

Source: Monty Python’s Flying Circus (1969)

Since machine translation does not rely on manually programming grammatical rules, basically you do not need to prepare another specific network architecture for another pair of languages. The same method can be applied to any pairs of languages, as long as you have an enough size of corpus for that. You do not have to think about translation rules between other pairs of languages.

Source: Monty Python’s Flying Circus (1969)

*I do not follow the cutting edge studies on machine translation, so I am not sure, but I guess there are some heuristic methods for machine translation. That is, designing a network depending on the pair of languages could be effective. When it comes grammatical word orders, English and Japanese have totally different structures, I mean English is basically SVO and Japanese is basically SOV. In many cases, the structures of sentences with the same meaning in both of the languages are almost like reflections in a mirror. A lot of languages have similar structures to English, even in Asia, for example Chinese. On the other hand relatively few languages have Japanese-like structures, for example Korean, Turkish. I guess there would be some grammatical-structure-aware machine translation networks.

Not only machine translations, but also several other NLP tasks, such as summarization, question answering, use a model named seq2seq model (sequence to sequence model). As well as other deep learning techniques, seq2seq models are composed of an encoder and a decoder. In the case of seq2seq models, you use RNNs in both the encoder and decoder parts. For the RNN cells, you usually use a gated RNN such as LSTM or GRU because simple RNNs would suffer from vanishing gradient problem when inputs or outputs are long, and those in translation tasks are long enough. In the encoder part, you just pass input sentences. To be exact, you input them from the first time step to the last time step, every time giving an output, and passing information to the next cell via recurrent connections.

*I think you would be confused without some understandings on how RNNs propagate forward. You do not need to understand this part that much if you just want to learn Transformer. In order to learn Transformer model, attention mechanism, which I explain in the next section is more important. If you want to know how basic RNNs work, an article of mine should help you.

*In the encoder part of the figure below, the cell also propagate information backward. I assumed an encoder part with bidirectional RNNs, and they “forward propagate” information backwards. But in the codes below, we do not consider such complex situation. Please just keep it in mind that seq2seq model could use bidirectional RNNs.

At the last time step in the encoder part, you pass the hidden state of the RNN to the decoder part, which I show as a yellow cell in the figure below, and the yellow cell/layer is the initial hidden layer of the first RNN cell of the decoder part. Just as normal RNNs, the decoder part start giving out outputs, and passing information via reccurent connections. At every time step you choose a token to give out from the vocabulary you use in the task. That means, each cell of decoder RNNs does a classification task and decides which word to write out at the time step. Also, very importantly, in the decoder part, the output at one time step is the input at the next time step, as I show as dotted lines in the figure below.

*The translation algorithm I explained depends on greedy decoding, which has to decide a token at every time step. However it is easy to imagine that that is not how you translate a word. You usually erase the earlier words or you construct some possibilities in your mind. Actually, for better translations you would need decoding strategies such as beam search, but it is out of the scope of at least this article. Thus we are going to make a very simplified translator based on greedy decoding.

2 Learning by making

*It would take some hours on your computer to train the translator if you do not use a GPU. I recommend you to run it at first and continue reading this article.

Seq2seq models do not have that complicated structures, and for now you just need to understand the points I mentioned above. Rather than just formulating the models, I think it would be better to understand this model by actually writing codes. If you copy and paste the codes in this Github page or the official Tensorflow tutorial, installing necessary libraries, it would start training the seq2seq model for Spanish-English translator. In the Github page, I just added comments to the codes in the official tutorial so that they are more understandable. If you can understand the codes in the tutorial without difficulty, I have to say this article itself is not compatible to your level. Otherwise, I am going to help you understand the tutorial with my original figures. I made this article so that it would help you read the next article. If you have no idea what RNN is, at least the second article of my RNN series should be helpful to some extent.

*If you try to read the the whole article series of mine on RNN, I think you should get prepared. I mean, you should prepare some pieces of paper and a pen. It would be nice if you have some stocks of coffee and snacks. Though I do not think you have to do that to read this article.

2.1 The corpus and datasets

In the codes in the Github page, please ignore the part sandwiched by “######”.  Handling language data is not the focus of this article. All you have to know is that the codes below first create datasets from the Spanish-English corpus in http://www.manythings.org/anki/ , and you datasets for training the translator as the tensors below.

Each token is encoded with integers as the codes below, thus after encoding, the Spanish sentence “Todo sobre mi madre.” is [1, 74, 514, 19, 237, 3, 2].

2.2 The encoder

The encoder part is relatively simple. All you have to keep in mind is that you put input sentences, and pass the hidden layer of the last cell to the decoder part. To be more concrete, an RNN cell receives an input word every time step, and gives out an output vector at each time step, passing hidden states to the next cell. You make a chain of RNN cells by the process, like in the figure below. In this case “time steps” means the indexes of the order of the words. If you more or less understand how RNNs work, I think this is nothing difficult. The encoder part passes the hidden state, which is in yellow in the figure below, to the decoder part.

Let’s see how encoders are implemented in the code below. We use a type of RNN named GRU (Gated Recurrent Unit). GRU is simpler than LSTM (Long Short-Term Memory). One GRU cell gets an input every time step, and passes one hidden state via recurrent connections. As well as LSTM, GRU is a gated RNN so that it can mitigate vanishing gradient problems. GRU was invented after LSTM for smaller computation costs. At time step (t) one GRU cell gets an input \boldsymbol{x}^{(t)} and passes its hidden state/vector \boldsymbol{h}^{(t)} to the next cell like the figure below. But in the implementation, you put the whole input sentence as a 16 dimensional vector whose elements are integers, as you saw in the figure in the last subsection 2.1. That means, the ‘Encoder’ class in the implementation below makes a chain of 16 GRU cells every time you put an input sentence in Spanish, even if input sentences have less than 16 tokens.

*TO BE  VERY HONEST, I am not sure why the encoder part of  seq2seq models are implemented this way in the codes below. In the implementation below, the number of total time steps in the encoder part is fixed to 16. If input sentences have less than 16 tokens, it seems the RNN cells get no inputs after the time step of the token “<end>”. As far as I could check, if RNN cells get no inputs, they repeats giving out similar 1024-d vectors. I think in this implementation, RNN cells after the <end> token, which I showed as the dotted RNN cells in the figure above, do not change so much. And the encoder part passes the hidden state of the 16th RNN cell, which is in yellow, to the decoder.

2.3 The decoder

The decoder part is also not that hard to understand. As I briefly explained in the last section, you initialize the first cell of the decoder, using the hidden layer of the last cell the encoder. During decoding, I mean while writing a translation, at the beginning you put the token “<start>” as the first input of the decoder. Given the input “<start>”, the first cell outputs “all” in the example in the figure below, and the output “all” is the input of the next cell. The output of the next cell “about” is also passed to the next cell, and you repeat this till the decoder gives out the token “<end>”.

A more important point is how to get losses in the decoder part during training. We use a technique named teacher enforcing during training the decoder part of a seq2seq model. This is also quite simple: you just have to make sure you input a correct answer to RNN cells, regardless of the outputs generated by the cell last time step. You force the decoder to get the correct input every time step, and that is what teacher forcing is all about.

You can see how the decoder part and teacher forcing is implemented in the codes below. You have to keep it in mind that unlike the ‘Encoder’ class, you put a token into a ‘Decoder’ class every time step. To be exact you also need the outputs of the encoder part to calculate attentions in the decoder part. I am going to explain that in the next subsection.

2.4 Attention mechanism

I think you have learned at least one foreign language, and usually you have to translate some sentences. Remember the processes of writing a translation of a sentence in another language. Imagine that you are about to write a new word after writing some. If you are not used to translations in the language, you must have cared about which parts of the original language correspond to the very new word you are going to write. You have to pay “attention” to the original sentence. This is what attention mechanism is all about.

*I would like you to pay “attention” to this section. As you can see from the fact that the original paper on Transformer model is named “Attention Is All You Need,” attention mechanism is a crucial idea of Transformer.

In the decoder part you initialize the hidden layer with the last hidden layer of the encoder, and its first input is “<start>”.  The decoder part start decoding, , as I explained in the last subsection. If you use attention mechanism in the seq2seq model, you calculate attentions every times step.  Let’s consider an example in the figure below, where the next input in the decoder is “my”, and given the token “my”, the GRU cell calculates a hidden state at the time step. The hidden state is the “query” in this case, and you compare the “query” with the 6 outputs of the encoder, which are “keys”. You get weights/scores, I mean “attentions”, which is the histogram in the figure below.

And you reweight the “values” with the weights in the histogram. In this case the “values” are the outputs of the encoder themselves. You used use the reweighted “values” to calculate the hidden state of the decoder at the times step again. And you used the hidden state updated by the attentions to predict the next word.

*In the implementation, however, the size of the output of the ‘Encoder’ class is always (16, 2024). You calculate attentions for all those 16 output vectors, but virtually only the first 6 1024-d output vectors important.

Summing up the points I have explained, you compare the “query” with the “keys” and get scores/weights for the “values.” Each score/weight is in short the relevance between the “query” and each “key”. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted “values.” In the case of attention mechanism in this article, we can say that “values” and “keys” are the same. You would also see that more clearly in the implementation below.

You especially have to pay attention to the terms “query”, “key”, and “value.” “Keys” and “values” are basically in the same language, and in the case above, they are in Spanish. “Queries” and “keys” can be in either different or the same. In the example above, the “query” is in English, and the “keys” are in Spanish.

You can compare a “query” with “keys” in various ways. The implementation uses the one called  Bahdanau’s additive style, and in Transformer, you use more straightforward ways. You do not have to care about how Bahdanau’s additive style calculates those attentions. It is much more important to learn the relations of “queries”, “keys”, and “values” for now.

*A problem is that Bahdanau’s additive style is slightly different from the figure above. It seems in Bahdanau’s additive style, at the time step (t) in the decoder part, the query is the hidden state at the time step (t-1). You would notice that if you closely look at the implementation below.As you can see in the figure above, you can see that you have to calculate the hidden state of the decoder cell two times at the time step (t): first in order to generate a “query”, second in order to predict the translated word at the time step. That would not be so computationally efficient, and I guess that is why Bahdanau’s additive style uses the hidden layer at the last time step as a query rather than calculating hidden layers twice.

2.5 Translating and displaying attentions

After training the translator for 20 epochs, I could translate Spanish sentences, and the implementation also displays attention scores for between the input and output sentences. For example the translation of the inputs “Todo sobre mi madre.” and “Habre con ella.” were “all about my mother .” and “i talked to her .” respectively, and the results seem fine. One powerful advantage of using attention mechanism is you can display this type of word alignment, I mean correspondences of words in a sentence, easily as in the heat maps below. The yellow parts shows high scores of attentions, and you can see that the distributions of relatively highs scores are more or less diagonal, which implies that English and Spanish have similar word orders.

For other inputs like “Mujeres al borde de un ataque de nervious.” or “Volver.”, the translations are not good.

You might have noticed there is one big problem in this implementation: you can use only the words appeared in the corpus. And actually I had to manually add some pairs of sentences with the word “borde” to the corpus to get the translation in the figure.


[1] “Neural machine translation with attention,” Tensorflow Core

[2]Tsuboi Yuuta, Unno Yuuya, Suzuki Jun, “Machine Learning Professional Series: Natural Language Processing with Deep Learning,” (2017), pp. 72-85, 91-94
坪井祐太、海野裕也、鈴木潤 著, 「機械学習プロフェッショナルシリーズ 深層学習による自然言語処理」, (2017), pp. 72-85, 191-193

[3]”Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 8 – Translation, Seq2Seq, Attention”, stanfordonline, (2019)

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

AI Voice Assistants are the Next Revolution: How Prepared are You?

By 2022, voice-based shopping is predicted to rise to USD 40 billion, based on the data from OC&C Strategy Consultants. We’re in an era of ‘voice’ where drastic transformation is seen between the way AI and voice recognition are changing the way we live.

According to the survey, the surge of voice assistants is said to be driven by the number of homes that used smart speakers, as such that the rise is seen to grow from 13% to 55%. Nonetheless, Amazon will be one of the leaders to dominate the new channel having the largest market share.

Perhaps this is the first time you’ve heard about the voice revolution. Well, why not, based on multiple researchers, it is estimated that the number of voice assistants will grow to USD 8 billion by 2023 from USD 2.5 billion in 2018.

But what is voice revolution or voice assistant or voice search?

It was only until recently that the consumers have started learning about voice assistants which further predicts to exist in the future.

You’ve heard of Alexa, Cortana, Siri, and Google Assistant, these technologies are some of the world’s greatest examples of voice assistants. They will further help to drive consumer behavior as well as prepare the companies and adjust based on the industry demands. Consumers can now transform the way they act, search, and advertise their brand through voice technology.

Voice search is a technology to help users or consumers perform a search on the website by simply asking a question on their smartphone, their computer, or their smart device.

The voice assistant awareness: Why now?

As surveyed by PwC, amongst the 90% respondents, about 72% have been recorded to use voice assistant while merely 10% said they were clueless about voice-enabled devices and products. It is noted, the adoption of voice-enabled was majorly driven by children, young consumers, and households earning an income of around >USD100k.

Let us have a glance to ensure the devices that are used mainly for voice assistance: –

  • Smartphone – 57%
  • Desktop – 29%
  • Tablet – 29%
  • Laptop – 29%
  • Speaker – 27%
  • TV remote – 21%
  • Car navigation – 20%
  • Wearable – 14%

According to the survey, most consumers that use voice-assistants were the younger generation, aged between 18-24.

While individuals between the ages 25-49 were said to use these technologies in a much more statistical manner, and are called the “heavy users.”

Significance of mobile voice assistants: What is the need?

Although mobile is accessible everywhere, you will merely find three out of four consumers using mobile voice assistants in their household i.e. 74%.

Mobile-based AI chatbots have taken our lives by storm, thus providing the best solution to both the customers and agents in varied areas – insurance, travel, and education, etc.

A certain group of individuals said they needed privacy while speaking to their device and that sending a voice command in public is weird.

Well, this simply explains why 18-24 aged group individuals prefer less use of voice assistants. However, this age group tends to spend more time out of their homes.

Situations where voice assistants can be used – standalone speakers Vs mobile


  • Standalone speakers – 65%
  • Mobile – 37%


  • Standalone speakers – 62%
  • Mobile – 12%

Watching TV

  • Standalone speakers – 57%
  • Mobile – 43%

In bed

  • Standalone speakers – 38%
  • Mobile – 37%


  • Standalone speakers – 29%
  • Mobile – 25%


  • Standalone speakers – 0%
  • Mobile – 40%

By the end of 2020, nearly half of all the searches made will be voice-based, as predicted by Comscore, a media analytics firm.

Don’t you think voice-based assistant is changing the way businesses function? Thanks to the advent of AI!

  • A 2018 study on AI chatbots and voice assistants by Spiceworks said, 24% of businesses that were spread largely, and 16% of smaller businesses have already started using AI technologies in their workplaces. While 25% of the business market is expected to adopt AI within the next 12 months.

Surprisingly, voice-based assistants such as Siri, Google Assistant, and Cortana are some of the most prominent technologies these businesses are using in their workstations.

Where will the next AI voice revolution take us?

Voice-authorized transactions

Paypal, an online payment gateway now leverages Siri and Alexa’s voice recognition capability, thus, allowing users to make payments, check their balance, and ask payments from people via voice command.

Voice remote control – AI-powered

Communications conglomerate Comcast, an American telecommunications and media conglomerate introduces their first-ever X1 voice remote control that provides both natural image processing and voice recognition.

With the help of deep learning, the X1 can easily come up with better search results with just a press of the button telling what your television needs to do next.

Voice AI-enabled memos and analytics

Salesforce recently unveiled Einstein Voice which is an AI assistant that helps in entering critical data the moment it hears, making use of the voice command. This AI assistant also initiates in interpreting voice memos. Besides this, the voice bots accompanying Einstein Voice also helps the company create their customized voice bots to answer customer queries.

Voice-activated ordering

It is astonishing to see how Domino’s is using voice-activated feature automate orders made over the phone by customers. Well, welcome to the era of voice revolution.

This app, developed by Nuance Communications already has a Siri like voice recognition feature that allows customers to place their orders just like how they would be doing it in front of the cash counter making your order to take place efficiently.

As more businesses look forward to breaking down the roadblocks between a consumer and a brand, voice search now projects to become an impactful technology of bridging the gap.

Simple RNN

A gentle introduction to the tiresome part of understanding RNN

Just as a normal conversation in a random pub or bar in Berlin, people often ask me “Which language do you use?” I always answer “LaTeX and PowerPoint.”

I have been doing an internship at DATANOMIQ and trying to make straightforward but precise study materials on deep learning. I myself started learning machine learning in April of 2019, and I have been self-studying during this one-year-vacation of mine in Berlin.

Many study materials give good explanations on densely connected layers or convolutional neural networks (CNNs). But when it comes to back propagation of CNN and recurrent neural networks (RNNs), I think there’s much room for improvement to make the topic understandable to learners.

Many study materials avoid the points I want to understand, and that was as frustrating to me as listening to answers to questions in the Japanese Diet, or listening to speeches from the current Japanese minister of the environment. With the slightest common sense, you would always get the feeling “How?” after reading an RNN chapter in any book.

This blog series focuses on the introductory level of recurrent neural networks. By “introductory”, I mean prerequisites for a better and more mathematical understanding of RNN algorithms.

I am going to keep these posts as visual as possible, avoiding equations, but I am also going to attach some links to check more precise mathematical explanations.

This blog series is composed of five contents.:

  1. Prerequisites for understanding RNN at a more mathematical level
  2. Simple RNN: the first foothold for understanding LSTM
  3. A brief history of neural nets: everything you should know before learning LSTM
  4. Understanding LSTM forward propagation in two ways
  5. LSTM back propagation: following the flows of variables


Business Data is changing the world’s view towards Green Energy

Energy conservation is one of the main stressed points all around the globe. In the past 30 years, researches in the field of energy conservation and especially green energy have risen to another level. The positive outcomes of these researches have given us a gamut of technologies that can aid in preserving and utilize green energy. It has also reduced the over-dependency of companies on fossil fuels such as oil, coal, and natural gas.

Business data and analytics have all the power and the potential to take the business organizations forward in the future and conquer new frontiers. Seizing the opportunities presented by Green energy, market leaders such as Intel and Google have already implemented it, and now they enjoy the rich benefits of green energy sources.

Business data enables the organizations to keep an eye on measuring the positive outcomes by adopting the green energies. According to a report done by the World energy outlook, the global wind energy capacity will increase by 85% by the year 2020, reaching 1400 TWh. Moreover, in the Paris Summit, more than 170 countries around the world agreed on reducing the impact of global warming by harnessing energy from green energy sources. And for this to work, Big Data Analytics will play a pivotal role.

Overview of Green energy

In simpler terms, Green Energy is the energy coming from natural sources such as wind, sun, plants, tides, and geothermal heat. In contrast to fossil fuels, green energy resources can be replenished in a short period, and one can use them for longer periods. Green energy sources have a minimal ill effect on the environment as compared to fossil fuels. In addition to this, fossil fuels can be replaced by green energy sources in many areas like providing electricity, fuel for motor vehicles, etc..

With the help of business data, organizations throughout the world can change the view of green energy. Big Data can show how different types of green energy sources can help businesses and accelerate sustainable expansion.

Below are the different types of green energy sources:

  • Wind Power
  • Solar Power
  • Geothermal Energy
  • Hydropower
  • Biofuels
  • Bio-mass

Now we present before you a list of advantages that green energy or renewable energy sources have brought to the new age businesses.

Profits on the rise

If the energy produced is more than the energy used, the organizations can sell it back to the grids and earn profit out of it. Green energy sources are renewable sources of energy, and with precise data, the companies will get an overall estimation of the requirement of energy.

With Big Data, the organizations can know the history of the demographical location before setting up the factory. For example, if your company is planning to setup a factory in the coastal region, tidal and wind energy would be more beneficial as compared to solar power. Business data will give the complete analysis of the flow of the wind so that the companies can ascertain the best location of the windmill; this will allow them to store the energy in advance and use it as per their requirement. It not only saves money but also provides an extra source of income to the companies. With green energy sources, the production in the company can increase to an unprecedented level and have sustainable growth over the years.

Synchronizing the maintenance process

If there is a rapid inflow of solar and wind energy sources, the amount of power produced will be huge. Many solar panels and windmills are operating in a solar power plant or in a wind energy source, and with many types of equipment, it becomestoo complex to manage. Big Data analytics will assist the companies in streamlining all the operations to a large extent for their everyday work without any hassle.

Moreover, the analytics tool will convey the performance of renewable energy sources under different weather conditions. Thus, the companies will get the perfect idea about the performance of the green energy sources, thus enabling them to take necessary actions as and when required.

Lowering the attrition rate

Researchers have found that more number of employees want to be associated with companies that support green energies. By opting for green energy sources and investing in them, companies are indirectly investing in keeping the workforce intact and lowering the attrition rate. Stats also show the same track as nearly 50% of the working professionals, and almost 2/3rd of the millennial population want to be associated with the companies who are opting for the green energy sources and have a positive impact on environmental conservation.

The employees will not only wish to stay with the organizations for a long time but will also work hard for the betterment of the organization. Therefore, you can concentrate on expanding the business rather than thinking about the replacement of the employees.

Lowering the risk due to Power Outage

The Business Data Analytics will continuously keep updating the requirements of power needed to run the company. Thus the organizations can cut down the risk of the power outage and also the expenses related to it. The companies will know when to halt the energy transmission as they would know if the grid is under some strain or not.

Business analytics and green energy provide a planned power outage to the companies, which is cost-efficient and thus can decrease the product development cost.  Apart from this, companies can store energy for later usage. Practicing this process will help save a lot of money in the long run, proving that investment in green energy sources is a smart investment.

Reducing the maintenance cost

An increasing number of organizations are using renewable sources of energy as it plays a vital role in decreasing production and maintenance costs. The predictive analysis technology helps renewable energy sources to produce more energy at less cost, thus reducing the cost of infrastructure.

Moreover, data analytics will make green energy sources more bankable for companies. As organizations will have a concrete amount of data related to the energy sources, they can use it wisely on a more productive basis

Escalating Energy Storage

Green energy sources can be stored in bulk and used as per requirement by the business organizations. Using green energy on a larger basis will even allow companies to completely get rid of fossil fuels and thus work towards the betterment of the environment. Big Data analytics with AI and cloud-enabled systems help organizations store renewable energies such as Wind and Solar.

Moreover, it gathers information for the businesses and gives the complete analysis of the exact amount of energy required to complete a particular task. The data will also automate cost savings as it can predict the client’s needs. Based on business data, companies can store renewable energy sources in a better manner.

With Business data analytics, the companies can store energy when it is cheap and use it according to the needs when the energy rates go higher. Although predicting the requirement of storage is a complicated process, with Artificial Intelligence (AI) at work, you can analyze the data efficiently.

Bundling Up

Green energy sources will play a pivotal role in deciding the future of the businesses as fossil fuels are available in a certain limit. Moreover, astute business data analysts will assist the organizations to not only use renewable energy sources in a better manner but also to form a formidable workforce. The data support in the green energy sector will also provide sustainable growth to the companies, monitor their efforts, and assist them in the long run.

Predictive Analytics World 2020 Healthcare

Difficult times call for creative measures

Predictive Analytics World for Healthcare will go virtual and you still have time to join us!

What do you have in store for me?

We will provide a live-streamed virtual version of healthcare Munich 2020 on 11-12 May, 2020: you will be able to attend sessions and to interact and connect with the speakers and fellow members of the data science community including sponsors and exhibitors from your home or your office.

What about the workshops?

The workshops will also be held virtually on the planned date:
13 May, 2020.

Get a complimentary virtual sneak preview!

If you would like to join us for a virtual sneak preview of the workshop „Data Thinking“ on Thursday, April 16, so you can familiarise yourself with the quality of the virtual edition of both conference and workshops and how the interaction with speakers and attendees works, please send a request to registration@risingmedia.com.

Don’t have a ticket yet?

It‘s not too late to join the data science community.
Register by 10 May to receive access to the livestream and recordings.


We’re looking forward to see you – virtually!

This year Predictive Analytics World for Healthcare runs alongside Deep Learning World and Predictive Analytics World for Industry 4.0.