Geschriebene Artikel über Big Data Analytics

Data Analytics and Mining for Dummies

Data Analytics and Mining is often perceived as an extremely tricky task cut out for Data Analysts and Data Scientists having a thorough knowledge encompassing several different domains such as mathematics, statistics, computer algorithms and programming. However, there are several tools available today that make it possible for novice programmers or people with no absolutely no algorithmic or programming expertise to carry out Data Analytics and Mining. One such tool which is very powerful and provides a graphical user interface and an assembly of nodes for ETL: Extraction, Transformation, Loading, for modeling, data analysis and visualization without, or with only slight programming is the KNIME Analytics Platform.

KNIME, or the Konstanz Information Miner, was developed by the University of Konstanz and is now popular with a large international community of developers. Initially KNIME was originally made for commercial use but now it is available as an open source software and has been used extensively in pharmaceutical research since 2006 and also a powerful data mining tool for the financial data sector. It is also frequently used in the Business Intelligence (BI) sector.

KNIME as a Data Mining Tool

KNIME is also one of the most well-organized tools which enables various methods of machine learning and data mining to be integrated. It is very effective when we are pre-processing data i.e. extracting, transforming, and loading data.

KNIME has a number of good features like quick deployment and scaling efficiency. It employs an assembly of nodes to pre-process data for analytics and visualization. It is also used for discovering patterns among large volumes of data and transforming data into more polished/actionable information.

Some Features of KNIME:

  • Free and open source
  • Graphical and logically designed
  • Very rich in analytics capabilities
  • No limitations on data size, memory usage, or functionalities
  • Compatible with Windows ,OS and Linux
  • Written in Java and edited with Eclipse.

A node is the smallest design unit in KNIME and each node serves a dedicated task. KNIME contains graphical, drag-drop nodes that require no coding. Nodes are connected with one’s output being another’s input, as a workflow. Therefore end-to-end pipelines can be built requiring no coding effort. This makes KNIME stand out, makes it user-friendly and make it accessible for dummies not from a computer science background.

KNIME workflow designed for graduate admission prediction

KNIME workflow designed for graduate admission prediction

KNIME has nodes to carry out Univariate Statistics, Multivariate Statistics, Data Mining, Time Series Analysis, Image Processing, Web Analytics, Text Mining, Network Analysis and Social Media Analysis. The KNIME node repository has a node for every functionality you can possibly think of and need while building a data mining model. One can execute different algorithms such as clustering and classification on a dataset and visualize the results inside the framework itself. It is a framework capable of giving insights on data and the phenomenon that the data represent.

Some commonly used KNIME node groups include:

  • Input-Output or I/O:  Nodes in this group retrieve data from or to write data to external files or data bases.
  • Data Manipulation: Used for data pre-processing tasks. Contains nodes to filter, group, pivot, bin, normalize, aggregate, join, sample, partition, etc.
  • Views: This set of nodes permit users to inspect data and analysis results using multiple views. This gives a means for truly interactive exploration of a data set.
  • Data Mining: In this group, there are nodes that implement certain algorithms (like K-means clustering, Decision Trees, etc.)

Comparison with other tools 

The first version of the KNIME Analytics Platform was released in 2006 whereas Weka and R studio were released in 1997 and 1993 respectively. KNIME is a proper data mining tool whereas Weka and R studio are Machine Learning tools which can also do data mining. KNIME integrates with Weka to add machine learning algorithms to the system. The R project adds statistical functionalities as well. Furthermore, KNIME’s range of functions is impressive, with more than 1,000 modules and ready-made application packages. The modules can be further expanded by additional commercial features.

Simple RNN

Simple RNN: the first foothold for understanding LSTM

*In this article “Densely Connected Layers” is written as “DCL,” and “Convolutional Neural Network” as “CNN.”

In the last article, I mentioned “When it comes to the structure of RNN, many study materials try to avoid showing that RNNs are also connections of neurons, as well as DCL or CNN.” Even if you manage to understand DCL and CNN, you can be suddenly left behind once you try to understand RNN because it looks like a different field. In the second section of this article, I am going to provide a some helps for more abstract understandings of DCL/CNN , which you need when you read most other study materials.

My explanation on this simple RNN is based on a chapter in a textbook published by Massachusetts Institute of Technology, which is also recommended in some deep learning courses of Stanford University.

First of all, you should keep it in mind that simple RNN are not useful in many cases, mainly because of vanishing/exploding gradient problem, which I am going to explain in the next article. LSTM is one major type of RNN used for tackling those problems. But without clear understanding forward/back propagation of RNN, I think many people would get stuck when they try to understand how LSTM works, especially during its back propagation stage. If you have tried climbing the mountain of understanding LSTM, but found yourself having to retreat back to the foot, I suggest that you read through this article on simple RNNs. It should help you to gain a solid foothold, and you would be ready for trying to climb the mountain again.

*This article is the second article of “A gentle introduction to the tiresome part of understanding RNN.”

1, A brief review on back propagation of DCL.

Simple RNNs are straightforward applications of DCL, but if you do not even have any ideas on DCL forward/back propagation, you will not be able to understand this article. If you more or less understand how back propagation of DCL works, you can skip this first section.

Deep learning is a part of machine learning. And most importantly, whether it is classical machine learning or deep learning, adjusting parameters is what machine learning is all about. Parameters mean elements of functions except for variants. For example when you get a very simple function f(x)=a + bx + cx^2 + dx^3, then x is a variant, and a, b, c, d are parameters. In case of classical machine learning algorithms, the number of those parameters are very limited because they were originally designed manually. Such functions for classical machine learning is useful for features found by humans, after trial and errors(feature engineering is a field of finding such effective features, manually). You adjust those parameters based on how different the outputs(estimated outcome of classification/regression) are from supervising vectors(the data prepared to show ideal answers).

In the last article I said neural networks are just mappings, whose inputs are vectors, matrices, or sequence data. In case of DCLs, inputs are vectors. Then what’s the number of parameters ? The answer depends on the the number of neurons and layers. In the example of DCL at the right side, the number of the connections of the neurons is the number of parameters(Would you like to try to count them? At least I would say “No.”). Unlike classical machine learning you no longer need to do feature engineering, but instead you need to design networks effective for each task and adjust a lot of parameters.

*I think the hype of AI comes from the fact that neural networks find features automatically. But the reality is difficulty of feature engineering was just replaced by difficulty of designing proper neural networks.

It is easy to imagine that you need an efficient way to adjust those parameters, and the method is called back propagation (or just backprop). As long as it is about DCL backprop, you can find a lot of well-made study materials on that, so I am not going to cover that topic precisely in this article series. Simply putting, during back propagation, in order to adjust parameters of a layer you need errors in the next layer. And in order calculate the errors of the next layer, you need errors in the next next layer.

*You should not think too much about what the “errors” exactly mean. Such “errors” are defined in this context, and you will see why you need them if you actually write down all the mathematical equations behind backprops of DCL.

The red arrows in the figure shows how errors of all the neurons in a layer propagate backward to a neuron in last layer. The figure shows only some sets of such errors propagating backward, but in practice you have to think about all the combinations of such red arrows in the whole back propagation(this link would give you some ideas on how DCLs work).

These points are minimum prerequisites for continuing reading this  RNN this article. But if you are planning to understand RNN forward/back propagation at  an abstract/mathematical level that you can read academic papers,  I highly recommend you to actually write down all the equations of DCL backprop. And if possible you should try to implement backprop of three-layer DCL.

2, Forward propagation of simple RNN

In fact the simple RNN which we are going to look at in this article has only three layers. From now on imagine that inputs of RNN come from the bottom and outputs go up. But RNNs have to keep information of earlier times steps during upcoming several time steps because as I mentioned in the last article RNNs are used for sequence data, the order of whose elements is important. In order to do that, information of the neurons in the middle layer of RNN propagate forward to the middle layer itself. Therefore in one time step of forward propagation of RNN, the input at the time step propagates forward as normal DCL, and the RNN gives out an output at the time step. And information of one neuron in the middle layer propagate forward to the other neurons like yellow arrows in the figure. And the information in the next neuron propagate forward to the other neurons, and this process is repeated. This is called recurrent connections of RNN.

*To be exact we are just looking at a type of recurrent connections. For example Elman RNNs have simpler recurrent connections. And recurrent connections of LSTM are more complicated.

Whether it is a simple one or not, basically RNN repeats this process of getting an input at every time step, giving out an output, and making recurrent connections to the RNN itself. But you need to keep the values of activated neurons at every time step, so virtually you need to consider the same RNNs duplicated for several time steps like the figure below. This is the idea of unfolding RNN. Depending on contexts, the whole unfolded DCLs with recurrent connections is also called an RNN.

In many situations, RNNs are simplified as below. If you have read through this article until this point, I bet you gained some better understanding of RNNs, so you should little by little get used to this more abstract, blackboxed  way of showing RNN.

You have seen that you can unfold an RNN, per time step. From now on I am going to show the simple RNN in a simpler way,  based on the MIT textbook which I recomment. The figure below shows how RNN propagate forward during two time steps (t-1), (t).

The input \boldsymbol{x}^{(t-1)}at time step(t-1) propagate forward as a normal DCL, and gives out the output \hat{\boldsymbol{y}} ^{(t)} (The notation on the \boldsymbol{y} ^{(t)} is called “hat,” and it means that the value is an estimated value. Whatever machine learning tasks you work on, the outputs of the functions are just estimations of ideal outcomes. You need to adjust parameters for better estimations. You should always be careful whether it is an actual value or an estimated value in the context of machine learning or statistics). But the most important parts are the middle layers.

*To be exact I should have drawn the middle layers as connections of two layers of neurons like the figure at the right side. But I made my figure closer to the chart in the MIT textbook, and also most other study materials show the combinations of the two neurons before/after activation as one neuron.

\boldsymbol{a}^{(t)} is just linear summations of \boldsymbol{x}^{(t)} (If you do not know what “linear summations” mean, please scroll this page a bit), and \boldsymbol{h}^{(t)} is a combination of activated values of \boldsymbol{a}^{(t)} and linear summations of \boldsymbol{h}^{(t-1)} from the last time step, with recurrent connections. The values of \boldsymbol{h}^{(t)} propagate forward in two ways. One is normal DCL forward propagation to \hat{\boldsymbol{y}} ^{(t)} and \boldsymbol{o}^{(t)}, and the other is recurrent connections to \boldsymbol{h}^{(t+1)} .

These are equations for each step of forward propagation.

  • \boldsymbol{a}^{(t)} = \boldsymbol{b} + \boldsymbol{W} \cdot \boldsymbol{h}^{(t-1)} + \boldsymbol{U} \cdot \boldsymbol{x}^{(t)}
  • \boldsymbol{h}^{(t)}= g(\boldsymbol{a}^{(t)})
  • \boldsymbol{o}^{(t)} = \boldsymbol{c} + \boldsymbol{V} \cdot \boldsymbol{h}^{(t)}
  • \hat{\boldsymbol{y}} ^{(t)} = f(\boldsymbol{o}^{(t)})

*Please forgive me for adding some mathematical equations on this article even though I pledged not to in the first article. You can skip the them, but for some people it is on the contrary more confusing if there are no equations. In case you are allergic to mathematics, I prescribed some treatments below.

*Linear summation is a type of weighted summation of some elements. Concretely, when you have a vector \boldsymbol{x}=(x_0, x_1, x_2), and weights \boldsymbol{w}=(w_0,w_1, w_2), then \boldsymbol{w}^T \cdot \boldsymbol{x} = w_0 \cdot x_0 + w_1 \cdot x_1 +w_2 \cdot x_2 is a linear summation of \boldsymbol{x}, and its weights are \boldsymbol{w}.

*When you see a product of a matrix and a vector, for example a product of \boldsymbol{W} and \boldsymbol{v}, you should clearly make an image of connections between two layers of a neural network. You can also say each element of \boldsymbol{u}} is a linear summations all the elements of \boldsymbol{v}} , and \boldsymbol{W} gives the weights for the summations.

A very important point is that you share the same parameters, in this case \boldsymbol{\theta \in \{\boldsymbol{U}, \boldsymbol{W}, \boldsymbol{b}, \boldsymbol{V}, \boldsymbol{c} \}}, at every time step. 

And you are likely to see this RNN in this blackboxed form.

3, The steps of back propagation of simple RNN

In the last article, I said “I have to say backprop of RNN, especially LSTM (a useful and mainstream type or RNN), is a monster of chain rules.” I did my best to make my PowerPoint on LSTM backprop straightforward. But looking at it again, the LSTM backprop part still looks like an electronic circuit, and it requires some patience from you to understand it. If you want to understand LSTM at a more mathematical level, understanding the flow of simple RNN backprop is indispensable, so I would like you to be patient while understanding this step (and you have to be even more patient while understanding LSTM backprop).

This might be a matter of my literacy, but explanations on RNN backprop are very frustrating for me in the points below.

  • Most explanations just show how to calculate gradients at each time step.
  • Most study materials are visually very poor.
  • Most explanations just emphasize that “errors are back propagating through time,” using tons of arrows, but they lack concrete instructions on how actually you renew parameters with those errors.

If you can relate to the feelings I mentioned above, the instructions from now on could somewhat help you. And I am going to share some study materials on simple RNNs in an external link so that you can gain a clear and mathematical understanding on how simple RNNs work.

Backprop of RNN , as long as you are thinking about simple RNNs, is not so different from that of DCLs. But you have to be careful about the meaning of errors in the context of RNN backprop. Back propagation through time (BPTT) is one of the major methods for RNN backprop, and I am sure most textbooks explain BPTT. But most study materials just emphasize that you need errors from all the time steps, and I think that is very misleading and confusing.

You need all the gradients to adjust parameters, but you do not necessarily need all the errors to calculate those gradients. Gradients in the context of machine learning mean partial derivatives of error functions (in this case J) with respect to certain parameters, and mathematically a gradient of J with respect to \boldsymbol{\theta \in \{\boldsymbol{U}, \boldsymbol{W}, \boldsymbol{b}^{(t)}, \boldsymbol{V}, \boldsymbol{c} \}}is denoted as ( \frac{\partial J}{\partial \boldsymbol{\theta}}  ). And another confusing point in many textbooks, including the MIT one, is that they give an impression that parameters depend on time steps. For example some study materials use notations like \frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}}, and I think this gives an impression that this is a gradient with respect to the parameters at time step (t). In my opinion this gradient rather should be written as ( \frac{\partial J}{\partial \boldsymbol{\theta}} )^{(t)} . But many study materials denote gradients of those errors in the former way, so from now on let me use the notations which you can see in the figures in this article.

In order to calculate the gradient \frac{\partial J}{\partial \boldsymbol{x}^{(t)}} you need errors from time steps s (s \geq t) \quad (as you can see in the figure, in order to calculate a gradient in a colored frame, you need all the errors in the same color).

*Another confusing point is that the \frac{\partial J}{\partial \boldsymbol{\ast ^{(t)}}}, \boldsymbol{\ast} \in \{\boldsymbol{a}^{(t)}, \boldsymbol{h}^{(t)}, \boldsymbol{o}^{(t)}, \dots \} are correct notations, because \boldsymbol{\ast} are values of neurons after forward propagation. They depend on time steps, and these are very values which I have been calling “errors.” That is why parameters do not depend on time steps, whereas errors depend on time steps.

As I mentioned before, you share the same parameters at every time step. Again, please do not assume that parameters are different from time step to time step. It is gradients/errors (you need errors to calculate gradients) which depend on time step. And after calculating errors at every time step, you can finally adjust parameters one time, and that’s why this is called “back propagation through time.” (It is easy to imagine that this method can be very inefficient. If the input is the whole text on a Wikipedia link, you need to input all the sentences in the Wikipedia text to renew parameters one time. To solve this problem there is a backprop method named “truncated BPTT,” with which you renew parameters based on a part of a text. )

And after calculating those gradients \frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}} you can take a summation of them: \frac{\partial J}{\partial \boldsymbol{\theta}}=\sum_{t=0}^{t=\tau}{\frac{\partial J}{\partial \boldsymbol{\theta}^{(t)}}}. With this gradient \frac{\partial J}{\partial \boldsymbol{\theta}} , you can finally renew the value of \boldsymbol{\theta} one time.

At the beginning of this article I mentioned that simple RNNs are no longer for practical uses, and that comes from exploding/vanishing problem of RNN. This problem was one of the reasons for the AI winter which lasted for some 20 years. In the next article I am going to write about LSTM, a fancier type of RNN, in the context of a history of neural network history.

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

Severity of lockdowns and how they are reflected in mobility data

The global spread of the SARS-CoV-2 at the beginning of March 2020 forced majority of countries to introduce measures to contain the virus. The governments found themselves facing a very difficult tradeoff between limiting the spread of the virus and bearing potentially catastrophic economical costs of a lockdown. Notably, considering the level of globalization today, the response of countries varied a lot in severity and response latency. In the overwhelming amount of media and social media information feed a lot of misinformation and anecdotal evidence surfaced and remained in people’s mind. In this article, I try to have a more systematic view on the topics of severity of response from governments and change in people’s mobility due to the pandemic.

I want to look at several countries with different approach to restraining the spread of the virus. I will look at governmental regulations, when, and how they were introduced. For that I am referring to an index called Oxford COVID-19 Government Response Tracker (OxCGRT)[1]. The OxCGRT follows, records, and rates the actions taken by governments, that are available publicly. However, looking just at the regulations and taking them for granted does not provide that we have the whole picture. Therefore, equally interesting is the investigation of how the recommended levels of self-isolation and social distancing is reflected in the mobility data and we will look at it first.

The mobility dataset

The mobility data used in this article was collected by Google and made freely accessible[2]. The data reflects how the number of visits and their length changed as compared to a baseline from before the pandemic. The baseline is the median value for the corresponding day of the week in the period from 3.01.2020 – 6.02.2020. The dataset contains data in six categories. Here we look at only 4 of them: public transport stations, places of residence, workplaces, and retail/recreation (including shopping centers, libraries, gastronomy, culture). The analysis intentionally omits parks (public beaches, gardens etc.) and grocery/pharmacy category. Mobility in parks is excluded due to huge weather change confound. The baseline was created in winter and increased/decreased (depending on the hemisphere) activity in parks is expected as the weather changes. It would be difficult to detangle tis change from the change caused by the pandemic without referring to a different baseline. The grocery shops and pharmacies are excluded because the measures regarding the shopping were very similar across the countries.

Amid the Covid-19 pandemic a lot of anecdotal information surfaced, that some countries, like Sweden, acted completely against the current by not introducing a lockdown. It was reported that there were absolutely no restrictions and Sweden can be basically treated as a control group for comparing the different approaches to lockdown on the spread of the coronavirus. Looking at the mobility data (below), we can see however, that there was a change in the mobility of Swedish citizens in comparison to the baseline.

Fig. 1 Moving average (+/- 6 days) of the mobility data in Sweden in four categories.

Fig. 1 Moving average (+/- 6 days) of the mobility data in Sweden in four categories.

Looking at the change in mobility in Sweden, we can see that the change in the residential areas is small, but it is indicating some change in behavior. A change in the retail and recreational sector is more noticeable. Most interestingly it is approaching the baseline levels at the beginning of June. The most substantial changes, however, are in the workplaces and transit categories. They are also much slower to come back to the baseline, although a trend in that direction starts to be visible.

Next, let us have a look at the change in mobility in selected countries, separately for each category. Here, I compare Germany, Sweden, Italy, and New Zealand. (To see the mobility data for other countries visit https://covid19.datanomiq.de/#section-mobility).

Fig. 2 Moving average (+/- 6 days) of the mobility data.

Fig. 2 Moving average (+/- 6 days) of the mobility data.

Looking at the data, we can see that the change in mobility in Germany and Sweden was somewhat similar in orders of magnitude, in comparison to changes in mobility in countries like Italy and New Zealand. Without a doubt, the behavior in Sweden changed the least from the baseline in all the categories. Nevertheless, claiming that people’s reaction to the pandemic in Sweden in Germany were polar opposites is not necessarily correct. The biggest discrepancy between Sweden and Germany is in the retail and recreation sector out of all categories presented. The changes in Italy and New Zealand reached very comparable levels, but in New Zealand they seem to be much more dynamic, especially in approaching the baseline levels again.

The government response dataset

Oxford COVID-19 Government Response Tracker records regulations from number of countries, rates them and categorizes into a few indices. The number between 1 and 100 reflects the level of the action taken by a government. Here, I focus on the Containment and Health sub-index that includes 11 indicators from categories: containment and closure policies and health system policies[3]. The actions included in the index are for example: school and workplace closing, restrictions on public events, travel restrictions, public information campaigns, testing policy and contact tracing.

Below, we look at a plot with the Containment and Health sub-index value for the four aforementioned countries. Data and documentation is available here[4]

Fig. 3 Oxford COVID-19 Government Response Tracker, the Containment and Health sub-index.

Fig. 3 Oxford COVID-19 Government Response Tracker, the Containment and Health sub-index.

Here the difference between Sweden and the other countries that we are looking at becomes more apparent. Nevertheless, the Swedish government did take some measures in order to condemn the spread of the SARS-CoV-2. At the highest, the index reached value 45 points in Sweden, 73 in Germany, 92 in Italy and 94 in New Zealand. In all these countries except for Sweden the index started dropping again, while the drop is the most dynamic in New Zealand and the index has basically reached the level of Sweden.

Conclusions

As we have hopefully seen, the response to the COVID-19 pandemic from governments differed substantially, as well as the resulting change in mobility behavior of the inhabitants did. However, the discrepancies were probably not as big as reported in the media.

The overwhelming presence of the social media could have blown some of the mentioned differences out of proportion. For example, the discrepancy in the mobility behavior between Sweden and Germany was biggest in recreation sector, that involves cafes, restaurants, cultural resorts, and shopping centers. It is possible, that those activities were the ones that people in lockdown missed the most. Looking at Swedes, who were participating in them it was easy to extrapolate on the overall landscape of the response to the virus in the country.

It is very hard to say which of the world country’s approach will bring the best effects for the people’s well-being and the economies. The ongoing pandemic will remain a topic of extensive research for many years to come. We will (most probably) eventually find out which approach to the lockdown was the most optimal (or at least come close to finding out). For the time being, it is however important to remember that there are many factors in play and looking into one type of data might be misleading. Comparing countries with different history, weather, political and economic climate, or population density might be misleading as well. But it is still more insightful than not looking into the data at all.

[1] Hale, Thomas, Sam Webster, Anna Petherick, Toby Phillips, and Beatriz Kira (2020). Oxford COVID-19 Government Response Tracker, Blavatnik School of Government. Data use policy: Creative Commons Attribution CC BY standard.

[2] Google LLC “Google COVID-19 Community Mobility Reports”. https://www.google.com/covid19/mobility/ retrived: 04.06.2020

[3] See documentation https://github.com/OxCGRT/covid-policy-tracker/tree/master/documentation

[4] https://github.com/OxCGRT/covid-policy-tracker  retrieved on 04.06.2020

 How Text to Speech Voices Are Used In Data Science

To speak on voices, text to speech platforms are bringing versatility to a new scale by implementing voices that sound more personal and less like a robot. As these services gain traction, vocal quality, and implementation improve to give sounds that feel like they’re speaking to you from a human mouth.

The intention of most text to speech platforms have always been to provide experiences that users feel comfortable using. Voices are a huge part of that, so great strides have been taken to ensure that they sound right.

How Voices are Utilized

Voices in the text to speech are generated by a computer itself. As the computer-generated voices transcribe the text into oral responses, they make up what we hear as dialogue read to us. These voice clips initially had the problem of sounding robotic and unpersonable as they were pulled digitally together. Lately, though, the technology has improved to bring faster response time in transcribing words, as well as seamlessly stringing together. This has brought the advantage of making a computer-generated voice sound much more natural and human. As people seek to connect more with the works they read, having a human-sounding voice is a huge step in letting listeners relate to their works.

To give an example of where this works, you might have a GPS in your car. The GPS has a function where it will transcribe the car’s route and tell you each instruction. Some GPS companies have made full use of this feature and added fun voices to help entertain drivers. These include Darth Vader and Yoda from Star Wars or having Morgan Freeman and Homer Simpson narrate your route. Different voice types are utilized in services depending on the situation. Professional uses like customer service centers will keep automated voices sounding professional and courteous when assisting customers. Educational systems will keep softer and kinder sounding voices to help sound more friendly with students.

When compared to older solutions, the rise of vocal variety in Text to Speech services has taken huge leaps as more people see the value of having a voice that they can connect to. Expressive voices and emotional variance are being applied to voices to help further convey this, with happy or sad sounding voices being implemented wherever appropriate. As time goes on, these services will get better at the reading context within sentences to apply emotion and tone at the correct times, and improve overall vocal quality as well. These reinvent past methods by advancing the once static and robotic sounds that used to be commonplace among text to voice services.

More infrastructures adopt these services to expand their reach to consumers who might not have the capabilities to utilize their offerings.  Having clear and relatable voices matter because customers and users will be drawn to them considerably more than if they chose not to offer them at all. In the near future text to speech voices will develop even further, enhancing the way people of all kinds connect to the words they read.

Simple RNN

Prerequisites for understanding RNN at a more mathematical level

Writing the A gentle introduction to the tiresome part of understanding RNN Article Series on recurrent neural network (RNN) is nothing like a creative or ingenious idea. It is quite an ordinary topic. But still I am going to write my own new article on this ordinary topic because I have been frustrated by lack of sufficient explanations on RNN for slow learners like me.

I think many of readers of articles on this website at least know that RNN is a type of neural network used for AI tasks, such as time series prediction, machine translation, and voice recognition. But if you do not understand how RNNs work, especially during its back propagation, this blog series is for you.

After reading this articles series, I think you will be able to understand RNN in more mathematical and abstract ways. But in case some of the readers are allergic or intolerant to mathematics, I tried to use as little mathematics as possible.

Ideal prerequisite knowledge:

  • Some understanding on densely connected layers (or fully connected layers, multilayer perception) and how their forward/back propagation work.
  •  Some understanding on structure of Convolutional Neural Network.

*In this article “Densely Connected Layers” is written as “DCL,” and “Convolutional Neural Network” as “CNN.”

1, Difficulty of Understanding RNN

I bet a part of difficulty of understanding RNN comes from the variety of its structures. If you search “recurrent neural network” on Google Image or something, you will see what I mean. But that cannot be helped because RNN enables a variety of tasks.

Another major difficulty of understanding RNN is understanding its back propagation algorithm. I think some of you found it hard to understand chain rules in calculating back propagation of densely connected layers, where you have to make the most of linear algebra. And I have to say backprop of RNN, especially LSTM, is a monster of chain rules. I am planing to upload not only a blog post on RNN backprop, but also a presentation slides with animations to make it more understandable, in some external links.

In order to avoid such confusions, I am going to introduce a very simplified type of RNN, which I call a “simple RNN.” The RNN displayed as the head image of this article is a simple RNN.

2, How Neurons are Connected

    \begin{equation*}   1 = 3 - 2 \end{equation*}

How to connect neurons and how to activate them is what neural networks are all about. Structures of those neurons are easy to grasp as long as that is about DCL or CNN. But when it comes to the structure of RNN, many study materials try to avoid showing that RNNs are also connections of neurons, as well as DCL or CNN(*If you are not sure how neurons are connected in CNN, this link should be helpful. Draw a random digit in the square at the corner.). In fact the structure of RNN is also the same, and as long as it is a simple RNN, and it is not hard to visualize its structure.

Even though RNN is also connections of neurons, usually most RNN charts are simplified, using blackboxes. In case of simple RNN, most study material would display it as the chart below.

But that also cannot be helped because fancier RNN have more complicated connections of neurons, and there are no longer advantages of displaying RNN as connections of neurons, and you would need to understand RNN in more abstract way, I mean, as you see in most of textbooks.

I am going to explain details of simple RNN in the next article of this series.

3, Neural Networks as Mappings

If you still think that neural networks are something like magical spider webs or models of brain tissues, forget that. They are just ordinary mappings.

If you have been allergic to mathematics in your life, you might have never heard of the word “mapping.” If so, at least please keep it in mind that the equation y=f(x), which most people would have seen in compulsory education, is a part of mapping. If you get a value x, you get a value y corresponding to the x.

But in case of deep learning, x is a vector or a tensor, and it is denoted with \boldsymbol{x} . If you have never studied linear algebra , imagine that a vector is a column of Excel data (only one column), a matrix is a sheet of Excel data (with some rows and columns), and a tensor is some sheets of Excel data (each sheet does not necessarily contain only one column.)

CNNs are mainly used for image processing, so their inputs are usually image data. Image data are in many cases (3, hight, width) tensors because usually an image has red, blue, green channels, and the image in each channel can be expressed as a hight*width matrix (the “height” and the “width” are number of pixels, so they are discrete numbers).

The convolutional part of CNN (which I call “feature extraction part”) maps the tensors to a vector, and the last part is usually DCL, which works as classifier/regressor. At the end of the feature extraction part, you get a vector. I call it a “semantic vector” because the vector has information of “meaning” of the input image. In this link you can see maps of pictures plotted depending on the semantic vector. You can see that even if the pictures are not necessarily close pixelwise, they are close in terms of the “meanings” of the images.

In the example of a dog/cat classifier introduced by François Chollet, the developer of Keras, the CNN maps (3, 150, 150) tensors to 2-dimensional vectors, (1, 0) or (0, 1) for (dog, cat).

Wrapping up the points above, at least you should keep two points in mind: first, DCL is a classifier or a regressor, and CNN is a feature extractor used for image processing. And another important thing is, feature extraction parts of CNNs map images to vectors which are more related to the “meaning” of the image.

Importantly, I would like you to understand RNN this way. An RNN is also just a mapping.

*I recommend you to at least take a look at the beautiful pictures in this link. These pictures give you some insight into how CNN perceive images.

4, Problems of DCL and CNN, and needs for RNN

Taking an example of RNN task should be helpful for this topic. Probably machine translation is the most famous application of RNN, and it is also a good example of showing why DCL and CNN are not proper for some tasks. Its algorithms is out of the scope of this article series, but it would give you a good insight of some features of RNN. I prepared three sentences in German, English, and Japanese, which have the same meaning. Assume that each sentence is divided into some parts as shown below and that each vector corresponds to each part. In machine translation we want to convert a set of the vectors into another set of vectors.

Then let’s see why DCL and CNN are not proper for such task.

  • The input size is fixed: In case of the dog/cat classifier I have mentioned, even though the sizes of the input images varies, they were first molded into (3, 150, 150) tensors. But in machine translation, usually the length of the input is supposed to be flexible.
  • The order of inputs does not mater: In case of the dog/cat classifier the last section, even if the input is “cat,” “cat,” “dog” or “dog,” “cat,” “cat” there’s no difference. And in case of DCL, the network is symmetric, so even if you shuffle inputs, as long as you shuffle all of the input data in the same way, the DCL give out the same outcome . And if you have learned at least one foreign language, it is easy to imagine that the orders of vectors in sequence data matter in machine translation.

*It is said English language has phrase structure grammar, on the other hand Japanese language has dependency grammar. In English, the orders of words are important, but in Japanese as long as the particles and conjugations are correct, the orders of words are very flexible. In my impression, German grammar is between them. As long as you put the verb at the second position and the cases of the words are correct, the orders are also relatively flexible.

5, Sequence Data

We can say DCL and CNN are not useful when you want to process sequence data. Sequence data are a type of data which are lists of vectors. And importantly, the orders of the vectors matter. The number of vectors in sequence data is usually called time steps. A simple example of sequence data is meteorological data measured at a spot every ten minutes, for instance temperature, air pressure, wind velocity, humidity. In this case the data is recorded as 4-dimensional vector every ten minutes.

But this “time step” does not necessarily mean “time.” In case of natural language processing (including machine translation), which you I mentioned in the last section, the numberings of each vector denoting each part of sentences are “time steps.”

And RNNs are mappings from a sequence data to another sequence data.

*At least I found a paper on the RNN’s capability of universal approximation on many-to-one RNN task. But I have not found any papers on universal approximation of many-to-many RNN tasks. Please let me know if you find any clue on whether such approximation is possible. I am desperate to know that. 

6, Types of RNN Tasks

RNN tasks can be classified into some types depending on the lengths of input/output sequences (the “length” means the times steps of input/output sequence data).

If you want to predict the temperature in 24 hours, based on several time series data points in the last 96 hours, the task is many-to-one. If you sample data every ten minutes, the input size is 96*6=574 (the input data is a list of 574 vectors), and the output size is 1 (which is a value of temperature). Another example of many-to-one task is sentiment classification. If you want to judge whether a post on SNS is positive or negative, the input size is very flexible (the length of the post varies.) But the output size is one, which is (1, 0) or (0, 1), which denotes (positive, negative).

*The charts in this section are simplified model of RNN used for each task. Please keep it in mind that they are not 100% correct, but I tried to make them as exact as possible compared to those in other study materials.

Music/text generation can be one-to-many tasks. If you give the first sound/word you can generate a phrase.

Next, let’s look at many-to-many tasks. Machine translation and voice recognition are likely to be major examples of many-to-many tasks, but here name entity recognition seems to be a proper choice. Name entity recognition is task of finding proper noun in a sentence . For example if you got two sentences “He said, ‘Teddy bears on sale!’ ” and ‘He said, “Teddy Roosevelt was a great president!” ‘ judging whether the “Teddy” is a proper noun or a normal noun is name entity recognition.

Machine translation and voice recognition, which are more popular, are also many-to-many tasks, but they use more sophisticated models. In case of machine translation, the inputs are sentences in the original language, and the outputs are sentences in another language. When it comes to voice recognition, the input is data of air pressure at several time steps, and the output is the recognized word or sentence. Again, these are out of the scope of this article but I would like to introduce the models briefly.

Machine translation uses a type of RNN named sequence-to-sequence model (which is often called seq2seq model). This model is also very important for other natural language processes tasks in general, such as text summarization. A seq2seq model is divided into the encoder part and the decoder part. The encoder gives out a hidden state vector and it used as the input of the decoder part. And decoder part generates texts, using the output of the last time step as the input of next time step.

Voice recognition is also a famous application of RNN, but it also needs a special type of RNN.

*To be honest, I don’t know what is the state-of-the-art voice recognition algorithm. The example in this article is a combination of RNN and a collapsing function made using Connectionist Temporal Classification (CTC). In this model, the output of RNN is much longer than the recorded words or sentences, so a collapsing function reduces the output into next output with normal length.

You might have noticed that RNNs in the charts above are connected in both directions. Depending on the RNN tasks you need such bidirectional RNNs.  I think it is also easy to imagine that such networks are necessary. Again, machine translation is a good example.

And interestingly, image captioning, which enables a computer to describe a picture, is one-to-many-task. As the output is a sentence, it is easy to imagine that the output is “many.” If it is a one-to-many task, the input is supposed to be a vector.

Where does the input come from? I told you that I was obsessed with the beauty of the last vector of the feature extraction part of CNN. Surprisingly the the “beautiful” vector, which I call a “semantic vector” is the input of image captioning task (after some transformations, depending on the network models).

I think this articles includes major things you need to know as prerequisites when you want to understand RNN at more mathematical level. In the next article, I would like to explain the structure of a simple RNN, and how it forward propagate.

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

As Businesses Struggle With ML, Automation Offers a Solution

In recent years, machine learning technology and the business solutions it enables has developed into a big business in and of itself. According to the industry analysts at IDC, spending on ML and AI technology is set to grow to almost $98 billion per year by 2023. In practical terms, that figure represents a business environment where ML technology has become a key priority for companies of every kind.

That doesn’t mean that the path to adopting ML technology is easy for businesses. Far from it. In fact, survey data seems to indicate that businesses are still struggling to get their machine learning efforts up and running. According to one such survey, it currently takes the average business as many as 90 days to deploy a single machine learning model. For 20% of businesses, that number is even higher.

From the data, it seems clear that something is missing in the methodologies that most companies rely on to make meaningful use of machine learning in their business workflows. A closer look at the situation reveals that the vast majority of data workers (analysts, data scientists, etc.) spend an inordinate amount of time on infrastructure work – and not on creating and refining machine learning models.

Streamlining the ML Adoption Process

To fix that problem, businesses need to turn to another growing area of technology: automation. By leveraging the latest in automation technology, it’s now possible to build an automated machine learning pipeline (AutoML pipeline) that cuts down on the repetitive tasks that slow down ML deployments and lets data workers get back to the work they were hired to do. With the right customized solution in place, a business’s ML team can:

  • Reduce the time spent on data collection, cleaning, and ingestion
  • Minimize human errors in the development of ML models
  • Decentralize the ML development process to create an ML-as-a-service model with increased accessibility for all business stakeholders

In short, an AutoML pipeline turns the high-effort functions of the ML development process into quick, self-adjusting steps handled exclusively by machines. In some use cases, an AutoML pipeline can even allow non-technical stakeholders to self-create ML solutions tailored to specific business use cases with no expert help required. In that way, it can cut ML costs, shorten deployment time, and allow data scientists to focus on tackling more complex modelling work to develop custom ML solutions that are still outside the scope of available automation techniques.

The Parts of an AutoML Pipeline

Although the frameworks and tools used to create an AutoML pipeline can vary, they all contain elements that conform to the following areas:

  • Data Preprocessing – Taking available business data from a variety of sources, cleaning it, standardizing it, and conducting missing value imputation
  • Feature Engineering – Identifying features in the raw data set to create hypotheses for the model to base predictions on
  • Model Selection – Choosing the right ML approach or hyperparameters to produce the desired predictions
  • Tuning Hyperparameters – Determining which hyperparameters help the model achieve optimal performance

As anyone familiar with ML development can tell you, the steps in the above process tend to represent the majority of the labour and time-intensive work that goes into creating a model that’s ready for real-world business use. It is also in those steps where the lion’s share of business ML budgets get consumed, and where most of the typical delays occur.

The Limitations and Considerations for Using AutoML

Given the scope of the work that can now become part of an AutoML pipeline, it’s tempting to imagine it as a panacea – something that will allow a business to reduce its reliance on data scientists going forward. Right now, though, the technology can’t do that. At this stage, AutoML technology is still best used as a tool to augment the productivity of business data teams, not to supplant them altogether.

To that end, there are some considerations that businesses using AutoML will need to keep in mind to make sure they get reliable, repeatable, and value-generating results, including:

  • Transparency – Businesses must establish proper vetting procedures to make sure they understand the models created by their AutoML pipeline, so they can explain why it’s making the choices or predictions it’s making. In some industries, such as in medicine or finance, this could even fall under relevant regulatory requirements.
  • Extensibility – Making sure the AutoML framework may be expanded and modified to suit changing business needs or to tackle new challenges as they arise.
  • Monitoring and Maintenance – Since today’s AutoML technology isn’t a set-it-and-forget-it proposition, it’s important to establish processes for the monitoring and maintenance of the deployment so it can continue to produce useful and reliable ML models.

The Bottom Line

As it stands today, the convergence of automation and machine learning holds the promise of delivering ML models at scale for businesses, which would greatly speed up the adoption of the technology and lower barriers to entry for those who have yet to embrace it. On the whole, that’s great news both for the businesses that will benefit from increased access to ML technology, as well as for the legions of data professionals tasked with making it all work.

It’s important to note, of course, that complete end-to-end ML automation with no human intervention is still a long way off. While businesses should absolutely explore building an automated machine learning pipeline to speed up development time in their data operations, they shouldn’t lose sight of the fact that they still need plenty of high-skilled data scientists and analysts on their teams. It’s those specialists that can make appropriate and productive use of the technology. Without them, an AutoML pipeline would accomplish little more than telling the business what it wants to hear.

The good news is that the AutoML tools that exist right now are sufficient to alleviate many of the real-world problems businesses face in their road to ML adoption. As they become more commonplace, there’s little doubt that the lead time to deploy machine learning models is going to shrink correspondingly – and that businesses will enjoy higher ROI and enhanced outcomes as a result.

Six properties of modern Business Intelligence

Regardless of the industry in which you operate, you need information systems that evaluate your business data in order to provide you with a basis for decision-making. These systems are commonly referred to as so-called business intelligence (BI). In fact, most BI systems suffer from deficiencies that can be eliminated. In addition, modern BI can partially automate decisions and enable comprehensive analyzes with a high degree of flexibility in use.

Let us discuss the six characteristics that distinguish modern business intelligence, which mean taking technical tricks into account in detail, but always in the context of a great vision for your own company BI:

1. Uniform database of high quality

Every managing director certainly knows the situation that his managers do not agree on how many costs and revenues actually arise in detail and what the margins per category look like. And if they do, this information is often only available months too late.

Every company has to make hundreds or even thousands of decisions at the operational level every day, which can be made much more well-founded if there is good information and thus increase sales and save costs. However, there are many source systems from the company’s internal IT system landscape as well as other external data sources. The gathering and consolidation of information often takes up entire groups of employees and offers plenty of room for human error.

A system that provides at least the most relevant data for business management at the right time and in good quality in a trusted data zone as a single source of truth (SPOT). SPOT is the core of modern business intelligence.

In addition, other data on BI may also be made available which can be useful for qualified analysts and data scientists. For all decision-makers, the particularly trustworthy zone is the one through which all decision-makers across the company can synchronize.

2. Flexible use by different stakeholders

Even if all employees across the company should be able to access central, trustworthy data, with a clever architecture this does not exclude that each department receives its own views of this data. Many BI systems fail due to company-wide inacceptance because certain departments or technically defined employee groups are largely excluded from BI.

Modern BI systems enable views and the necessary data integration for all stakeholders in the company who rely on information and benefit equally from the SPOT approach.

3. Efficient ways to expand (time to market)

The core users of a BI system are particularly dissatisfied when the expansion or partial redesign of the information system requires too much of patience. Historically grown, incorrectly designed and not particularly adaptable BI systems often employ a whole team of IT staff and tickets with requests for change requests.

Good BI is a service for stakeholders with a short time to market. The correct design, selection of software and the implementation of data flows / models ensures significantly shorter development and implementation times for improvements and new features.

Furthermore, it is not only the technology that is decisive, but also the choice of organizational form, including the design of roles and responsibilities – from the technical system connection to data preparation, pre-analysis and support for the end users.

4. Integrated skills for Data Science and AI

Business intelligence and data science are often viewed and managed separately from each other. Firstly, because data scientists are often unmotivated to work with – from their point of view – boring data models and prepared data. On the other hand, because BI is usually already established as a traditional system in the company, despite the many problems that BI still has today.

Data science, often referred to as advanced analytics, deals with deep immersion in data using exploratory statistics and methods of data mining (unsupervised machine learning) as well as predictive analytics (supervised machine learning). Deep learning is a sub-area of ​​machine learning and is used for data mining or predictive analytics. Machine learning is a sub-area of ​​artificial intelligence (AI).

In the future, BI and data science or AI will continue to grow together, because at the latest after going live, the prediction models flow back into business intelligence. BI will probably develop into ABI (Artificial Business Intelligence). However, many companies are already using data mining and predictive analytics in the company, using uniform or different platforms with or without BI integration.

Modern BI systems also offer data scientists a platform to access high-quality and more granular raw data.

5. Sufficiently high performance

Most readers of these six points will probably have had experience with slow BI before. It takes several minutes to load a daily report to be used in many classic BI systems. If loading a dashboard can be combined with a little coffee break, it may still be acceptable for certain reports from time to time. At the latest, however, with frequent use, long loading times and unreliable reports are no longer acceptable.

One reason for poor performance is the hardware, which can be almost linearly scaled to higher data volumes and more analysis complexity using cloud systems. The use of cloud also enables the modular separation of storage and computing power from data and applications and is therefore generally recommended, but not necessarily the right choice for all companies.

In fact, performance is not only dependent on the hardware, the right choice of software and the right choice of design for data models and data flows also play a crucial role. Because while hardware can be changed or upgraded relatively easily, changing the architecture is associated with much more effort and BI competence. Unsuitable data models or data flows will certainly bring the latest hardware to its knees in its maximum configuration.

6. Cost-effective use and conclusion

Professional cloud systems that can be used for BI systems offer total cost calculators, such as Microsoft Azure, Amazon Web Services and Google Cloud. With these computers – with instruction from an experienced BI expert – not only can costs for the use of hardware be estimated, but ideas for cost optimization can also be calculated. Nevertheless, the cloud is still not the right solution for every company and classic calculations for on-premise solutions are necessary.

Incidentally, cost efficiency can also be increased with a good selection of the right software. Because proprietary solutions are tied to different license models and can only be compared using application scenarios. Apart from that, there are also good open source solutions that can be used largely free of charge and can be used for many applications without compromises.

However, it is wrong to assess the cost of a BI only according to its hardware and software costs. A significant part of cost efficiency is complementary to the aspects for the performance of the BI system, because suboptimal architectures work wastefully and require more expensive hardware than neatly coordinated architectures. The production of the central data supply in adequate quality can save many unnecessary processes of data preparation and many flexible analysis options also make redundant systems unnecessary and lead to indirect savings.

In any case, a BI for companies with many operational processes is always cheaper than no BI. However, if you take a closer look with BI expertise, cost efficiency is often possible.

Simplify Vendor Onboarding with Automated Data Integration

Vendor onboarding is a key business process that involves collecting and processing large data volumes from one or multiple vendors. Business users need vendor information in a standardized format to use it for subsequent data processes. However, consolidating and standardizing data for each new vendor requires IT teams to write code for custom integration flows, which can be a time-consuming and challenging task.

In this blog post, we will talk about automated vendor onboarding and how it is far more efficient and quicker than manually updating integration flows.

Problems with Manual Integration for Vendor Onboarding

During the onboarding process, vendor data needs to be extracted, validated, standardized, transformed, and loaded into the target system for further processing. An integration task like this involves coding, updating, and debugging manual ETL pipelines that can take days and even weeks on end.

Every time a vendor comes on board, this process is repeated and executed to load the information for that vendor into the unified business system. Not just this, but because vendor data is often received from disparate sources in a variety of formats (CSV, Text, Excel), these ETL pipelines frequently break and require manual fixes.

All this effort is not suitable, particularly for large-scale businesses that onboard hundreds of vendors each month. Luckily, there is a faster alternative available that involves no code-writing.

Automated Data Integration

The manual onboarding process can be automated using purpose-built data integration tools.

To help you better understand the advantages, here is a step-by-step guide on how automated data integration for vendor onboarding works:

  1. Vendor data is retrieved from heterogeneous sources such as databases, FTP servers, and web APIs through built-in connectors available in the solution.
  2. The data from each file is validated by passing it through a set of predefined quality rules – this step helps in eliminating records with missing, duplicate, or incorrect data.
  3. Transformations are applied to convert input data into the desired output format or screen vendors based on business criteria. For example, if the vendor data is stored in Excel sheets and the business uses SQL Server for data storage, then the data has to be mapped to the relevant fields in the SQL Server database, which is the destination.
  4. The standardized, validated data is then loaded into a unified enterprise database that you can use as the source of information for business processes. In some cases, this can be a staging database where you can perform further filtering and aggregation to build a consolidated vendor database.
  5. This entire ETL pipeline (Step 1 through Step 4) can then be automated through event-based or time-based triggers in a workflow. For instance, you may want to run the pipeline once every day, or once a new file/data point is available in your FTP server.

Why Build a Consolidated Database for Vendors?

Once the ETL pipeline runs, you will end up with a consolidated database with complete vendor information. The main benefit of having a unified database is that it would have filtered information regarding vendors.

Most businesses have a strict process for screening vendors that follows a set of predefined rules. For example, you may want to reject vendors that have a poor credit history automatically. With manual data integration, you would need to perform this filtering by writing code. Automated data integration allows you to apply pre-built filters directly within your ETL pipeline to flag or remove vendors with a credit score lower than the specified threshold.

This is just one example; you can perform a wide range of tasks at this level in your ETL pipeline including vendor scoring (calculated based on multiple fields in your data), filtering (based on rules applied to your data), and data aggregation (to add measures to your data) to build a robust vendor database for decision-making and subsequent processes.

Conclusion

Automated vendor onboarding offers cost-and-time benefits to your organization. Making use of enterprise-grade data integration tools ensures a seamless business-to-vendor data exchange without the need for reworking and upgrading your ETL pipelines.

Interview – Predictive Maintenance and how it can unleash cost savings

Interview with Dr. Kai Goebel, Principal Scientist at PARC, a Xerox Company, about Predictive Maintenance and how it can unleash cost savings.

Dr. Kai Goebel is principal scientist as PARC with more than two decades experience in corporate and government research organizations. He is responsible for leading applied research on state awareness, prognostics and decision-making using data analytics, AI, hybrid methods and physics-base methods. He has also fielded numerous applications for Predictive Maintenance at General Electric, NASA, and PARC for uses as diverse as rocket launchpads, jet engines, and chemical plants.

Data Science Blog: Mr. Goebel, predictive maintenance is not just a hype since industrial companies are already trying to establish this use case of predictive analytics. What benefits do they really expect from it?

Predictive Maintenance is a good example for how value can be realized from analytics. The result of the analytics drives decisions about when to schedule maintenance in advance of an event that might cause unexpected shutdown of the process line. This is in contrast to an uninformed process where the decision is mostly reactive, that is, maintenance is scheduled because equipment has already failed. It is also in contrast to a time-based maintenance schedule. The benefits of Predictive Maintenance are immediately clear: one can avoid unexpected downtime, which can lead to substantial production loss. One can manage inventory better since lead times for equipment replacement can be managed well. One can also manage safety better since equipment health is understood and safety averse situations can potentially be avoided. Finally, maintenance operations will be inherently more efficient as they shift significant time from inspection to mitigation of.

Data Science Blog: What are the most critical success factors for implementing predictive maintenance?

Critical for success is to get the trust of the operator. To that end, it is imperative to understand the limitations of the analytics approach and to not make false performance promises. Often, success factors for implementation hinge on understanding the underlying process and the fault modes reasonably well. It is important to be able to recognize the difference between operational changes and abnormal conditions. It is equally important to recognize rare events reliably while keeping false positives in check.

Data Science Blog: What kind of algorithm does predictive maintenance work with? Do you differentiate between approaches based on classical machine learning and those based on deep learning?

Well, there is no one kind of algorithm that works for Predictive Mantenance everywhere. Instead, one should look at the plurality of all algorithms as tools in a toolbox. Then analyze the problem – how many examples for run-to-failure trajectories are there; what is the desired lead time to report on a problem; what is the acceptable false positive/false negative rate; what are the different fault modes; etc – and use the right kind of tool to do the job. Just because a particular approach (like the one you mentioned in your question) is all the hype right now does not mean it is the right tool for the problem. Sometimes, approaches from what you call “classical machine learning” actually work better. In fact, one should consider approaches even outside the machine learning domain, either as stand-alone approach as in a hybrid configuration. One may also have to invent new methods, for example to perform online learning of the dynamic changes that a system undergoes through its (long) life. In the end, a customer does not care about what approach one is using, only if it solves the problem.

Data Science Blog: There are several providers for predictive analytics software. Is it all about software tools? What makes the difference for having success?

Frequently, industrial partners lament that they have to spend a lot of effort in teaching a new software provider about the underlying industrial processes as well as the equipment and their fault modes. Others are tired of false promises that any kind of data (as long as you have massive amounts of it) can produce any kind of performance. If one does not physically sense a certain modality, no algorithmic magic can take place. In other words, it is not just all about the software. The difference for having success is understanding that there is no cookie cutter approach. And that realization means that one may have to role up the sleeves and to install new instrumentation.

Data Science Blog: What are coming trends? What do you think will be the main topic 2020 and 2021?

Predictive Maintenance is slowly evolving towards Prescriptive Maintenance. Here, one does not only seek to inform about an impending problem, but also what to do about it. Such an approach needs to integrate with the logistics element of an organization to find an optimal decision that trades off several objectives with regards to equipment uptime, process quality, repair shop loading, procurement lead time, maintainer availability, safety constraints, contractual obligations, etc.

Conversion Rate Optimization: Understanding the Sales Funnel

Are you capturing the attention of consumers or prospects with your content? Do they trust you enough to give you their contact information? Will they come back and buy from you again? Knowing how the sales funnel works and what you can do to improve it will take you down the road of success.

Business 101

As a business owner, your goal is to turn a prospect (meaning a prospective buyer) into a loyal customer. Nobody wants to lose a possible customer after putting a lot of effort into the attempt of establishing a relationship. Once you understand the different stages of the sales funnel, it will be easier to find cracks and holes within. The following sections unpack how sales funnel management can help you optimize your conversion rate and build a successful long-term relationship with your customers and website users.

The Sales Funnel

The sales funnel describes the path a customer takes on the way to buying a product or service. It visualizes the typical journey they go through and in which stage of the buying decision prospects are at the moment. As one of the core concepts in digital marketing, sales funnel management can help you to understand your audience and prevent them from dropping out before a sale is made. It is about giving every potential customer the treatment they are looking for. If you don’t understand your sales funnel, you can’t optimize it. What matters most when it comes to a sales funnel is website optimization.

Prospects move from the top of the funnel to the bottom as they become more familiar with what you have to offer. The sales funnel narrows as visitors move through it, and the number of people in your funnel will continue to decrease the closer you get to sealing the deal. It starts at the top with all the prospects who landed on your website one way or another, while the narrow bottom represents loyal customers.

The 4 Stages of the Sales Funnel

Moving people through the funnel can be a challenge. A stratagem to keep in mind is that your goal should be to solve the “problems” of your customers, or potentially make them aware of a problem they didn’t even know existed. Start by creating content that attracts your prospect’s attention, followed by offering an irresistible solution to the problem. All you have to do then is watch the magic happen.

Truthfully, that is easier said than done, but if you follow the four stages of a prospective customer’s mindset, you will reach your goal sooner than later. The different stages can be easily explained using the AIDA (Awareness, Interest, Decision, Action) strategy. To understand what moves a buying decision, we have to take a closer look at each stage and the approach it requires.

Awareness

To end up with a strong bond with your prospect, you have to gain attention first. Depending on how they found you (organic search results, recommendations, advertisements, or just pure luck), people will put different amounts of trust in your business. If you are lucky and all circumstances fall perfectly into place, a prospect turns into a customer immediately. More often though, the awareness stage does exactly what it sounds like; it creates awareness of your business and your products or services. At this point, all you are trying to do is lead prospects into the next stage, which will make them return for more.

Interest

Once a potential customer is aware of you, you need to build their interest. In this stage potential customers are interested in what you have to offer and are doing research or comparison. It is the perfect time to show off authority in your field and support them with helpful content that does not yet try to sell to them. Make sure your message stays consistent throughout the whole process and do not try to push too hard from the beginning. The interest stage should only lead them to be able to make an informed decision.

Decision

For the most part, the majority of people do not like making decisions and, therefore, getting a prospect to make a buying decision is not an easy feat. At this stage, you have to bring on your A-game and make them an offer they can’t refuse. Whether this means offering free premium shipping, a discount code, or a free month of your services is totally up to you; you just have to make sure that your potential customer wants to take advantage of it. Showcasing positive reviews or social proof is another powerful way that you can get people to take action.

 Action

Now your prospect turns into a customer. When he or she purchases your product or takes advantage of your service, that customer becomes part of your business’s ecosystem. But just because they reached the final stage of the sales funnel and the AIDA principle doesn’t mean your work is all said and done. Starting to build a long-term relationship with someone who already trusts your company is easier than starting the sales funnel all over again with a new prospect.

Sales Funnel Management

At this point, you should understand why sales funnel management is so important. Even the best prospects can get lost along the way if expectations aren’t met. It takes time to build a sales funnel that represents what your audience is looking for. The best way to optimize a sales funnel is to start with the results and work your way up. Another point of interest is the timing when people move from one point to the next within the funnel. This can help you find out where, when, and why you’re losing potential customers.

Too slow: New leads are nine times more likely to convert if someone follows up within the first five minutes. On the other hand, a lead is 21 times less likely to turn into a sale after 30 minutes have passed. To react within tight response times like that, you need to implement sales funnel management automation.

Too impatient: It can be tempting to dump a lead that isn’t converting right away and move on to the next. You should ask yourself the question if you are patient enough and if you are following up as much as you should. A marketing automation funnel also helps to stay in touch with the prospect over time.

Too fast: Instead of asking people to buy from you right away, you should cultivate them over time. If you adjust your sales approach to the different stages, you don’t just avoid chasing them away; you also find out what is working and what is a waste of your time.

How can you optimize your conversion rate?

There are countless ways you can improve your conversion rate and turn a “no, thank you” into a “yes, please.” In sales, a no often simply means “not until later” or “try again, I’m just not totally convinced yet.” Any time you encounter problems like that, you can use one or multiple of the following, mostly automated sales techniques, to reach your goals.

Target your Audience

To lead people into your sales funnel, you have to put the right content in front of your prospects. How and where you do that depends on your target audience. Be creative with your content, but make sure it mimics your offer and the call-to-action you are using. Customer relationship management (CRM) can help you track interactions with current and future customers.

Build a Landing Page

A landing page offers content that addresses a specific problem, ideally with a single call-to-action, and should steer your visitor towards becoming a customer. A/B testing your landing pages will help you figure out what your audience responds to best and what language, imagery, or layouts can help you improve conversion rates. Experienced hosting companies like 101domain can help you along the way. Additionally, you can use pay-per-click campaigns to drive traffic to your landing page and contact forms to gain subscribers to a mailing list.

Targeting Soft Conversions

When considering which page to use as a landing page, you can increase your conversion rate by bringing leads to an on-site resource to gain a “soft conversion.”

 To illustrate the importance of a good landing page and soft conversions, consider the following data:

RED: Cost per conversion BLUE: Number of conversions X-AXIS: Time (Screenshot supplied by Howard Ahmanson)

The initial strategy represented in this graph was to take visitors directly to a sales page. This resulted in a very low number of conversions, about a rate of 1%,, which in turn drove the cost per conversion way up. Later, the landing page was switched to an on-site resource, such as  a form fill of “get the free retirement planning guide.” This prompted a few soft conversions, or in other words email addresses. Upon doing this, the average number of conversions per month increased from about 10 to between 30 and 45, which in turn dropped the total cost per conversion from a median of about $400 to about $100. This is an approximately 300% increase in conversions at 50% of the cost.

But how does increased conversions translate in terms of sales numbers? To see an example of this, consider the data from the Ken Tamplin Vocal Academy:

RED: Total conversion, including soft conversions
BLUE: Sales conversions
X-AXIS: Time

When running ads for Ken, the initial strategy was to bring prospects directly to a sales page. Later, this was switched out for a “Yes! I want Ken’s free lessons!” page.

This led to an increase in the number of soft conversions, which led to a tightly correlated increase in sales. There was an increase from around 30 conversions per month up to over 225, which is an increase of 750%.

Create an Email Drip Campaign

Email drip campaigns are used to send a pre-written set of emails to subscribers or customers over time. You can use those campaigns to educate the receiver as well as make them aware of sales or offers. Last but not least, don’t forget about existing customers. This technique is ideal for building up loyalty and making them feel like part of the family.