Data Science im Vertrieb

Data Science im Vertrieb – Praxisbeispiel

Wie Sie mit einer automatisierten Lead-Priorisierung zu erfolgreichen Geschäftsabschlüssen kommen.

Die Fragestellung:

Ein Softwareunternehmen generierte durch Marketing- und Sales-Aktivitäten eine große Anzahl potenzieller Leads, die nicht alle gleichzeitig bearbeitet werden konnten. Die zentrale Frage war nun: Wie kann eine Priorisierung der Leads erfolgen, sodass erfolgsversprechende Leads zuerst bearbeitet werden können?

Definition: Ein Lead bezeichnet einen Kontakt zu einem/einer potenziellen Kund:in, die/der sich für ein Produkt oder eine Dienstleistung eines Unternehmens interessiert und deren/dessen Kontaktdaten dem Unternehmen vorliegen. Solche Leads können durch Online- und Offline-Werbemaßnahmen gewonnen werden.

In der Vergangenheit beruhte die Priorisierung und somit auch die Bearbeitung der Leads in dem Unternehmen häufig auf der persönlichen Erfahrung der zuständigen Vertriebsmitarbeiter:innen. Diese Vorgehensweise ist  jedoch sehr ressourcenintensiv und stark abhängig von der Erfahrung einzelner Vertriebsmitarbeiter:innen.

Aus diesem Grund beschloss das Unternehmen, ein KI-gestütztes System zu entwickeln, welches zum einen erfolgsversprechende Leads datenbasiert priorisiert und zum anderen Handlungsempfehlungen für die Vertriebsmitarbeiter:innen bereitstellt.

Das Vorgehen

Grundlage dieses Projektes waren bereits vorhandene Daten zu früheren Leads sowie CRM-Daten zu bereits geschlossenen Aufträgen und Deals mit diesen früheren Leads. Dazu gehörten beispielsweise:

  • Firma des Leads
  • Firmengröße des Leads
  • Branche des Leads
  • Akquisekanal, über den der Lead generiert wurde
  • Dauer bis Antwort durch Vertriebsmitarbeiter:in
  • Wochentag der Antwort
  • Kanal der Antwort

Diese Daten aus der Vergangenheit konnten zunächst einer explorativen Datenanalyse unterzogen werden, bei der untersucht wurde, inwiefern die Eigenschaften der Leads und das Verhalten der Vertriebsmitarbeiter:innen in der Vergangenheit einen Einfluss darauf hatten, ob es mit einem Lead zu einem Geschäftsabschluss kam oder nicht.

Diese Erkenntnisse aus den vergangenen Leads sollten jedoch nun auch auf aktuelle bzw. zukünftige Leads und die damit verbundenen Vertriebsaktivitäten übertragen werden. Deshalb ergaben sich aus der explorativen Datenanalyse zwei weiterführende Fragen:

  • Durch welche Merkmale zeichnen sich Leads aus, die mit einer hohen Wahrscheinlichkeit zu einem Geschäftsabschluss führen?
  • Welche Aktivitäten der Vertriebsmitarbeiter:innen führen zu einem Geschäftsabschluss?

Leads priorisieren

Durch die explorative Datenanalyse konnte das Unternehmen bereits erste Einblicke in die verschiedenen Eigenschaften der Leads erlangen. Bei einigen dieser Eigenschaften ist anzunehmen, dass sie die Wahrscheinlichkeit erhöhen, dass ein:e potenzielle:r Kund:in Interesse am Produkt des Unternehmens zeigt. Es gibt mehrere Wege, um die Erkenntnisse aus der explorativen Datenanalyse nun für zukünftiges Verhalten der Vertriebsmitarbeiter:innen zu nutzen.

Regelbasiertes Vorgehen

Auf Grundlage der explorativen Datenanalyse und der dort gewonnenen Erkenntnisse könnte das Unternehmen, z. B. dessen Vertriebsleitung, bestimmte Regeln oder Kriterien definieren, wie beispielsweise die Unternehmensgröße des Kunden oder die Branche. So könnte die Vertriebsleitung anordnen, dass Leads aus größeren Unternehmen oder aus Unternehmen aus dem Energiesektor priorisiert behandelt werden sollten, weil diese Leads auch in der Vergangenheit zu erfolgreichen Geschäftsabschlüssen geführt haben.

Der Vorteil eines solchen regelbasierten Vorgehens ist, dass es einfach zu definieren und schnell umzusetzen ist.

Der Nachteil ist jedoch, dass die hier definierten Regeln sehr starr sind und dass Menschen meist nicht in der Lage sind, mehr als zwei oder drei der Eigenschaften gleichzeitig zu betrachten. Obwohl sich die Regeln dann zwar grundsätzlich an den Erkenntnissen aus den Daten orientieren, hängen sie doch immer noch stark vom Bauchgefühl der Vertriebsleitung ab.

Clustering

Ein besserer Ansatz war es, die vergangenen Leads anhand aller verfügbaren Eigenschaften in Gruppen einzuteilen, innerhalb derer die Leads sich einander stark ähneln. Hierfür kommt ein maschinelles Lernverfahren namens Clustering zum Einsatz, welches genau dieses Ziel verfolgt: Beim Clustering werden Datenpunkte, also in diesem Falle die Leads, anhand ihrer Eigenschaften, also beispielsweise die Unternehmensgröße oder die Branche, aber auch ob es zu einem Geschäftsabschluss kam oder nicht, zusammengefasst.

Beispiel: Leads aus Unternehmen zwischen 500 und 999 Mitarbeitern aus der Energiebranche kauften 250 Lizenzen der Software A.

Kommt nun ein neuer Lead hinzu, kann er anhand seiner bereits bekannten Eigenschaften einem Cluster zugeordnet werden. Anschließend können die Vertriebsmitarbeiter:innen jene Leads priorisieren, die einem Cluster zugeordnet worden sind, in dem in der Vergangenheit bereits häufig erfolgreich Geschäfte abgeschlossen worden sind.

Der Vorteil eines solchen datenbasierten Vorgehens ist, dass eine Vielzahl an Kriterien gleichzeitig in die Priorisierung einbezogen werden kann.

Erfolgsführende Aktivitäten identifizieren

Process Mining

Im zweiten Schritt wurde eine weitere Frage gestellt: Welche Aktivitäten der Vertriebsmitarbeiter:innen führen zu einem erfolgreichen Geschäftsabschluss mit einem Lead? Dabei standen nicht nur die Leistungen einzelner Mitarbeiter:innen im Fokus, sondern auch die übergreifenden Muster, die beim Vergleich der verschiedenen Mitarbeiter:innen deutlich wurden. Mithilfe von Process Mining konnte festgestellt werden, welche Maßnahmen und Aktivitäten der Vertriebler:innen im Umgang mit einem Lead zum Erfolg bzw. zu einem Misserfolg geführt hatten. Weniger erfolgsversprechende Maßnahmen konnten somit in der Zukunft vermieden werden.

Vor allem zeitliche Aspekte spielten hierbei eine Rolle: Parameter, die aussagten, wie schnell oder an welchem Wochentag Leads eine Antwort erhielten, waren entscheidend für erfolgreiche Geschäftsabschlüsse. Diese Erkenntnisse konnte das Unternehmen dann in zukünftige Sales Trainings sowie die Sales-Strategie einfließen lassen.

Die Ergebnisse

In diesem Projekt konnte die Sales-Abteilung des Softwareunternehmens durch zwei verschiedene Ansätze die Priorisierung der Leads und damit die Geschäftsabschlüsse deutlich verbessern:

  • Priorisierung der Leads

Mithilfe des Clustering war es möglich, Leads in Gruppen einzuteilen, die sich in ihren Eigenschaften ähneln, u.a. auch in der Eigenschaft, ob es zu einem Geschäftsabschluss kommt oder nicht. Neue Leads wurden den verschiedenen Clustern zuordnen. Leads, die einem Cluster mit hoher Erfolgswahrscheinlichkeit zugeordnet wurden, konnten nun priorisiert bearbeitet werden.

  • Erfolgsversprechende Aktivitäten identifizieren

Mithilfe von Process Mining wurden erfolgsversprechende Aktivitäten der Sales-Mitarbeiter:innen identifiziert und skaliert. Umgekehrt wurden wenig erfolgsversprechende Aktivitäten erkannt und eliminiert, um Ressourcen zu sparen.

Infolgedessen konnte das Softwareunternehmen Leads erfolgreicher bearbeiten und höhere Umsätze erzielen. 

How to speed up claims processing with automated car damage detection

AI drives automation, not only in industrial production or for autonomous driving, but above all in dealing with bureaucracy. It is an realy enabler for lean management!

One example is the use of Deep Learning (as part of Artificial Intelligence) for image object detection. A car insurance company checks the amount of the damage by a damage report after car accidents. This process is actually performed by human professionals. With AI, we can partially automate this process using image data (photos of car damages). After an AI training with millions of photos in relation to real costs for repair or replacement, the cost estimation gets suprising accurate and supports the process in speed and quality.

AI drives automation and DATANOMIQ drives this automation with you! You can download the Infographic as PDF.

How to speed up claims processing with automated car damage detection

How to speed up claims processing
with automated car damage detection

Download this Infographic as PDF now by clicking here!

We wrote this article in cooperation with pixolution, a company for computer vision and AI-bases visual search. Interested in introducing AI / Deep Learning to your organization? Do not hesitate to get in touch with us!

DATANOMIQ is the independent consulting and service partner for business intelligence, process mining and data science. We are opening up the diverse possibilities offered by big data and artificial intelligence in all areas of the value chain. We rely on the best minds and the most comprehensive method and technology portfolio for the use of data for business optimization.

How Do Various Actor-Critic Based Deep Reinforcement Learning Algorithms Perform on Stock Trading?

Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy

Abstract

Deep Reinforcement Learning (DRL) is a blooming field famous for addressing a wide scope of complex decision-making tasks. This article would introduce and summarize the paper “Deep Reinforcement Learning for Automated Stock Trading: An Ensemble Strategy”, and discuss how these actor-critic based DRL learning algorithms, Proximal Policy Optimization (PPO), Advantage Actor Critic (A2C), and Deep Deterministic Policy Gradient (DDPG), act to accomplish automated stock trading by boosting investment return.

1 Motivation and Related Technology

It has long been challenging to design a comprehensive strategy for capital allocation optimization in a complex and dynamic stock market. With development of Artificial Intelligence, machine learning coupled with fundamentals analysis and alternative data has been in trend and provides better performance than conventional methodologies. Reinforcement Learning (RL) as a branch of it, is able to learn from interactions with environment, during which the agent continuously absorbs information, takes actions, and learns to improve its policy regarding rewards or losses obtained. On top of that, DRL utilizes neural networks as function approximators to approximate the Q-value (the expected reward of each action) in RL, which in return adjusts RL for large-scale data learning.

In DRL, the critic-only approach is capable for solving discrete action space problems, calculating Q-value to learn the optimal action-selection policy. On the other side, the actor-only approach, used in continuous action space environments, directly learns the optimal policy itself. Combining both, the actor-critic algorithm simultaneously updates the actor network representing the policy, and critic network representing the value function. The critic estimates the value function, while the actor updates the policy guided by the critic with policy gradients.

Overview of reinforcement learning-based stock theory.

Figure 1: Overview of reinforcement learning-based stock theory.

2 Mathematical Modeling

2.1 Stock Trading Simulation

Given the stochastic nature of stock market, the trading process is modeled as a Markov Decision Process (MDP) as follows:

  • State s = [p, h, b]: a vector describing the current state of the portfolio consists of D stocks, includes stock prices vector p, the stock shares vector h, and the remaining balance b.
  • Action a: a vector of actions which are selling, buying, or holding (Fig.2), resulting in decreasing, increasing, and no change of shares h, respectively. The number of shares been transacted is recorded as k.
  • Reward r(s, a, s’): the reward of taking action a at state s and arriving at the new state s’.
  • Policy π(s): the trading strategy at state s, which is the probability distribution of actions.
  • Q-value : the expected reward of taking action a at state s following policy π.
A starting portfolio value with three actions result in three possible portfolios.

A starting portfolio value with three actions result in three possible portfolios. Note that “hold” may lead to different portfolio values due to the changing stock prices.

Besides, several assumptions and constraints are proposed for practice:

  • Market liquidity: the orders are rapidly executed at close prices.
  • Nonnegative balance: the balance at time t+1 after taking actions at t, equals to the original balance plus the proceeds of selling minus the spendings of buying:
  • Transaction cost: assume the transaction costs to be 0.1% of the value of each trade:
  • Risk-aversion: to control the risk of stock market crash caused by major emergencies, the financial turbulence index that measures extreme asset price movements is introduced:

    where  denotes the stock returns, µ and Σ are respectively the average and covariance of historical returns. When  exceeds a threshold, buying will be halted and the agent sells all shares. Trading will be resumed once  returns to normal level.

2.2 Trading Goal: Return Maximation

The goal is to design a trading strategy that raises agent’s total cumulative compensation given by the reward function:

and then considering the transition of the shares and the balance defined as:

the reward can be further decomposed:

where:

At inception, h and Q_{\pi}(s,a) are initialized to 0, while the policy π(s) is uniformly distributed among all actions. Afterwards, everything is updated through interacting with the stock market environment. By the Bellman Equation, Q_{\pi}(s_t, a_t) is the expectation of the sum of direct reward r(s_t,a_t,s_{t+1} and the future reqard Q_{\pi}(s{t+1}, a_{a+1}) at the next state discounted by a factor γ, resulting in the state-action value function:

2.3 Environment for Multiple Stocks

OpenAI gym is used to implement the multiple stocks trading environment and to train the agent.

  1. State Space: a vector [b_t, p_t, h_t, M_t, R_t, C_t, X_t] storing information about
    b_t: Portfolio balance
    p_t: Adjusted close prices
    h_t: Shares owned of each stock
    M_t: Moving Average Convergence Divergence
    R_t: Relative Strength Index
    C_t: Commodity Channel Index
    X_t: Average Directional Index
  2. Action Space: {−k, …, −1, 0, 1, …, k} for a single stock, whose elements representing the number of shares to buy or sell. The action space is then normalized to [−1, 1], since A2C and PPO are defined directly on a Gaussian distribution.
Overview of the load-on-demand technique.

Overview of the load-on-demand technique.

Furthermore, a load-on-demand technique is applied for efficient use of memory as shown above.

  1. Algorithms Selection

This paper mainly uses the following three actor-critic algorithms:

  • A2C: uses parallel copies of the same agent to update gradients for different data samples, and a coordinator to pass the average gradients over all agents to a global network, which can update the actor and the critic network, with the objective function:
  • where \pi_{\theta}(a_t|s_t) is the policy network, and A(S_t|a_t) is the advantage function to reduce the high variance of it:
  • V(S_t)is the value function of state S_t, regardless of actions. DDPG: combines the frameworks of Q-learning and policy gradients and uses neural networks as function approximators; it learns directly from the observations through policy gradient and deterministically map states to actions. The Q-value is updated by:
    Critic network is then updated by minimizing the loss function:
  • PPO: controls the policy gradient update to ensure that the new policy does not differ too much from the previous policy, with the estimated advantage function and a probability ratio:

    The clipped surrogate objective function:

    takes the minimum of the clipped and normal objective to restrict the policy update at each step and improve the stability of the policy.

An ensemble strategy is finally proposed to combine the three agents together to build a robust trading strategy. After training and testing the three agents concurrently, in the trading stage, the agent with the highest Sharpe ratio in one period will be automatically selected to use in the next period.

  1. Implementation: Training and Validation

The historical daily trading data comes from the 30 DJIA constituent stocks.

Stock data splitting in-sample and out-of-sample

Stock data splitting in-sample and out-of-sample.

  • In-sample training stage: data from 01/01/2009 – 09/30/2015 used to train 3 agents using PPO, A2C, and DDPG;
  • In-sample validation stage: data from 10/01/2015 – 12/31/2015 used to validate the 3 agents by 5 metrics: cumulative return, annualized return, annualized volatility, Sharpe ratio, and max drawdown; tune key parameters like learning rate and number of episodes;
  • Out-of-sample trading stage: unseen data from 01/01/2016 – 05/08/2020 to evaluate the profitability of algorithms while continuing training. In each quarter, the agent with the highest Sharpe ratio is selected to act in the next quarter, as shown below.

    Table 1 - Sharpe Ratios over time.

    Table 1 – Sharpe Ratios over time.

  1. Results Analysis and Conclusion

From Table II and Fig.5, one can notice that PPO agent is good at following trend and performs well in chasing for returns, with the highest cumulative return 83.0% and annual return 15.0% among the three agents, indicating its appropriateness in a bullish market. A2C agent is more adaptive to handle risk, with the lowest annual volatility 10.4% and max drawdown −10.2%, suggesting its capability in a bearish market. DDPG generates the lowest return among the three, but works fine under risk, with lower annual volatility and max drawdown than PPO. Apparently all three agents outperform the two benchmarks.

Table 2 - Performance Evaluation Comparison.

Table 2 – Performance Evaluation Comparison.

Moreover, it is obvious in Fig.6 that the ensemble strategy and the three agents act well during the 2020 stock market crash, when the agents successfully stops trading, thus cutting losses.

Performance during the stock market crash in the first quarter of 2020.

Performance during the stock market crash in the first quarter of 2020.

From the results, the ensemble strategy demonstrates satisfactory returns and lowest volatilities. Although its cumulative returns are lower than PPO, it has achieved the highest Sharpe ratio 1.30 among all strategies. It is reasonable that the ensemble strategy indeed performs better than the individual algorithms and baselines, since it works in a way each elemental algorithm is supplementary to others while balancing risk and return.

For further improvement, it will be inspiring to explore more models such as Asynchronous Advantage Actor-Critic (A3C) or Twin Delayed DDPG (TD3), and to take more fundamental analysis indicators or ESG factors into consideration. While more sophisticated models and larger datasets are adopted, improvement of efficiency may also be a challenge.

Automated product quality monitoring using artificial intelligence deep learning

How to maintain product quality with deep learning

Deep Learning helps companies to automate operative processes in many areas. Industrial companies in particular also benefit from product quality assurance by automated failure and defect detection. Computer Vision enables automation to identify scratches and cracks on product item surfaces. You will find more information about how this works in the following infografic from DATANOMIQ and pixolution you can download using the link below.

How to maintain product quality with automatic defect detection - Infographic

How to maintain product quality with automatic defect detection – Infographic

Variational Autoencoders

After Deep Autoregressive Models and Deep Generative Modelling, we will continue our discussion with Variational AutoEncoders (VAEs) after covering up DGM basics and AGMs. Variational autoencoders (VAEs) are a deep learning method to produce synthetic data (images, texts) by learning the latent representations of the training data. AGMs are sequential models and generate data based on previous data points by defining tractable conditionals. On the other hand, VAEs are using latent variable models to infer hidden structure in the underlying data by using the following intractable distribution function: 

(1)   \begin{equation*} p_\theta(x) = \int p_\theta(x|z)p_\theta(z) dz. \end{equation*}

The generative process using the above equation can be expressed in the form of a directed graph as shown in Figure ?? (the decoder part), where latent variable z\sim p_\theta(z) produces meaningful information of x \sim p_\theta(x|z).

Architectures AE and VAE based on the bottleneck architecture. The decoder part work as a generative model during inference.

Figure 1: Architectures AE and VAE based on the bottleneck architecture. The decoder part work as
a generative model during inference.

Autoencoders

Autoencoders (AEs) are the key part of VAEs and are an unsupervised representation learning technique and consist of two main parts, the encoder and the decoder (see Figure ??). The encoders are deep neural networks (mostly convolutional neural networks with imaging data) to learn a lower-dimensional feature representation from training data. The learned latent feature representation z usually has a much lower dimension than input x and has the most dominant features of x. The encoders are learning features by performing the convolution at different levels and compression is happening via max-pooling.

On the other hand, the decoders, which are also a deep convolutional neural network are reversing the encoder’s operation. They try to reconstruct the original data x from the latent representation z using the up-sampling convolutions. The decoders are pretty similar to VAEs generative models as shown in Figure 1, where synthetic images will be generated using the latent variable z.

During the training of autoencoders, we would like to utilize the unlabeled data and try to minimize the following quadratic loss function:

(2)   \begin{equation*} \mathcal{L}(\theta, \phi) = ||x-\hat{x}||^2, \end{equation*}


The above equation tries to minimize the distance between the original input and reconstructed image as shown in Figure 1.

Variational autoencoders

VAEs are motivated by the decoder part of AEs which can generate the data from latent representation and they are a probabilistic version of AEs which allows us to generate synthetic data with different attributes. VAE can be seen as the decoder part of AE, which learns the set parameters \theta to approximate the conditional p_\theta(x|z) to generate images based on a sample from a true prior, z\sim p_\theta(z). The true prior p_\theta(z) are generally of Gaussian distribution.

Network Architecture

VAE has a quite similar architecture to AE except for the bottleneck part as shown in Figure 2. in AES, the encoder converts high dimensional input data to low dimensional latent representation in a vector form. On the other hand, VAE’s encoder learns the mean vector and standard deviation diagonal matrix such that z\sim \matcal{N}(\mu_z, \Sigma_x) as it will be performing probabilistic generation of data. Therefore the encoder and decoder should be probabilistic.

Training

Similar to AGMs training, we would like to maximize the likelihood of the training data. The likelihood of the data for VAEs are mentioned in Equation 1 and the first term p_\theta(x|z) will be approximated by neural network and the second term p(x) prior distribution, which is a Gaussian function, therefore, both of them are tractable. However, the integration won’t be tractable because of the high dimensionality of data.

To solve this problem of intractability, the encoder part of AE was utilized to learn the set of parameters \phi to approximate the conditional q_\phi (z|x). Furthermore, the conditional q_\phi (z|x) will approximate the posterior p_\theta (z|x), which is intractable. This additional encoder part will help to derive a lower bound on the data likelihood that will make the likelihood function tractable. In the following we will derive the lower bound of the likelihood function:

(3)   \begin{equation*} \begin{flalign} \begin{aligned} log \: p_\theta (x) = & \mathbf{E}_{z\sim q_\phi(z|x)} \Bigg[log \: \frac{p_\theta (x|z) p_\theta (z)}{p_\theta (z|x)} \: \frac{q_\phi(z|x)}{q_\phi(z|x)}\Bigg] \\ = & \mathbf{E}_{z\sim q_\phi(z|x)} \Bigg[log \: p_\theta (x|z)\Bigg] - \mathbf{E}_{z\sim q_\phi(z|x)} \Bigg[log \: \frac{q_\phi (z|x)} {p_\theta (z)}\Bigg] + \mathbf{E}_{z\sim q_\phi(z|x)} \Bigg[log \: \frac{q_\phi (z|x)}{p_\theta (z|x)}\Bigg] \\ = & \mathbf{E}_{z\sim q_\phi(z|x)} \Big[log \: p_\theta (x|z)\Big] - \mathbf{D}_{KL}(q_\phi (z|x), p_\theta (z)) + \mathbf{D}_{KL}(q_\phi (z|x), p_\theta (z|x)). \end{aligned} \end{flalign} \end{equation*}


In the above equation, the first line computes the likelihood using the logarithmic of p_\theta (x) and then it is expanded using Bayes theorem with additional constant q_\phi(z|x) multiplication. In the next line, it is expanded using the logarithmic rule and then rearranged. Furthermore, the last two terms in the second line are the definition of KL divergence and the third line is expressed in the same.

In the last line, the first term is representing the reconstruction loss and it will be approximated by the decoder network. This term can be estimated by the reparametrization trick \cite{}. The second term is KL divergence between prior distribution p_\theta(z) and the encoder function q_\phi (z|x), both of these functions are following the Gaussian distribution and has the closed-form solution and are tractable. The last term is intractable due to p_\theta (z|x). However, KL divergence computes the distance between two probability densities and it is always positive. By using this property, the above equation can be approximated as:

(4)   \begin{equation*} log \: p_\theta (x)\geq \mathcal{L}(x, \phi, \theta) , \: \text{where} \: \mathcal{L}(x, \phi, \theta) = \mathbf{E}_{z\sim q_\phi(z|x)} \Big[log \: p_\theta (x|z)\Big] - \mathbf{D}_{KL}(q_\phi (z|x), p_\theta (z)). \end{equation*}

In the above equation, the term \mathcal{L}(x, \phi, \theta) is presenting the tractable lower bound for the optimization and is also termed as ELBO (Evidence Lower Bound Optimization). During the training process, we maximize ELBO using the following equation:

(5)   \begin{equation*} \operatorname*{argmax}_{\phi, \theta} \sum_{x\in X} \mathcal{L}(x, \phi, \theta). \end{equation*}

.

Furthermore, the reconstruction loss term can be written using Equation 2 as the decoder output is assumed to be following Gaussian distribution. Therefore, this term can be easily transformed to mean squared error (MSE).

During the implementation, the architecture part is straightforward and can be found here. The user has to define the size of latent space, which will be vital in the reconstruction process. Furthermore, the loss function can be minimized using ADAM optimizer with a fixed batch size and a fixed number of epochs.

Figure 2: The results obtained from vanilla VAE (left) and a recent VAE-based generative model NVAE (right)

Figure 2: The results obtained from vanilla VAE (left) and a recent VAE-based generative
model NVAE (right)

In the above, we are showing the quality improvement since VAE was introduced by Kingma and
Welling [KW14]. NVAE is a relatively new method using a deep hierarchical VAE [VK21].

Summary

In this blog, we discussed variational autoencoders along with the basics of autoencoders. We covered
the main difference between AEs and VAEs along with the derivation of lower bound in VAEs. We
have shown using two different VAE based methods that VAE is still active research because in general,
it produces a blurry outcome.

Further readings

Here are the couple of links to learn further about VAE-related concepts:
1. To learn basics of probability concepts, which were used in this blog, you can check this article.
2. To learn more recent and effective VAE-based methods, check out NVAE.
3. To understand and utilize a more advance loss function, please refer to this article.

References

[KW14] Diederik P Kingma and Max Welling. Auto-encoding variational bayes, 2014.
[VK21] Arash Vahdat and Jan Kautz. Nvae: A deep hierarchical variational autoencoder, 2021.

Air Quality Forecasting Python Project

You will find the full python code and all visuals for this article here in this gitlab repository. The repository contains a series of analysis, transforms and forecasting models frequently used when dealing with time series. The aim of this repository is to showcase how to model time series from the scratch, for this we are using a real usecase dataset

This project forecast the Carbon Dioxide (Co2) emission levels yearly. Most of the organizations have to follow government norms with respect to Co2 emissions and they have to pay charges accordingly, so this project will forecast the Co2 levels so that organizations can follow the norms and pay in advance based on the forecasted values. In any data science project the main component is data, for this project the data was provided by the company, from here time series concept comes into the picture. The dataset for this project contains 215 entries and two components which are Year and Co2 emissions which is univariate time series as there is only one dependent variable Co2 which depends on time. from year 1800 to year 2014 Co2 levels were present in the dataset.

The dataset used: The dataset contains yearly Co2 emmisions levels. data from 1800 to 2014 sampled every 1 year. The dataset is non stationary so we have to use differenced time series for forecasting.

After getting data the next step is to analyze the time series data. This process is done by using Python. The data was present in excel file so first we need to read that excel file. This task is done by using Pandas which is python libraries to creates Pandas Data Frame. After that preprocessing like changing data types of time from object to DateTime performed for the coding purpose. Time series contain 4 main components Level, Trend, Seasonality and Noise. To study this component, we need to decompose our time series so that we can batter understand our time series and we can choose the forecasting model accordingly because each component behave different on the model. also by decomposing we can identify that the time series is multiplicative or additive.

CO2 emissions – plotted via python pandas / matplotlib

Decomposing time series using python statesmodels libraries we get to know trend, seasonality and residual component separately. the components multiply together to make the time series multiplicative and in additive time series components added together. Taking the deep dive to understand the trend component, moving average of 10 steps were applied which shows nonlinear upward trend, fit the linear regression model to check the trend which shows upward trend. talking about seasonality there were combination of multiple patterns over time period which is common in real world time series data. capturing the white noise is difficult in this type of data. the time series contains values from 1800 where the Co2 values are less then 1 because of no human activities so levels were decreasing. By the time numbers of industries and human activities are rapidly increasing which causes Co2 levels rapidly increasing. In time series the highest Co2 emission level was 18.7 in 1979. It was challenging to decide whether to consider this values which are less then 0.5 as white noise or not because 30% of the Co2 values were less then 1, in real world looking at current scenario the chances of Co2 emission level being 0 is near to impossible still there are chances that Co2 levels can be 0.0005. So considering each data point as a valuable information we refused to remove that entries.

Next step is to create Lag plot so we can see the correlation between the current year Co2 level and previous year Co2 level. the plot was linear which shows high correlation so we can say that the current Co2 levels and previous levels have strong relationship. the randomness of the data were measured by plotting autocorrelation graph. the autocorrelation graph shows smooth curves which indicates the time series is nonstationary thus next step is to make time series stationary. in nonstationary time series, summary statistics like mean and variance change over time.

To make time series stationary we have to remove trend and seasonality from it. Before that we use dickey fuller test to make sure our time series is nonstationary. the test was done by using python, and the test gives pvalue as output. here the null hypothesis is that the data is nonstationary while alternate hypothesis is that the data is stationary, in this case the significance values is 0.05 and the pvalues which is given by dickey fuller test is greater than 0.05 hence we failed to reject null hypothesis so we can say the time series is nonstationery. Differencing is one of the techniques to make time series stationary. On this time series, first order differencing technique applied to make the time series stationary. In first order differencing we have to subtract previous value from current value for all the data points. also different transformations like log, sqrt and reciprocal were applied in the context of making the time series stationary. Smoothing techniques like simple moving average, exponential weighted moving average, simple exponential smoothing and double exponential smoothing techniques can be applied to remove the variation between time stamps and to see the smooth curves.

Smoothing techniques also used to observe trend in time series as well as to predict the future values. But performance of other models was good compared to smoothing techniques. First 200 entries taken to train the model and remaining last for testing the performance of the model. performance of different models measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) as we are predicting future Co2 emissions so basically it is regression problem. RMSE is calculated by root of the average of squared difference between actual values and predicted values by the model on testing data. Here RMSE values were calculated using python sklearn library. For model building two approaches are there, one is datadriven and another one is model based. models from both the approaches were applied to find the best fitted model. ARIMA model gives the best results for this kind of dataset as the model were trained on differenced time series. The ARIMA model predicts a given time series based on its own past values. It can be used for any nonseasonal series of numbers that exhibits patterns and is not a series of random events. ARIMA takes 3 parameters which are AR, MA and the order of difference. Hyper parameter tuning technique gives best parameters for the model by trying different sets of parameters. Although The autocorrelation and partial autocorrelation plots can be use to decide AR and MA parameter because partial autocorrelation function shows the partial correlation of a stationary time series with its own lagged values so using PACF we can decide the value of AR and from ACF we can decide the value of MA parameter as ACF shows how data points in a time series are related.

Yearly difference of CO2 emissions – ARIMA Prediction

Apart from ARIMA, few other model were trained which are AR, ARMA, Simple Linear Regression, Quadratic method, Holts winter exponential smoothing, Ridge and Lasso Regression, LGBM and XGboost methods, Recurrent neural network (RNN) Long Short Term Memory (LSTM) and Fbprophet. I would like to mention my experience with LSTM here because it is another model which gives good result as ARIMA. the reason for not choosing LSTM as final model is its complexity. As ARIMA is giving appropriate results and it is simple to understand and requires less dependencies. while using lstm, lot of data preprocessing and other dependencies required, the dataset was small thus we used to train the model on CPU, otherwise gpu is required to train the LSTM model. we face one more challenge in deployment part. the challenge is to get the data into original form because the model was trained on differenced time series, so it will predict the future values in differenced format. After lot of research on the internet and by deeply understanding mathematical concepts finally we got the solution for it. solution for this issue is we have to add previous value from the original data from into first order differencing and then we have to add the last value of this time series into predicted values. To create the user interface streamlit was used, it is commonly used python library. the pickle file of the ARIMA model were used to predict the future values based on user input. The limit for forecasting is the year 2050. The project was uploaded on google cloud platform. so the flow is, first the starting year from which user want to forecast was taken and the end year till which year user want to forecast was taken and then according to the range of this inputs the prediction takes place. so by taking the inputs the pickle file will produce the future Co2 emissions in differenced format, then the values will be converted to original format and then the original values will be displayed on the user interface as well as the interactive line graph were displayed on the interface.

You will find the full python code and all visuals for this article here in this gitlab repository.

How to ensure occupational safety using Deep Learning – Infographic

In cooperation between DATANOMIQ, my consulting company for data science, business intelligence and process mining, and Pixolution, a specialist for computer vision with deep learning, we have created an infographic (PDF) about a very special use case for companies with deep learning: How to ensure occupational safety through automatic risk detection using using Deep Learning AI.

How to ensure occupational safety through automatic risk detection using Deep Learning - Infographic

How to ensure occupational safety through automatic risk detection using Deep Learning – Infographic

How Deep Learning drives businesses forward through automation – Infographic

In cooperation between DATANOMIQ, my consulting company for data science, business intelligence and process mining, and Pixolution, a specialist for computer vision with deep learning, we have created an infographic (PDF) about a very special use case for companies with deep learning: How to protect the corporate identity of any company by ensuring consistent branding with automated font recognition.

How to ensure consistent branding with automatic font recognition - Infographic

How to ensure consistent branding with automatic font recognition – Infographic

The infographic is available as PDF download:

My elaborate study notes on reinforcement learning

I will not tell you why, but all of a sudden I was in need of writing an article series on Reinforcement Learning. Though I am also a beginner in reinforcement learning field. Everything I knew was what I learned from one online lecture conducted in a lazy tone in my college. However in the process of learning reinforcement learning, I found a line which could connect the two dots, one is reinforcement learning and the other is my studying field. That is why I made up my mind to make an article series on reinforcement learning seriously.

To be a bit more concrete, I imagine that technologies in our world could be enhanced by a combination of reinforcement learning and virtual reality. That means companies like Toyota or VW might come to invest on visual effect or video game companies more seriously in the future. And I have been actually struggling with how to train deep learning with cgi, which might bridge the virtual world and the real world.

As I am also a beginner in reinforcement learning, this article series would a kind of study note for me. But as I have been doing in my former articles, I prefer exhaustive but intuitive explanations on AI algorithms, thus I will do my best to make my series as instructive and effective as existing tutorial on reinforcement learning.

This article is going to be composed of the following contents.

In this article I would like to share what I have learned about RL, and I hope you could get some hints of learning this fascinating field. In case you have any comments or advice on my “study note,” leaving a comment or contacting me via email would be appreciated.

How Microsoft Azure Is Impacting Financial Companies

Microsoft Azure has taken a large chunk of the cloud marketplace, transforming companies with the speed and security of the cloud. Microsoft has over the years used Azure to cushion companies against risk, deal with fraud and differentiate their customer experience. 

With Microsoft Cloud App Security, customers experience 75% automatic threat elimination because of increased visibility and automated threat protection. With all these and more amazing benefits of using Azure, its market share is bound to increase even more over the coming years.

https://www.flickr.com/photos/91869083@N05/8493934839/in/photolist-dWzCUp-efhrzk-29k3oWh-9zALPj-9zALPh-9aXgpG-91z6Eo-6pABZ8-2htjpWP-Wrr2UG-aNxVLK-4z3omV-2kEyM6k-9GvMhf-Rf9aM7-4z7CQJ-aS8oqx-ekXUoo-9aU3wz-9aXjnw-aS8HTZ-LPgq61-2kjSEYf-2hamKDd-2h6JfeX-2h7gxoF-Fx6eAM-pQ6Ken-fbNckF-2iMRZSS-2hTUA6v-b8ayve-b8awer-dZwwJ7-2i3mmqV-e1dGQz-2dZwNg6-b8aoSH-b8arkc-6ztgDn-b8asCZ-efwZLM-b8atnM-b8attr-2kGQugq-2iowpX5-6zbcAC-dAQCVY-b8aoq8-517Jxq

Image Source

Financial companies have not been left behind by the Azure bandwagon. The financial industry is using Microsoft Azure to enhance its core functionsinvest money by making informed decisions, and minimize risk while maximizing returns. 

Azure facilitates these core functions by helping with the storage of huge amounts of data—  some dating back to decades ago—, data retrieval and data security. 

It also helps financial companies to keep up with regulatory compliance.

Microsoft Azure is not the only cloud services provider. But here’s why it is the most outstanding when it comes to helping financial companies achieve their business goals.

Azure Offers Hybrid and Multi-Cloud Computing for Financial Companies

The financial services industry is extremely dynamic. Organizations offering financial services have to constantly test the market and come up with new and innovative products and services. 

They are also often under pressure to extend their services across borders. Remember they have to do all of this while at the same time managing their existing customers, containing their risk, and dealing with fraud.

Financial regulations also keep changing. As financial companies increasingly embrace new technology for their services— including intelligent cloud computing— and they have to comply with industry regulations. They cannot afford to leave loopholes as they take on their journey with the cloud.

The financial services industry is highly competitive and keeps up with modernity. These companies have had to resort to the dynamic hybrid, multi-cloud computing, and public cloud strategies to keep up with the trend.

This is how a hybrid cloud model worksit enables existing on-premises applications to be extended through a connection to the public cloud. 

This allows financial companies to enjoy the speed, elasticity, and scale of the public cloud without necessarily having to remodel their entire applications. These organizations are afforded the flexibility of deciding what parts of their application remains in an existing data center and which one resides in the cloud.

Cloud computing with Azure allows financial organizations to operate more efficiently by providing end-to-end protection to information, allowing the digitization of financial services, and providing data security. 

Data security is particularly important to financial firms because they are often targeted by fraudsters and cyber threats. They, therefore, need to protect crucial information which they achieve by authenticating their data centers using Azure.

Here’s why financial companies cannot think of doing without Azure’s hybrid cloud computing even for just a day.

https://unsplash.com/collections/28744506/work?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText

Photo by Windows on Unsplash

  • The ability to expand their geographic reach

Azure enables financial companies to establish data centers in new locations to meet globally growing demand. This allows them to open and explore new markets. They can then use Azure DevOps pipelines to maintain their data factories and keep everything consistent.

  • Consistent Infrastructure management

The hybrid cloud model promotes a consistent approach to infrastructure management across all locations, whether it is on-premises, public cloud, or the edge.

  • Increased Elasticity

Financial firms and banks utilizing Azure services can respond with great agility to transactional changes or changes in demand by provisioning or de-provisioning as the situation at hand demands. 

In cases where the organization requires high computation such as complex risk modeling, a hybrid strategy allows it to expand its capacity beyond its data center without overwhelming its servers.

  • Flexibility

A hybrid strategy allows financial organizations to choose cloud services that fall within their budget, match their needs, and suit their features.

  • Data security and enhanced regulatory compliance

Hybrid and multi-cloud strategies are a superb alternative for strictly on-premises strategies when one considers resiliency, data portability, and data security.

  • Reduces CapEx Expenses

Managing on-premises infrastructure is expensive. Financial companies utilizing Azure do not need to spend large amounts of money setting them up and managing them. 

With the increased elasticity of the hybrid system, financial organizations only pay for the resources they actually use, at a relatively lower cost.

Financial Organizations Have Access to an Analytics Platform

As we mentioned earlier, financial companies have the core function of making financial decisions in order to invest money and gain maximum returns at the least possible risk. 

Having been entrusted with their customers’ assets, the best way to ensure success in making profits is by using an analytics system.

Getting the form of analytics that helps with solving this investment problem is the kind of headache that does not go away by taking a tablet of ibuprofen and a glass of waterintegrating data is not an easy task. Besides, building a custom analytics solution from scratch is quite expensive.

Luckily for financial companies, Azure has a dedicated analytics platform for the financial services industry. It is custom-made just for these types of organizations. 

Their system is quite intuitive and easy to use. Companies not only get to save the resources they would have otherwise used to build a custom solution, but they get to learn about their investment risks and get instant results at cloud speed. 

They can mitigate against negatively impactful market occurrences and gain profits even when operating in adverse market conditions.

https://unsplash.com/collections/28744506/work?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText

Image by Headway on Unsplash

Financial Companies Get Advanced Data Management

Good analytics goes hand-in-hand with a great data management system. Financial companies need to have good data, create an organized data warehouse, and have a secure data storage system.

In addition to storing your data, Microsoft Azure ensures your storage can be optimized to support advanced applications, for example, machine learning and forecasting. 

Azure even allows you to compress and store documents for long periods of time when you write the data to Microsoft Azure Blob Storage. These documents can be retrieved anytime when the need arises for auditors’, regulators’, and lawyers’ perusal. 

Conclusion

Microsoft has over time managed to gain the trust of many industries, the financial services industry inclusive. Using its cloud computing giant, Azure, it has empowered these companies to carry out their functions efficiently and at the lowest cost and risk possible.

Azure’s hybrid cloud computing strategy has made financial operations flexible, opened doors for financial companies to establish their services in multiple locations, and provided them with consistent infrastructure management, among many other benefits.

With their futuristic model and commitment to growth, it’s only prudent to assume that Microsoft Azure will continue carrying the mantle as the best cloud services provider in the financial services industry.