February 2020

Introduction to Recommendation Engines

February 29, 2020/in Big Data, Data Science, Database, Graph Database, Machine Learning, Neo4J, Python, Tutorial, Use Case/by Aakash Chugh

This is the second article of article series Getting started with the top eCommerce use cases. If you are interested in reading the first article you can find it here.

What are Recommendation Engines?

Recommendation engines are the automated systems which helps select out similar things whenever a user selects something online. Be it Netflix, Amazon, Spotify, Facebook or YouTube etc. All of these companies are now using some sort of recommendation engine to improve their user experience. A recommendation engine not only helps to predict if a user prefers an item or not but also helps to increase sales, ,helps to understand customer behavior, increase number of registered users and helps a user to do better time management. For instance Netflix will suggest what movie you would want to watch or Amazon will suggest what kind of other products you might want to buy. All the mentioned platforms operates using the same basic algorithm in the background and in this article we are going to discuss the idea behind it.

What are the techniques?

There are two fundamental algorithms that comes into play when there’s a need to generate recommendations. In next section these techniques are discussed in detail.

Content-Based Filtering

The idea behind content based filtering is to analyse a set of features which will provide a similarity between items themselves i.e. between two movies, two products or two songs etc. These set of features once compared gives a similarity score at the end which can be used as a reference for the recommendations.

There are several steps involved to get to this similarity score and the first step is to construct a profile for each item by representing some of the important features of that item. In other terms, this steps requires to define a set of characteristics that are discovered easily. For instance, consider that there’s an article which a user has already read and once you know that this user likes this article you may want to show him recommendations of similar articles. Now, using content based filtering technique you could find the similar articles. The easiest way to do that is to set some features for this article like publisher, genre, author etc. Based on these features similar articles can be recommended to the user (as illustrated in Figure 1). There are three main similarity measures one could use to find the similar articles mentioned below.

Figure 1: Content-Based Filtering

Minkowski distance

Minkowski distance between two variables can be calculated as:

$(x,y)= (\sum_{i=1}^{n}{|X_{i} - Y_{i}|^{p}})^{1/p}$

Cosine Similarity

Cosine similarity between two variables can be calculated as :

$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \$

Jaccard Similarity

$J(X,Y) = |X ∩ Y| / |X ∪ Y|$

These measures can be used to create a matrix which will give you the similarity between each movie and then a function can be defined to return the top 10 similar articles.

Collaborative filtering

This filtering method focuses on finding how similar two users or two products are by analyzing user behavior or preferences rather than focusing on the content of the items. For instance consider that there are three users A,B and C. We want to recommend some movies to user A, our first approach would be to find similar users and compare which movies user A has not yet watched and recommend those movies to user A. This approach where we try to find similar users is called as User-User Collaborative Filtering.

The other approach that could be used here is when you try to find similar movies based on the ratings given by others, this type is called as Item-Item Collaborative Filtering. The research shows that item-item collaborative filtering works better than user-user collaborative filtering as user behavior is really dynamic and changes over time. Also, there are a lot more users and increasing everyday but on the other side item characteristics remains the same. To calculate the similarities we can use Cosine distance.

Figure 2: Collaborative Filtering

Recently some companies have started to take advantage of both content based and collaborative filtering techniques to make a hybrid recommendation engine. The results from both models are combined into one hybrid model which provides more accurate recommendations. Five steps are involved to make a recommendation engine work which are collection of data, storing of data, analyzing the data, filtering the data and providing recommendations. There are a lot of attributes that are involved in order to collect user data including browsing history, page views, search logs, order history, marketing channel touch points etc. which requires a strong data architecture. The collection of data is pretty straightforward but it can be overwhelming to analyze this amount of data. Storing this data could get tricky on the other hand as you need a scalable database for this kind of data. With the rise of graph databases this area is also improving for many use cases including recommendation engines. Graph databases like Neo4j can also help to analyze and find similar users and relationship among them. Analyzing the data can be carried in different ways, depending on how strong and scalable your architecture you can run real time, batch or near real time analysis. The fourth step involves the filtering of the data and here you can use any of the above mentioned approach to find similarities to finally provide the recommendations.

Having a good recommendation engine can be time consuming initially but it is definitely beneficial in the longer run. It not only helps to generate revenue but also helps to to improve your product catalog and customer service.

Six ways process mining in 2020 can save your business transformation

February 27, 2020/in Process Mining/by Signavio

The lack of information about existing processes kills 70% of large transformation projects and around 50% of RPA projects…alarming statistics. Triggering this failure rate is a lack of understanding about existing processes, and the disconnect between the discovery, visualization, analysis, and execution of existing data. So, banish the process guesswork! Utilizing process mining technology unlocks the information, visibility, and quantifiable numbers needed to improve end-to-end processes for sustainable transformation.

Read this article in German:

Wie Process Mining 2020 Ihre erfolgreiche Geschäftstransformation 2020 sicherstellt

Process mining in 2020

Your data fingerprint

If we consider the figures again (from McKinsey and Ernst & Young (EY) respectively), the digitization of products and services is forcing companies of all shapes and sizes, and in all industries, to dramatically reconsider their existing business models and the processes they implement. Because all activities are different, process mining uses the unique data—your company’s business fingerprint—to automatically piece together a digital representation of all your existing business processes.

This digital evidence enables us to visualize exactly how processes are operating (both the conventional path and variable executions) down to individual process instances. In other words, you can unearth processes which lie unseen or dormant, revealing hidden value, and providing an instant understanding of complex processes in minutes rather than days.

Triggering dormant success

Then, with standardized and configurable notifications and KPIs, you can further understand the immediate impact of any process change made—meaning that failure rates decrease, and company confidence is improved. And that’s not all: everyone from new employees to the C-suite can better visualize, understand, and explain their organization’s processes. This ensures that the right process change is secure and that improvement has the intended impact, every time.

Unleash the power of process

In business, we all answer to somebody, and it is critical to connect problems to real solutions. The everyday functions of companies—the processes upon which they are built—are the connection to business tech, from “process” mining to robotic “process” automation. Without process understanding, the tech is redundant because we have no idea how work has flowed in an existing application. Process is the lifeblood of operations.

Process mining: your point of differentiation

Transformative digital technology integration

In addition to the DVA of process mining—discover, visualize, analyze—is the power to monitor real-time process execution automatically from your existing data. This simple point and click assessment can provide an instant understanding of complex processes. Within transformation projects, which by their very nature require the profound transformation of business and organizational activities, processes, competencies, and models, process mining provides the visual map to facilitate immediate action.

This self-sustaining approach across an entire organization is what leads to genuinely sustainable outcomes, and builds a process culture within an organization. By taking this holistic approach, digital transformation and excellence professionals will find it easier to leverage processes, justify their projects and programs, and address behavioral change challenges.

This includes the facilitation of transformative digital technology integration, operational agility and flexibility, leadership and culture, and workforce enablement.

Three ways process mining can save your business in a transformation project:

You require 100% operational transparency: To chart all your transactions requires complete process transparency. This capability allows the direct comparison of actual operations to the ways that processes were designed to occur. This conformance checking can automatically identify the highest priority issues and tasks, and highlight root causes, so we can take immediate action.
You must reduce costs and increase efficiency: Signavio research shows that almost 60% of companies incurred additional charges from suppliers due to process inefficiencies. Process mining can help your business reduce costs because it finds vulnerabilities and deviations, whilst highlighting what is slowing you down, including the bottlenecks and inefficiencies hampering revenue. Process mining beefs up operational health via process improvements and pre-emptive strategies.
You must optimize the buying and selling cycle: Is shipping taking too long? Which of your suppliers supports you least? Who is outperforming whom? Process mining is your one-click trick to finding these answers and identifying which units are performing best and which are wasting time and money.

Process mining and robotic process automation (RPA)

The beneficial fusion of both technologies

Robotic process automation (RPA) provides a virtual workforce to automatize manual, repetitive, and error-prone tasks. However, successful process automation requires exact knowledge about the intended (and potential) benefits, effective training of the robots, and continuous monitoring of their performance. With this, process mining supports organizations throughout the lifecycle of RPA initiatives by monitoring and benchmarking robots to ensure sustainable benefits.

Upgrade robot-led automation

These insights are especially valuable for process miners and managers with a particular interest in process automation. To further upgrade the impact of robot-led automation, there is also a need for a solid understanding of legacy systems, and an overview of automation opportunities. Process mining tools provide critical insights throughout the entire RPA journey, from defining the strategy to continuous improvement and innovation.

Three ways process mining can save your business in an RPA lifecycle project

You require process overviews, based on specific criteria: To provide a complete overview of end-to-end processes, involves the identification of high ROI processes suitable for RPA implementation. This, in turn, helps determine the best-case process flow/path, enabling you to spot redundant processes, which you may not be aware of, before automating.
You are unsure how best to optimize human-digital worker cycles: By mining the optimal process flow/path, we can better discover inefficient human-robot hand-off, providing quantifiable data on the financial impact of any digital worker or process. This way, we can compare human vs. digital labor in terms of accuracy, efficiency, cost, and project duration.
You need to understand better how RPA supports legacy processes and systems: RPA enables enterprises to keep legacy systems by making integration with cloud and web/app-based services, transforming abilities to connect legacy with modern tools, applications, and even mobile apps. Efficiency and effectiveness will be improved across crucial departments, including HR, finance, and legal.

Process mining for improved customer experience and mapping

Reconfigure customer delight

The integration of process mining with other technologies is also essential in growing the process excellence and management market. With process management, we already talk about customer engagement, which empowers companies to shift away from lopsided efficiency goals, which often frustrate customers, towards all-inclusive effectiveness goals, built around delighting customers at the lowest organizational cost possible.

Further, the application of process mining within customer journey mapping (CJM)—especially when linked to the underlying processes—offers the bundled capability of better business understanding and outside-in customer perspective, connected to the processes that deliver them. So, by connecting process mining with a customer-centric view across producing, marketing, selling, and providing products and services, customer delight becomes a strategic catalyst for success.

Unlock the full potential of process

Trigger process mining initiatives with Signavio Process Intelligence, and see how it can help your organization uncover the hidden value of process, generate fresh ideas, and save time and money. Discover more in our white paper, Managing Successful Process Mining Initiatives with Signavio Process Intelligence.

5 Things You Should Know About Data Mining

February 22, 2020/in Data Mining, Uncategorized/by Halley Johnson

The majority of people spend about twenty-four hours online every week. In that time they give out enough information for big data to know a lot about them. Having people collecting and compiling your data might seem scary but it might have been helpful for you in the past.

If you have ever been surprised to find an ad targeted toward something you were talking about earlier or an invention made based on something you were googling, then you already know that data mining can be helpful. Advanced education in data mining can be an awesome resource, so it may pay to have a personal tutor skilled in the area to help you understand.

It is understandable to be unsure of a system that collects all of the information online so that they can learn more about you. Luckily, so much data is put out every day it is unlikely data mining is focusing on any of your important information. Here are a few statistics you should know about mining.

1. Data Mining Is Used In Crime Scenes

Using a variation of earthquake prediction software and data, the Los Angeles police department and researchers were able to predict crime within five hundred feet. As they learn how to compile and understand more data patterns, crime detecting will become more accurate.

Using their data the Los Angeles police department was able to stop thief activity by thirty-three percent. They were also able to predict violent crime by about twenty-one percent. Those are not perfect numbers, but they are better than before and will get even more impressive as time goes on.

The fact that data mining is able to pick up on crime statistics and compile all of that data to give an accurate picture of where crime is likely to occur is amazing. It gives a place to look and is able to help stop crime as it starts.

2. Data Mining Helps With Sales

A great story about data mining in sales is the example of Walmart putting beer near the diapers. The story claims that through measuring statistics and mining data it was found that when men purchase diapers they are also likely to buy a pack of beer. Walmart collected that data and put it to good use by putting the beer next to the diapers.

The amount of truth in that story/example is debatable, but it has made data mining popular in most retail stores. Finding which products are often bought together can give insight into where to put products in a store. This practice has increased sales in both items immensely just because people tend to purchase items near one another more than they would if they had to walk to get the second item.

Putting a lot of stock in the data-gathering teams that big stores build does not always work. There have been plenty of times when data teams failed and sales plummeted. Often, the benefits outweigh the potential failure, however, and many stores now use data mining to make a lot of big decisions about their sales.

3. It’s Helping With Predicting Disease

In 2009 Google began work to be able to predict the winter flu. Google went through the fifty million most searched words and then compared them with what the CDC was finding during the 2003-2008 flu seasons. With that information google was able to help predict the next winter flu outbreak even down to the states it hit the hardest.

Since 2009, data mining has gotten much better at predicting disease. Since the internet is a newer invention it is still growing and data mining is still getting better. Hopefully, in the future, we will be able to predict disease breakouts quickly and accurately.

With new data mining techniques and research in the medical field, there is hope that doctors will be able to narrow down problems in the heart. As the information grows and more data is entered the medical field gets closer to solving problems through data. It is something that is going to help cure diseases more quickly and find the root of a problem.

4. Some Data Mining Gets Ignored

Interestingly, very little of the data that companies collect from you is actually used. “Big data Companies” do not use about eighty-eight percent of the data they have. It is incredibly difficult to use all of the millions of bits of data that go through big data companies every day.

The more people that are used for data mining and the more data companies are actually able to filter through, the better the online experience will be. It might be a bit frightening to think of someone going through what you are doing online, but no one is touching any of the information that you keep private. Big data is using the information you put out into the world and using that data to come to conclusions and make the world a better place.

There is so much information being put onto the internet at all times. Twenty-four hours a week is the average amount of time a single person spends on the internet, but there are plenty of people who spend more time than that. All of that information takes a lot of people to sift through and there are not enough people in the data mining industry to currently actually go through the majority of the data being put online.

5. Too Many Data Mining Jobs

Interestingly, the data industry is booming. In general, there are an amazing amount of careers opening on the internet every day. The industry is growing so quickly that there are not enough people to fill the jobs that are being created.

The lack of talent in the industry means there is plenty of room for new people who want to go into the data mining industry. It was predicted that by 2018 there would be a shortage of 140,000 with deep analytical skills. With the lack of jobs that are being discussed, it is amazing that there is such a shortage in the data industry.

If big data is only able to wade through less than half of the data being collected then we are wasting a resource. The more people who go into an analytics or computer career the more information we will be able to collect and utilize. There are currently more jobs than there are people in the data mining field and that needs to be corrected.

To Conclude

The data mining industry is making great strides. Big data is trying to use the information they collect to sell more things to you but also to improve the world. Also, there is something very convenient about your computer knowing the type of things you want to buy and showing you them immediately.

Data mining has been able to help predict crime in Los Angeles and lower crime rates. It has also helped companies know what items are commonly purchased together so that stores can be organized more efficiently. Data mining has even been able to predict the outbreak of disease down to the state.

Even with so much data being ignored and so many jobs left empty, data mining is doing incredible things. The entire internet is constantly growing and the data mining is growing right along with it. As the data mining industry climbs and more people find their careers mining data the more we will learn and the more facts we will find.

Python vs R: Which Language to Choose for Deep Learning?

February 18, 2020/in Data Mining, Data Science, Insights, Python, R Statistics/by Deep Moteria

Data science is increasingly becoming essential for every business to operate efficiently in this modern world. This influences the processes composed together to obtain the required outputs for clients. While machine learning and deep learning sit at the core of data science, the concepts of deep learning become essential to understand as it can help increase the accuracy of final outputs. And when it comes to data science, R and Python are the most popular programming languages used to instruct the machines.

Python and R: Primary Languages Used for Deep Learning

Deep learning and machine learning differentiate based on the input data type they use. While machine learning depends upon the structured data, deep learning uses neural networks to store and process the data during the learning. Deep learning can be described as the subset of machine learning, where the data to be processed is defined in another structure than a normal one.

R is developed specifically to support the concepts and implementation of data science and hence, the support provided by this language is incredible as writing codes become much easier with its simple syntax.

Python is already much popular programming language that can serve more than one development niche without straining even for a bit. The implementation of Python for programming machine learning algorithms is very much popular and the results provided are accurate and faster than any other language. (C or Java). And because of its extended support for data science concept implementation, it becomes a tough competitor for R.

However, if we compare the charts of popularity, Python is obviously more popular among data scientists and developers because of its versatility and easier usage during algorithm implementation. However, R outruns Python when it comes to the packages offered to developers specifically expertise in R over Python. Therefore, to conclude which one of them is the best, let’s take an overview of the features and limits offered by both languages.

Python

Python was first introduced by Guido Van Rossum who developed it as the successor of ABC programming language. Python puts white space at the center while increasing the readability of the developed code. It is a general-purpose programming language that simply extends support for various development needs.

The packages of Python includes support for web development, software development, GUI (Graphical User Interface) development and machine learning also. Using these packages and putting the best development skills forward, excellent solutions can be developed. According to Stackoverflow, Python ranks at the fourth position as the most popular programming language among developers.

Benefits for performing enhanced deep learning using Python are:

Concise and Readable Code
Extended Support from Large Community of Developers
Open-source Programming Language
Encourages Collaborative Coding
Suitable for small and large-scale products

The latest and stable version of Python has been released as Python 3.8.0 on 14th October 2019. Developing a software solution using Python becomes much easier as the extended support offered through the packages drives better development and answers every need.

R is a language specifically used for the development of statistical software and for statistical data analysis. The primary user base of R contains statisticians and data scientists who are analyzing data. Supported by R Foundation for statistical computing, this language is not suitable for the development of websites or applications. R is also an open-source environment that can be used for mining excessive and large amounts of data.

R programming language focuses on the output generation but not the speed. The execution speed of programs written in R is comparatively lesser as producing required outputs is the aim not the speed of the process. To use R in any development or mining tasks, it is required to install its operating system specific binary version before coding to run the program directly into the command line.

R also has its own development environment designed and named RStudio. R also involves several libraries that help in crafting efficient programs to execute mining tasks on the provided data.

The benefits offered by R are pretty common and similar to what Python has to offer:

Open-source programming language
Supports all operating systems
Supports extensions
R can be integrated with many of the languages
Extended Support for Visual Data Mining

Although R ranks at the 17th position in Stackoverflow’s most popular programming language list, the support offered by this language has no match. After all, the R language is developed by statisticians for statisticians!

Python vs R: Should They be Really Compared?

Even when provided with the best technical support and efficient tools, a developer will not be able to provide quality outputs if he/she doesn’t possess the required skills. The point here is, technical skills rank higher than the resources provided. A comparison of these two programming languages is not advisable as they both hold their own set of advantages. However, the developers considering to use both together are less but they obtain maximum benefit from the process.

Both these languages have some features in common. For example, if a representative comes asking you if you lend technical support for developing an uber clone, you are directly going to decline as Python and R both do not support mobile app development. To benefit the most and develop excellent solutions using both these programming languages, it is advisable to stop comparing and start collaborating!

R and Python: How to Fit Both In a Single Program

Anticipating the future needs of the development industry, there has been a significant development to combine these both excellent programming languages into one. Now, there are two approaches to performing this: either we include R script into Python code or vice versa.

Using the available interfaces, packages and extended support from Python we can include R script into the code and enhance the productivity of Python code. Availability of PypeR, pyRserve and more resources helps run these two programming languages efficiently while efficiently performing the background work.

Either way, using the developed functions and packages made available for integrating Python in R are also effective at providing better results. Available R packages like rJython, rPython, reticulate, PythonInR and more, integrating Python into R language is very easy.

Therefore, using the development skills at their best and maximizing the use of such amazing resources, Python and R can be togetherly used to enhance end results and provide accurate deep learning support.

Conclusion

Python and R both are great in their own names and own places. However, because of the wide applications of Python in almost every operation, the annual packages offered to Python developers are less than the developers skilled in using R. However, this doesn’t justify the usability of R. The ultimate decision of choosing between these two languages depends upon the data scientists or developers and their mining requirements.

And if a developer or data scientist decides to develop skills for both- Python and R-based development, it turns out to be beneficial in the near future. Choosing any one or both to use in your project depends on the project requirements and expert support on hand.

Looking for the ‘aha moment’: An expert’s insights on process mining

February 17, 2020/in Interviews, Process Mining/by Signavio

Henny Selig is a specialist in process mining, with significant expertise in the implementation of process mining solutions and supporting customers with process analysis. As a Solution Owner at Signavio, Henny is also well versed in bringing Signavio Process Intelligence online for businesses of all shapes and sizes. In this interview, Henny shares her thoughts about the challenges and opportunities of process mining.

Read this interview in German:

Im Interview mit Henny Selig zu Process Mining: “Für den Kunden sind solche Aha-Momente toll“

Henny, could you give a simple explanation of the concept of process mining?

Basically, process mining is a combination of data analysis and business process management. IT systems support almost every business process, meaning they leave behind digital traces. We extrapolate all the data from the IT systems connected to a particular process, then visualize and evaluate it with the help of data science technology.

In short, process mining builds a bridge between employees, process experts and management, allowing for a data-driven and fact-based approach to business process optimization. This helps avoid thinking in siloes, as well as enabling transparent design of handovers and process steps that cross departmental boundaries within an organization.

When a business starts to analyze their process data, what are the sorts of questions they ask? Do they have at least have some expectation about what process mining can offer?

That’s a really good question! There isn’t really a single good answer to it, as it is different for different companies. For example, there was one procurement manager, and we were presenting the complete data set to him, and it turned out there was an approval at one point, but it should have been at another. He was really surprised, but we weren’t, because we sat outside the process itself and were able to take a broader view.

We also had different questions that the company hadn’t considered, things like what was the process flow if an order amount is below 1000 euros, and how often that occurs—just questions that seem clear to an outsider but often do not occur to process owners.

So do people typically just have an idea that something is wrong, or do they generally understand there is a specific problem in one area, and they want to dive deeper?

There are those people who know that a process is running well, but they know a particular problem pops up repeatedly. Usually, even if people say they don’t have a particular focus or question, most of them actually do because they know their area. They already have some assumptions and ideas, but it is sometimes so deep in their mind they can’t actually articulate it.

Often, if you ask people directly how they do things, it can put pressure on them, even if that’s not the intention. If this happens, people may hide things without meaning to, because they already have a feeling that the process or workflow they are describing is not perfect, and they want to avoid blame.

The approvals example I mentioned above is my favorite because it is so simple. We had a team who all said, over and over, “We don’t approve this type of request.” However, the data said they did–the team didn’t even know.

We then talked to the manager, who was interested in totally different ideas, like all these risks, approvals, are they happening, how many times this, how many times that — the process flow in general. Just by having this conversation, we were able to remove the mismatch between management and the team, and that is before we even optimized the actual process itself.

So are there other common issues or mismatches that people should be aware of when beginning their process mining initiative?

The one I often return to is that not every variation that is out of line with the target model is necessarily negative. Very few processes, apart from those that run entirely automatically, actually conform 100% to the intended process model—even when the environment is ideal. For this reason, there will always be exceptions requiring a different approach. This is the challenge in projects: finding out which variations are desirable, and where to make necessary exceptions.

So would you say that data-based process analysis is a team effort?

Absolutely! In every phase of a process mining project, all sorts of project members are included. IT makes the data available and helps with the interpretation of the data. Analysts then carry out the analysis and discuss the anomalies they find with IT, the process owners, and experts from the respective departments. Sometimes there are good reasons to explain why a process is behaving differently than expected.

In this discussion, it is incredibly helpful to document the thought process of the team with technical means, such as Signavio Process Intelligence. In this way, it is possible to break down the analysis into individual processes and to bring the right person into the discussion at the right point without losing the thread of the discussion. Then, the next colleague who picks up the topic can then see the thread of the analysis and properly classify the results.

At the very least, we can provide some starting points. Helping people reach an “aha moment” is one of the best parts of my job!

To find out more about how process mining can help you understand and optimize your business processes, visit the Signavio Process Intelligence product page. If you would like to get a group effort started in your organization right now, why not sign up for a free 30-day trial with Signavio, today.

How Finance Organizations Are Dealing with The Growing Demand for Instant Response Times

February 12, 2020/in Uncategorized/by Edward Huskin

The financial industry is one of the most innovative industries that has evolved at an incredibly fast-paced over the past decade. Finance is a complex industry that requires a delicate balance between optimal convenience and security.

With security being the most important aspect, the role of AI has increased in importance and various financial organizations are taking strides to innovate unique solutions to meet the growing demand for faster and instant response rates.

In a recent study, it was found that automation and digital intelligence save US banks over $1trillion on an annual basis. From a world perspective, more countries in different parts of the world are adopting AI tools to meet the growing demand for instant response time.

The client experience

Despite the fast rate of digital integration into various industries, clients still want to feel a personal connection to a brand experience. The advances in machine learning have allowed for a vast improvement in personalized services using customer data. This feature uses AI tools to better understand and respond to client needs.

A feature of this nature allows financial organizations to develop improved products and increase speeds in response rates. The client not only experiences faster service but also gains access to products that are relevant to their needs and interested.

The improved customer experience has also improved by eliminating the need to go to the physical office of a financial institution to solve a problem. The incorporation of chatbots for customer service allows clients to easily solve queries remotely.

A recent example is the Bank of America’s chatbot, known as Erica, who is accessible at all times of the day is currently used by a million people. This eliminates having to deal with human assistants meaning that it is easier to access solutions. Customer service is on the areas that allow financial institutions to thrive and the client is increasingly demanding optimal customer service.

Improved security and fraud prevention

More financial organizations are making use of biometric data to record customer data. Some financial institutions have decided to replace passwords, thus simplifying client verification. Despite the simplicity, it offers a higher level of security beyond a simple pin code.

In the future, clients are anticipated to simply use their biometrics to access their funds at an ATM or the bank. Another aspect of improving response times to limit cybercrime and prevent fraud by easily identifying client patterns. The knowledge of client patterns allows clients to be contacted in the event of unusual activities.

Disruption from startup innovation

The term disruption has transformed into a positive term in the past decade because disruptors have created technology that speeds up and streamlines payments, product maintenance for clients and increasing the value chain.

Financial institutions are finding ways to work collaboratively with disruptors and innovative FinTech companies to create improved technology-driven solutions. The culture of disruption has allowed financial institutions to deliver more innovative money management solutions and simple avenues to process transactions with minimal delays.

Disruptors generally evolve at a rapid pace and are also becoming institutions that are becoming standalone financial service providers. The expanded competition only creates room for a wide range of institutions to choose from dedicated to solving client problems.

Using robotics to eliminate the risk

The growing alliance between financial services and technology companies focused on AI allows the financial industry to have a better understanding of consumer patterns to develop products relevant to them.

The joy of incorporating AI tools means that the client does not have to resort to interacting with a bank teller to solve an issue. The integration of AI tools is a good way to ensure that tasks are performed with minimal human error and eliminate hurdles that arise due to inaccuracies.

NLP AI Technology has also worked towards assisting financial institutions make informed decisions by developing different useful apps. For example, there are apps that use NLP to gather data on influencers, marketers and blog posts, that data is then used to advise financiers on how to invest. There is also other software that helps digitize financial documentation processes using NLP and that is just a few examples amongst quite a few.

Taking advantage of the sharing economy

A recent innovation in finance has been the recognition of the power of a shared economy which has been realized in industries such as transport and hospitality. The client is always looking for fast means to meet their needs and the cheapest possible options.

The rise of digital currencies and the decentralized model have shown banks that people respond to a system that allows for decentralized asset sharing.

With the rise of cryptocurrency, financial institutions have also started exploring the potential of employing blockchain to create a system that presents a public ledger and improve internal operation within an organization to deliver at high speed.

Moving infrastructure to the cloud

Financial institutions are growing more and more to use the cloud to manage their operations and this allows for easier management. Financial institutions realize the importance of automating processes such as data management, CRM, accounting and even HR.

Using analytical tools allows for the fast-tracking of data gathering and delivering solutions to clients. This allows functions like client payment, statement generation, credit checks and more to become automated and more accurate.

Once again, the issue of cybersecurity is forefronted in machines ‘taking over’ and the concern stems from the fact that the software is being sourced from third parties and requirements in the industry are highly sophisticated.

The rapid growth of data-driven solutions has placed pressure on financial institutions to work with trustworthy service providers or develop inhouse data management systems to avoid third-party interactions.

Conclusion

The language of convenience is one that is universal; everyone wants everything to work faster, be delivered to their doorstep and accommodate their needs. The financial industry is no exception to these expectations from customers. Finance organizations are taking the leap into incorporating AI tools to partly manage operations because it simplifies monitoring, reporting and processing large volumes of data.

The sophistication of analytical tools ensures that issues are resolved before they become larger issues that are beyond an organization’s control. It is certainly exciting to see how financial industries and organizations will transform in 2020 to incorporate tech tools to streamline security and operations.

Multi-touch attribution: A data-driven approach

February 4, 2020/in Data Science, Gerneral, Insights, Python, R Statistics, Tutorial, Use Case, Use Cases, Visualization/by Aakash Chugh

Customers shopping behavior has changed drastically when it comes to online shopping, as nowadays, customer likes to do a thorough market research about a product before making a purchase.

What is Multi-touch attribution?

This makes it really hard for marketers to correctly determine the contribution for each marketing channel to which a customer was exposed to. The path a customer takes from his first search to the purchase is known as a Customer Journey and this path consists of multiple marketing channels or touchpoints. Therefore, it is highly important to distribute the budget between these channels to maximize return. This problem is known as multi-touch attribution problem and the right attribution model helps to steer the marketing budget efficiently. Multi-touch attribution problem is well known among marketers. You might be thinking that if this is a well known problem then there must be an algorithm out there to deal with this. Well, there are some traditional models but every model has its own limitation which will be discussed in the next section.

Types of attribution models

Most of the eCommerce companies have a performance marketing department to make sure that the marketing budget is spent in an agile way. There are multiple heuristics attribution models pre-existing in google analytics however there are several issues with each one of them. These models are:

Traditional attribution models

First touch attribution model

100% credit is given to the first channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 1: First touch attribution model

Last touch attribution model

100% credit is given to the last channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 2: Last touch attribution model

Linear-touch attribution model

In this attribution model, equal credit is given to all the marketing channels present in customer journey as it is considered that each channel is equally responsible for the purchase.

Figure 3: Linear attribution model

U-shaped or Bath tub attribution model

This is most common in eCommerce companies, this model assigns 40% to first and last touch and 20% is equally distributed among the rest.

Figure 4: Bathtub or U-shape attribution model

Data driven attribution models

Traditional attribution models follows somewhat a naive approach to assign credit to one or all the marketing channels involved. As it is not so easy for all the companies to take one of these models and implement it. There are a lot of challenges that comes with multi-touch attribution problem like customer journey duration, overestimation of branded channels, vouchers and cross-platform issue, etc.

Switching from traditional models to data-driven models gives us more flexibility and more insights as the major part here is defining some rules to prepare the data that fits your business. These rules can be defined by performing an ad hoc analysis of customer journeys. In the next section, I will discuss about Markov chain concept as an attribution model.

Markov chains

Markov chains concepts revolves around probability. For attribution problem, every customer journey can be seen as a chain(set of marketing channels) which will compute a markov graph as illustrated in figure 5. Every channel here is represented as a vertex and the edges represent the probability of hopping from one channel to another. There will be an another detailed article, explaining the concept behind different data-driven attribution models and how to apply them.

Figure 5: Markov chain example

Challenges during the Implementation

Transitioning from a traditional attribution models to a data-driven one, may sound exciting but the implementation is rather challenging as there are several issues which can not be resolved just by changing the type of model. Before its implementation, the marketers should perform a customer journey analysis to gain some insights about their customers and try to find out/perform:

Length of customer journey.
On an average how many branded and non branded channels (distinct and non-distinct) in a typical customer journey?
Identify most upper funnel and lower funnel channels.
Voucher analysis: within branded and non-branded channels.

When you are done with the analysis and able to answer all of the above questions, the next step would be to define some rules in order to handle the user data according to your business needs. Some of the issues during the implementation are discussed below along with their solution.

Customer journey duration

Assuming that you are a retailer, let’s try to understand this issue with an example. In May 2016, your company started a Fb advertising campaign for a particular product category which “attracted” a lot of customers including Chris. He saw your Fb ad while working in the office and clicked on it, which took him to your website. As soon as he registered on your website, his boss called him (probably because he was on Fb while working), he closed everything and went for the meeting. After coming back, he started working and completely forgot about your ad or products. After a few days, he received an email with some offers of your products which also he ignored until he saw an ad again on TV in Jan 2019 (after 3 years). At this moment, he started doing his research about your products and finally bought one of your products from some Instagram campaign. It took Chris almost 3 years to make his first purchase.

Figure 6: Chris journey

Now, take a minute and think, if you analyse the entire journey of customers like Chris, you would realize that you are still assigning some of the credit to the touchpoints that happened 3 years ago. This can be solved by using an attribution window. Figure 6 illustrates that 83% of the customers are making a purchase within 30 days which means the attribution window here could be 30 days. In simple words, it is safe to remove the touchpoints that happens after 30 days of purchase. This parameter can also be changed to 45 days or 60 days, depending on the use case.

Figure 7: Length of customer journey

Removal of direct marketing channel

A well known issue that every marketing analyst is aware of is, customers who are already aware of the brand usually comes to the website directly. This leads to overestimation of direct channel and branded channels start getting more credit. In this case, you can set a threshold (say 7 days) and remove these branded channels from customer journey.

Figure 8: Removal of branded channels

Cross platform problem

If some of your customers are using different devices to explore your products and you are not able to track them then it will make retargeting really difficult. In a perfect world these customers belong to same journey and if these can’t be combined then, except one, other paths would be considered as “non-converting path”. For attribution problem device could be thought of as a touchpoint to include in the path but to be able to track these customers across all devices would still be challenging. A brief introduction to deterministic and probabilistic ways of cross device tracking can be found here.

Figure 9: Cross platform clash

How to account for Vouchers?

To better account for vouchers, it can be added as a ‘dummy’ touchpoint of the type of voucher (CRM,Social media, Affiliate or Pricing etc.) used. In our case, we tried to add these vouchers as first touchpoint and also as a last touchpoint but no significant difference was found. Also, if the marketing channel of which the voucher was used was already in the path, the dummy touchpoint was not added.

Figure 10: Addition of Voucher as a touchpoint