5 AI Tricks to Grow Your Online Sales

The way people shop is currently changing. This only means that online stores need optimization to stay competitive and answer to the needs of customers. In this post, we’ll bring up the five ways in which you can use artificial intelligence technology in an online store to grow your revenues. Let’s begin!

1. Personalization with AI

Opening the list of AI trends that are certainly worth covering deals with a step up in personalization. Did you know that according to the results of a survey that was held by Accenture, more than 90% of shoppers are likelier to buy things from those stores and brands that propose suitable product recommendations?

This is exactly where artificial intelligence can give you a big hand. Such progressive technology analyzes the behavior of your consumers individually, keeping in mind their browsing and purchasing history. After collecting all the data, AI draws the necessary conclusions and offers those product recommendations that the user might like.

Look at the example below with the block has a carousel of neat product options. Obviously, this “move” can give a big boost to the average cart sizes.

Screenshot taken on the official Reebok website

Screenshot taken on the official Reebok website

2. Smarter Search Options

With the rise of the popularity of AI voice assistants and the leap in technology in general, the way people look for things on the web has changed. Everything is moving towards saving time and getting faster better results.

One of such trends deals with embracing the text to speech and image search technology. Did you notice how many search bars have “microphone icons” for talking out your request?

On a similar note, numerous sites have made a big jump forward after incorporating search by picture. In this case, uploaded photos get analyzed by artificial intelligence technology. The system studies what’s depicted on the image and cross-checks it with the products sold in the store. In several seconds the user is provided with a selection of similar products.

Without any doubt, this greatly helps users find what they were looking for faster. As you might have guessed, this is a time-saving feature. In essence, this omits the necessity to open dozens of product pages on multiple sites when seeking out a liked item that they’ve taken a screenshot or photo of.

Check out how such a feature works on the official Amazon website by taking a look at the screenshots of StyleSnap provided below.

Screenshot taken on the official Amazon StyleSnap website

Screenshot taken on the official Amazon StyleSnap website

3. Assisting Clients via Chatbots

The next point on the list is devoted to AI chatbots. This feature can be a real magic wand with client support which is also beneficial for online sales.

Real customer support specialists usually aren’t available 24/7. And keeping in mind that most requests are on repetitive topics, having a chatbot instantly handle many of the questions is a neat way to “unload” the work of humans.

Such chatbots use machine learning to get better at understanding and processing client queries. How do they work? They’re “taught” via scripts and scenario schemes. Therefore, the more data you supply them with, the more matters they’ll be able to cover.

Case in point, there’s such a chat available on the official Victoria’s Secret website. If the user launches the Digital Assistant, the messenger bot starts the conversation. Based on the selected topic the user selects from the options, the bot defines what will be discussed.

Screenshot taken on the official Victoria’s Secret website

Screenshot taken on the official Victoria’s Secret website

4. Determining Top-Selling Product Combos

A similar AI use case for boosting online revenues to the one mentioned in the first point, it becomes much easier to cross-sell products when artificial intelligence “cracks” the actual top matches. Based on the findings by Sumo, you can boost your revenues by 10 to 30% if you upsell wisely!

The product database of online stores gets larger by the month, making it harder to know for good which items go well together and complement each other. With AI on your analytics team, you don’t have to scratch your head guessing which products people are likely to additionally buy along with the item they’re browsing at the moment. This work on singling out data can be done for you.

As seen on the screenshot from the official MAC Cosmetics website, the upselling section on the product page presents supplement items in a carousel. Thus, the chance of these products getting added to the shopping cart increases (if you compare it to the situation when the client would search the site and find these products by himself).

Screenshot taken on the official MAC Cosmetics website

Screenshot taken on the official MAC Cosmetics website

5. “Try It On” with a Camera

The fifth AI technology in this list is virtual try on that borrowed the power of augmented reality technology in the world of sales.

Especially for fields like cosmetics or accessories, it is important to find ways to help clients to make up their minds and encourage them to buy an item without testing it physically. If you want, you can play around with such real-time functionality and put on makeup using your camera on the official Maybelline New York site.

Consumers, ultimately, become happier because this solution omits frustration and unneeded doubts. With everything evident and clear, people don’t have the need to take a shot in the dark what will be a good match, they can see it.

Screenshot taken on the official Maybelline New York website

Screenshot taken on the official Maybelline New York website

In Closing

To conclude everything stated in this article, artificial intelligence is a big crunch point. Incorporating various AI-powered features into an online retail store can be a neat advancement leading to a visible growth in conversions.

What is Data Warehousing and Data Mining – Know the difference between Data Warehousing and Data Mining

Getting started

Before we start off with Data Warehousing and Data Mining, let us first set the ground for the same. This will help in understanding why we need them in the first place. By the end of the post, you would feel much more acquainted with the two topics at hand. So here goes!

With the exponential increase in the generation and consumption of the data, the organizations have to deal with a humungous amount of data at their end. We all have heard the talks about data being the new oil, which is rather turning out to be a reality. Data is considered to be an extremely valuable asset for every organization and they attempt to put it to good use. The data assists the organizations in making business decisions that will generate significant revenues. It helps in understanding the current requirements of the market which is vital for some organizations to stay in business. Thus, it is essential for organizations to store the data somewhere which can be later utilized for analytical purposes.

 

Introduction to Data Warehousing

Data Warehousing is a process to collect and manage data from a variety of sources. The data can also come from different departments of an organization like Finance, Marketing, etc. The idea of constructing a Data Warehouse is to be able to use the data for analytical purposes and make decisions based upon the analysis. Data warehouses are a pivotal component of any analytical and business intelligence operations at an organization. As the organizations can generate data at various sources, we might need to use different tools in order to store the data at a single source. The process of data warehousing generally involves ETL (Extract – Transform – Load) tools that help to extract the data from different sources, transform the data into a suitable format, and load the data into a single source. There are some tools like Google BigQuery, Amazon Redshift, etc that allow you to connect with a vast number of sources to store the data in one place. The data warehouses can be implemented on-premise as well as on the cloud. On-premise data warehouses are implemented on the local networks of the organization while cloud data warehouses are implemented over the internet. There is always a trade-off in making decisions as to which one to choose because there are multiple factors to be considered like scalability, initial investment, recurring costs, security, speed, etc.

General steps to implement a data warehouse

  1. Determining the business objectives.

Every organization can have different business objectives that define success in its terms. Some organizations are involved in a constantly changing market which will require a large number of sources while others might just need to use the data for better administration purposes. Hence the key step to initiate the creation of a data warehouse is to include the stakeholders and determine their business objectives.

  1. Analyzing and obtaining the information regarding the objectives.

Once the business objectives are decided, the information regarding those objectives needs to be obtained. The information can be obtained using any periodic report, or any CRM application, etc depending upon the organization. An extensive amount of interaction with all the supervisors attending to that information can be crucial for this process. Interacting with the people that are daily involved in this routine can serve a lot of information. These people tend to know the bits and pieces of the entire task and any information obtained from these people can lead to better implementation of the data warehouse. This step also helps in identifying the key performance indicators for the desired objectives.

  1. Identifying the concerned departments in the organization.

The key performance indicators can be different for different organizations. For example, in an organization that deals with the manufacturing of different products, if their objective were to increase the revenue then the number of units sold would be one of the key performance indicators. Based on these indicators, the involvement of the concerned departments proves to be significant.

  1. Create a layout of the data warehouse.

With all the key performance indicators discovered, create a layout of the data warehouse. It will help in providing an overview of the entire data warehouse and the data that will be stored in it. It will determine what key indicators are being stored in the data warehouse and whether all the indicators required for our objectives exist or not. As the data will be pulled in periodically, think about all of the investment costs including the hardware costs and recurring costs.

  1. Locating the sources of data and its transformation.

After finalizing the layout, we need to locate the sources of the data and figure out how the data can be extracted from it. The data can be in a CRM application or any database, we need to export the data or make use of an ETL tool that can connect with the data source. As the data comes from different sources, there is a need to consolidate all of the data. Also, there are high chances that the data is not clean and needs some transformation. In case some data may not be extracted then we need the reconsider the layout of the data warehouse. Many times, these two steps are performed in a parallel fashion.

  1. Implementation of the data warehouse layout

After the objectives are set in place, the concerned stakeholders are looped into the plan, the information is collected and analyzed, a layout for creating a data warehouse is planned, the data sources are located and transformed, now it is time to put all the things to work. After the data from the warehouse can be accessed, the data needs to be pulled from the sources periodically. We need to monitor the data warehouse continuously and check for any irregularities.

 

Introduction to Data Mining

Data Mining is a process to extract insightful information from a large amount of raw data. The intent of this process is to find some trends and patterns which would help organizations in making data-driven decisions. This process is one of the steps of KDD or Knowledge Discovery in Database. KDD also includes different sub-processes like data cleaning, data transformation, data pre-processing, etc. There are a number of tools that we can use for performing Data Mining – Tableau and Power BI being two of them. We can also make use of certain packages in Python and R languages to extract information. Data Mining helps in analyzing a huge amount of data in a quick amount of time. It is an essential step in any data science project because it provides some exploratory insights which might tell us which features are very important in prediction or which features provide very little information. Data Mining helps the organization in various ways like analyzing their market expenditure, resource management, fraud detection, etc. There are a lot of Data mining techniques which include Association rules, Classification, Clustering, Regression, Outlier Detection, etc.  It is a very cost-effective solution for the organization and it can regularly provide new information and analysis depending upon the skillsets. The process of data mining also begins with the understanding of the business and the data. Developing a thorough knowledge about the business and its related data is extremely pivotal for the analysts to be able to perform some operations on it.

Examples of Data Mining

  1. Ever got any recommendations on an e-commerce store when you are buying a product? Like when you buy a smartphone, the website will show you some phone cases or accessories. This is what is known as Basket Analysis. In this analysis, the buying patterns of the customers and what they tend to buy along with the other products are analyzed. It not only helps in an e-commerce website but also is implemented at any supermarket or grocery store. It can be done using Association based learning and creating some rules.
  2. Fraud detection is one of the most vital use-cases of Data Mining. Banks have a lot at stake due to the fraudulent transactions because they have to bear the losses for these transactions. Data Mining can help in analyzing the data and catching these fraudulent activities. Although it is a tedious task, the organizations can try to extract some patterns that will help in getting a hold of these fraudsters.

 

The link between Data Warehousing and Data Mining

Although we will mention the differences that lie between these two terms, let us see how data warehousing data mining is linked to each other. In fact, data warehousing and data mining work in conjunction with each other. As mentioned earlier in the article, we all know that data mining includes the extraction of useful information from tons of data. In order to perform data mining, from where will the analyst obtain the data? Their search ends at the data warehouse itself as it is a single source of contact for all their data needs. The data mining process is provided with all the information that is required for the analysis from the data warehouse. In many instances, the analytical team is able to extract some useful information from the data merged from two completely different departments of the organizations or even from different offices of the same department.

Distinguishing between Data Warehousing and Data Mining

Parameter Data Warehousing Data Mining
Process It is a process of storing data from multiple sources. It is a process of using different methods to analyze the raw data.

 

Ideology The idea behind it was to centralize all the sources of data into one location for ease of use in analytical processes. The idea behind it was to use the data to find some trends and patterns and help the organization in making good decisions.
Requirements To implement a data warehouse, we need to locate the different means of sources and how the data can be extracted from those sources. In order to perform data mining, a data warehouse needs to be implemented to be able to look at all the sources and then analyze the raw data.
Maintenance The data pipelines need to be maintained and monitored to prevent any loss of data. The methods used for extracting information need to be maintained and monitored in order to check if they provide any useful information or not.
Periodicity The data is extracted and stored in the data warehouse periodically. The data needs to be analyzed periodically for continuously extracting useful information.
Tools Tools used for this process include Google BigQuery, Amazon Redshift, etc Tools used for this process include Tableau, Power BI, etc.
Benefits Easy access to the historic data of the organization Helps in detecting any fraudulent operations, financial and market analysis, etc.

 

End Notes

Any organization that plans to use the data at hand for analytical purposes needs to implement a data warehouse and different data mining techniques. It requires a good amount of skillset and resources for getting good use of it. Another element that is also vital in the entire data analysis process is the interpretation of the analysis. One should be able to correctly interpret what the data is trying to tell you because all the decisions are based on these interpretations. Bad decisions could really cost organizations a fortune of money. But the decisions that are spot-on can make the organizations earn a fortune of money as well. This explains the increasing demand for different positions such as Data Engineers, Data Scientists, and Data Analysts.

This article was centered on giving its readers an overview of data warehousing and data mining. It mentioned different steps that are generally involved during the implementation of a data warehouse. It illustrated a couple of examples of Data Mining and explained how data warehousing and data mining are linked to each other. And lastly, provided some distinguishing parameters between data warehousing and data mining.

Process Mining mit MEHRWERK – Artikelserie

Dieser Artikel der Artikelserie Process Mining Tools beschäftigt sich mit dem Anbieter MEHRWERK. Das im Jahr 2008 gegründete Unternehmen, heute geführt durch drei Geschäftsführer, bietet Business Intelligence als Beratung und Dienstleistung rund um die Produkte des BI-Software-Anbieters QlikTech an. Rund zehn Jahre später, 2018, stieg das Unternehmen auch als Teil-Software-Anbieter in Process Mining ein. MEHRWERK ProcessMining, kurz MPM, ist einen Process Mining Lösung auf der Basis des weit verbreiteten BI-Tools Qlik Sense.

Lösungspakete: Standard-Lizenz
Zielgruppe:  Für mittel- und große Unternehmen
Datenquellen: Beliebig über Standard-Konnektoren von Qlik Sense
Datenvolumen: Unlimitierte Datenmengen
Architektur: On-Premise, Cloud oder Multi-Cloud

Für den Einsatz von MEHRWERK ProcessMining wird Qlik Sense Enterprise benötigt, welches sowohl On-Premise auf unternehmenseigenen Windows-Servern direkt installiert werden kann, über Kubernetes via Container ebenfalls On-Premise oder in  sowie auch noch einfacher direkt in der Qlik Cloud oder aus Datenschutzgründen in Verbindung mit der Hochskalierbarkeit der Cloud als hybrides Deployment.

Bedienbarkeit und Anpassungsfähigkeit der Analysen

Die Beurteilung der Bedienbarkeit ist nahezu vollständig abhängig von der Einschätzung zur Bedienbarkeit von Qlik Sense, da MPM auf diesem gängigen BI-Tool basiert. Im Wording von Qlik Sense arbeiten Developer in einem Hub und erstellen Apps, die ein oder mehrere Worksheets (Arbeitsblätter) umfassen können, welche horizontal durchgeblättert werden können. Die Qlik-Technologie ermöglicht es dabei übrigens auch, neben Story-Telling-Boards ganze Dashboards oder einzelne Visualisierungen über Mashups in Webseiten einzubetten.

Jede App kann in einem bestimmten Stream veröffentlicht werden. Über die Apps und die Streams wird der Zugriff durch die Nutzer erweitert, beschränkt oder anderweitig organisiert. Die Zugriffe auf Apps können über Security Rules gesteuert und beschränkt werden, was für die Data Governance eines Unternehmens wichtig ist und die Lösung auch mandantenfähig macht.

Figure 1 - Übersicht über die wichtigsten Schaltflächen einer Qlik Sense-App

Figure 1 – Übersicht über die wichtigsten Schaltflächen einer Qlik Sense-App

Wer mit Qlik Sense als BI-Tool bereits vertraut ist, wird sich hier sofort zurechtfinden und kann direkt in Process Mining als Analyseform, die immer mehr zum festen Bestandteil leistungsstarker BI-Systeme wird, einsteigen. Standardmäßig startet jede App im Ansichtsmodus. Die Qlik Sense-User-Role „Analyzer User“ ist nur für diese Ansicht berechtigt und kann Apps nur lesend verwenden. Die App ist jedoch interaktiv nutzbar, so dass alle in der App verfügbaren Dimensionen anklickbar und als Filter nutzbar sind. Die Besonderheit ist hier das assoziative Datenmodell, welches durch Qlik’s inMemory Engine bereitgestellt wird. Diese überwindet die Einschränkungen relationaler Datenbanken und SQL-Abfragen. Bei diesem traditionellen Ansatz müssen Datenquellen mit SQL-Join-Befehlen kombiniert werden, und es müssen im Voraus Annahmen über die Art der Fragen getroffen werden, die die Anwender stellen werden. Wenn ein Benutzer eine Analyse durchführen möchte, die nicht geplant war, müssen die Daten neu aufgebaut werden, was die Ausführung komplexer Abfragen zur Folge hat und eine gewisse Wartezeit verursacht. Die assoziative Engine hingegen ermöglicht “on the fly”-Berechnungen und Aggregationen, die sofortige Erkenntnisse über die betrachteten Prozesse liefern.

Für Anwender, die mit den Filtermöglichkeiten nicht so vertraut sind, bietet Qlik auch die assoziative Suche an. Diese ermöglicht es, Suchbegriffe, ähnlich wie bei Google, einzugeben. Die Assoziative Engine ermittelt dann mögliche Treffer und Verbindungen in den Daten, welche daraufhin entsprechend gefiltert werden.

Die User-Role „Professional User“ kann jede veröffentlichte App zudem im Editier-Modus öffnen und eigene Arbeitsblätter und Analysen auf Basis zentral definierter Masteritems (Kennzahlen und Dimensionen) erstellen. Ebenfalls können bestehende Dashboards dupliziert werden, um diese für den eigenen Bedarf anzupassen, z. B. um Tabellen und Diagrammen anzupassen oder zu löschen. Dabei erfolgt jedoch keine Datenduplizierung, da Qlik Sense einen sogenannten Server Side Authoring Ansatz verfolgt. Durch das Konzept der Master Items wird zusätzlich sichergestellt, dass die Data Governance erhalten bleibt. Die erstellen Arbeitsblätter können durch die Professional User wiederrum veröffentlicht werden. Dabei ist sichergestellt, dass alle anderen Anwender diese „Community Sheets“ nur mit den Daten ihres Berechtigungskontexts sehen.

Figure 2 - Eine QlikSense App im Edit-Modus für "Professional User".

Figure 2 – Eine QlikSense App im Edit-Modus für “Professional User”.

Jede Seite der App kann beliebig gestaltet werden, auch so, dass Read-Only-Nutzer über die Standard-Lizenz viele Möglichkeiten des Ablesens und der Filterung von Daten erhalten.

Figure 3 - Hier eine Seite der App, die nur zur Filterung von Dimensionen gestaltet ist: Die Filterung von Prozessnetzen nach Vorgangsnummern, Produkten und/oder Prozess-Varianten

Figure 3 – Hier eine Seite der App, die nur zur Filterung von Dimensionen gestaltet ist: Die Filterung von Prozessnetzen nach Vorgangsnummern, Produkten und/oder Prozess-Varianten

MEHRWERK ProcessMining liefert Vorlagen als Standard-App, die typische Analyse-Szenarien wie das Prozess-Flussdiagramm und Filter für Durchlaufzeiten, Frequenzen und Varianten bereits vorgeben und somit den Einstieg erleichtern. Die Template App liefert außerdem sehr umfangreiche Process Mining Funktionen wie Conformance Checking, automatisierte Ursachenanalysen, Prozessmusterabfragen oder kontinuierliches Process Monitoring gleich mit aus. Außerdem können u.a. Schichten, Prozesshierarchien oder Sollprozesse konfiguriert werden.

Nur User mit der Qlik Sense „Professional User“ Lizenz können dazu im Editier-Modus auch die Datenmodelle einsehen, erstellen und anpassen. So wie auch in der klassischen Business Intelligence sind im Process Mining Datenmodelle in Form sogenannter Event-Logs entscheidend für die Analyse und die Vorbedingung auch für die MPM App.

Figure 4 - Beispielhaftes Event Log aus der Beispielvorlage-App von MEHRWERK.

Figure 4 – Beispielhaftes Event Log aus der Beispielvorlage-App von MEHRWERK.

Das Event Log kann und sollte neben den drei Must-Haves für Process Mining (Case-ID, Activity Description & Timestamp) noch beliebig viele weitere hilfreiche Informationen in weiteren Spalten aufführen. Denn nur so können Abweichungen, Anomalien oder andere Auffälligkeiten im Prozess in einen Kontext gesetzt werden, um gezielte Maßnahmen treffen zu können.

Integrationsfähigkeit

Die Frage, wie gut und leicht sich MEHRWERK ProcessMining in die Unternehmens-IT einfügen lässt, stellt sich mit der Frage, ob Qlik Sense bereits Teil der IT-Infrastruktur ist oder beispielsweise als Cloud-Lösung eingesetzt wird. Unternehmen, die bisher nicht auf Qlik Sense setzten, müssten hier die grundsätzliche Frage der Voraussetzungen des Tools von QlikTech stellen.  Vollständigerweise sei jedoch angemerkt, dass laut Aussage von MEHRWERK ca. 40% ihrer Kunden vorher kein Qlik Sense im Einsatz hatten und die Installation von Qlik Sense keine große Hürde darstellt.

Ein wesentlicher Aspekt der Integrationsfähigkeit ist jedoch nicht nur die Integration der Software in die IT-Infrastruktur, sondern auch, wie leicht sich Daten in das benötigte Datenformat (Event Log) überführen lässt. Es ist zwar möglich, Qlik Sense mit MPM ausschließlich für die Datenanalyse/-visualisierung zu verwenden, und die Datenmodellierung dann mit anderen Tools (Datenbanken, ETL) durchzuführen. Allerdings bringt Qlik Sense selbst eine Menge an Konnektoren zu vielen Datenquellen mit. Wie mit jedem Process Mining Tool ist gibt es dabei zwei Konzepte der Datenaufbereitung. Die eine Möglichkeit ist das Laden, Konsolidieren und Vorbereiten der Datenbank für ein Data Warehouse (DWH), das die Daten bereits in Event Logs transformiert. In diesem Fall kann MPM die Daten über einen Standard-Konnektor von Qlik Sense importieren, in ein MPM-spezifisches Event Log nachbereiten und dann direkt mit der Analyse starten. Dabei benötigt Qlik Sense keine eigene Datenbank für die Datenhaltung sondern verabeitet die Daten hochkomprimiert in der eigenen, patentierten InMemory-Engine.

Figure 5 - Qlik Sense Standard Connectors

Figure 5 – Qlik Sense Standard Connectors

Das andere Konzept der Datenaufbereitung ist die Nutzung von Qlik Sense auch als Tool für das Datenmanagement. Hierfür werden die Standard-Konnektoren genutzt, um Daten möglichst direkt an Qlik Sense anzubinden. In diesem Fall muss die Bildung des anwendungsfallspezifischen Event Logs als prozessprotokollartiges Datenmodell in Qlik Sense erfolgen. Dies lässt sich in einem prozeduralen Skript mit der Qlik-eigenen Skriptsprache, die an die Sprache DAX von Microsoft sowie an SQL erinnert, umsetzen. Dabei kann das Skript in mehrere Segmente unterteilt und die Ausführung automatisiert und ge-timed werden. MEHRWERK ProcessMining bietet hierfür standardisierte ETL-Best-Practices an, die erlauben mit Hilfe von Regelwerken die Eventloggenerierung stark zu vereinfachen. Ein großer Vorteil ist die Verzahnung von Process Mining Funktionalitäten während des ETL-Prozesses. Dies erlaubt frühzeitiges und visuelles Validieren schon bei der Beladung.

Figure 6 - Das Laden und Modellieren von Daten kann eingeschränkt visuell mit klickbaren Oberflächen erfolgen. Mehr Möglichkeiten bietet jedoch der Qlik Script Editor.

Figure 6 – Das Laden und Modellieren von Daten kann eingeschränkt visuell mit klickbaren Oberflächen erfolgen. Mehr Möglichkeiten bietet jedoch der Qlik Script Editor.

Skalierbarkeit

Klassischerweise wurde Qlik Sense Server On-Premise in der eigenen IT-Infrastruktur installiert. Die Software Qlik Sense ist nur als Server-Version verfügbar. Qlik Sense setzt auf eine patentierte In-Memory-Technologie. Technisch ist Qlik Sense in Sachen Performance nur durch die Hardware begrenzt.

Heute kann Qlik Sense Server auch direkt über die Qlik Cloud genutzt oder über Kubernetes auf eigene Server oder in die Multi-Cloud ausgeliefert werden. Ein Betrieb bei typischen Cloud-Anbietern wie von Amazon, Google oder Microsoft ist problemlos möglich und somit technisch auch beliebig skalierbar.

Zukunftsfähigkeit

Die Zukunftsfähigkeit von MPM liegt in erster Linie in der Weiterentwicklung von Qlik Sense durch QlikTech. Im Magic Quadrant von Gartner 2020 für BI- und Analytics-Tools zählt Qlik zu den top drei Anführern nach Tableau und Microsoft.

Auf Grund der großen Qlik-Community und der weiten Verbreitung als BI-Tool zählt die Lösung von MEHRWERK vermutlich zu einer sehr zukunftssicheren mit vielen Weiterentwicklungsmöglichkeiten. Aus der Community und von anderen BI-Unternehmen gibt es viele Erweiterungen für Qlik Sense, die den Funktionsumfang von der Konnektivität zu anderen Tools bis hin zur einfacheren oder visuell attraktiveren Analyse verbessern. Für Qlik Sense gibt es viele weitere Anbieter für diverse Erweiterungen sowie Qlik-eigene und kompatible Co-Lösungen für Master Data Management und Data Governance. Auch die Integration von Data Science Tools via Programmiersprachen wie Python oder R ist möglich und erweitert diese Plattform in Richtung Advanced Analytics.

Die Weiterentwicklung der Process Mining Lösung erfolgt unabhängig davon auch durch MEHRWERK selbst, so wird Machine Learning vermehrt dazu eingesetzt, Process Anomalien zu erkennen sowie Durchlaufzeiten von Prozessen zu prognostizieren.

Preisgestaltung

Die Preisgestaltung wird von MEHRWERK nicht transparent kommuniziert und liegt im Vergleich zu anderen Process Mining Tools erfahrungsgemäß im Mittelfeld. Neben den MPM spezifischen Kosten werden darüber hinaus auch User-Lizenzen für Qlik Sense fällig. Weitere mögliche Kosten hängen auch von der Wahl ab, ob die Qlik Cloud, eine andere Cloud-Plattform oder die On-Premise-Installation geplant wird.

Fazit

MEHRWERK Process Mining ist für Unternehmen, die voll und ganz auf QlikSense als BI-Tool setzen, eine echte Option für den schnellen und leistungsstarken Einstieg in diese spezielle Analysemethodik. Mitarbeiter, die Qlik Sense bereits kennen, finden sich hier beinahe sofort zurecht und können direkt starten, sofern Event-Logs vorliegen. Die Gestaltung von Event-Logs in Qlik Sense bedingt jedoch etwas Erfahrung mit der Datenaufbereitung und -modellierung in Qlik Sense und Kenntnisse in Qlik Script.

Bag of Words: Convert text into vectors

In this blog, we will study about the model that represents and converts text to numbers i.e. the Bag of Words (BOW). The bag-of-words model has seen great success in solving problems which includes language modeling and document classification as it is simple to understand and implement.

After completing this particular blog, you all will have an overview of: What does the bag-of-words model mean by and why is its importance in representing text. How we can develop a bag-of-words model for a collection of documents. How to use the bag of words to prepare a vocabulary and deploy in a model using programming language.

 

The problem and its solution…

The biggest problem with modeling text is that it is unorganised, and most of the statistical algorithms, i.e., the machine learning and deep learning techniques prefer well defined numeric data. They cannot work with raw text directly, therefore we have to convert text into numbers.

Word embeddings are commonly used in many Natural Language Processing (NLP) tasks because they are found to be useful representations of words and often lead to better performance in the various tasks performed. A huge number of approaches exist in this regard, among which some of the most widely used are Bag of Words, Fasttext, TF-IDF, Glove and word2vec. For easy user implementation, several libraries exist, such as Scikit-Learn and NLTK, which can implement these techniques in one line of code. But it is important to understand the working principle behind these word embedding techniques. As already said before, in this blog, we see how to implement Bag of words and the best way to do so is to implement these techniques from scratch in Python . Before we start with coding, let’s try to understand the theory behind the model approach.

 Theory Behind Bag of Words Approach

In simple words, Bag of words can be defined as a Natural Language Processing technique used for text modelling or we can say that it is a method of feature extraction with text data from documents.  It involves mainly two things firstly, a vocabulary of known words and, then a measure of the presence of known words.

The process of converting NLP text into numbers is called vectorization in machine learning language.A lot of different ways are available in converting text into vectors which are:

Counting the number of times each word appears in a document, and Calculating the frequency that each word appears in a document out of all the words in the document.

Understanding using an example

To understand the bag of words approach, let’s see how this technique converts text into vectors with the help of an example. Suppose we have a corpus with three sentences:

  1. “I like to eat mangoes”
  2. “Did you like to eat jellies?”
  3. “I don’t like to eat jellies”

Step 1: Firstly, we go through all the words in the above three sentences and make a list of all of the words present in our model vocabulary.

  1. I
  2. like
  3. to
  4. eat
  5. mangoes
  6. Did
  7. you
  8. like
  9. to
  10. eat
  11. Jellies
  12. I
  13. don’t
  14. like
  15. to
  16. eat
  17. jellies

Step 2: Let’s find out the frequency of each word without preprocessing our text.

But is this not the best way to perform a bag of words. In the above example, the words Jellies and jellies are considered twice no doubt they hold the same meaning. So, let us make some changes and see how we can use ‘bag of words’ by preprocessing our text in a more effective way.

Step 3: Let’s find out the frequency of each word with preprocessing our text. Preprocessing is so very important because it brings our text into such a form that is easily understandable, predictable and analyzable for our task.

Firstly, we need to convert the above sentences into lowercase characters as case does not hold any information. Then it is very important to remove any special characters or punctuations if present in our document, or else it makes the conversion more messy.

From the above explanation, we can say the major advantage of Bag of Words is that it is very easy to understand and quite simple to implement in our datasets. But this approach has some disadvantages too such as:

  1. Bag of words leads to a high dimensional feature vector due to the large size of word vocabulary.
  2. Bag of words assumes all words are independent of each other ie’, it doesn’t leverage co-occurrence statistics between words.
  3. It leads to a highly sparse vector as there is nonzero value in dimensions corresponding to words that occur in the sentence.

Bag of Words Model in Python Programming

The first thing that we need to create is a proper dataset for implementing our Bag of Words model. In the above sections, we have manually created a bag of words model with three sentences. However, now we shall find a random corpus on Wikipedia such as ‘https://en.wikipedia.org/wiki/Bag-of-words_model‘.

Step 1: The very first step is to import the required libraries: nltk, numpy, random, string, bs4, urllib.request and re.

Step 2: Once we are done with importing the libraries, now we will be using the Beautifulsoup4 library to parse the data from Wikipedia.Along with that we shall be using Python’s regex library, re, for preprocessing tasks of our document. So, we will scrape the Wikipedia article on Bag of Words.

Step 3: As we can observe, in the above code snippet we have imported the raw HTML for the Wikipedia article from which we have filtered the text within the paragraph text and, finally,have created a complete corpus by merging up all the paragraphs.

Step 4: The very next step is to split the corpus into individual sentences by using the sent_tokenize function from the NLTK library.

Step 5: Our text contains a number of punctuations which are unnecessary for our word frequency dictionary. In the below code snippet, we will see how to convert our text into lower case and then remove all the punctuations from our text, which will result in multiple empty spaces which can be again removed using regex.

Step 6: Once the preprocessing is done, let’s find out the number of sentences present in our corpus and then, print one sentence from our corpus to see how it looks.

Step 7: We can observe that the text doesn’t contain any special character or multiple empty spaces, and so our own corpus is ready. The next step is to tokenize each sentence in the corpus and create a dictionary containing each word and their corresponding frequencies.

As you can see above, we have created a dictionary called wordfreq. Next, we iterate through each word in the sentence and check if it exists in the wordfreq dictionary.  On its existence,we will add the word as the key and set the value of the word as 1.

Step 8: Our corpus has more than 500 words in total and so we shall filter down to the 200 most frequently occurring words by using Python’s heap library.


Step 9: Now, comes the final step of converting the sentences in our corpus into their corresponding vector representation. Let’s check the below code snippet to understand it. Our model is in the form of a list of lists which can be easily converted matrix form using this script:

Multi-head attention mechanism: “queries”, “keys”, and “values,” over and over again

This is the third article of my article series named “Instructions on Transformer for people outside NLP field, but with examples of NLP.”

In the last article, I explained how attention mechanism works in simple seq2seq models with RNNs, and it basically calculates correspondences of the hidden state at every time step, with all the outputs of the encoder. However I would say the attention mechanisms of RNN seq2seq models use only one standard for comparing them. Using only one standard is not enough for understanding languages, especially when you learn a foreign language. You would sometimes find it difficult to explain how to translate a word in your language to another language. Even if a pair of languages are very similar to each other, translating them cannot be simple switching of vocabulary. Usually a single token in one language is related to several tokens in the other language, and vice versa. How they correspond to each other depends on several criteria, for example “what”, “who”, “when”, “where”, “why”, and “how”. It is easy to imagine that you should compare tokens with several criteria.

Transformer model was first introduced in the original paper named “Attention Is All You Need,” and from the title you can easily see that attention mechanism plays important roles in this model. When you learn about Transformer model, you will see the figure below, which is used in the original paper on Transformer.  This is the simplified overall structure of one layer of Transformer model, and you stack this layer N times. In one layer of Transformer, there are three multi-head attention, which are displayed as boxes in orange. These are the very parts which compare the tokens on several standards. I made the head article of this article series inspired by this multi-head attention mechanism.

The figure below is also from the original paper on Transfromer. If you can understand how multi-head attention mechanism works with the explanations in the paper, and if you have no troubles understanding the codes in the official Tensorflow tutorial, I have to say this article is not for you. However I bet that is not true of majority of people, and at least I need one article to clearly explain how multi-head attention works. Please keep it in mind that this article covers only the architectures of the two figures below. However multi-head attention mechanisms are crucial components of Transformer model, and throughout this article, you would not only see how they work but also get a little control over it at an implementation level.

1 Multi-head attention mechanism

When you learn Transformer model, I recommend you first to pay attention to multi-head attention. And when you learn multi-head attentions, before seeing what scaled dot-product attention is, you should understand the whole structure of multi-head attention, which is at the right side of the figure above. In order to calculate attentions with a “query”, as I said in the last article, “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” Sooner or later, you will notice I would be just repeating these phrases over and over again throughout this article, in several ways.

*Even if you are not sure what “reweighting” means in this context, please keep reading. I think you would little by little see what it means especially in the next section.

The overall process of calculating multi-head attention, displayed in the figure above, is as follows (Please just keep reading. Please do not think too much.): first you split the V: “values”, K: “keys”, and Q: “queries”, and second you transform those divided “values”, “keys”, and “queries” with densely connected layers (“Linear” in the figure). Next you calculate attention weights and reweight the “values” and take the summation of the reiweighted “values”, and you concatenate the resulting summations. At the end you pass the concatenated “values” through another densely connected layers. The mechanism of scaled dot-product attention is just a matter of how to concretely calculate those attentions and reweight the “values”.

*In the last article I briefly mentioned that “keys” and “queries” can be in the same language. They can even be the same sentence in the same language, and in this case the resulting attentions are called self-attentions, which we are mainly going to see. I think most people calculate “self-attentions” unconsciously when they speak. You constantly care about what “she”, “it” , “the”, or “that” refers to in you own sentence, and we can say self-attention is how these everyday processes is implemented.

Let’s see the whole process of calculating multi-head attention at a little abstract level. From now on, we consider an example of calculating multi-head self-attentions, where the input is a sentence “Anthony Hopkins admired Michael Bay as a great director.” In this example, the number of tokens is 9, and each token is encoded as a 512-dimensional embedding vector. And the number of heads is 8. In this case, as you can see in the figure below, the input sentence “Anthony Hopkins admired Michael Bay as a great director.” is implemented as a 9\times 512 matrix. You first split each token into 512/8=64 dimensional, 8 vectors in total, as I colored in the figure below. In other words, the input matrix is divided into 8 colored chunks, which are all 9\times 64 matrices, but each colored matrix expresses the same sentence. And you calculate self-attentions of the input sentence independently in the 8 heads, and you reweight the “values” according to the attentions/weights. After this, you stack the sum of the reweighted “values”  in each colored head, and you concatenate the stacked tokens of each colored head. The size of each colored chunk does not change even after reweighting the tokens. According to Ashish Vaswani, who invented Transformer model, each head compare “queries” and “keys” on each standard. If the a Transformer model has 4 layers with 8-head multi-head attention , at least its encoder has 4\times 8 = 32 heads, so the encoder learn the relations of tokens of the input on 32 different standards.

I think you now have rough insight into how you calculate multi-head attentions. In the next section I am going to explain the process of reweighting the tokens, that is, I am finally going to explain what those colorful lines in the head image of this article series are.

*Each head is randomly initialized, so they learn to compare tokens with different criteria. The standards might be straightforward like “what” or “who”, or maybe much more complicated. In attention mechanisms in deep learning, you do not need feature engineering for setting such standards.

2 Calculating attentions and reweighting “values”

If you have read the last article or if you understand attention mechanism to some extent, you should already know that attention mechanism calculates attentions, or relevance between “queries” and “keys.” In the last article, I showed the idea of weights as a histogram, and in that case the “query” was the hidden state of the decoder at every time step, whereas the “keys” were the outputs of the encoder. In this section, I am going to explain attention mechanism in a more abstract way, and we consider comparing more general “tokens”, rather than concrete outputs of certain networks. In this section each [ \cdots ] denotes a token, which is usually an embedding vector in practice.

Please remember this mantra of attention mechanism: “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” The figure below shows an overview of a case where “Michael” is a query. In this case you compare the query with the “keys”, that is, the input sentence “Anthony Hopkins admired Michael Bay as a great director.” and you get the histogram of attentions/weights. Importantly the sum of the weights 1. With the attentions you have just calculated, you can reweight the “values,” which also denote the same input sentence. After that you can finally take a summation of the reweighted values. And you use this summation.

*I have been repeating the phrase “reweighting ‘values’  with attentions,”  but you in practice calculate the sum of those reweighted “values.”

Assume that compared to the “query”  token “Michael”, the weights of the “key” tokens “Anthony”, “Hopkins”, “admired”, “Michael”, “Bay”, “as”, “a”, “great”, and “director.” are respectively 0.06, 0.09, 0.05, 0.25, 0.18, 0.06, 0.09, 0.06, 0.15. In this case the sum of the reweighted token is 0.06″Anthony” + 0.09″Hopkins” + 0.05″admired” + 0.25″Michael” + 0.18″Bay” + 0.06″as” + 0.09″a” + 0.06″great” 0.15″director.”, and this sum is the what wee actually use.

*Of course the tokens are embedding vectors in practice. You calculate the reweighted vector in actual implementation.

You repeat this process for all the “queries.”  As you can see in the figure below, you get summations of 9 pairs of reweighted “values” because you use every token of the input sentence “Anthony Hopkins admired Michael Bay as a great director.” as a “query.” You stack the sum of reweighted “values” like the matrix in purple in the figure below, and this is the output of a one head multi-head attention.

3 Scaled-dot product

This section is a only a matter of linear algebra. Maybe this is not even so sophisticated as linear algebra. You just have to do lots of Excel-like operations. A tutorial on Transformer by Jay Alammar is also a very nice study material to understand this topic with simpler examples. I tried my best so that you can clearly understand multi-head attention at a more mathematical level, and all you need to know in order to read this section is how to calculate products of matrices or vectors, which you would see in the first some pages of textbooks on linear algebra.

We have seen that in order to calculate multi-head attentions, we prepare 8 pairs of “queries”, “keys” , and “values”, which I showed in 8 different colors in the figure in the first section. We calculate attentions and reweight “values” independently in 8 different heads, and in each head the reweighted “values” are calculated with this very simple formula of scaled dot-product: Attention(\boldsymbol{Q}, \boldsymbol{K}, \boldsymbol{V}) =softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})\boldsymbol{V}. Let’s take an example of calculating a scaled dot-product in the blue head.

At the left side of the figure below is a figure from the original paper on Transformer, which explains one-head of multi-head attention. If you have read through this article so far, the figure at the right side would be more straightforward to understand. You divide the input sentence into 8 chunks of matrices, and you independently put those chunks into eight head. In one head, you convert the input matrix by three different fully connected layers, which is “Linear” in the figure below, and prepare three matrices Q, K, V, which are “queries”, “keys”, and “values” respectively.

*Whichever color attention heads are in, the processes are all the same.

*You divide \frac{\boldsymbol{Q} \boldsymbol{K}} ^T by \sqrt{d}_k in the formula. According to the original paper, it is known that re-scaling \frac{\boldsymbol{Q} \boldsymbol{K}} ^T by \sqrt{d}_k is found to be effective. I am not going to discuss why in this article.

As you can see in the figure below, calculating Attention(\boldsymbol{Q}, \boldsymbol{K}, \boldsymbol{V}) is virtually just multiplying three matrices with the same size (Only K is transposed though). The resulting 9\times 64 matrix is the output of the head.

softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k}) is calculated like in the figure below. The softmax function regularize each row of the re-scaled product \frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k}, and the resulting 9\times 9 matrix is a kind a heat map of self-attentions.

The process of comparing one “query” with “keys” is done with simple multiplication of a vector and a matrix, as you can see in the figure below. You can get a histogram of attentions for each query, and the resulting 9 dimensional vector is a list of attentions/weights, which is a list of blue circles in the figure below. That means, in Transformer model, you can compare a “query” and a “key” only by calculating an inner product. After re-scaling the vectors by dividing them with \sqrt{d_k} and regularizing them with a softmax function, you stack those vectors, and the stacked vectors is the heat map of attentions.

You can reweight “values” with the heat map of self-attentions, with simple multiplication. It would be more straightforward if you consider a transposed scaled dot-product \boldsymbol{V}^T \cdot softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})^T. This also should be easy to understand if you know basics of linear algebra.

One column of the resulting matrix (\boldsymbol{V}^T \cdot softmax(\frac{\boldsymbol{Q} \boldsymbol{K} ^T}{\sqrt{d}_k})^T) can be calculated with a simple multiplication of a matrix and a vector, as you can see in the figure below. This corresponds to the process or “taking a summation of reweighted ‘values’,” which I have been repeating. And I would like you to remember that you got those weights (blue) circles by comparing a “query” with “keys.”

Again and again, let’s repeat the mantra of attention mechanism together: “you compare the ‘query’ with the ‘keys’ and get scores/weights for the ‘values.’ Each score/weight is in short the relevance between the ‘query’ and each ‘key’. And you reweight the ‘values’ with the scores/weights, and take the summation of the reweighted ‘values’.” If you have been patient enough to follow my explanations, I bet you have got a clear view on how multi-head attention mechanism works.

We have been seeing the case of the blue head, but you can do exactly the same procedures in every head, at the same time, and this is what enables parallelization of multi-head attention mechanism. You concatenate the outputs of all the heads, and you put the concatenated matrix through a fully connected layers.

If you are reading this article from the beginning, I think this section is also showing the same idea which I have repeated, and I bet more or less you no have clearer views on how multi-head attention mechanism works. In the next section we are going to see how this is implemented.

4 Tensorflow implementation of multi-head attention

Let’s see how multi-head attention is implemented in the Tensorflow official tutorial. If you have read through this article so far, this should not be so difficult. I also added codes for displaying heat maps of self attentions. With the codes in this Github page, you can display self-attention heat maps for any input sentences in English.

The multi-head attention mechanism is implemented as below. If you understand Python codes and Tensorflow to some extent, I think this part is relatively easy.  The multi-head attention part is implemented as a class because you need to train weights of some fully connected layers. Whereas, scaled dot-product is just a function.

*I am going to explain the create_padding_mask() and create_look_ahead_mask() functions in upcoming articles. You do not need them this time.

Let’s see a case of using multi-head attention mechanism on a (1, 9, 512) sized input tensor, just as we have been considering in throughout this article. The first axis of (1, 9, 512) corresponds to the batch size, so this tensor is virtually a (9, 512) sized tensor, and this means the input is composed of 9 512-dimensional vectors. In the results below, you can see how the shape of input tensor changes after each procedure of calculating multi-head attention. Also you can see that the output of the multi-head attention is the same as the input, and you get a 9\times 9 matrix of attention heat maps of each attention head.

I guess the most complicated part of this implementation above is the split_head() function, especially if you do not understand tensor arithmetic. This part corresponds to splitting the input tensor to 8 different colored matrices as in one of the figures above. If you cannot understand what is going on in the function, I recommend you to prepare a sample tensor as below.

This is just a simple (1, 9, 512) sized tensor with sequential integer elements. The first row (1, 2, …., 512) corresponds to the first input token, and (4097, 4098, … , 4608) to the last one. You should try converting this sample tensor to see how multi-head attention is implemented. For example you can try the operations below.

These operations correspond to splitting the input into 8 heads, whose sizes are all (9, 64). And the second axis of the resulting (1, 8, 9, 64) tensor corresponds to the index of the heads. Thus sample_sentence[0][0] corresponds to the first head, the blue 9\times 64 matrix. Some Tensorflow functions enable linear calculations in each attention head, independently as in the codes below.

Very importantly, we have been only considering the cases of calculating self attentions, where all “queries”, “keys”, and “values” come from the same sentence in the same language. However, as I showed in the last article, usually “queries” are in a different language from “keys” and “values” in translation tasks, and “keys” and “values” are in the same language. And as you can imagine, usualy “queries” have different number of tokens from “keys” or “values.” You also need to understand this case, which is not calculating self-attentions. If you have followed this article so far, this case is not that hard to you. Let’s briefly see an example where the input sentence in the source language is composed 9 tokens, on the other hand the output is composed 12 tokens.

As I mentioned, one of the outputs of each multi-head attention class is 9\times 9 matrix of attention heat maps, which I displayed as a matrix composed of blue circles in the last section. The the implementation in the Tensorflow official tutorial, I have added codes to display actual heat maps of any input sentences in English.

*If you want to try displaying them by yourself, download or just copy and paste codes in this Github page. Please maker “datasets” directory in the same directory as the code. Please download “spa-eng.zip” from this page, and unzip it. After that please put “spa.txt” on the “datasets” directory. Also, please download the “checkpoints_en_es” folder from this link, and place the folder in the same directory as the file in the Github page. In the upcoming articles, you would need similar processes to run my codes.

After running codes in the Github page, you can display heat maps of self attentions. Let’s input the sentence “Anthony Hopkins admired Michael Bay as a great director.” You would get a heat maps like this.

In fact, my toy implementation cannot handle proper nouns such as “Anthony” or “Michael.” Then let’s consider a simple input sentence “He admired her as a great director.” In each layer, you respectively get 8 self-attention heat maps.

I think we can see some tendencies in those heat maps. The heat maps in the early layers, which are close to the input, are blurry. And the distributions of the heat maps come to concentrate more or less diagonally. At the end, presumably they learn to pay attention to the start and the end of sentences.

You have finally finished reading this article. Congratulations.

You should be proud of having been patient, and you passed the most tiresome part of learning Transformer model. You must be ready for making a toy English-German translator in the upcoming articles. Also I am sure you have understood that Michael Bay is a great director, no matter what people say.

*Hannibal Lecter, I mean Athony Hopkins, also wrote a letter to the staff of “Breaking Bad,” and he told them the tv show let him regain his passion. He is a kind of admiring around, and I am a little worried that he might be getting senile. He played a role of a father forgetting his daughter in his new film “The Father.” I must see it to check if that is really an acting, or not.

[References]

[1] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, “Attention Is All You Need” (2017)

[2] “Transformer model for language understanding,” Tensorflow Core
https://www.tensorflow.org/overview

[3] “Neural machine translation with attention,” Tensorflow Core
https://www.tensorflow.org/tutorials/text/nmt_with_attention

[4] Jay Alammar, “The Illustrated Transformer,”
http://jalammar.github.io/illustrated-transformer/

[5] “Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 14 – Transformers and Self-Attention,” stanfordonline, (2019)
https://www.youtube.com/watch?v=5vcj8kSwBCY

[6]Tsuboi Yuuta, Unno Yuuya, Suzuki Jun, “Machine Learning Professional Series: Natural Language Processing with Deep Learning,” (2017), pp. 91-94
坪井祐太、海野裕也、鈴木潤 著, 「機械学習プロフェッショナルシリーズ 深層学習による自然言語処理」, (2017), pp. 191-193

[7]”Stanford CS224N: NLP with Deep Learning | Winter 2019 | Lecture 8 – Translation, Seq2Seq, Attention”, stanfordonline, (2019)
https://www.youtube.com/watch?v=XXtpJxZBa2c

[8]Rosemary Rossi, “Anthony Hopkins Compares ‘Genius’ Michael Bay to Spielberg, Scorsese,” yahoo! entertainment, (2017)
https://www.yahoo.com/entertainment/anthony-hopkins-transformers-director-michael-bay-guy-genius-010058439.html

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

7 Ways To Advance Your Data Science Knowledge and Expertise

Image Source: Pexels

As a data scientist, your knowledge and expertise are what powers industries. Businesses of all sectors of the economy now rely on data to inform their business processes. As many as 53% of companies have already adopted big data analytics, highlighting the upward trend in data science within the private sector.

Businesses rely on data scientists to stay competitive facing in this market. But how can you advance your data science knowledge and expertise to bring the most value to your work?

These seven strategies will help you build your resources and improve your opportunities to grow.

1. Recognize the Need for Growth

It may seem disheartening at first to realize that there is no end to the progress you can make in honing your data science skills. There is simply too much to master in just a few years. However, what this really means is that there is no end to the progress and advancement you can make as a data scientist.

Consider the breadth of what there is to know. Skills to master include probability, new programming languages, data visualization, data intuition, and so much more. Recognize the scope of your field to open the door to learning opportunities in data science.

2. Brush Up on the Latest Trends

Your opportunities as a data scientist are largely dependent on how well you can utilize new software and data analytics trends. Modern data analytics relies on artificial intelligence and machine learning processes to drive insights with unprecedented detail. Meanwhile, data communication and storage platforms like blockchain are emerging to supplement data management infrastructures.

An awareness of these modern developments paired with basic general knowledge and qualifications will be key to getting hired as a data scientist in 2021 and beyond. As companies across industries look to pivot to new tech and competitive data strategies, it is more important than ever to keep abreast of the latest data science trends.

3. Enroll in Data Science Bootcamps

Data science is a constantly changing field, driven by technological innovation. At the same time, the breadth of opportunities that exist in a tech field invite career flexibility. Data scientists can make the most of these advancement and flexibility opportunities by enrolling in boot camps and training courses designed to fill in skills gaps.

These programs cover a range of topics within the field of data science. No matter your level of expertise and education, engaging in supplemental training can help you advance your expertise and bring value-building benefits to your role as a data scientist.

4. Look for Guidance Online

Because of the increasingly virtual nature of all kinds of work and education, opportunities for data science growth may be better sought out online. There are many ways you can go about increasing your data science expertise on a virtual platform. From finding a mentor through social media like LinkedIn to participating in training courses crafted by other data science professionals, you can expand your knowledge base.

First, however, ensure that you have a productive workspace at home that will allow you to learn and grow while staying motivated. This means setting up a home office to accommodate the virtual shift, complete with a comfortable chair and desk set up to avoid neck strain and health problems.

With virtual guidance in a productive environment, you can advance your expertise to secure the value of your position.

5. Expand Your Horizons

Data science is a multifaceted arena. The role of a data scientist typically consists of harnessing and categorizing raw data to draw out useful and predictive insights. Meanwhile, other positions in analytics and IT lend to more powerful data results.

Customer analytics, for example, is another subset of data science that involves harnessing information to describe and predict customer journeys. This entails focusing on customer demographics and behaviors to assemble more carefully targeted buyer personas, which can then be used to increase customer engagement and conversion rates.

Through broadening your data skills to account for areas like customer analytics, you can advance your professional opportunities.

6. Let Your Passions Inspire You

Every data scientist has a reason they got into their field. Your passions and inspirations can inform new avenues of exploration into the many designations surrounding data science. For example, big data analysts, machine learning specialists, and data visualization experts all play vital roles in modern business.

Finding your niche and specialization can come down to what drove you into data science in the first place. Perhaps you have a talent for creating comprehensive visuals that expertly summarize the point you want to be taken from your graphic. Alternatively, diving deep into the ins and outs of algorithmic functions may be what inspires you most.

Explore your passions and commit to a lifetime of learning and growing.

7. Never Stop Improving

With rapid technological change, data scientists must maintain their awareness of new systems and processes at all times. Innovations in AI, for example, have created a skills gap in the market. Eighty percent of business leaders say that lack of talent is the biggest obstacle in AI implementation.

For data scientists, closing this skills gap can be a simple matter of improving your technological training over time. Learning how machine learning functions, for example, can assist in your application of this tech to increase the value you add to your business.

Never stop improving through new courses and credentials that explore changing technology and how these changes affect the world of data science. With a commitment to lifelong learning, your skills as a data scientist will never go out of vogue.

These seven strategies can help you formulate a plan to expand your expertise into new territory, leading to new opportunities and a lucrative financial future.

Five ways Data Science is used in Fintech

Data science experts process and act upon data that digital resources produce. In the fintech world, data comes from mobile apps, transactions, conversations and financial standings. With this data for fintech, experts can improve the experience and success of businesses and customers alike.

Apps like PayPal, Venmo and Cash App have led the way for other fintech organizations, big and small, to grow. In fact, roughly 65% of Americans are already using digital banking in some capacity, whether it’s an app or online service. This growth, in turn, brings benefits. From personalization to integrating robotic advisors, here are five ways data scientists help fintech brands.

1. Personalization

Finance is one of the most personal industries out there as it deals with your private accounts and data. To match this uniqueness, fintechs can use data science for personalization. That way, customer service caters to individual needs.

As the fintech company gathers data from individual transactions, communications, behavior and interests, data scientists can then use said data to curate a better experience for the customer. They can advertise products and services that the customer may need to help with savings, for instance.

Contis is one example of a fintech that has integrated personalization into its services. Customers receive specific recommendations to create an efficient experience.

2. Fundraising

Fundraising had an interesting year in 2020. Amid racial justice protests and movements, crowdfunding took off on fintechs like GoFundMe and Kickstarter. These platforms helped provide funding for those who needed it. From here, data scientists can use fundraising in unique ways.

They can help raise money by targeting people who have donated in the past, or who are likely to donate based on spending habits. This data provides a more well-rounded fundraising campaign.

Then, once they do have donors, they can again use data to segment contributors by interest, demographic or engagement history. This segmentation helps advertise in a more personal, interest-specific way.

3. Fraud Detection

Cybercriminals thrive on an abundance of digital interactions. With the rise in digital banking — and the pandemic-driven shift to technology — fintechs could potentially see high rates of fraud. In fact, by the end of 2020, the United States saw about $11 billion in lost funds from credit card fraud alone.

Data for fintech brands will help address and prevent fraud like this in the future. As customers produce data from their transactions and interactions, it provides a better picture of their behavior. If there’s deviance, the data then shows potential fraud may be occurring.

If fraud does occur, data scientists can then use that instance to learn and properly recognize how data behaves during cybercriminal activity.

4. Robo-Advisors

With more people using fintech services, employees have a lot on their hands. They must properly address the customers’ needs and provide solutions. However, in the online world, employees are now getting some robotic assistance.

Robo-advisors use machine learning algorithms to interact with customers online or on mobile apps. They ask questions, understand the problems and provide solutions. They also collect data like customer goals and financial plans, which they can report back to data scientists for analysis.

Overall, roughly 75% and 46% of large and small banks, respectively, are implementing artificial intelligence to some degree. This data-driven revolution is one to keep your eye on.

5. Blockchain Governance

Blockchain governance is a somewhat newer way that experts can use data for fintech services. The blockchain is commonly known for its support of cryptocurrency services. Though crypto assets like Bitcoin and Ethereum are on the rise, the blockchain itself is still getting its footing.

Now, fintechs like PayPal are offering crypto services, which means data scientists will be able to expand what’s possible for digital banking. As customers transfer crypto funds, data scientists can monitor their activity and get a better handle on the data that exists on the blockchain. From there, they can provide personalization and prevent fraud in the same ways as they would with standard digital banking.

A Changing Landscape

As data scientists continue to help fintech services grow, you’ll notice each of these five areas begins to become more common. Some, like personalization and fraud detection, are already key focuses for fintech companies. However, alongside robo-advisor, fundraising and blockchain, they all have room to grow through the use of data science.