Main Category Archives

Geschriebene Artikel über Big Data Analytics

5 Applications for Location-Based Data in 2020

March 12, 2020/in Data Science, Uncategorized/by Kaylae Matthews

Location-based data enables giving people relevant information based on where they are at any given moment. Here are five location data applications to look for in 2020 and beyond.

1. Increasing Sales and Reducing Frustration

One 2019 report indicated that 89% of the marketers who used geo data saw increased sales within their customer bases. Sometimes, the ideal way to boost sales is to convert what would be a frustration into something positive.

A French campaign associated with the Actimel yogurt brand achieved this by sending targeted, encouraging messages to drivers who used the Waze navigation app and appeared to have made a wrong turn or got caught in traffic.

For example, a driver might get a message that said, “Instead of getting mad and honking your horn, pump up the jams! #StayStrong.” The three-month campaign saw a 140% increase in ad recall.

More recently, home furnishing brand IKEA launched a campaign in Dubai where people can get free stuff for making a long trip to a store. The freebies get more valuable as a person’s commute time increases. The catch is that participants have to activate location settings on their phones and enable Google Maps. Driving five minutes to a store got a person a free veggie hot dog, and they’d get a complimentary table for traveling 49 minutes.

2. Offering Tailored Ad Targeting in Medical Offices

Pharmaceutical companies are starting to rely on companies that send targeted ads to patients connected to the Wi-Fi in doctors’ offices. One such provider is Semcasting. A recent effort involved sending ads to cardiology offices for a type of drug that lowers cholesterol levels in the blood.

The company has taken a similar approach for an over-the-counter pediatric drug and a medication to relieve migraine headaches, among others. Such initiatives cause a 10% boost in the halo effect, plus a 1.5% uptick in sales. The first perk relates to the favoritism that people feel towards other products a company makes once they like one of them.

However, location data applications related to health care arguably require special attention regarding privacy. Patients may feel uneasy if they believe that companies are watching them and know they need a particular kind of medical treatment.

3. Facilitating the Deployment of the 5G Network

The 5G network is coming soon, and network operators are working hard to roll it out. Statistics indicate that the 5G infrastructure investment will total $275 billion over seven years. Geodata can help network brands decide where to deploy 5G connectivity first.

Moreover, once a company offers 5G in an area, marketing teams can use location data to determine which neighborhoods to target when contacting potential customers. Most companies that currently have 5G within their product lineups have carefully chosen which areas are at the top of the list to receive 5G, and that practice will continue throughout 2020.

It’s easy to envision a scenario whereby people can send error reports to 5G providers by using location data. For example, a company could say that having location data collection enabled on a 5G-powered smartphone allows a technician to determine if there’s a persistent problem with coverage.

Since the 5G network is still, it’s impossible to predict all the ways that a telecommunications operator might use location data to make their installations maximally profitable. However, the potential is there for forward-thinking brands to seize.

4. Helping People Know About the Events in Their Areas

SoundHound, Inc. and Wcities recently announced a partnership that will rely on location-based data to keep people in the loop about upcoming local events. People can use a conversational intelligence platform that has information about more than 20,000 cities around the world.

Users also don’t need to mention their locations in voice queries. They could say, for example, “Which bands are playing downtown tonight?” or “Can you give me some events happening on the east side tomorrow?” They can also ask something associated with a longer timespan, such as “Are there any wine festivals happening this month?”

People can say follow-up commands, too. They might ask what the weather forecast is after hearing about an outdoor event they want to attend. The system also supports booking an Uber, letting people get to the happening without hassles.

5. Using Location-Based Data for Matchmaking

In honor of Valentine’s Day 2020, students from more than two dozen U.S colleges signed up for a matchmaking opportunity. It, at least in part, uses their location data to work.

Participants answer school-specific questions, and their responses help them find a friend or something more. The platform uses algorithms to connect people with like-minded individuals.

However, the company that provides the service can also give a breakdown of which residence halls have the most people taking part, or whether people generally live off-campus. This example is not the first time a university used location data by any means, but it’s different from the usual approach.

Location Data Applications Abound

These five examples show there are no limits to how a company might use location data. However, they must do so with care, protecting user privacy while maintaining a high level of data quality.

Integrate Unstructured Data into Your Enterprise to Drive Actionable Insights

March 5, 2020/in Big Data, Data Engineering, Data Mining, Data Science, Main Category/by Tehreem Naeem

In an ideal world, all enterprise data is structured – classified neatly into columns, rows, and tables, easily integrated and shared across the organization.

The reality is far from it! Datamation estimates that unstructured data accounts for more than 80% of enterprise data, and it is growing at a rate of 55 – 65 percent annually. This includes information stored in images, emails, spreadsheets, etc., that cannot fit into databases.

Therefore, it becomes imperative for a data-driven organization to leverage their non-traditional information assets to derive business value. We have outlined a simple 3-step process that can help organizations integrate unstructured sources into their data eco-system:

1. Determine the Challenge

The primary step is narrowing down the challenges you want to solve through the unstructured data flowing in and out of your organization. Financial organizations, for instance, use call reports, sales notes, or other text documents to get real-time insights from the data and make decisions based on the trends. Marketers make use of social media data to evaluate their customers’ needs and shape their marketing strategy.

Figuring out which process your organization is trying to optimize through unstructured data can help you reach your goal faster.

2. Map Out the Unstructured Data Sources Within the Enterprise

An actionable plan starts with identifying the range of data sources that are essential to creating a truly integrated environment. This enables organizations to align the sources with business objectives and streamline their data initiatives.

Deciding which data should be extracted, analyzed, and stored should be a primary concern in this regard. Even if you can ingest data from any source, it doesn’t mean that you should.

Collecting a large volume of unstructured data is not enough to generate insights. It needs to be properly organized and validated for quality before integration. Full, incremental, online, and offline extraction methods are generally used to mine valuable information from unstructured data sources.

3. Transform Unstructured Assets into Decision-Ready Insights

Now that you have all the puzzle pieces, the next step is to create a complete picture. This may require making changes in your organization’s infrastructure to derive meaning from your unstructured assets and get a 360-degree business view.

IDC recommends creating a company culture that promotes the collection, use, and sharing of both unstructured and structured business assets. Therefore, finding an enterprise-grade integration solution that offers enhanced connectivity to a range of data sources, ideally structured, unstructured, and semi-structured, can help organizations generate the most value out of their data assets.

Automation is another feature that can help speed up integration processes, minimize error probability, and generate time-and-cost savings. Features like job scheduling, auto-mapping, and workflow automation can optimize the process of extracting information from XML, JSON, Excel or audio files, and storing it into a relational database or generating insights.

The push to become a data-forward organization has enterprises re-evaluating the way to leverage unstructured data assets for decision-making. With an actionable plan in place to integrate these sources with the rest of the data, organizations can take advantage of the opportunities offered by analytics and stand out from the competition.

Introduction to Recommendation Engines

February 29, 2020/in Big Data, Data Science, Database, Graph Database, Machine Learning, Neo4J, Python, Tutorial, Use Case/by Aakash Chugh

This is the second article of article series Getting started with the top eCommerce use cases. If you are interested in reading the first article you can find it here.

What are Recommendation Engines?

Recommendation engines are the automated systems which helps select out similar things whenever a user selects something online. Be it Netflix, Amazon, Spotify, Facebook or YouTube etc. All of these companies are now using some sort of recommendation engine to improve their user experience. A recommendation engine not only helps to predict if a user prefers an item or not but also helps to increase sales, ,helps to understand customer behavior, increase number of registered users and helps a user to do better time management. For instance Netflix will suggest what movie you would want to watch or Amazon will suggest what kind of other products you might want to buy. All the mentioned platforms operates using the same basic algorithm in the background and in this article we are going to discuss the idea behind it.

What are the techniques?

There are two fundamental algorithms that comes into play when there’s a need to generate recommendations. In next section these techniques are discussed in detail.

Content-Based Filtering

The idea behind content based filtering is to analyse a set of features which will provide a similarity between items themselves i.e. between two movies, two products or two songs etc. These set of features once compared gives a similarity score at the end which can be used as a reference for the recommendations.

There are several steps involved to get to this similarity score and the first step is to construct a profile for each item by representing some of the important features of that item. In other terms, this steps requires to define a set of characteristics that are discovered easily. For instance, consider that there’s an article which a user has already read and once you know that this user likes this article you may want to show him recommendations of similar articles. Now, using content based filtering technique you could find the similar articles. The easiest way to do that is to set some features for this article like publisher, genre, author etc. Based on these features similar articles can be recommended to the user (as illustrated in Figure 1). There are three main similarity measures one could use to find the similar articles mentioned below.

Figure 1: Content-Based Filtering

Minkowski distance

Minkowski distance between two variables can be calculated as:

$(x,y)= (\sum_{i=1}^{n}{|X_{i} - Y_{i}|^{p}})^{1/p}$

Cosine Similarity

Cosine similarity between two variables can be calculated as :

$\mbox{Cosine Similarity} = \frac{\sum_{i=1}^{n}{x_{i} y_{i}}} {\sqrt{\sum_{i=1}^{n}{x_{i}^{2}}} \sqrt{\sum_{i=1}^{n}{y_{i}^{2}}}} \$

Jaccard Similarity

$J(X,Y) = |X ∩ Y| / |X ∪ Y|$

These measures can be used to create a matrix which will give you the similarity between each movie and then a function can be defined to return the top 10 similar articles.

Collaborative filtering

This filtering method focuses on finding how similar two users or two products are by analyzing user behavior or preferences rather than focusing on the content of the items. For instance consider that there are three users A,B and C. We want to recommend some movies to user A, our first approach would be to find similar users and compare which movies user A has not yet watched and recommend those movies to user A. This approach where we try to find similar users is called as User-User Collaborative Filtering.

The other approach that could be used here is when you try to find similar movies based on the ratings given by others, this type is called as Item-Item Collaborative Filtering. The research shows that item-item collaborative filtering works better than user-user collaborative filtering as user behavior is really dynamic and changes over time. Also, there are a lot more users and increasing everyday but on the other side item characteristics remains the same. To calculate the similarities we can use Cosine distance.

Figure 2: Collaborative Filtering

Recently some companies have started to take advantage of both content based and collaborative filtering techniques to make a hybrid recommendation engine. The results from both models are combined into one hybrid model which provides more accurate recommendations. Five steps are involved to make a recommendation engine work which are collection of data, storing of data, analyzing the data, filtering the data and providing recommendations. There are a lot of attributes that are involved in order to collect user data including browsing history, page views, search logs, order history, marketing channel touch points etc. which requires a strong data architecture. The collection of data is pretty straightforward but it can be overwhelming to analyze this amount of data. Storing this data could get tricky on the other hand as you need a scalable database for this kind of data. With the rise of graph databases this area is also improving for many use cases including recommendation engines. Graph databases like Neo4j can also help to analyze and find similar users and relationship among them. Analyzing the data can be carried in different ways, depending on how strong and scalable your architecture you can run real time, batch or near real time analysis. The fourth step involves the filtering of the data and here you can use any of the above mentioned approach to find similarities to finally provide the recommendations.

Having a good recommendation engine can be time consuming initially but it is definitely beneficial in the longer run. It not only helps to generate revenue but also helps to to improve your product catalog and customer service.

Six ways process mining in 2020 can save your business transformation

February 27, 2020/in Process Mining/by Signavio

The lack of information about existing processes kills 70% of large transformation projects and around 50% of RPA projects…alarming statistics. Triggering this failure rate is a lack of understanding about existing processes, and the disconnect between the discovery, visualization, analysis, and execution of existing data. So, banish the process guesswork! Utilizing process mining technology unlocks the information, visibility, and quantifiable numbers needed to improve end-to-end processes for sustainable transformation.

Read this article in German:

Wie Process Mining 2020 Ihre erfolgreiche Geschäftstransformation 2020 sicherstellt

Process mining in 2020

Your data fingerprint

If we consider the figures again (from McKinsey and Ernst & Young (EY) respectively), the digitization of products and services is forcing companies of all shapes and sizes, and in all industries, to dramatically reconsider their existing business models and the processes they implement. Because all activities are different, process mining uses the unique data—your company’s business fingerprint—to automatically piece together a digital representation of all your existing business processes.

This digital evidence enables us to visualize exactly how processes are operating (both the conventional path and variable executions) down to individual process instances. In other words, you can unearth processes which lie unseen or dormant, revealing hidden value, and providing an instant understanding of complex processes in minutes rather than days.

Triggering dormant success

Then, with standardized and configurable notifications and KPIs, you can further understand the immediate impact of any process change made—meaning that failure rates decrease, and company confidence is improved. And that’s not all: everyone from new employees to the C-suite can better visualize, understand, and explain their organization’s processes. This ensures that the right process change is secure and that improvement has the intended impact, every time.

Unleash the power of process

In business, we all answer to somebody, and it is critical to connect problems to real solutions. The everyday functions of companies—the processes upon which they are built—are the connection to business tech, from “process” mining to robotic “process” automation. Without process understanding, the tech is redundant because we have no idea how work has flowed in an existing application. Process is the lifeblood of operations.

Process mining: your point of differentiation

Transformative digital technology integration

In addition to the DVA of process mining—discover, visualize, analyze—is the power to monitor real-time process execution automatically from your existing data. This simple point and click assessment can provide an instant understanding of complex processes. Within transformation projects, which by their very nature require the profound transformation of business and organizational activities, processes, competencies, and models, process mining provides the visual map to facilitate immediate action.

This self-sustaining approach across an entire organization is what leads to genuinely sustainable outcomes, and builds a process culture within an organization. By taking this holistic approach, digital transformation and excellence professionals will find it easier to leverage processes, justify their projects and programs, and address behavioral change challenges.

This includes the facilitation of transformative digital technology integration, operational agility and flexibility, leadership and culture, and workforce enablement.

Three ways process mining can save your business in a transformation project:

You require 100% operational transparency: To chart all your transactions requires complete process transparency. This capability allows the direct comparison of actual operations to the ways that processes were designed to occur. This conformance checking can automatically identify the highest priority issues and tasks, and highlight root causes, so we can take immediate action.
You must reduce costs and increase efficiency: Signavio research shows that almost 60% of companies incurred additional charges from suppliers due to process inefficiencies. Process mining can help your business reduce costs because it finds vulnerabilities and deviations, whilst highlighting what is slowing you down, including the bottlenecks and inefficiencies hampering revenue. Process mining beefs up operational health via process improvements and pre-emptive strategies.
You must optimize the buying and selling cycle: Is shipping taking too long? Which of your suppliers supports you least? Who is outperforming whom? Process mining is your one-click trick to finding these answers and identifying which units are performing best and which are wasting time and money.

Process mining and robotic process automation (RPA)

The beneficial fusion of both technologies

Robotic process automation (RPA) provides a virtual workforce to automatize manual, repetitive, and error-prone tasks. However, successful process automation requires exact knowledge about the intended (and potential) benefits, effective training of the robots, and continuous monitoring of their performance. With this, process mining supports organizations throughout the lifecycle of RPA initiatives by monitoring and benchmarking robots to ensure sustainable benefits.

Upgrade robot-led automation

These insights are especially valuable for process miners and managers with a particular interest in process automation. To further upgrade the impact of robot-led automation, there is also a need for a solid understanding of legacy systems, and an overview of automation opportunities. Process mining tools provide critical insights throughout the entire RPA journey, from defining the strategy to continuous improvement and innovation.

Three ways process mining can save your business in an RPA lifecycle project

You require process overviews, based on specific criteria: To provide a complete overview of end-to-end processes, involves the identification of high ROI processes suitable for RPA implementation. This, in turn, helps determine the best-case process flow/path, enabling you to spot redundant processes, which you may not be aware of, before automating.
You are unsure how best to optimize human-digital worker cycles: By mining the optimal process flow/path, we can better discover inefficient human-robot hand-off, providing quantifiable data on the financial impact of any digital worker or process. This way, we can compare human vs. digital labor in terms of accuracy, efficiency, cost, and project duration.
You need to understand better how RPA supports legacy processes and systems: RPA enables enterprises to keep legacy systems by making integration with cloud and web/app-based services, transforming abilities to connect legacy with modern tools, applications, and even mobile apps. Efficiency and effectiveness will be improved across crucial departments, including HR, finance, and legal.

Process mining for improved customer experience and mapping

Reconfigure customer delight

The integration of process mining with other technologies is also essential in growing the process excellence and management market. With process management, we already talk about customer engagement, which empowers companies to shift away from lopsided efficiency goals, which often frustrate customers, towards all-inclusive effectiveness goals, built around delighting customers at the lowest organizational cost possible.

Further, the application of process mining within customer journey mapping (CJM)—especially when linked to the underlying processes—offers the bundled capability of better business understanding and outside-in customer perspective, connected to the processes that deliver them. So, by connecting process mining with a customer-centric view across producing, marketing, selling, and providing products and services, customer delight becomes a strategic catalyst for success.

Unlock the full potential of process

Trigger process mining initiatives with Signavio Process Intelligence, and see how it can help your organization uncover the hidden value of process, generate fresh ideas, and save time and money. Discover more in our white paper, Managing Successful Process Mining Initiatives with Signavio Process Intelligence.

5 Things You Should Know About Data Mining

February 22, 2020/in Data Mining, Uncategorized/by Halley Johnson

The majority of people spend about twenty-four hours online every week. In that time they give out enough information for big data to know a lot about them. Having people collecting and compiling your data might seem scary but it might have been helpful for you in the past.

If you have ever been surprised to find an ad targeted toward something you were talking about earlier or an invention made based on something you were googling, then you already know that data mining can be helpful. Advanced education in data mining can be an awesome resource, so it may pay to have a personal tutor skilled in the area to help you understand.

It is understandable to be unsure of a system that collects all of the information online so that they can learn more about you. Luckily, so much data is put out every day it is unlikely data mining is focusing on any of your important information. Here are a few statistics you should know about mining.

1. Data Mining Is Used In Crime Scenes

Using a variation of earthquake prediction software and data, the Los Angeles police department and researchers were able to predict crime within five hundred feet. As they learn how to compile and understand more data patterns, crime detecting will become more accurate.

Using their data the Los Angeles police department was able to stop thief activity by thirty-three percent. They were also able to predict violent crime by about twenty-one percent. Those are not perfect numbers, but they are better than before and will get even more impressive as time goes on.

The fact that data mining is able to pick up on crime statistics and compile all of that data to give an accurate picture of where crime is likely to occur is amazing. It gives a place to look and is able to help stop crime as it starts.

2. Data Mining Helps With Sales

A great story about data mining in sales is the example of Walmart putting beer near the diapers. The story claims that through measuring statistics and mining data it was found that when men purchase diapers they are also likely to buy a pack of beer. Walmart collected that data and put it to good use by putting the beer next to the diapers.

The amount of truth in that story/example is debatable, but it has made data mining popular in most retail stores. Finding which products are often bought together can give insight into where to put products in a store. This practice has increased sales in both items immensely just because people tend to purchase items near one another more than they would if they had to walk to get the second item.

Putting a lot of stock in the data-gathering teams that big stores build does not always work. There have been plenty of times when data teams failed and sales plummeted. Often, the benefits outweigh the potential failure, however, and many stores now use data mining to make a lot of big decisions about their sales.

3. It’s Helping With Predicting Disease

In 2009 Google began work to be able to predict the winter flu. Google went through the fifty million most searched words and then compared them with what the CDC was finding during the 2003-2008 flu seasons. With that information google was able to help predict the next winter flu outbreak even down to the states it hit the hardest.

Since 2009, data mining has gotten much better at predicting disease. Since the internet is a newer invention it is still growing and data mining is still getting better. Hopefully, in the future, we will be able to predict disease breakouts quickly and accurately.

With new data mining techniques and research in the medical field, there is hope that doctors will be able to narrow down problems in the heart. As the information grows and more data is entered the medical field gets closer to solving problems through data. It is something that is going to help cure diseases more quickly and find the root of a problem.

4. Some Data Mining Gets Ignored

Interestingly, very little of the data that companies collect from you is actually used. “Big data Companies” do not use about eighty-eight percent of the data they have. It is incredibly difficult to use all of the millions of bits of data that go through big data companies every day.

The more people that are used for data mining and the more data companies are actually able to filter through, the better the online experience will be. It might be a bit frightening to think of someone going through what you are doing online, but no one is touching any of the information that you keep private. Big data is using the information you put out into the world and using that data to come to conclusions and make the world a better place.

There is so much information being put onto the internet at all times. Twenty-four hours a week is the average amount of time a single person spends on the internet, but there are plenty of people who spend more time than that. All of that information takes a lot of people to sift through and there are not enough people in the data mining industry to currently actually go through the majority of the data being put online.

5. Too Many Data Mining Jobs

Interestingly, the data industry is booming. In general, there are an amazing amount of careers opening on the internet every day. The industry is growing so quickly that there are not enough people to fill the jobs that are being created.

The lack of talent in the industry means there is plenty of room for new people who want to go into the data mining industry. It was predicted that by 2018 there would be a shortage of 140,000 with deep analytical skills. With the lack of jobs that are being discussed, it is amazing that there is such a shortage in the data industry.

If big data is only able to wade through less than half of the data being collected then we are wasting a resource. The more people who go into an analytics or computer career the more information we will be able to collect and utilize. There are currently more jobs than there are people in the data mining field and that needs to be corrected.

To Conclude

The data mining industry is making great strides. Big data is trying to use the information they collect to sell more things to you but also to improve the world. Also, there is something very convenient about your computer knowing the type of things you want to buy and showing you them immediately.

Data mining has been able to help predict crime in Los Angeles and lower crime rates. It has also helped companies know what items are commonly purchased together so that stores can be organized more efficiently. Data mining has even been able to predict the outbreak of disease down to the state.

Even with so much data being ignored and so many jobs left empty, data mining is doing incredible things. The entire internet is constantly growing and the data mining is growing right along with it. As the data mining industry climbs and more people find their careers mining data the more we will learn and the more facts we will find.

Python vs R: Which Language to Choose for Deep Learning?

February 18, 2020/in Data Mining, Data Science, Insights, Python, R Statistics/by Deep Moteria

Data science is increasingly becoming essential for every business to operate efficiently in this modern world. This influences the processes composed together to obtain the required outputs for clients. While machine learning and deep learning sit at the core of data science, the concepts of deep learning become essential to understand as it can help increase the accuracy of final outputs. And when it comes to data science, R and Python are the most popular programming languages used to instruct the machines.

Python and R: Primary Languages Used for Deep Learning

Deep learning and machine learning differentiate based on the input data type they use. While machine learning depends upon the structured data, deep learning uses neural networks to store and process the data during the learning. Deep learning can be described as the subset of machine learning, where the data to be processed is defined in another structure than a normal one.

R is developed specifically to support the concepts and implementation of data science and hence, the support provided by this language is incredible as writing codes become much easier with its simple syntax.

Python is already much popular programming language that can serve more than one development niche without straining even for a bit. The implementation of Python for programming machine learning algorithms is very much popular and the results provided are accurate and faster than any other language. (C or Java). And because of its extended support for data science concept implementation, it becomes a tough competitor for R.

However, if we compare the charts of popularity, Python is obviously more popular among data scientists and developers because of its versatility and easier usage during algorithm implementation. However, R outruns Python when it comes to the packages offered to developers specifically expertise in R over Python. Therefore, to conclude which one of them is the best, let’s take an overview of the features and limits offered by both languages.

Python

Python was first introduced by Guido Van Rossum who developed it as the successor of ABC programming language. Python puts white space at the center while increasing the readability of the developed code. It is a general-purpose programming language that simply extends support for various development needs.

The packages of Python includes support for web development, software development, GUI (Graphical User Interface) development and machine learning also. Using these packages and putting the best development skills forward, excellent solutions can be developed. According to Stackoverflow, Python ranks at the fourth position as the most popular programming language among developers.

Benefits for performing enhanced deep learning using Python are:

Concise and Readable Code
Extended Support from Large Community of Developers
Open-source Programming Language
Encourages Collaborative Coding
Suitable for small and large-scale products

The latest and stable version of Python has been released as Python 3.8.0 on 14th October 2019. Developing a software solution using Python becomes much easier as the extended support offered through the packages drives better development and answers every need.

R is a language specifically used for the development of statistical software and for statistical data analysis. The primary user base of R contains statisticians and data scientists who are analyzing data. Supported by R Foundation for statistical computing, this language is not suitable for the development of websites or applications. R is also an open-source environment that can be used for mining excessive and large amounts of data.

R programming language focuses on the output generation but not the speed. The execution speed of programs written in R is comparatively lesser as producing required outputs is the aim not the speed of the process. To use R in any development or mining tasks, it is required to install its operating system specific binary version before coding to run the program directly into the command line.

R also has its own development environment designed and named RStudio. R also involves several libraries that help in crafting efficient programs to execute mining tasks on the provided data.

The benefits offered by R are pretty common and similar to what Python has to offer:

Open-source programming language
Supports all operating systems
Supports extensions
R can be integrated with many of the languages
Extended Support for Visual Data Mining

Although R ranks at the 17th position in Stackoverflow’s most popular programming language list, the support offered by this language has no match. After all, the R language is developed by statisticians for statisticians!

Python vs R: Should They be Really Compared?

Even when provided with the best technical support and efficient tools, a developer will not be able to provide quality outputs if he/she doesn’t possess the required skills. The point here is, technical skills rank higher than the resources provided. A comparison of these two programming languages is not advisable as they both hold their own set of advantages. However, the developers considering to use both together are less but they obtain maximum benefit from the process.

Both these languages have some features in common. For example, if a representative comes asking you if you lend technical support for developing an uber clone, you are directly going to decline as Python and R both do not support mobile app development. To benefit the most and develop excellent solutions using both these programming languages, it is advisable to stop comparing and start collaborating!

R and Python: How to Fit Both In a Single Program

Anticipating the future needs of the development industry, there has been a significant development to combine these both excellent programming languages into one. Now, there are two approaches to performing this: either we include R script into Python code or vice versa.

Using the available interfaces, packages and extended support from Python we can include R script into the code and enhance the productivity of Python code. Availability of PypeR, pyRserve and more resources helps run these two programming languages efficiently while efficiently performing the background work.

Either way, using the developed functions and packages made available for integrating Python in R are also effective at providing better results. Available R packages like rJython, rPython, reticulate, PythonInR and more, integrating Python into R language is very easy.

Therefore, using the development skills at their best and maximizing the use of such amazing resources, Python and R can be togetherly used to enhance end results and provide accurate deep learning support.

Conclusion

Python and R both are great in their own names and own places. However, because of the wide applications of Python in almost every operation, the annual packages offered to Python developers are less than the developers skilled in using R. However, this doesn’t justify the usability of R. The ultimate decision of choosing between these two languages depends upon the data scientists or developers and their mining requirements.

And if a developer or data scientist decides to develop skills for both- Python and R-based development, it turns out to be beneficial in the near future. Choosing any one or both to use in your project depends on the project requirements and expert support on hand.

Looking for the ‘aha moment’: An expert’s insights on process mining

February 17, 2020/in Interviews, Process Mining/by Signavio

Henny Selig is a specialist in process mining, with significant expertise in the implementation of process mining solutions and supporting customers with process analysis. As a Solution Owner at Signavio, Henny is also well versed in bringing Signavio Process Intelligence online for businesses of all shapes and sizes. In this interview, Henny shares her thoughts about the challenges and opportunities of process mining.

Read this interview in German:

Im Interview mit Henny Selig zu Process Mining: “Für den Kunden sind solche Aha-Momente toll“

Henny, could you give a simple explanation of the concept of process mining?

Basically, process mining is a combination of data analysis and business process management. IT systems support almost every business process, meaning they leave behind digital traces. We extrapolate all the data from the IT systems connected to a particular process, then visualize and evaluate it with the help of data science technology.

In short, process mining builds a bridge between employees, process experts and management, allowing for a data-driven and fact-based approach to business process optimization. This helps avoid thinking in siloes, as well as enabling transparent design of handovers and process steps that cross departmental boundaries within an organization.

When a business starts to analyze their process data, what are the sorts of questions they ask? Do they have at least have some expectation about what process mining can offer?

That’s a really good question! There isn’t really a single good answer to it, as it is different for different companies. For example, there was one procurement manager, and we were presenting the complete data set to him, and it turned out there was an approval at one point, but it should have been at another. He was really surprised, but we weren’t, because we sat outside the process itself and were able to take a broader view.

We also had different questions that the company hadn’t considered, things like what was the process flow if an order amount is below 1000 euros, and how often that occurs—just questions that seem clear to an outsider but often do not occur to process owners.

So do people typically just have an idea that something is wrong, or do they generally understand there is a specific problem in one area, and they want to dive deeper?

There are those people who know that a process is running well, but they know a particular problem pops up repeatedly. Usually, even if people say they don’t have a particular focus or question, most of them actually do because they know their area. They already have some assumptions and ideas, but it is sometimes so deep in their mind they can’t actually articulate it.

Often, if you ask people directly how they do things, it can put pressure on them, even if that’s not the intention. If this happens, people may hide things without meaning to, because they already have a feeling that the process or workflow they are describing is not perfect, and they want to avoid blame.

The approvals example I mentioned above is my favorite because it is so simple. We had a team who all said, over and over, “We don’t approve this type of request.” However, the data said they did–the team didn’t even know.

We then talked to the manager, who was interested in totally different ideas, like all these risks, approvals, are they happening, how many times this, how many times that — the process flow in general. Just by having this conversation, we were able to remove the mismatch between management and the team, and that is before we even optimized the actual process itself.

So are there other common issues or mismatches that people should be aware of when beginning their process mining initiative?

The one I often return to is that not every variation that is out of line with the target model is necessarily negative. Very few processes, apart from those that run entirely automatically, actually conform 100% to the intended process model—even when the environment is ideal. For this reason, there will always be exceptions requiring a different approach. This is the challenge in projects: finding out which variations are desirable, and where to make necessary exceptions.

So would you say that data-based process analysis is a team effort?

Absolutely! In every phase of a process mining project, all sorts of project members are included. IT makes the data available and helps with the interpretation of the data. Analysts then carry out the analysis and discuss the anomalies they find with IT, the process owners, and experts from the respective departments. Sometimes there are good reasons to explain why a process is behaving differently than expected.

In this discussion, it is incredibly helpful to document the thought process of the team with technical means, such as Signavio Process Intelligence. In this way, it is possible to break down the analysis into individual processes and to bring the right person into the discussion at the right point without losing the thread of the discussion. Then, the next colleague who picks up the topic can then see the thread of the analysis and properly classify the results.

At the very least, we can provide some starting points. Helping people reach an “aha moment” is one of the best parts of my job!

To find out more about how process mining can help you understand and optimize your business processes, visit the Signavio Process Intelligence product page. If you would like to get a group effort started in your organization right now, why not sign up for a free 30-day trial with Signavio, today.

Multi-touch attribution: A data-driven approach

February 4, 2020/in Data Science, Gerneral, Insights, Python, R Statistics, Tutorial, Use Case, Use Cases, Visualization/by Aakash Chugh

Customers shopping behavior has changed drastically when it comes to online shopping, as nowadays, customer likes to do a thorough market research about a product before making a purchase.

What is Multi-touch attribution?

This makes it really hard for marketers to correctly determine the contribution for each marketing channel to which a customer was exposed to. The path a customer takes from his first search to the purchase is known as a Customer Journey and this path consists of multiple marketing channels or touchpoints. Therefore, it is highly important to distribute the budget between these channels to maximize return. This problem is known as multi-touch attribution problem and the right attribution model helps to steer the marketing budget efficiently. Multi-touch attribution problem is well known among marketers. You might be thinking that if this is a well known problem then there must be an algorithm out there to deal with this. Well, there are some traditional models but every model has its own limitation which will be discussed in the next section.

Types of attribution models

Most of the eCommerce companies have a performance marketing department to make sure that the marketing budget is spent in an agile way. There are multiple heuristics attribution models pre-existing in google analytics however there are several issues with each one of them. These models are:

Traditional attribution models

First touch attribution model

100% credit is given to the first channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 1: First touch attribution model

Last touch attribution model

100% credit is given to the last channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 2: Last touch attribution model

Linear-touch attribution model

In this attribution model, equal credit is given to all the marketing channels present in customer journey as it is considered that each channel is equally responsible for the purchase.

Figure 3: Linear attribution model

U-shaped or Bath tub attribution model

This is most common in eCommerce companies, this model assigns 40% to first and last touch and 20% is equally distributed among the rest.

Figure 4: Bathtub or U-shape attribution model

Data driven attribution models

Traditional attribution models follows somewhat a naive approach to assign credit to one or all the marketing channels involved. As it is not so easy for all the companies to take one of these models and implement it. There are a lot of challenges that comes with multi-touch attribution problem like customer journey duration, overestimation of branded channels, vouchers and cross-platform issue, etc.

Switching from traditional models to data-driven models gives us more flexibility and more insights as the major part here is defining some rules to prepare the data that fits your business. These rules can be defined by performing an ad hoc analysis of customer journeys. In the next section, I will discuss about Markov chain concept as an attribution model.

Markov chains

Markov chains concepts revolves around probability. For attribution problem, every customer journey can be seen as a chain(set of marketing channels) which will compute a markov graph as illustrated in figure 5. Every channel here is represented as a vertex and the edges represent the probability of hopping from one channel to another. There will be an another detailed article, explaining the concept behind different data-driven attribution models and how to apply them.

Figure 5: Markov chain example

Challenges during the Implementation

Transitioning from a traditional attribution models to a data-driven one, may sound exciting but the implementation is rather challenging as there are several issues which can not be resolved just by changing the type of model. Before its implementation, the marketers should perform a customer journey analysis to gain some insights about their customers and try to find out/perform:

Length of customer journey.
On an average how many branded and non branded channels (distinct and non-distinct) in a typical customer journey?
Identify most upper funnel and lower funnel channels.
Voucher analysis: within branded and non-branded channels.

When you are done with the analysis and able to answer all of the above questions, the next step would be to define some rules in order to handle the user data according to your business needs. Some of the issues during the implementation are discussed below along with their solution.

Customer journey duration

Assuming that you are a retailer, let’s try to understand this issue with an example. In May 2016, your company started a Fb advertising campaign for a particular product category which “attracted” a lot of customers including Chris. He saw your Fb ad while working in the office and clicked on it, which took him to your website. As soon as he registered on your website, his boss called him (probably because he was on Fb while working), he closed everything and went for the meeting. After coming back, he started working and completely forgot about your ad or products. After a few days, he received an email with some offers of your products which also he ignored until he saw an ad again on TV in Jan 2019 (after 3 years). At this moment, he started doing his research about your products and finally bought one of your products from some Instagram campaign. It took Chris almost 3 years to make his first purchase.

Figure 6: Chris journey

Now, take a minute and think, if you analyse the entire journey of customers like Chris, you would realize that you are still assigning some of the credit to the touchpoints that happened 3 years ago. This can be solved by using an attribution window. Figure 6 illustrates that 83% of the customers are making a purchase within 30 days which means the attribution window here could be 30 days. In simple words, it is safe to remove the touchpoints that happens after 30 days of purchase. This parameter can also be changed to 45 days or 60 days, depending on the use case.

Figure 7: Length of customer journey

Removal of direct marketing channel

A well known issue that every marketing analyst is aware of is, customers who are already aware of the brand usually comes to the website directly. This leads to overestimation of direct channel and branded channels start getting more credit. In this case, you can set a threshold (say 7 days) and remove these branded channels from customer journey.

Figure 8: Removal of branded channels

Cross platform problem

If some of your customers are using different devices to explore your products and you are not able to track them then it will make retargeting really difficult. In a perfect world these customers belong to same journey and if these can’t be combined then, except one, other paths would be considered as “non-converting path”. For attribution problem device could be thought of as a touchpoint to include in the path but to be able to track these customers across all devices would still be challenging. A brief introduction to deterministic and probabilistic ways of cross device tracking can be found here.

Figure 9: Cross platform clash

How to account for Vouchers?

To better account for vouchers, it can be added as a ‘dummy’ touchpoint of the type of voucher (CRM,Social media, Affiliate or Pricing etc.) used. In our case, we tried to add these vouchers as first touchpoint and also as a last touchpoint but no significant difference was found. Also, if the marketing channel of which the voucher was used was already in the path, the dummy touchpoint was not added.

Figure 10: Addition of Voucher as a touchpoint

Stop processing the same mistakes! Four steps to business & IT alignment

January 30, 2020/in Business Analytics, Uncategorized/by Signavio

Digitization. Agility. Tech-driven. Just three strategy buzzwords that promise IT transformation and business alignment, but often fade out into merely superficial change. In fact, aligning business and IT still vexes many organizations because company leaders often forget that transformation is not a move from A to B, or even from A to Z––it’s a move from a fixed starting point, to a state of continual change.

Read this article in German:

Mit den richtigen Prozessen zum Erfolg: vier Schritte zum Business-IT Alignment

Within this state of perpetual flux, adaptive technology is necessary, not only to keep up with industry developments but also with the expansion of technology-enabled customer experiences. After all, alignment assumes that business and technology are separate entities, when in fact they are inextricably linked!

Metrics that matter: From information technology to business technology

Information technology is continuing to challenge the way companies organize their business processes, communicate with customers and potential customers, and deliver services. Although there is no single dominant reorganization strategy, common company structures lean towards decentralizing IT, shifting it closer to end-users and melding the knowledge-base with business strategy. Business-IT alignment is more than ever vital for market impact and growth.

This tactic means as business goals pivot, IT can more readily respond with permanent solutions to support and maintain enterprise momentum. In turn, technological advances and improvements are hardwired into current and future strategies and initiatives. As working ecosystems replace strict organizational structures, the traditional question “Which department do you work in?” has been replaced by, “How do you work?”

But how does IT prove its value and win the trust of the C-suite? Well, according to Gartner, almost 20% of companies have already invested in tools capable of monitoring business-relevant metrics, with this number predicted to reach 60% by 2021. The problem is many infrastructure and operations (I&O) leaders don’t know where to begin when initiating an IT monitoring strategy.

Reach beyond the everyday: Four challenges to alignment

With this, CIOs are under mounting pressure to address digital needs that grow and transform, as well as to renovate the operational environment with new functions. They also must still demonstrate how IT is meeting a given business strategy. So looking forward, no matter how big or small your business is, technology can deliver tangible and intangible benefits (like speed and performance) to hit revenue and operational targets efficiently, and meet your customers’ expectations of innovation.

Put simply, having a good technological infrastructure enriches the culture, efficiency, and relationships of your business.

Business and IT alignment: The rate of change

This continuous strategic loop means enterprises function better, make more profit, and see better ROI because they achieve their goals with less effort. And while there may be no standard way to align successfully, an organization where IT and business strategy are in lock-step can further improve agility and operational efficiencies. This battle of the ‘effs’, efficiency vs. effectiveness, has never been so critical to business survival.

In fact, successful companies are those that dive deeper; such is the importance of this synergy. Amazon and Apple are prime examples—technology and technological innovation is embedded and aligned within their operational structure. In several cases, they created the integral technology and business strategies themselves!

Convergence and Integration

These types of aligned companies have also increased the efficiency of technology investments and significantly reduced the financial and operational risks associated with business and technical change.

However, if this rate of change and business agility is as fast as we continually say, we need to be talking about convergence and integration, not just alignment. In other words, let’s do the research and learn, but empower next-level thinking so we can focus on the co-creation of “true value” and respond quickly to customers and users.

Granular strategies

Without this granular strategy, companies may spend too much on technology without ever solving the business challenges they face, simply due to differing departmental objectives, cultures, and incentives. Simply put, business-IT alignment integrates technology with the strategy, mission, and goals of an organization. For example:

Faster time-to-market
Increased profitability
Better customer experience
Improved collaboration
Greater industry and IT agility
Strategic technological transformation

Hot topic

View webinar recording Empowering Collaboration Between Business and IT, with Fabio Gammerino, Signavio Pre-Sales Consultant.

The power of process: Four steps to better business-IT alignment

While it may seem intuitive, many organizations struggle to achieve the elusive goal of business-IT alignment. This is not only because alignment is a cumbersome and lengthy process, but because the overall process is made up of many smaller sub-processes. Each of these sub-processes lacks a definitive start and endpoint. Instead, each one comprises some “learn and do” cycles that incrementally advance the overall goal.

These cycles aren’t simple fixes, and this explains why issues still exist in the modern digital world. But by establishing a common language, building internal business relationships, ensuring transparency, and developing precise corporate plans of action, the bridge between the two stabilizes.

Four steps to best position your business-IT alignment strategy:

Plan: Translate business objectives into measurable IT services, so resources are effectively allocated to maximize turnover and ROI – This step requires ongoing communication between business and IT leaders.
Model: IT designs infrastructure to increase business value and optimize operations – IT must understand business needs and ensure that they are implementing systems critical to business services.
Manage: Service is delivered based on company objectives and expectations – IT must act as a single point-of-service request, and prioritize those requests based on pre-defined priorities.
Measure: Improvement of cross-organization visibility and service level commitments – While metrics are essential, it is crucial that IT ensures a business context to what they are measuring, and keeps a clear relationship between the measured parameter and business goals.

Signavio Says

Temporarily rotating IT employees within business operations is a top strategy in reaching business-IT alignment because it circulates company knowledge. This cross-pollination encourages better relationships between the IT department and other silos and broadens skill-sets, especially for entry-level employees. Better knowledge depth gives the organization more flexibility with well-rounded employees who can fill various roles as demand arises.

Get in touch

Discover how Signavio can lead your business to IT transformation and operational excellence with the Signavio Business Transformation Suite. Try it for yourself by registering now for a free 30-day trial.

How Data Analytics In The Cloud Transforms Your Business

January 29, 2020/in Business Analytics/by Gracie Myers

Businesses have started to turn to cloud-based technology to solve their growing data problems. But before we dive deep into the reason behind it, let’s look at some reasons why data analytics is such a powerful tool. It all falls back to businesses like Netflix, Amazon, Google, and Facebook. All of these businesses are using data analytics to understand their customers and are making an absolute fortune. They also have so much data coming in that they needed to mitigate it somehow, so they turned to the cloud.

Let’s use Netflix as an example here. They have over 115 million subscribers and have become the absolute king of the online streaming industry. Their rise to the top was no fluke. They developed state-of-the-art methods of data analytics and then gathered the information needed to provide the right entertainment to the right people.

Amazon uses data to learn about its customers. They analyze all behavior on their website and then target customers based on that data.

Cloud-based technologies are designed to reduce costs associated with older data analytical methods. Businesses like Netflix, Amazon, Google, and Facebook have all started underpinning the cloud because they know it’s the future. They based their entire business models around it.

But smaller businesses still have a long way to go. Only 40% of businesses are using data as the core piece of their business strategy.

Now let’s look at some ways that data analytics has transformed business.

It Gave Birth to Strategic Analytics

Strategic analytics is the backbone of your entire data plan. It is a detailed analysis of the entire system that is used to determine how you are funneling customers into your system. It will reveal weak points and show you the strengths so that you can develop data-driven strategies moving forward. It also helps you understand the behavior of your market.

Strategic analytics follows a three-step process:

Identify your business model’s strengths and weaknesses in comparison with your competition.
Diagnose all of your business processes to determine areas that might need to be improved.
Analyze individuals within the company to make sure you are properly using them. You would be surprised at the number of businesses wasting their employees’ talents on inefficient tasks.

At the end of it all, your business should be able to determine areas of your marketing where you can pull out more value, as well as data that you need to start gathering.

Fuel your Decisions with Platform Analytics

The goal here is to combine data analytics with your decision-making processes so that your business operates more efficiently at its very core. If money is the lifeblood of your business, then decisions are the heart that keeps that money flowing. So think of analytics as a healthy diet. It keeps every area of your business healthy and operating at peak efficiency. Platform analytics asks some important questions like:

How can data analytics be efficiently added to our everyday business processes?
Are there any areas that we can automate that will improve efficiency?
What will back end systems benefit from learning more about our customers?

In most cases, businesses will find that the cloud will enhance their overall data plan, no matter which point they have reached in their growth. Think of it like checking your blood pressure. If there are problems, then you know that you’ll need a diagnosis.

Helps Businesses Transform their Model

Businesses will need to use data in parallel with their model to stay caught up with the changing times as we move forward. In layman’s terms, businesses need to update their core business processes in a way so that it uses data to create opportunities. This opens up a whole new world for their customers, products, and services.

Companies that can forecast using data will see improvements across the board – from their recruitment to their marketing. But there is a specific data-centric approach that must be taken.

Must possess an overall vision that includes data and capitalizes on the opportunities presented.
Develop a culture that is centered on data and is not afraid to experiment with it.
Leverage new technologies to manage their data. Right now, the latest technology is cloud-based so businesses must learn to leverage it.
Use data to build trust with consumers.
Find innovative ways to gain insight into upcoming trends and tap into there as quickly as possible.

Management of Enterprise Information

Enterprise information management (known as EIM) is an important part of data-driven processes. Most data in businesses is stored in an unmanaged location like a server or some other in-house database. Cloud-based technologies have created a more secure way to store data, but you will still need a data management system in place.

By developing agile data management systems, you will be able to gather and distribute data more efficiently. EIM systems allow businesses to:

Streamline all of their processes in a way that simplifies everyone’s job.
Improve collaboration among different teams.
Improve the productivity of employees.

Creates a Data-Centric Business

This is the most important factor in business today, and it’s the reason why all businesses must start using the latest data analytics strategies. The more useful data a business can generate, the more of an advantage they are going to have. Again, look at leaders like Netflix and Amazon to see this in action. They are generating essential information from everyone who browses their systems. Their entire business models are centered on data, and it’s the number one reason why they are at the top of their respective industries.

Insight, optimization, and innovation are the three main categories of data analytics.

Final Thoughts

The Research Optimus Team understands that having the right data migration system is going to benefit all businesses, both large and small. It’s why their focus has turned to cloud-based technologies. Could-enabled businesses gain a competitive advantage over those who are still relying on older data technologies.

Business moves at supersonic speeds now so if you are not staying current with the latest technology, then you are going to fall behind.