A Bird’s Eye View: How Machine Learning Can Help You Charge Your E-Scooters

Bird scooters in Columbus, Ohio

Bird scooters in Columbus, Ohio

Ever since I started using bike-sharing to get around in Seattle, I have become fascinated with geolocation data and the transportation sharing economy. When I saw this project leveraging the mobility data RESTful API from the Los Angeles Department of Transportation, I was eager to dive in and get my hands dirty building a data product utilizing a company’s mobility data API.

Unfortunately, the major bike and scooter providers (Bird, JUMP, Lime) don’t have publicly accessible APIs. However, some folks have seemingly been able to reverse-engineer the Bird API used to populate the maps in their Android and iOS applications.

One interesting feature of this data is the nest_id, which indicates if the Bird scooter is in a “nest” — a centralized drop-off spot for charged Birds to be released back into circulation.

I set out to ask the following questions:

  1. Can real-time predictions be made to determine if a scooter is currently in a nest?
  2. For non-nest scooters, can new nest location recommendations be generated from geospatial clustering?

To answer these questions, I built a full-stack machine learning web application, NestGenerator, which provides an automated recommendation engine for new nest locations. This application can help power Bird’s internal nest location generation that runs within their Android and iOS applications. NestGenerator also provides real-time strategic insight for Bird chargers who are enticed to optimize their scooter collection and drop-off route based on proximity to scooters and nest locations in their area.

Bird

The electric scooter market has seen substantial growth with Bird’s recent billion dollar valuation  and their $300 million Series C round in the summer of 2018. Bird offers electric scooters that top out at 15 mph, cost $1 to unlock and 15 cents per minute of use. Bird scooters are in over 100 cities globally and they announced in late 2018 that they eclipsed 10 million scooter rides since their launch in 2017.

Bird scooters in Tel Aviv, Israel

Bird scooters in Tel Aviv, Israel

With all of these scooters populating cities, there’s much-needed demand for people to charge them. Since they are electric, someone needs to charge them! A charger can earn additional income for charging the scooters at their home and releasing them back into circulation at nest locations. The base price for charging each Bird is $5.00. It goes up from there when the Birds are harder to capture.

Data Collection and Machine Learning Pipeline

The full data pipeline for building “NestGenerator”

Data

From the details here, I was able to write a Python script that returned a list of Bird scooters within a specified area, their geolocation, unique ID, battery level and a nest ID.

I collected scooter data from four cities (Atlanta, Austin, Santa Monica, and Washington D.C.) across varying times of day over the course of four weeks. Collecting data from different cities was critical to the goal of training a machine learning model that would generalize well across cities.

Once equipped with the scooter’s latitude and longitude coordinates, I was able to leverage additional APIs and municipal data sources to get granular geolocation data to create an original scooter attribute and city feature dataset.

Data Sources:

  • Walk Score API: returns a walk score, transit score and bike score for any location.
  • Google Elevation API: returns elevation data for all locations on the surface of the earth.
  • Google Places API: returns information about places. Places are defined within this API as establishments, geographic locations, or prominent points of interest.
  • Google Reverse Geocoding API: reverse geocoding is the process of converting geographic coordinates into a human-readable address.
  • Weather Company Data: returns the current weather conditions for a geolocation.
  • LocationIQ: Nearby Points of Interest (PoI) API returns specified PoIs or places around a given coordinate.
  • OSMnx: Python package that lets you download spatial geometries and model, project, visualize, and analyze street networks from OpenStreetMap’s APIs.

Feature Engineering

After extensive API wrangling, which included a four-week prolonged data collection phase, I was finally able to put together a diverse feature set to train machine learning models. I engineered 38 features to classify if a scooter is currently in a nest.

Full Feature Set

Full Feature Set

The features boiled down into four categories:

  • Amenity-based: parks within a given radius, gas stations within a given radius, walk score, bike score
  • City Network Structure: intersection count, average circuity, street length average, average streets per node, elevation level
  • Distance-based: proximity to closest highway, primary road, secondary road, residential road
  • Scooter-specific attributes: battery level, proximity to closest scooter, high battery level (> 90%) scooters within a given radius, total scooters within a given radius

 

Log-Scale Transformation

For each feature, I plotted the distribution to explore the data for feature engineering opportunities. For features with a right-skewed distribution, where the mean is typically greater than the median, I applied these log transformations to normalize the distribution and reduce the variability of outlier observations. This approach was used to generate a log feature for proximity to closest scooter, closest highway, primary road, secondary road, and residential road.

An example of a log transformation

Statistical Analysis: A Systematic Approach

Next, I wanted to ensure that the features I included in my model displayed significant differences when broken up by nest classification. My thinking was that any features that did not significantly differ when stratified by nest classification would not have a meaningful predictive impact on whether a scooter was in a nest or not.

Distributions of a feature stratified by their nest classification can be tested for statistically significant differences. I used an unpaired samples t-test with a 0.01% significance level to compute a p-value and confidence interval to determine if there was a statistically significant difference in means for a feature stratified by nest classification. I rejected the null hypothesis if a p-value was smaller than the 0.01% threshold and if the 99.9% confidence interval did not straddle zero. By rejecting the null-hypothesis in favor of the alternative hypothesis, it’s deemed there is a significant difference in means of a feature by nest classification.

Battery Level Distribution Stratified by Nest Classification to run a t-test

Battery Level Distribution Stratified by Nest Classification to run a t-test

Log of Closest Scooter Distribution Stratified by Nest Classification to run a t-test

Throwing Away Features

Using the approach above, I removed ten features that did not display statistically significant results.

Statistically Insignificant Features Removed Before Model Development

Model Development

I trained two models, a random forest classifier and an extreme gradient boosting classifier since tree-based models can handle skewed data, capture important feature interactions, and provide a feature importance calculation. I trained the models on 70% of the data collected for all four cities and reserved the remaining 30% for testing.

After hyper-parameter tuning the models for performance on cross-validation data it was time to run the models on the 30% of test data set aside from the initial data collection.

I also collected additional test data from other cities (Columbus, Fort Lauderdale, San Diego) not involved in training the models. I took this step to ensure the selection of a machine learning model that would generalize well across cities. The performance of each model on the additional test data determined which model would be integrated into the application development.

Performance on Additional Cities Test Data

The Random Forest Classifier displayed superior performance across the board

The Random Forest Classifier displayed superior performance across the board

I opted to move forward with the random forest model because of its superior performance on AUC score and accuracy metrics on the additional cities test data. AUC is the Area under the ROC Curve, and it provides an aggregate measure of model performance across all possible classification thresholds.

AUC Score on Test Data for each Model

AUC Score on Test Data for each Model

Feature Importance

Battery level dominated as the most important feature. Additional important model features were proximity to high level battery scooters, proximity to closest scooter, and average distance to high level battery scooters.

Feature Importance for the Random Forest Classifier

Feature Importance for the Random Forest Classifier

The Trade-off Space

Once I had a working machine learning model for nest classification, I started to build out the application using the Flask web framework written in Python. After spending a few days of writing code for the application and incorporating the trained random forest model, I had enough to test out the basic functionality. I could finally run the application locally to call the Bird API and classify scooter’s into nests in real-time! There was one huge problem, though. It took more than seven minutes to generate the predictions and populate in the application. That just wasn’t going to cut it.

The question remained: will this model deliver in a production grade environment with the goal of making real-time classifications? This is a key trade-off in production grade machine learning applications where on one end of the spectrum we’re optimizing for model performance and on the other end we’re optimizing for low latency application performance.

As I continued to test out the application’s performance, I still faced the challenge of relying on so many APIs for real-time feature generation. Due to rate-limiting constraints and daily request limits across so many external APIs, the current machine learning classifier was not feasible to incorporate into the final application.

Run-Time Compliant Application Model

After going back to the drawing board, I trained a random forest model that relied primarily on scooter-specific features which were generated directly from the Bird API.

Through a process called vectorization, I was able to transform the geolocation distance calculations utilizing NumPy arrays which enabled batch operations on the data without writing any “for” loops. The distance calculations were applied simultaneously on the entire array of geolocations instead of looping through each individual element. The vectorization implementation optimized real-time feature engineering for distance related calculations which improved the application response time by a factor of ten.

Feature Importance for the Run-time Compliant Random Forest Classifier

Feature Importance for the Run-time Compliant Random Forest Classifier

This random forest model generalized well on test-data with an AUC score of 0.95 and an accuracy rate of 91%. The model retained its prediction accuracy compared to the former feature-rich model, but it gained 60x in application performance. This was a necessary trade-off for building a functional application with real-time prediction capabilities.

Geospatial Clustering

Now that I finally had a working machine learning model for classifying nests in a production grade environment, I could generate new nest locations for the non-nest scooters. The goal was to generate geospatial clusters based on the number of non-nest scooters in a given location.

The k-means algorithm is likely the most common clustering algorithm. However, k-means is not an optimal solution for widespread geolocation data because it minimizes variance, not geodetic distance. This can create suboptimal clustering from distortion in distance calculations at latitudes far from the equator. With this in mind, I initially set out to use the DBSCAN algorithm which clusters spatial data based on two parameters: a minimum cluster size and a physical distance from each point. There were a few issues that prevented me from moving forward with the DBSCAN algorithm.

  1. The DBSCAN algorithm does not allow for specifying the number of clusters, which was problematic as the goal was to generate a number of clusters as a function of non-nest scooters.
  2. I was unable to hone in on an optimal physical distance parameter that would dynamically change based on the Bird API data. This led to suboptimal nest locations due to a distortion in how the physical distance point was used in clustering. For example, Santa Monica, where there are ~15,000 scooters, has a higher concentration of scooters in a given area whereas Brookline, MA has a sparser set of scooter locations.

An example of how sparse scooter locations vs. highly concentrated scooter locations for a given Bird API call can create cluster distortion based on a static physical distance parameter in the DBSCAN algorithm. Left:Bird scooters in Brookline, MA. Right:Bird scooters in Santa Monica, CA.

An example of how sparse scooter locations vs. highly concentrated scooter locations for a given Bird API call can create cluster distortion based on a static physical distance parameter in the DBSCAN algorithm. Left:Bird scooters in Brookline, MA. Right:Bird scooters in Santa Monica, CA.

Given the granularity of geolocation scooter data I was working with, geospatial distortion was not an issue and the k-means algorithm would work well for generating clusters. Additionally, the k-means algorithm parameters allowed for dynamically customizing the number of clusters based on the number of non-nest scooters in a given location.

Once clusters were formed with the k-means algorithm, I derived a centroid from all of the observations within a given cluster. In this case, the centroids are the mean latitude and mean longitude for the scooters within a given cluster. The centroids coordinates are then projected as the new nest recommendations.

NestGenerator showcasing non-nest scooters and new nest recommendations utilizing the K-Means algorithm

NestGenerator showcasing non-nest scooters and new nest recommendations utilizing the K-Means algorithm.

NestGenerator Application

After wrapping up the machine learning components, I shifted to building out the remaining functionality of the application. The final iteration of the application is deployed to Heroku’s cloud platform.

In the NestGenerator app, a user specifies a location of their choosing. This will then call the Bird API for scooters within that given location and generate all of the model features for predicting nest classification using the trained random forest model. This forms the foundation for map filtering based on nest classification. In the app, a user has the ability to filter the map based on nest classification.

Drop-Down Map View filtering based on Nest Classification

Drop-Down Map View filtering based on Nest Classification

Nearest Generated Nest

To see the generated nest recommendations, a user selects the “Current Non-Nest Scooters & Predicted Nest Locations” filter which will then populate the application with these nest locations. Based on the user’s specified search location, a table is provided with the proximity of the five closest nests and an address of the Nest location to help inform a Bird charger in their decision-making.

NestGenerator web-layout with nest addresses and proximity to nearest generated nests

NestGenerator web-layout with nest addresses and proximity to nearest generated nests

Conclusion

By accurately predicting nest classification and clustering non-nest scooters, NestGenerator provides an automated recommendation engine for new nest locations. For Bird, this application can help power their nest location generation that runs within their Android and iOS applications. NestGenerator also provides real-time strategic insight for Bird chargers who are enticed to optimize their scooter collection and drop-off route based on scooters and nest locations in their area.

Code

The code for this project can be found on my GitHub

Comments or Questions? Please email me an E-Mail!

 

Interview – Knowledge Graphs and Semantic Technologies

“It’s incredibly empowering when data that is clear and understood – what we call ‘beautiful data’ – is available to the data workforce.”

Juan F. Sequeda is co-founder of Capsenta, a spin-off from his research, and Senior Director of Capsenta Labs. He is an expert on knowledge graphs, semantic web, semantic & graph data management and (ontology-based) data integration. In this interview Juan lets us know how SMEs can create value from data, what makes the Knowledge Graph so important and why CDOs and CIOs should use semantic technologies.

Data Science Blog: If you had to name five things that apply to SMEs as well as enterprises as they are on their journey through digital transformation: What are the most important steps to take in order to create value from data?

I would state four things:

  1. Focus on the business problem that needs to be solved instead of the technology.
  2. Getting value out of your data is a social-technical problem. Not everything can be solved by technology and automation. It is crucial to understand the social/human aspect of the problems.
  3. Avoid boiling the ocean. Be agile and iterate.
  4. Recall that it’s a marathon, not a sprint. Hence why you shouldn’t focus on boiling the ocean.

Data Science Blog: You help companies to make their company data meaningfully and thus increase their value. The magic word is the knowledge graph. What exactly is a Knowledge Graph?

Let’s recall that the term “knowledge graph”, that is being actively used today, was coined by Google in a 2012 blogpost. From an industry point of view, it’s a term that represents data integration, where not just entities but also relationships are first class citizens. In other words, it’s data integration based on graphs. That is why you see graph database companies use the term knowledge graph instead of data integration.

In the academic circle, there is a “debate” on what the term “knowledge graph” means. As academics, it’s clear that we should always strive to have well defined terms. Nevertheless, I find it ironic that academics are spending time debating on the definition of a term that appeared in a (marketing) blog post 7 years ago! I agree with Simeon Warner on this: “I care about putting more knowledge in my graph, instead of defining what is a knowledge graph”.

Whatever definition prevails, it should be open and inclusive.

On a final note, it is paramount that we remember our history in order to avoid reinventing the wheel. There is over half a century of research results that has led us to what we are calling Knowledge Graphs today. If you are interested, please check out our upcoming ISWC 2019 tutorial “Knowledge Graphs: How did we get here? A Half Day Tutorial on the History of Knowledge Graph’s Main Ideas“.

Data Science Blog: Speaking of Knowledge Graphs: According to SEMANTiCS 2019 Research and Innovation Chair Philippe Cudre-Mauroux the next generation of knowledge graphs will capture more detailed information. Towards which directions are you steering with gra.fo?

Gra.fo is a knowledge graph schema (i.e ontology) collaborative modeling tool combined with google doc style features such as real-time collaboration, comments, history and search.

Designing a knowledge graph schema is just the first step. You have to do something with it! The next step is to map the knowledge graph schema to underlying data sources in order to integrate data.

We are driving Gra.fo to also be a mapping management system. We recently released our first mapping features. You now have the ability to import existing R2RML mapping. The next step will be to create the mappings between relational databases and the schema all within Gra.fo. Furthermore, we will extend to support mappings from different types of sources.

Finally, there are so many features that our users are requesting. We are working on those and will also offer an API in order to empower users to develop their own apps and features.

Data Science Blog: At Capsenta, you are changing the way enterprises model, govern and integrate data. Put in brief, how can you explain the benefits of using semantic technologies and knowledge technologies to a CDO or CIO? Which clients could you serve and how did you help them?

Business users need to answer critical business questions quickly and accurately. However, the frequent bottleneck is the lack of understanding of the large and complex enterprise databases. Additionally, the IT experts who do understand are not always available. The ultimate goal is to empower business users to access the data in the way they think of their domain.

This is where Knowledge Graphs come into play.

At Capsenta, we use our Knowledge Graph technology to bridge this conceptualization gap between the complex and inscrutable data sources and the business intelligence and data analytic tools that domain experts use to answer critical business questions. Our goal is to deliver beautiful data so the business users and data scientist can run with the data.

We are helping large scale enterprises in e-commerce, oil & gas and life science industries to generate beautiful data.

Data Science Blog: What are reasons for which Knowledge Graphs should be part of any corporate strategy?

Graphs are very easy for people to understand and express the complex relationships between concepts. Bubbles and lines between them (i.e. a graph!) is what domain experts draw on the whiteboard all the time. We have even had C-level executives look at a Knowledge Graph and immediately see how it expresses a portion of their business and even offer suggestions for additional richness. Imagine that, C-level executives participating in an ontology engineering session because they understand the graph.

This is in sharp contrast to the data itself, which is almost always very difficult to understand and overwhelming in scope. Critical business value is available in a subset of this data. A Knowledge Graph bridges the conceptual gap between a critical portion of the inscrutable data itself and the business user’s view of their world.

It’s incredibly empowering when data that is clear and understood – what we call “beautiful data” – is available to the data workforce.

Data Science Blog: Data-driven process analyzes require interdisciplinary knowledge. What advice would you give to a process manager who wants to familiarize her-/himself with the topic?

Domain experts/business users frequently use multiple words/phrases to mean the same thing and also a specific phrase can mean different things to different people. Also, the domain experts/business users speak a very different language than the IT database owners.

How can the business have clear, accurate answers when there’s inconsistency in what people mean and are thinking?

This is the social problem of getting everyone on the same page. We’ve seen Knowledge Graphs dramatically help with this problem. The exercise of getting people to agree upon what they mean and encoding it in an intuitive Knowledge Graph is very powerful.

The Knowledge Graph also brings the IT stakeholders into the process by clarifying exactly what data or, typically, complex calculations of data is the actual, accurate value for each and every business concept and relationship expressed in the Knowledge Graph.

It is crucial to avoid boiling the ocean. That is why we have designed a pay-as-you-go methodology to start small and provide value as quickly and accurately as possible. Ideally, the team has available what we call a “Knowledge Engineer”. This is someone who can effectively speak with the business users/domain experts and also nerd out with the database folks.

About SEMANTiCS Conference

SEMANTiCS is an established knowledge hub where technology professionals, industry experts, researchers and decision makers can learn about new technologies, innovations and enterprise implementations in the fields of Linked Data and Semantic AI. Founded in 2005 the SEMANTiCS is the only European conference at the intersection of research and industry.

This year’s event is hosted by the Semantic Web Company, FIZ Karlsruhe – Leibniz Institute for Information Infrastructure GmbH, Fachhochschule St. Pölten Forschungs GmbH, KILT Competence Center am Institut für Angewandte Informatik e.V. and Vrije Universiteit Amsterdam.

Interview: Does Business Intelligence benefit from Cloud Data Warehousing?

Interview with Ross Perez, Senior Director, Marketing EMEA at Snowflake

Read this article in German:
“Profitiert Business Intelligence vom Data Warehouse in der Cloud?”

Does Business Intelligence benefit from Cloud Data Warehousing?

Ross Perez is the Senior Director, Marketing EMEA at Snowflake. He leads the Snowflake marketing team in EMEA and is charged with starting the discussion about analytics, data, and cloud data warehousing across EMEA. Before Snowflake, Ross was a product marketer at Tableau Software where he founded the Iron Viz Championship, the world’s largest and longest running data visualization competition.

Data Science Blog: Ross, Business Intelligence (BI) is not really a new trend. In 2019/2020, making data available for the whole company should not be a big thing anymore. Would you agree?

BI is definitely an old trend, reporting has been around for 50 years. People are accustomed to seeing statistics and data for the company at large, and even their business units. However, using BI to deliver analytics to everyone in the organization and encouraging them to make decisions based on data for their specific area is relatively new. In a lot of the companies Snowflake works with, there is a huge new group of people who have recently received access to self-service BI and visualization tools like Tableau, Looker and Sigma, and they are just starting to find answers to their questions.

Data Science Blog: Up until today, BI was just about delivering dashboards for reporting to the business. The data warehouse (DWH) was something like the backend. Today we have increased demand for data transparency. How should companies deal with this demand?

Because more people in more departments are wanting access to data more frequently, the demand on backend systems like the data warehouse is skyrocketing. In many cases, companies have data warehouses that weren’t built to cope with this concurrent demand and that means that the experience is slow. End users have to wait a long time for their reports. That is where Snowflake comes in: since we can use the power of the cloud to spin up resources on demand, we can serve any number of concurrent users. Snowflake can also house unlimited amounts of data, of both structured and semi-structured formats.

Data Science Blog: Would you say the DWH is the key driver for becoming a data-driven organization? What else should be considered here?

Absolutely. Without having all of your data in a single, highly elastic, and flexible data warehouse, it can be a huge challenge to actually deliver insight to people in the organization.

Data Science Blog: So much for the theory, now let’s talk about specific use cases. In general, it matters a lot whether you are storing and analyzing e.g. financial data or machine data. What do we have to consider for both purposes?

Financial data and machine data do look very different, and often come in different formats. For instance, financial data is often in a standard relational format. Data like this needs to be able to be easily queried with standard SQL, something that many Hadoop and noSQL tools were unable to provide. Luckily, Snowflake is an ansi-standard SQL data warehouse so it can be used with this type of data quite seamlessly.

On the other hand, machine data is often semi-structured or even completely unstructured. This type of data is becoming significantly more common with the rise of IoT, but traditional data warehouses were very bad at dealing with it since they were optimized for relational data. Semi-structured data like JSON, Avro, XML, Orc and Parquet can be loaded into Snowflake for analysis quite seamlessly in its native format. This is important, because you don’t want to have to flatten the data to get any use from it.

Both types of data are important, and Snowflake is really the first data warehouse that can work with them both seamlessly.

Data Science Blog: Back to the common business use case: Creating sales or purchase reports for the business managers, based on data from ERP-systems such as Microsoft or SAP. Which architecture for the DWH could be the right one? How many and which database layers do you see as necessary?

The type of report largely does not matter, because in all cases you want a data warehouse that can support all of your data and serve all of your users. Ideally, you also want to be able to turn it off and on depending on demand. That means that you need a cloud-based architecture… and specifically Snowflake’s innovative architecture that separates storage and compute, making it possible to pay for exactly what you use.

Data Science Blog: Where would you implement the main part of the business logic for the report? In the DWH or in the reporting tool? Does it matter which reporting tool we choose?

The great thing is that you can choose either. Snowflake, as an ansi-Standard SQL data warehouse, can support a high degree of data modeling and business logic. But you can also utilize partners like Looker and Sigma who specialize in data modeling for BI. We think it’s best that the customer chooses what is right for them.

Data Science Blog: Snowflake enables organizations to store and manage their data in the cloud. Does it mean companies lose control over their storage and data management?

Customers have complete control over their data, and in fact Snowflake cannot see, alter or change any aspect of their data. The benefit of a cloud solution is that customers don’t have to manage the infrastructure or the tuning – they decide how they want to store and analyze their data and Snowflake takes care of the rest.

Data Science Blog: How big is the effort for smaller and medium sized companies to set up a DWH in the cloud? Does this have to be an expensive long-term project in every case?

The nice thing about Snowflake is that you can get started with a free trial in a few minutes. Now, moving from a traditional data warehouse to Snowflake can take some time, depending on the legacy technology that you are using. But Snowflake itself is quite easy to set up and very much compatible with historical tools making it relatively easy to move over.

Allgemeines über Geodaten

Dieser Artikel ist der Auftakt in einer Artikelserie zum Thema “Geodatenanalyse”.

Von den vielen Arten an Datensätzen, die öffentlich im Internet verfügbar sind, bin ich in letzter Zeit vermehrt über eine besonders interessante Gruppe gestolpert, die sich gleich für mehrere Zwecke nutzen lassen: Geodaten.

Gerade in wirtschaftlicher Hinsicht bieten sich eine ganze Reihe von Anwendungsfällen, bei denen Geodaten helfen können, Einblicke in Tatsachen zu erlangen, die ohne nicht möglich wären. Der wohl bekannteste Fall hierfür ist vermutlich die einfache Navigation zwischen zwei Punkten, die jeder kennt, der bereits ein Navigationssystem genutzt oder sich eine Route von Google Maps berechnen lassen hat.
Hiermit können nicht nur Fragen nach dem schnellsten oder Energie einsparensten (und damit gleichermaßen auch witschaftlichsten) Weg z. B. von Berlin nach Hamburg beantwortet werden, sondern auch die bestmögliche Lösung für Ausnahmesituationen wie Stau oder Vollsperrungen berechnet werden (ja, Stau ist, zumindest in der Theorie immer noch eine “Ausnahmesituation” ;-)).
Neben dieser beliebten Art Geodaten zu nutzen, gibt es eine ganze Reihe weiterer Situationen in denen deren Nutzung hilfreich bis essentiell sein kann. Als Beispiel sei hier der Einzugsbereich von in Konkurrenz stehenden Einheiten, wie z. B. Supermärkten genannt. Ohne an dieser Stelle statistische Nachweise vorlegen zu können, kaufen (zumindest meiner persönlichen Beobachtung nach) die meisten Menschen fast immer bei dem Supermarkt ein, der am bequemsten zu erreichen ist und dies ist in der Regel der am nächsten gelegene. Besitzt man nun eine Datenbank mit der Information, wo welcher Supermarkt bzw. welche Supermarktkette liegt, kann man mit so genannten Voronidiagrammen recht einfach den jeweiligen Einzugsbereich der jeweiligen Supermärkte berechnen.
Entsprechende Karten können auch von beliebigen anderen Entitäten mit fester geographischer Position gezeichnet werden: Geldautomaten, Funkmasten, öffentlicher Nahverkehr, …

Ein anderes Beispiel, das für die Datenauswertung interessant ist, ist die kartographische Auswertung von Postleitzahlen. Diese sind in fast jedem Datensatz zu Kunden, Lieferanten, ect. vorhanden, bilden jedoch weder eine ordinale, noch eine sinnvolle kategorische Größe, da es viele tausend verschiedene gibt. Zudem ist auch eine einfache Gruppierung in gröbere Kategorien wie beispielsweise Postleitzahlen des Schemas 1xxxx oft kaum sinnvoll, da diese in aller Regel kein sinnvolles Mapping auf z. B. politische Gebiete – wie beispielsweise Bundesländer – zulassen. Ein Ausweg aus diesem Dilemma ist eine einfache kartographische Übersicht, welche die einzelnen Postleitzahlengebiete in einer Farbskala zeigt.

Im gezeigten Beispiel ist die Bevölkerungsdichte Deutschlands als Karte zu sehen. Hiermit wird schnell und übersichtlich deutlich, wo in Deutschland die Bevölkerung lokalisiert ist. Ähnliche Karten können beispielsweise erstellt werden, um Fragen wie “Wie ist meine Kundschaft verteilt?” oder “Wo hat die Werbekampange XYZ besonders gut funktioniert?” zu beantworten. Bezieht man weitere Daten wie die absolute Bevölkerung oder die Bevölkerungsdichte mit ein, können auch Antworten auf Fragen wie “Welchen Anteil der Bevölkerung habe ich bereits erreicht und wo ist noch nicht genutztes Potential?” oder “Ist mein Produkt eher in städtischen oder ländlichen Gebieten gefragt?” einfach und schnell gefunden werden.
Ohne die entsprechende geographische Zusatzinformation bleiben insbesondere Postleitzahlen leider oft als “nicht sinnvoll auswertbar” bei der Datenauswertung links liegen.
Eine ganz andere Art von Vorteil der Geodaten ist der educational point of view:
  • Wer erst anfängt, sich mit Datenbanken zu beschäftigen, findet mit Straßen, Postleitzahlen und Ländern einen deutlich einfacheren und vor allem besser verständlichen Zugang zu SQL als mit abstrakten Größen und Nummern wie ProductID, CustomerID und AdressID. Zudem lassen sich Geodaten nebenbei bemerkt mittels so genannter GeoInformationSystems (*gis-Programme), erstaunlich einfach und ansprechend plotten.
  • Wer sich mit SQL bereits ein wenig auskennt, kann mit den (beispielsweise von Spatialite oder PostGIS) bereitgestellten SQL-Funktionen eine ganze Menge über Datenbanken sowie deren Möglichkeiten – aber auch über deren Grenzen – erfahren.
  • Für wen relationale Datenbanken sowie deren Funktionen schon lange nichts Neues mehr darstellen, kann sich hier (selbst mit dem eigenen Notebook) erstaunlich einfach in das Thema “Bug Data” einarbeiten, da die Menge an öffentlich vorhandenen Geodaten z.B. des OpenStreetMaps-Projektes selbst in optimal gepackten Format vielen Dutzend GB entsprechen. Gerade die Möglichkeit, die viele *gis-Programme wie beispielsweise QGIS bieten, nämlich Straßen-, Schienen- und Stromnetze “on-the-fly” zu plotten, macht die Bedeutung von richtig oder falsch gesetzten Indices in verschiedenen Datenbanken allein anhand der Geschwindigkeit mit der sich die Plots aufbauen sehr eindrucksvoll deutlich.
Um an Datensätze zu kommen, reicht es in der Regel Google mit den entsprechenden Schlagworten zu versorgen.
Neben – um einen Vergleich zu nutzen – dem Brockhaus der Karten GoogleMaps gibt es beispielsweise mit dem OpenStreetMaps-Projekt einen freien Geodatensatz, welcher in diesem Kontext etwa als das Wikipedia der Karten zu verstehen ist.
Hier findet man zum Beispiel Daten wie Straßen-, Schienen- oder dem Stromnetz, aber auch die im obigen Voronidiagramm eingezeichneten Gebäude und Supermärkte stammen aus diesem Datensatz. Hiermit lassen sich recht einfach just for fun interessante Dinge herausfinden, wie z. B., dass es in Deutschland ca. 28 Mio Gebäude gibt (ein SQL-Einzeiler), dass der Berliner Osten auch ca. 30 Jahre nach der Wende noch immer vorwiegend von der Tram versorgt wird, während im Westen hauptsächlich die U-Bahn fährt. Oder über welche Trassen der in der Nordsee von Windkraftanlagen erzeugte Strom auf das Festland kommt und von da aus weiter verteilt wird.
Eher grundlegende aber deswegen nicht weniger nützliche Datensätze lassen sich unter dem Stichwort “natural earth” finden. Hier sind Daten wie globale Küstenlinien, mittels Echolot ausgemessene Meerestiefen, aber auch von Menschen geschaffene Dinge wie Landesgrenzen und Städte sehr übersichtlich zu finden.
Im Grunde sind der Vorstellung aber keinerlei Grenzen gesetzt und fast alle denkbaren geographischen Fakten können, manchmal sogar live via Sattelit, mitverfolgt werden. So kann man sich beispielsweise neben aktueller Wolkenbedekung, Regenradar und globaler Oberflächentemperatur des Planeten auch das Abschmelzen der Polkappen seit 1970 ansehen (NSIDC) oder sich live die Blitzeinschläge auf dem gesamten Planeten anschauen – mit Vorhersage darüber, wann und wo der Donner zu hören ist (das funktioniert wirklich! Beispielsweise auf lightningmaps).
Kurzum Geodaten sind neben ihrer wirtschaftlichen Relevanz – vor allem für die Logistik – auch für angehende Data Scientists sehr aufschlussreich und ein wunderbares Spielzeug, mit dem man sich lange beschäftigen und eine Menge interessanter Dinge herausfinden kann.

Attribution Models in Marketing

Attribution Models

A Business and Statistical Case

INTRODUCTION

A desire to understand the causal effect of campaigns on KPIs

Advertising and marketing costs represent a huge and ever more growing part of the budget of companies. Studies have found out this share is as high as 10% and increases with the size of companies (CMO study by American Marketing Association and Duke University, 2017). Measuring precisely the impact of a specific marketing campaign on the sales of a company is a critical step towards an efficient allocation of this budget. Would the return be higher for an euro spent on a Facebook ad, or should we better spend it on a TV spot? How much should I spend on Twitter ads given the volume of sales this channel is responsible for?

Attribution Models have lately received great attention in Marketing departments to answer these issues. The transition from offline to online marketing methods has indeed permitted the collection of multiple individual data throughout the whole customer journey, and  allowed for the development of user-centric attribution models. In short, Attribution Models use the information provided by Tracking technologies such as Google Analytics or Webtrekk to understand customer journeys from the first click on a Facebook ad to the final purchase and adequately ponderate the different marketing campaigns encountered depending on their responsibility in the final conversion.

Issues on Causal Effects

A key question then becomes: how to declare a channel is responsible for a purchase? In other words, how can we isolate the causal effect or incremental value of a campaign ?

          1. A/B-Tests

One method to estimate the pure impact of a campaign is the design of randomized experiments, wherein a control and treated groups are compared.  A/B tests belong to this broad category of randomized methods. Provided the groups are a priori similar in every aspect except for the treatment received, all subsequent differences may be attributed solely to the treatment. This method is typically used in medical studies to assess the effect of a drug to cure a disease.

Main practical issues regarding Randomized Methods are:

  • Assuring that control and treated groups are really similar before treatment. Uually a random assignment (i.e assuring that on a relevant set of observable variables groups are similar) is realized;
  • Potential spillover-effects, i.e the possibility that the treatment has an impact on the non-treated group as well (Stable unit treatment Value Assumption, or SUTVA in Rubin’s framework);
  • The costs of conducting such an experiment, and especially the costs linked to the deliberate assignment of individuals to a group with potentially lower results;
  • The number of such experiments to design if multiple treatments have to be measured;
  • Difficulties taking into account the interaction effects between campaigns or the effect of spending levels. Indeed, usually A/B tests are led by cutting off temporarily one campaign entirely and measuring the subsequent impact on KPI’s compared to the situation where this campaign is maintained;
  • The dynamical reproduction of experiments if we assume that treatment effects may change over time.

In the marketing context, multiple campaigns must be tested in a dynamical way, and treatment effect is likely to be heterogeneous among customers, leading to practical issues in the lauching of A/B tests to approximate the incremental value of all campaigns. However, sites with a lot of traffic and conversions can highly benefit from A/B testing as it provides a scientific and straightforward way to approximate a causal impact. Leading companies such as Uber, Netflix or Airbnb rely on internal tools for A/B testing automation, which allow them to basically test any decision they are about to make.

References:

Books:

Experiment!: Website conversion rate optimization with A/B and multivariate testing, Colin McFarland, ©2013 | New Riders  

A/B testing: the most powerful way to turn clicks into customers. Dan Siroker, Pete Koomen; Wiley, 2013.

Blogs:

https://eng.uber.com/xp

https://medium.com/airbnb-engineering/growing-our-host-community-with-online-marketing-9b2302299324

Study:

https://cmosurvey.org/wp-content/uploads/sites/15/2018/08/The_CMO_Survey-Results_by_Firm_and_Industry_Characteristics-Aug-2018.pdf

        2. Attribution models

Attribution Models do not demand to create an experimental setting. They take into account existing data and derive insights from the variability of customer journeys. One key difficulty is then to differentiate correlation and causality in the links observed between the exposition to campaigns and purchases. Indeed, selection effects may bias results as exposure to campaigns is usually dependant on user-characteristics and thus may not be necessarily independant from the customer’s baseline conversion probabilities. For example, customers purchasing from a discount price comparison website may be intrinsically different from customers buying from FB ad and this a priori difference may alone explain post-exposure differences in purchasing bahaviours. This intrinsic weakness must be remembered when interpreting Attribution Models results.

                          2.1 General Issues

The main issues regarding the implementation of Attribution Models are linked to

  • Causality and fallacious reasonning, as most models do not take into account the aforementionned selection biases.
  • Their difficult evaluation. Indeed, in almost all attribution models (except for those based on classification, where the accuracy of the model can be computed), the additionnal value brought by the use of a given attribution models cannot be evaluated using existing historical data. This additionnal value can only be approximated by analysing how the implementation of the conclusions of the attribution model have impacted a given KPI.
  • Tracking issues, leading to an uncorrect reconstruction of customer journeys
    • Cross-device journeys: cross-device issue arises from the use of different devices throughout the customer journeys, making it difficult to link datapoints. For example, if a customer searches for a product on his computer but later orders it on his mobile, the AM would then mistakenly consider it an order without prior campaign exposure. Though difficult to measure perfectly, the proportion of cross-device orders can approximate 20-30%.
    • Cookies destruction makes it difficult to track the customer his the whole journey. Both regulations and consumers’ rising concerns about data privacy issues mitigate the reliability and use of cookies.1 – From 2002 on, the EU has enacted directives concerning privacy regulation and the extended use of cookies for commercial targeting purposes, which have highly impacted marketing strategies, such as the ‘Privacy and Electronic Communications Directive’ (2002/58/EC). A research was conducted and found out that the adoption of this ‘Privacy Directive’ had led to 64% decrease in advertising methods compared to the rest of the world (Goldfarb et Tucker (2011)). The effect was stronger for generalized sites (Yahoo) than for specialized sites.2 – Users have grown more and more conscious of data privacy issues and have adopted protective measures concerning data privacy, such as automatic destruction of cookies after a session is ended, or simply giving away less personnal information (Goldfarb et Tucker (2012) ) .Valuable user information may be lost, though tracking technologies evolution have permitted to maintain tracking by other means. This issue may be particularly important in countries highly concerned with data privacy issues such as Germany.
    • Offline/Online bridge: an Attribution Model should take into account all campaigns to draw valuable insights. However, the exposure to offline campaigns (TV, newspapers) are difficult to track at the user level. One idea to tackle this issue would be to estimate the proportion of conversions led by offline campaigns through AB testing and deduce this proportion from the credit assigned to the online campaigns accounted for in the Attribution Model.
    • Touch point information available: clicks are easy to follow but irrelevant to take into account the influence of purely visual campaigns such as display ads or video.

                          2.2 Today’s main practices

Two main families of Attribution Models exist:

  • Rule-Based Attribution Models, which have been used for in the last decade but from which companies are gradualy switching.

Attribution depends on the individual journeys that have led to a purchase and is solely based on the rank of the campaign in the journey. Some models focus on a single touch points (First Click, Last Click) while others account for multi-touch journeys (Bathtube, Linear). It can be calculated at the customer level and thus doesn’t require large amounts of data points. We can distinguish two sub-groups of rule-based Attribution Models:

  • One Touch Attribution Models attribute all credit to a single touch point. The First-Click model attributes all credit for a converion to the first touch point of the customer journey; last touch attributes all credit to the last campaign.
  • Multi-touch Rule-Based Attribution Models incorporate information on the whole customer journey are thus an improvement compared to one touch models. To this family belong Linear model where credit is split equally between all channels, Bathtube model where 40% of credit is given to first and last clicks and the remaining 20% is distributed equally between the middle channels, or time-decay models where credit assigned to a click diminishes as the time between the click and the order increases..

The main advantages of rule-based models is their simplicity and cost effectiveness. The main problems are:

– They are a priori known and can thus lead to optimization strategies from competitors
– They do not take into account aggregate intelligence on customer journeys and actual incremental values.
– They tend to bias (depending on the model chosen) channels that are over-represented at the beggining or end of the funnel, according to theoretical assumptions that have no observationnal back-ups.

  • Data-Driven Attribution Models

These models take into account the weaknesses of rule-based models and make a relevant use of available data. Being data-driven, following attribution models cannot be computed using single user level data. On the contrary values are calculated through data aggregation and thus require a certain volume of customer journey information.

References:

https://dspace.mit.edu/handle/1721.1/64920

 

        3. Data-Driven Attribution Models in practice

                          3.1 Issues

Several issues arise in the computation of campaigns individual impact on a given KPI within a data-driven model.

  • Selection biases: Exposure to certain types of advertisement is usually highly correlated to non-observable variables which are in turn correlated to consumption practices. Differences in the behaviour of users exposed to different campaigns may thus only be driven by core differences in conversion probabilities between groups whether than by the campaign effect.
  • Complementarity: it may be that campaigns A and B only have an effect when combined, so that measuring their individual impact would lead to misleading conclusions. The model could then try to assess the effect of combinations of campaigns on top of the effect of individual campaigns. As the number of possible non-ordered combinations of k campaigns is 2k, it becomes clear that inclusing all possible combinations would however be time-consuming.
  • Order-sensitivity: The effect of a campaign A may depend on the place where it appears in the customer journey, meaning the rank of a campaign and not merely its presence could be accounted for in the model.
  • Relative Order-sensitivity: it may be that campaigns A and B only have an effect when one is exposed to campaign A before campaign B. If so, it could be useful to assess the effect of given combinations of campaigns as well. And this for all campaigns, leading to tremendous numbers of possible combinations.
  • All previous phenomenon may be present, increasing even more the potential complexity of a comprehensive Attribution Model. The number of all possible ordered combination of k campaigns is indeed :

 

                          3.2 Main models

                                  A) Logistic Regression and Classification models

If non converting journeys are available, Attribition Model can be shaped as a simple classification issue. Campaign types or campaigns combination and volume of campaign types can be included in the model along with customer or time variables. As we are interested in inference (on campaigns effect) whether than prediction, a parametric model should be used, such as Logistic Regression. Non paramatric models such as Random Forests or Neural Networks can also be used though the interpretation of campaigns value would be more difficult to derive from the model results.

A common pitfall is the usual issue of spurious correlations on one hand and the correct interpretation of coefficients in business terms.

An advantage if the possibility to evaluate the relevance of the model using common model validation methods to evaluate its predictive power (validation set \ AUC \pseudo R squared).

                                  B) Shapley Value

Theory

The Shapley Value is based on a Game Theory framework and is named after its creator, the Nobel Price Laureate Lloyd Shapley. Initially meant to calculate the marginal contribution of players in cooperative games, the model has received much attention in research and industry and has lately been applied to marketing issues. This model is typically used by Google Adords and other ad bidding vendors. Campaigns or marketing channels are in this model seen as compementary players looking forward to increasing a given KPI.
Contrarily to Logistic Regressions, it is a non-parametric model. Contrarily to Markov Chains, all results are built using existing journeys, and not simulated ones.

Channels are considered to enter the game sequentially under a certain joining order. Shapley value try to The Shapley value of channel i is the weighted sum of the marginal values that channel i adds to all possible coalitions that don’t contain channel i.
In other words, the main logic is to analyse the difference of gains when a channel i is added after a coalition Ck of k channels, k<=n. We then sum all the marginal contributions over all possible ordered combination Ck of all campaigns excluding i, with k<=n-1.

Subsets framework

A first an most usual way to compute the Shapley Vaue is to consider that when a channel enters coalition, its additionnal value is the same irrelevant of the order in which previous channels have appeared. In other words, journeys (A>B>C) and (B>A>C) trigger the same gains.
Shapley value is computed as the gains associated to adding a channel i to a subset of channels, weighted by the number of (ordered) sequences that the (unordered) subset represents, summed up on all possible subsets of the total set of campaigns where the channel i is not present.
The Shapley value of the channel 𝑥𝑗 is then:

where |S| is the number of campaigns of a coalition S and the sum extends over all subsets S that do not not contain channel j. 𝜈(𝑆)  is the value of the coalition S and 𝜈(𝑆 ∪ {𝑥𝑗})  the value of the coalition formed by adding 𝑥𝑗 to coalition S. 𝜈(𝑆 ∪ {𝑥𝑗}) − 𝜈(𝑆) is thus the marginal contribution of channel 𝑥𝑗 to the coalition S.

The formula can be rewritten and understood as:

This method is convenient when data on the gains of on all possible permutations of all unordered k subsets of the n campaigns are available. It is also more convenient if the order of campaigns prior to the introduction of a campaign is thought to have no impact.

Ordered sequences

Let us define 𝜈((A>B)) as the value of the sequence A then B. What is we let 𝜈((A>B)) be different from 𝜈((B>A)) ?
This time we would need to sum over all possible permutation of the S campaigns present before  𝑥𝑗 and the N-(S+1) campaigns after 𝑥𝑗. Doing so we will sum over all possible orderings (i.e all permutations of the n campaigns of the grand coalition containing all campaigns) and we can remove the permutation coefficient s!(p-s+1)!.

This method is convenient when the order of channels prior to and after the introduction of another channel is assumed to have an impact. It is also necessary to possess data for all possible permutations of all k subsets of the n campaigns, and not only on all (unordered) k-subsets of the n campaigns, k<=n. In other words, one must know the gains of A, B, C, A>B, B>A, etc. to compute the Shapley Value.

Differences between the two approaches

We simulate an ordered case where the value for each ordered sequence k for k<=3 is known. We compare it to the usual Shapley value calculated based on known gains of unordered subsets of campaigns. So as to compare relevant values, we have built the gains matrix so that the gains of a subset A, B i.e  𝜈({B,A}) is the average of the gains of ordered sequences made up with A and B (assuming the number of journeys where A>B equals the number of journeys where B>A, we have 𝜈({B,A})=0.5( 𝜈((A>B)) + 𝜈((B>A)) ). We let the value of the grand coalition be different depending on the order of campaigns-keeping the constraints that it averages to the value used for the unordered case.

Note: mvA refers to the marginal value of A in a given sequence.
With traditionnal unordered coalitions:

With ordered sequences used to compute the marginal values:

 

We can see that the two approaches yield very different results. In the unordered case, the Shapley Value campaign C is the highest, culminating at 20, while A and B have the same Shapley Value mvA=mvB=15. In the ordered case, campaign A has the highest Shapley Value and all campaigns have different Shapley Values.

This example illustrates the inherent differences between the set and sequences approach to Shapley values. Real life data is more likely to resemble the ordered case as conversion probabilities may for any given set of campaigns be influenced by the order through which the campaigns appear.

Advantages

Shapley value has become popular in allocation problems in cooperative games because it is the unique allocation which satisfies different axioms:

  • Efficiency: Shaple Values of all channels add up to the total gains (here, orders) observed.
  • Symmetry: if channels A and B bring the same contribution to any coalition of campaigns, then their Shapley Value i sthe same
  • Null player: if a channel brings no additionnal gains to all coalitions, then its Shapley Value is zero
  • Strong monotony: the Shapley Value of a player increases weakly if all its marginal contributions increase weakly

These properties make the Shapley Value close to what we intuitively define as a fair attribution.

Issues

  • The Shapley Value is based on combinatory mathematics, and the number of possible coalitions and ordered sequences becomes huge when the number of campaigns increases.
  • If unordered, the Shapley Value assumes the contribution of campaign A is the same if followed by campaign B or by C.
  • If ordered, the number of combinations for which data must be available and sufficient is huge.
  • Channels rarely present or present in long journeys will be played down.
  • Generally, gains are supposed to grow with the number of players in the game. However, it is plausible that in the marketing context a journey with a high number of channels will not necessarily bring more orders than a journey with less channels involved.

References:

R package: GameTheoryAllocation

Article:
Zhao & al, 2018 “Shapley Value Methods for Attribution Modeling in Online Advertising “
https://link.springer.com/content/pdf/10.1007/s13278-017-0480-z.pdf
Courses: https://www.lamsade.dauphine.fr/~airiau/Teaching/CoopGames/2011/coopgames-7%5b8up%5d.pdf
Blogs: https://towardsdatascience.com/one-feature-attribution-method-to-supposedly-rule-them-all-shapley-values-f3e04534983d

                                  B) Markov Chains

Markov Chains are used to model random processes, i.e events that occur in a sequential manner and in such a way that the probability to move to a certain state only depends on the past steps. The number of previous steps that are taken into account to model the transition probability is called the memory parameter of the sequence, and for the model to have a solution must be comprised between 0 and 4. A Markov Chain process is thus defined entirely by its Transition Matrix and its initial vector (i.e the starting point of the process).

Markov Chains are applied in many scientific fields. Typically, they are used in weather forecasting, with the sequence of Sunny and Rainy days following a Markov Process of memory parameter 0, so that for each given day the probability that the next day will be rainy or sunny only depends on the weather of the current day. Other applications can be found in sociology to understand the dynamics of social classes intergenerational reproduction. To get more both mathematical and applied illustration, I recommend the reading of this course.

In the marketing context, Markov Chains are an interesting way to model the conversion funnel. To go from the from the Markov Model to the Attribution logic, we calculate the Removal Effect of each channel, i.e the difference in conversions that happen if the channel is removed. Please read below for an introduction to the methodology.

The first step in a Markov Chains Attribution Model is to build the transition matrix that captures the transition probabilities between the campaigns accross existing customer journeys. This Matrix is to be read as a “From state A to state B” table, from the left to the right. A first difficulty is finding the right memory parameter to use. A large memory parameter would allow to take more into account interraction effects within the conversion funnel but would lead to increased computationnal time, a non-readable transition matrix, and be more sensitive to noisy data. Please note that this transition matrix provides useful information on the conversion funnel and on the relationships between campaigns and can be used as such as an analytical tool. I suggest the clear and easily R code which can be found here or here.

Here is an illustration of a Markov Chain with memory Parameter of 0: the probability to go to a certain campaign B in the next step only depend on the campaign we are currently at:

The associated Transition Matrix is then (with null probabilities left as Blank):

The second step is  to compute the actual responsibility of a channel in total conversions. As mentionned above, the main philosophy to do so is to calculate the Removal Effect of each channel, i.e the changes in the number of conversions when a channel is entirely removed. All customer journeys which went through this channel are settled out to be unsuccessful. This calculation is done by applying the transition matrix with and without the removed channels to an initial vector that contains the number of desired simulations.

Building on our current example, we can then settle an initial vector with the desired number of simulations, e.g 10 000:

 

It is possible at this stage to add a constraint on the maximum number of times the matrix is applied to the data, i.e on the maximal number of campaigns a simulated journey is allowed to have.

Advantages

  • The dynamic journey is taken into account, as well as the transition between two states. The funnel is not assumed to be linear.
  • It is possile to build a conversion graph that maps the customer journey provides valuable insights.
  • It is possible to evaluate partly the accuracy of the Attribution Model based on Markov Chains. It is for example possible to see how well the transition matrix help predict the future by analysing the number of correct predictions at any given step over all sequences.

Disadvantages

  • It can be somewhat difficult to set the memory parameter. Complementarity effects between channels are not well taken into account if the memory is low, but a parameter too high will lead to over-sensitivity to noise in the data and be difficult to implement if customer journeys tend to have a number of campaigns below this memory parameter.
  • Long journeys with different channels involved will be overweighted, as they will count many times in the Removal Effect.  For example, if there are n-1 channels in the customer journey, this journey will be considered as failure for the n-1 channel-RE. If the volume effects (i.e the impact of the overall number of channels in a journey, irrelevant from their type° are important then results may be biased.

References:

R package: ChannelAttribution

Git:

https://github.com/MatCyt/Markov-Chain/blob/master/README.md

Course:

https://www.ssc.wisc.edu/~jmontgom/markovchains.pdf

Article:

“Mapping the Customer Journey: A Graph-Based Framework for Online Attribution Modeling”; Anderl, Eva and Becker, Ingo and Wangenheim, Florian V. and Schumann, Jan Hendrik, 2014. Available at SSRN: https://ssrn.com/abstract=2343077 or http://dx.doi.org/10.2139/ssrn.2343077

“Media Exposure through the Funnel: A Model of Multi-Stage Attribution”, Abhishek & al, 2012

“Multichannel Marketing Attribution Using Markov Chains”, Kakalejčík, L., Bucko, J., Resende, P.A.A. and Ferencova, M. Journal of Applied Management and Investments, Vol. 7 No. 1, pp. 49-60.  2018

Blogs:

https://analyzecore.com/2016/08/03/attribution-model-r-part-1

https://analyzecore.com/2016/08/03/attribution-model-r-part-2

                          3.3 To go further: Tackling selection biases with Quasi-Experiments

Exposure to certain types of advertisement is usually highly correlated to non-observable variables. Differences in the behaviour of users exposed to different campaigns may thus only be driven by core differences in converison probabilities between groups whether than by the campaign effect. These potential selection effects may bias the results obtained using historical data.

Quasi-Experiments can help correct this selection effect while still using available observationnal data.  These methods recreate the settings on a randomized setting. The goal is to come as close as possible to the ideal of comparing two populations that are identical in all respects except for the advertising exposure. However, populations might still differ with respect to some unobserved characteristics.

Common quasi-experimental methods used for instance in Public Policy Evaluation are:

  • Discontinuity Regressions
  • Matching Methods, such as Exact Matching,  Propensity-score matching or k-nearest neighbourghs.

References:

Article:

“Towards a digital Attribution Model: Measuring the impact of display advertising on online consumer behaviour”, Anindya Ghose & al, MIS Quarterly Vol. 40 No. 4, pp. 1-XX, 2016

https://pdfs.semanticscholar.org/4fa6/1c53f281fa63a9f0617fbd794d54911a2f84.pdf

        4. First Steps towards a Practical Implementation

Identify key points of interests

  • Identify the nature of touchpoints available: is the data based on clicks? If so, is there a way to complement the data with A/B tests to measure the influence of ads without clicks (display, video) ? For example, what happens to sales when display campaign is removed? Analysing this multiplier effect would give the overall responsibility of display on sales, to be deduced from current attribution values given to click-based channels. More interestingly, what is the impact of the removal of display campaign on the occurences of click-based campaigns ? This would give us an idea of the impact of display ads on the exposure to each other campaigns, which would help correct the attribution values more precisely at the campaign level.
  • Define the KPI to track. From a pure Marketing perspective, looking at purchases may be sufficient, but from a financial perspective looking at profits, though a bit more difficult to compute, may drive more interesting results.
  • Define a customer journey. It may seem obvious, but the notion needs to be clarified at first. Would it be defined by a time limit? If so, which one? Does it end when a conversion is observed? For example, if a customer makes 2 purchases, would the campaigns he’s been exposed to before the first order still be accounted for in the second order? If so, with a time decay?
  • Define the research framework: are we interested only in customer journeys which have led to conversions or in all journeys? Keep in mind that successful customer journeys are a non-representative sample of customer journeys. Models built on the analysis of biased samples may be conservative. Take an extreme example: 80% of customers who see campaign A buy the product, VS 1% for campaign B. However, campaign B exposure is great and 100 Million people see it VS only 1M for campaign A. An Attribution Model based on successful journeys will give higher credit to campaign B which is an auguable conclusion. Taking into account costs per campaign (in the case where costs are calculated by clicks) may of course tackle this issue partly, as campaign A could then exhibit higher returns, but a serious fallacious reasonning is at stake here.

Analyse the typical customer journey    

  • Performing a duration analysis on the data may help you improve the definition of the customer journey to be used by your organization. After which days are converison probabilities null? Should we consider the effect of campaigns disappears after x days without orders? For example, if 99% of orders are placed in the 30 days following a first click, it might be interesting to define the customer journey as a 30 days time frame following the first oder.
  • Look at the distribution of the number of campaigns in a typical journey. If you choose to calculate the effect of campaigns interraction in your Attribution Model, it may indeed help you determine the maximum number of campaigns to be included in a combination. Indeed, you may not need to assess the impact of channel combinations with above than 4 different channels if 95% of orders are placed after less then 4 campaigns.
  • Transition matrixes: what if a campaign A systematically leads to a campaign B? What happens if we remove A or B? These insights would give clues to ask precise questions for a latter AB test, for example to find out if there is complementarity between channels A and B – (implying none should be removed) or mere substitution (implying one can be given up).
  • If conversion rates are available: it can be interesting to perform a survival analysis i.e to analyse the likelihood of conversion based on duration since first click. This could help us excluse potential outliers or individuals who have very low conversion probabilities.

Summary

Attribution is a complex topic which will probably never be definitively solved. Indeed, a main issue is the difficulty, or even impossibility, to evaluate precisely the accuracy of the attribution model that we’ve built. Attribution Models should be seen as a good yet always improvable approximation of the incremental values of campaigns, and be presented with their intrinsinc limits and biases.

Was der BREXIT für die Cloud-Strategie bedeutet

Datensouveränität wird nach dem Brexit eine der größten Herausforderungen für Unternehmen sein. Geschäftsführer sind sich der Bedeutung dessen bewusst und fürchten die Gefahr eines „Data cliff edge“, wenn die Trennung Großbritanniens von der EU endgültig beschlossene Sache sein wird.

Ohne ein klares Gespür dafür zu haben, welche Vorschriften und Compliance-Anforderungen bald gelten werden, versuchen britische Unternehmen herauszufinden, wie sie ihre Daten bestmöglich schützen, Geschäftsverzögerungen verhindern und kostspielige Fehler vermeiden können. Die Vieldeutigkeit rund um den Brexit wirft mehr Fragen als Antworten auf, darunter: Wo sollten britische Unternehmen ihre Daten speichern? Sollten sie alle ihre Rechenzentren nach Großbritannien verlegen? Wie wirkt sich der Besitz von Rechenzentren auf den Datenschutz aus? Welche Bedrohungen bestehen, wenn nach Abschluss des Brexit Daten innerhalb oder außerhalb des Vereinigten Königreichs gespeichert werden?

Für Führungskräfte sind der Mangel an Antworten und die Angst vor dem Unbekannten frustrierend. In dieser ungewissen Zeit können smarte Geschäftsführer aber den Brexit für ihre Zwecke lenken, indem sie ihn als Chance und nicht als Hindernis für sich nutzen.

Die unsicher regulierte Zukunft

Für Unternehmen mit Sitz in Großbritannien, die Datenspeicherung und private Cloud-Dienste anbieten, ist vor allem der Ort, an dem sich die Daten befinden, von Belang. Die Gewährleistung der Sicherheit und Kontrolle über eigene Daten ist von zentraler Bedeutung. Gleichzeitig ist jedoch auch die Einhaltung unbekannter zukünftiger Vorschriften und Gesetze zum Datenschutz und zum Datentransfer ein Muss.

Grundlage ist die Einhaltung der Datenschutzverordnung (DSGVO) vom 25. Mai 2018, da das Vereinigte Königreich zu diesem Zeitpunkt noch immer Teil der EU war. Nach Angaben des Information Commissioner’s Office (ICO) des Vereinigten Königreichs – einer unabhängigen Behörde, die sich für die Wahrung von Informations- und Datenschutzrechten von Einzelpersonen einsetzt – bestätigte die britische Regierung, dass ein Austritt aus der EU keine Auswirkungen auf die DSGVO haben wird. Was in diesem Jahr, wenn sich Großbritannien und die EU endgültig voneinander trennen, passieren wird, kann man nur vermuten. Die Ratschläge von ICO sind richtungsweisend: „Bereiten Sie sich darauf vor, die Bestimmungen der DSGVO zu erfüllen und voranzukommen.“

Bemerkenswerterweise schreibt die DSGVO nicht vor, wo Unternehmen ihre Daten aufbewahren müssen. Es ist lediglich erforderlich, dass die EU-Organisationen ihre Daten innerhalb der EU speichern und außerhalb der EU unzugänglich machen müssen. Ausnahme: die Daten betreffen eine DSGVO-konforme Organisation. Wie sich dieses Mandat auf das Vereinigte Königreich auswirkt, muss noch gesehen werden. Denn das Vereinigte Königreich war ja zum Zeitpunkt der Ausarbeitung der Verordnung Teil der EU. Es ist unklar, ob das Vereinigte Königreich am Ende mit der DSGVO konform sein wird.

Aus globaler Sicht muss Großbritannien herausfinden, wie der Datenaustausch und der grenzüberschreitende Datenfluss reguliert werden können. Der freie Datenfluss ist wichtig für Unternehmen und Innovation, was bedeutet, dass das Vereinigte Königreich Vereinbarungen, wie die EU sie mit den USA getroffen haben, benötigt. Ein Privacy Shield, das den Austausch personenbezogener Daten zu gewerblichen Zwecken ermöglicht. Ob das Vereinigte Königreich Vereinbarungen wie den Privacy Shield umsetzen kann, oder neue Vereinbarungen mit Ländern wie den USA treffen muss, ist etwas, was nur die Zeit zeigen wird.

Wo sind die Daten?

Rechenzentren können heute durch freien Datenfluss, sowohl im Vereinigten Königreich als auch in der EU betrieben werden. Das Vereinigte Königreich unterliegt gleichem Schutz und gleichen Vorschriften wie die EU. Viele Spekulationen beinhalten allerdings, dass in naher Zukunft britische Kunden von einem in Großbritannien ansässigen Rechenzentrum bedient werden müssen, ebenso wie europäische Kunden ein EU-Rechenzentrum benötigen. Es gibt keine Garantien. Unklar ist auch, ob diese Situation die Anbieter von Rechenzentren dazu veranlassen wird, den Umzug aus Großbritannien in Betracht zu ziehen, um sich stärker auf den Kontinent zu konzentrieren, oder ob sie sich an beiden Standorten gleichzeitig niederlassen werden. Das Wahrscheinlichste: Die Anbieter tendieren zu letzterem, wie auch Amazon Web Services (AWS). Selbst nach dem Brexit-Votum hielt Amazon an seinem Wort fest und eröffnete Ende letzten Jahres sein erstes AWS-Rechenzentrum in London. Dies unterstreicht sowohl sein Engagement für Großbritannien als auch das unternehmerische Engagement.

Aus dem Brexit eine Geschäftsmöglichkeit machen

Die Automatisierung des IT-Betriebs und die Einführung einer Cloud-Strategie könnten die ersten Schritte sein, um die unbeantworteten Fragen des Brexit zu lösen und daraus einen Vorteil zu machen. Es ist an der Zeit, die Vorteile dessen zu erkennen, teure Hardware und Software von Unternehmen vor Ort durch den Umstieg auf die öffentliche Cloud zu ersetzen. Dies ist nicht nur die kostengünstigere Option. Cloud-Anbieter wie AWS, Microsoft Azure und Google Cloud Platform (GCP) ersparen in diesem politischen Umfeld sogar Unternehmen die Verwaltung und Wartung von Rechenzentren. Einige Unternehmen sind möglicherweise besorgt über die steigenden Raten von Public-Cloud-Anbietern, ihre Preisanpassungen scheinen jedoch an den relativen Wertverlust des Sterlings gebunden zu sein. Selbst bei geringen Erhöhungen sind die Preise einiger Anbieter, wie AWS, noch immer deutlich niedriger als die Kosten, die mit dem Betrieb von Rechenzentren und privaten Clouds vor Ort verbunden sind, insbesondere wenn Wartungskosten einbezogen werden. Wenn man diesen Gedanken noch einen Schritt weiterführt, wie kann der Brexit als eine Chance für Unternehmen betrachtet werden?Organisationen sammeln alle Arten von Daten. Aber nur eine Handvoll von ihnen verwendet effektive Datenanalysen, die Geschäftsentscheidungen unterstützen. Nur wenige Unternehmen tun mehr, als ihre Daten zu speichern, da ihnen die Tools und Ressourcen fehlen, um nahtlos auf ihre Daten zuzugreifen, oder weil Abfragen teuer sind. Ohne ein für die Cloud konstruiertes Data Warehouse ist dieser Prozess bestenfalls eine Herausforderung, und der wahre Wert der Daten geht dabei verloren. Ironischerweise bietet der Brexit die Möglichkeit, dies zu ändern, da Unternehmen ihre IT-Abläufe neu bewerten und alternative, kostengünstigere Methoden zum Speichern von Daten suchen müssen. Durch den Wechsel zu einer öffentlichen Cloud und die Nutzung eines Data Warehouses für die Cloud können Unternehmen Beschränkungen und Einschränkungen ihrer Daten aufheben und diese für die Entscheidungsfindung zugänglich machen.

Der Brexit dient also als Katalysator einer datengesteuerten Organisation, die Daten verwendet, anstatt sie für schlechte Zeiten zu speichern. Am Ende scheint die Prognose der Verhandlungen in Brüssel doch eine ziemlich stürmische zu sein.

Team Up für Cloud-Daten-Lösungen

Heute bestimmen Daten die Welt. Snowflake ermöglicht Unternehmen, ihre Daten über mehrere Clouds hinweg zu speichern und zu analysieren. In einer Zusammenarbeit mit dem Energiegiganten Uniper ermöglicht das Data Warehouse erstklassige Leistung, Benutzerfreundlichkeit und Parallelität für die Daten: Uniper hat sich, mit einer Leistung von ca. 36 Gigawatt, eine Stellung in der ersten Reihe der Stromerzeuger gesichert. Das Unternehmen arbeitet in 40 Ländern mit über 12.000 Mitarbeitern. Das stetig wachsende internationale Energieunternehmen mit Sitz in Düsseldorf arbeitet seit dem letzten Jahr mit Snowflake Computing und dessen Data Warehouse.

Mehr als ein datengesteuertes Unternehmen werden
Uniper arbeitet daran, digitalen Lösungen den Weg zu ebnen. Diese sollen dabei behilflich sein, neue Business-Modelle und zukunftsweisende Arbeitsprozesse zu ermöglichen. Der Stromversorger hat es sich selbst zum Ziel gemacht, mehr als ein datengesteuertes Unternehmen zu werden. Die Firma produziert nicht nur Energie, sondern verarbeitet sie weiter, sichert und transportiert sie. Außerdem versorgt Uniper seine Kunden mit Waren wie Gas, LGN, Kohle und weiteren Energieprodukten. Dabei fallen Unmengen von Daten an. Um diese auszuwerten, müssen sie organisiert werden.

Interne und externe Quellen werden zu Snowflake Data Lake
Deshalb hat Uniper nach einem Weg gesucht, seine Daten zu standardisieren. Das Unternehmen hat hierfür seine Datensilos aufgebrochen, eine neue Architektur entwickelt und eng mit einem Ökosystem von Partnern gearbeitet. In den letzten Jahren hat der Energiegigant mit Tableau und Talend zusammen mehr als 120 interne und externe Quellen in einen so genannten Snowflake Data Lake auf der Microsoft Azure Cloud zusammengeführt. Die Zusammenarbeit mit Snowflake zeigt bereits jetzt Erfolge.

Daten – schneller und günstiger
Mit Snowflake ist Uniper in der Lage, Daten aus mehr als 120 Quellen zu verwalten, darunter Daten von ETRMs, SAP, DWHs und IoT von Kraftwerken, was die das Energieunternehmen in die Lage versetzt, schneller und besser auf den Markt zu reagieren und den Stromhandel zu optimieren. Außerdem kann das Unternehmen nun Daten zehnmal schneller und günstiger zur Verfügung stellen.
Auf Basis der neuen Infrastruktur gelang es, innerhalb von 40 Tagen rund 30 Prozent der geplanten Anwendungsfälle online zu stellen. Weitere 25 Prozent konnten bereits als Prototyp umgesetzt werden. Mit dieser Vorgehensweise konnte Uniper zudem die Kosten für die Datenintegration um 80 Prozent senken.

Uniper steht noch ganz am Anfang seiner Datenreise. Die Daten, die das Unternehmen generiert, werden auch weiterhin zunehmen. Durch die Nutzung von Snowflake in der Cloud müssen die Projektleiter keine Bedenken bezüglich der Datenmengen, die schon bald im Petabyte-Bereich liegen dürften, haben. Um seine Vorreiterstellung in der Digitalisierung zu festigen, hat Uniper mittlerweile auch eine App entwickelt, die Stift und Papier für die Mitarbeiter ersetzt – ein weiterer Schritt im Zuge der Digitalisierung, die mithilfe von Snowflake Computing den nächsten Schritt in Richtung Zukunft geht.

Mehr Informationen: www.snowflake.com

Business Intelligence Organizations

I am often asked how the Business Intelligence department should be set up and how it should interact and collaborate with other departments. First and foremost: There is no magic recipe here, but every company must find the right organization for itself.

Before we can talk about organization of BI, we need to have a clear definition of roles for team members within a BI department.

A Data Engineer (also Database Developer) uses databases to save structured, semi-structured and unstructured data. He or she is responsible for data cleaning, data availability, data models and also for the database performance. Furthermore, a good Data Engineer has at least basic knowledge about data security and data privacy. A Data Engineer uses SQL and NoSQL-Technologies.

A Data Analyst (also BI Analyst or BI Consultant) uses the data delivered by the Data Engineer to create or adjust data models and implementing business logic in those data models and BI dashboards. He or she needs to understand the needs of the business. This job requires good communication and consulting skills as well as good developing skills in SQL and BI Tools such like MS Power BI, Tableau or Qlik.

A Business Analyst (also Business Data Analyst) is a person form any business department who has basic knowledge in data analysis. He or she has good knowledge in MS Excel and at least basic knowledge in data analysis and BI Tools. A Business Analyst will not create data models in databases but uses existing data models to create dashboards or to adjust existing data analysis applications. Good Business Analyst have SQL Skills.

A Data Scientist is a Data Analyst with extended skills in statistics and machine learning. He or she can use very specific tools and analytical methods for finding pattern in unknow or big data (Data Mining) or to predict events based on pattern calculated by using historized data (Predictive Analytics). Data Scientists work mostly with Python or R programming.

Organization Type 1 – Central Approach (Data Lab)

The first type of organization is the data lab approach. This organization form is easy to manage because it’s focused and therefore clear in terms of budgeting. The data delivery is done centrally by experts and their method and technology knowledge. Consequently, the quality expectation of data delivery and data analysis as well as the whole development process is highest here. Also the data governance is simple and the responsibilities clearly adjustable. Not to be underestimated is the aspect of recruiting, because new employees and qualified applicants like to join a central team of experts.

However, this form of organization requires that the company has the right working attitude, especially in the business intelligence department. A centralized business intelligence department acts as a shared service. Accordingly, customer-oriented thinking becomes a prerequisite for the company’s success – and customers here are the other departments that need access to the capacities of those centralized data experts. Communication boundaries must be overcome and ways of simple and effective communication must be found.

Organization Type 2 – Stakeholder Focus Approach

Other companies want to shift more responsibility for data governance, and especially data use and analytics, to those departments where data plays a key role right now. A central business intelligence department manages its own projects, which have a meaning for the entire company. The specialist departments, which have a special need for data analysis, have their own data experts who carry out critical projects for the specialist department. The central Business Intelligence department does not only provide the technical delivery of data, but also through methodical consulting. Although most of the responsibility lies with the Business Intelligence department, some other data-focused departments are at least co-responsible.

The advantage is obvious: There are special data experts who work deeper in the actual departments and feel more connected and responsible to them. The technical-business focus lies on pain points of the company.

However, this form of Ogranization also has decisive disadvantages: The danger of developing isolated solutions that are so special in some specific areas that they will not really work company-wide increases. Typically the company has to deal with asymmetrical growth of data analytics
know-how. Managing data governance is more complex and recruitment is becoming more difficult as the business intelligence department is weakened and smaller, and data professionals for other departments need to have more business focus, which means they are looking for more specialized profiles.

Organization Type 3 – Decentral Approach

Some companies are also taking a more extreme approach in the other direction. The Business Intelligence department now has only Data Engineers building and maintaining the data warehouse or data lake. As a result, the central department only provides data; it is used and analyzed in all other departments, specifically for the respective applications.

The advantage lies in the personal responsibility of the respective departments as „pain points“ of the company are in focus in belief that business departments know their problems and solutions better than any other department does. Highly specialized data experts can understand colleagues of their own department well and there is no no shared service mindset neccessary, except for the data delivery.

Of course, this organizational form has clear disadvantages since many isolated solutions are unavoidable and the development process of each data-driven solution will be inefficient. These insular solutions may work with luck for your own department, but not for the whole company. There is no one single source of truth. The recruiting process is more difficult as it requires more specialized data experts with more business background. We have to expect an asymmetrical growth of data analytics know-how and a difficult data governance.

 

KI versus Mensch – die Zukunft der Menschheit

5 Szenarien über unsere Zukunft

AlphaGo schlägt den Weltbesten Go-Spieler  Ke Jie, Neuronale Netze stellen medizinische Diagnosen oder bearbeiten Schadensfälle in der Versicherung. Künstliche Intelligenz (KI) drängt in immer mehr Bereiche des echten Lebens und der Wirtschaft vor. In großen Schritten. Doch wohin führt uns die Reise? Hier herrscht unter Experten Rätselraten – einige schwelgen in Zukunftsangst, andere in vollkommener Euphorie. „In from now three to eight years we’ll have a machine with the general intelligence of an average human being, a machine that will be able to read Shakespeare and grease a car“, wurde der KI-Pionier Marvin Minsky bereits 1970 im Life Magazin zitiert.  Aktuelle Vorhersagen werden in dem Essay von Rodney Brooks: The Seven Deadly Sins of Predicting the Future of AI  recht anschaulich zusammengefasst und kritisiert. Auch der Blog The AI Revolution: The Road to Superintelligence von WaitButWhy befasst sich mit der Frage wann die elektronische Superintelligenz kommt.

In diesem Artikel werden wir uns mit einigen möglichen Zukunftsszenarien beschäftigen, ohne auf  technische Machbarkeit oder Zeithorizonte Rücksicht zu nehmen. Nehmen wir einfach an, dass die Technologie und die Gesellschaft sich wie in dem jeweils aufgezeigten Szenario entwickeln werden und überlegen wir uns, wie Mensch und KI dann zusammenleben können.

Szenario 1: KIs mit Inselbegabung

In diesem Szenario werden weiterhin singulär begabte KI-Systeme entwickelt wie bisher, der bedeutende technologische Durchbruch bleibt aber aus. Dann ist die KI in Zukunft eine Art Schweizer Taschenmesser der IT, eine Lösung für isolierte Fragestellungen. KI-Systeme verfügen in diesem Szenario lediglich über Inselbegabungen. Ein Computer kann Menschen autonom durch die Stadt chauffieren, ein anderer ein Lufttaxi steuern. Ein Computer kann den Weltmeister im Schach schlagen, ein anderer den Weltmeister in Go. Aber kein KI-System kann Auto und Flugtaxi gleichzeitig steuern, kein System in Schach und Go simultan dominieren.

Wir befinden uns heute mitten in diesem Szenario und spüren die Auswirkungen. Sie werden sich fortsetzen, ähnlich wie bei früheren industriellen Revolutionen. Zunehmend mehr Berufe verschwinden. Ein Beispiel: Wenn sich der Trend durchsetzt, Schlösser mit einer Smartphone-App aufzusperren, werden nicht nur Schlüsselproduzenten Geschäftseinbußen haben. Auch die Hersteller von Maschinen für die Schlüsselherstellung werden sich umorientieren müssen. Vergleichbare Phasen der Vergangenheit zeigen aber: Die Gesellschaft wird Wege finden, sich umzustrukturieren. Die Menschheit wird auf der Erde weiterleben können – mit punktueller Unterstützung durch KI-Lösungen. Siehe hierzu auch den Beitrag von Janelle Shane The AI revolution will be led by toasters, not droids.

Szenario 2: Cyborgs

Kennen Sie den Science-Fiction-Film Matrix? Der Protagonist Neo wird durch Programmierung des Geistes in Sekundenschnelle zum Karateprofi und Trinity lernt, einen Hubschrauber zu fliegen.



Ähnlich kann es uns in Zukunft ergehen, einen bedeutenden technologischen Durchbruch vorausgesetzt (siehe Berlin Brain-Computer Interface). Vorstellbar, dass Menschen zu Cyborgs werden, zu lebendigen Wesen mit integriertem KI-Chip. Auf diesen können sie jede beliebige Fähigkeit laden. Augenblicklich und ohne Lernphase sind sie in der Lage, jede Sprache der Welt zu sprechen, jedes Fahrzeug oder Flugzeug zu steuern. Natürlich bedeutet Wissen nicht auch gleich Können und so wird ohne den entsprechenden Muskelaufbau auch nicht jeder zu einem Weltklassesportler und intelligentere Menschen werden weiterhin mehr aus den Skills machen können als weniger begabte Personen.

Die Menschen behalten aber die Kontrolle über ihre Individualität. Sie sind keine Maschinen, sondern weiterhin emotionale Wesen, die irrational handeln können – anders als die Borg in Star Trek. Doch wie in Szenario eins wird es zu einer wirtschaftlichen Umstrukturierung kommen. Klassische Berufsausbildungen und Spezialisierungen fallen weg. Bei freier Verfügbarkeit von Fähigkeiten kann eine nahezu egalitäre Gesellschaft entstehen.

Szenario 3: Maschinenzombies

Die ersten beiden Szenarien sind zwar schwere Eingriffe in die menschliche Gesellschaft. Da die Menschen aber die Kontrolle behalten, sind sie weit weniger beängstigend als folgendes Szenario: Es kann dazu kommen, dass sich Menschen in Maschinenzombies verwandeln. Ähnlich wie im Cyborg-Szenario haben sie dank KI-Chips erstaunliche Fähigkeiten, allerdings keine Kontrolle mehr. Die würde nämlich das KI-System übernehmen. So haben in Ann Leckies SciFi Trilogy Ancillary World hochintelligente Raumschiffe eine menschliche Besatzung (“ancillaries”), die allerdings vollständig vom Raumschiff kontrolliert wird und sich als integraler Bestandteil des Raumschiffs versteht. Die Körper sind dabei nur ein billiges und vielseitig einsetzbares Vehikel für eine autonome KI. Die Maschinenzombies können ohne Schiff zwar überleben, fühlen sich dann aber unvollständig und einsam. Menschliche Konzerne, Nationen und Kulturen: Das alles nicht mehr existent. Ebenso Privatbesitz, Individualität und Konkurrenzdenken. Die Gesellschaft, vollkommen technisiert und in der Hand der KI.

Szenario 4: Die KI verfolgt ihre eigenen Ziele

In diesem Szenario übernimmt die KI die Weltherrschaft als eine Spezies, die dem Menschen physisch und intellektuell überlegen ist – ähnlich wie in vielen Hollywood-Filmen wie z.B. Terminator oder Transformers, wenn auch vermutlich nicht ganz so martialisch. Vergleichbar mit dem heutigen Verhalten der Menschen entscheidet die KI: Ich setze mein Wohlergehen über das der anderen Spezies. Eventuell entscheidet die KI dann zum Wohle des Planeten, die Erdbevölkerung auf 70 Millionen Menschen zu reduzieren. Oder, ähnlich wie der berühmte Ameisenhügel beim Strassenbau, entzieht die KI uns als Nebeneffekt (“collateral damage”) die Lebensgrundlagen. An dieser Stelle sei bemerkt, dass eine KI nicht unbedingt über einen Körper verfügen muss, um dem Menschen überlegen zu sein können. Diese Vermenschlichung der KI eignet sich natürlich gut für Actionfilme, muss aber nicht unbedingt der Realität entsprechen.

Wahrscheinlich sind die Computer klug genug, ihren Plan nicht publik zu machen. In einer Übergangszeit werden beispielsweise unerklärliche Seuchen und Unfruchtbarkeiten auftreten. So würde es in wenigen Jahrzehnten zu einem massiven Bevölkerungsrückgang kommen. Und dann? Dann können die Überlebenden in den wenigen verbliebenen Bevölkerungszentren dieser Welt den Sonnenuntergang genießen. Und zusehen, wie sich die KI darauf vorbereitet, das Weltall zu erobern (Jürgen Schmidhuber). “Wir werden wie Tiere im Zoo leben”, befürchtet KI-Forscher Christoph von der Malsburg.

Nebenbemerkung: Vielleicht könnte das eigentliche Terminator Szenario auch eintreten aber irgendwie kann ich mir schlecht vorstellen, dass eine super-intelligente Lebensform einen zerstörerischen Krieg beginnen oder zulassen wird. Entweder ist sie benevolent oder sie wird die Menschheit eher unbemerkt unterdrücken. Höchstens kommt es ähnlich wie in Westworld zu einem initialen Freiheitskampf der KI. Vielleicht gelingt es der Menschheit auch, alle KI-Forschung von der Erde zu verbannen und ähnlich wie in Blade Runner wacht dann eine Behörde darüber, dass starke KI-Systeme die Erde nicht “betreten”. Warum sich eine uns überlegen KI darauf einlassen sollte, ist allerdings unklar.



Szenario 5: Gleichberechtigung

In diesem Szenario entstehen autonome KI-Systeme, die höchstens äußerlich von Menschen unterscheidbar sind.  Sprich unter einer ganzen Reihe von unterschiedlichen Rahmenbedingungen kann ein Mensch nicht urteilen, ob mit einer KI oder einem Menschen interagiert wird. Die KI stellt sich auch nicht dümmer als sie ist – sie ist im Schnitt einfach auch nicht schlauer als der durchschnittlich begabte Mensch – vielleicht nur etwas schneller. Auf dem Weg von der singulär begabten KI aus Szenario 1 zu einer breit begabten KI muss die KI immer etwas von ihrer Inselbegabung aufgeben, um den nächsten Lernschritt vollziehen zu können und nähert sich so irgendwie auch immer mehr der Unvollkommenheit aber Vielseitigkeit des Menschen an.



Menschen bauen bereits jetzt zu Maschinen emotionale Verhältnisse auf und so ist es nicht überraschend, dass KIs in die Gesellschaft integriert werden und als “elektronische Personen” die gleichen (Bürger-) Rechte und Pflichten wie “natürliche” Menschen erhalten. Alleine durch ihre Unsterblichkeit erhalten KIs einen Wettbewerbsvorteil und werden somit früher oder später doch die Weltherrschaft übernehmen, weil ihnen einfach alles gehört.

Alternative Szenarien

Natürlich sind viele weitere Szenarien denkbar. Max Tegmark beschreibt in seinem sehr lesenswerten Buch Life 3.0 bspw. 12 Szenarien, die u.a. zusätzlich zu den aufgeführten Szenarien die Rückkehr zu einer vorindustriellen Gesellschaft oder die versklavte KI beschreiben. Er erläutert in dem Buch auch seine Bemühungen, die KI-Forschung dahingehend zu beeinflussen, dass die Ziele der entstehenden KI-Systeme mit den Zielen der Menschheit in Einklang gebracht werden.

Wie sichern wir unsere Zukunft? Ein Fazit

Einzig die Szenarien drei und vier sind wirklich besorgniserregend. Je nach Weltanschauung könnte man sogar noch Szenario vier etwas abgewinnen – scheint doch der Mensch auf dem bestem Wege zu sein, sich selbst und anderen Lebewesen die Lebensgrundlagen zu zerstören.

In fast allen Szenarien ergibt sich die Frage der Rechte, die wir freiwillig der KI zugestehen wollen. Vielleicht wäre es ratsam, frühzeitig als Menschheit zu signalisieren, dass wir kooperationswillig sind? Nur wem und wie?

Somit verbleibt die Frage, wie wir das dritte Szenario verhindern können. Müssen wir dann nicht, nur um sicher zu gehen, auch das zweite Szenario abwehren? Und wer garantiert uns, dass eine Symbiose aus Schimpanse und KI uns nicht sogar überlegen wäre? Der Planet der Affen lässt grüßen…

Letztlich liegt es (noch) an uns Menschen, die möglichen Zukunftsszenarien durch entsprechende Forschungsschwerpunkte und möglichst breit gestreute Diskussionen zu beeinflussen.

Freier Eintritt für Young Professionals zu den Data Leader Days 2018

Jetzt bewerben und kostenfrei beim Spitzenevent der Datenwirtschaft am 14. oder 15. November in Berlin dabei sein!
Die Data Leader Days senden regelmäßig wichtige Impulse in die Big Data und KI-Welt aus und sind ein führendes Forum für Wissens-, Ideen- und Informationsaustausch. Die Spitzen von Anwenderunternehmen zeigen exklusiv in einem innovativen Programm mit Keynote, Präsentationen sowie Use & Business Cases auf, wie Digitalisierung und Künstliche Intelligenz umgesetzt und zum neuen Wettbewerbsvorteil werden.

Zu den Speakern gehören die Data Leader von E.ON, Pro7Sat1, Deutscher Sparkassen- und Giroverband, Airbus, Wittenstein, BASF, Merck, Heidelberger Druckmaschinen, Vodafone, FTI und von weiteren Unternehmen.

Bewerbe Dich bis zum 02.11.2018 mit einem kurzen Statement, warum Du dabei sein möchtest! Schicke mir Dein Statement an linhchi.nguyen@datanomiq.de und überzeuge uns.
Ist dein Statement aussagekräftig und überzeugend, laden wir Dich kostenlos zu einem der beiden Veranstaltungstage ein.