Was ist eine Vektor-Datenbank? Und warum spielt sie für AI eine so große Rolle?

Wie können Unternehmen und andere Organisationen sicherstellen, dass kein Wissen verloren geht? Intranet, ERP, CRM, DMS oder letztendlich einfach Datenbanken mögen die erste Antwort darauf sein. Doch Datenbanken sind nicht gleich Datenbanken, ganz besonders, da operative IT-Systeme meistens auf relationalen Datenbanken aufsetzen. In diesen geht nur leider dann doch irgendwann das Wissen verloren… Und das auch dann, wenn es nie aus ihnen herausgelöscht wird!

Die meisten Datenbanken sind darauf ausgelegt, Daten zu speichern und wieder abrufbar zu machen. Neben den relationalen Datenbanken (SQL) gibt es auch die NoSQL-Datenbanken wie den Key-Value-Store, Dokumenten- und Graph-Datenbanken mit recht speziellen Anwendungsgebieten. Vektor-Datenbanken sind ein weiterer Typ von Datenbank, die unter Einsatz von AI (Deep Learning, n-grams, …) Wissen in Vektoren übersetzen und damit vergleichbarer und wieder auffindbarer machen. Diese Funktion der Datenbank spielt seinen Vorteil insbesondere bei vielen Dimensionen aus, wie sie Text- und Bild-Daten haben.

Databases Types: Vector Database, Graph Database, Key-Value-Database, Document Database, Relational Database with Row or Column oriented table structures

Datenbank-Typen in grobkörniger Darstellung. Es gibt in der Realität jedoch viele Feinheiten, Übergänge und Überbrückungen zwischen den Datenbanktypen, z. B. zwischen emulierter und nativer Graph-Datenbank. Manche Dokumenten- Vektor-Datenbanken können auch relationale Datenmodellierung. Und eigentlich relationale Datenbanken wie z. B. PostgreSQL können mit Zusatzmodulen auch Vektoren verarbeiten.

Vektor-Datenbanken speichern Daten grundsätzlich nicht relational oder in einer anderen Form menschlich konstruierter Verbindungen. Dennoch sichert die Datenbank gewissermaßen Verbindungen indirekt, die von Menschen jedoch – in einem hochdimensionalen Raum – nicht mehr hergeleitet werden können und sich auf bestimmte Kontexte beziehen, die sich aus den Daten selbst ergeben. Maschinelles Lernen kommt mit der nummerischen Auflösung von Text- und Bild-Daten (und natürlich auch bei ganz anderen Daten, z. B. Sound) am besten zurecht und genau dafür sind Vektor-Datenbanken unschlagbar.

Was ist eine Vektor-Datenbank?

Eine Vektordatenbank speichert Vektoren neben den traditionellen Datenformaten (Annotation) ab. Ein Vektor ist eine mathematische Struktur, ein Element in einem Vektorraum, der eine Reihe von Dimensionen hat (oder zumindest dann interessant wird, genaugenommen starten wir beim Null-Vektor). Jede Dimension in einem Vektor repräsentiert eine Art von Information oder Merkmal. Ein gutes Beispiel ist ein Vektor, der ein Bild repräsentiert: jede Dimension könnte die Intensität eines bestimmten Pixels in dem Bild repräsentieren.
Auf dieseVektor Datenbank Illustration (vereinfacht, symbolisch) Weise kann eine ganze Sammlung von Bildern als eine Sammlung von Vektoren dargestellt werden. Noch gängiger jedoch sind Vektorräume, die Texte z. B. über die Häufigkeit des Auftretens von Textbausteinen (Wörter, Silben, Buchstaben) in sich einbetten (Embeddings). Embeddings sind folglich Vektoren, die durch die Projektion des Textes auf einen Vektorraum entstehen.

Vektor-Datenbanken sind besonders nützlich, wenn man Ähnlichkeiten zwischen Vektoren finden muss, z. B. ähnliche Bilder in einer Sammlung oder die Wörter “Hund” und “Katze”, die zwar in ihren Buchstaben keine Ähnlichkeit haben, jedoch in ihrem Kontext als Haustiere. Mit Vektor-Algorithmen können diese Ähnlichkeiten schnell und effizient aufgespürt werden, was sich mit traditionellen relationalen Datenbanken sehr viel schwieriger und vor allem ineffizienter darstellt.

Vektordatenbanken können auch hochdimensionale Daten effizient verarbeiten, was in vielen modernen Anwendungen, wie zum Beispiel Deep Learning, wichtig ist. Einige Beispiele für Vektordatenbanken sind Elasticsearch / Vector Search, Weaviate, Faiss von Facebook und Annoy von Spotify.

Viele Lernalgorithmen des maschinellen Lernens basieren auf Vektor-basierter Ähnlichkeitsmessung, z. B. der k-Nächste-Nachbarn-Prädiktionsalgorithmus (Regression/Klassifikation) oder K-Means-Clustering. Die Ähnlichkeitsbetrachtung erfolgt mit Distanzmessung im Vektorraum. Die dafür bekannteste Methode, die Euklidische Distanz zwischen zwei Punkten, basiert auf dem Satz des Pythagoras (Hypotenuse ist gleich der Quadratwurzel aus den beiden Dimensions-Katheten im Quadrat, im zwei-dimensionalen Raum). Es kann jedoch sinnvoll sein, aus Gründen der Effizienz oder besserer Konvergenz des maschinellen Lernens andere als die Euklidische Distanz in Betracht zu ziehen.

Vectore-based distance measuring methods: Euclidean Distance L2-Norm, Manhatten Distance L1-Norm, Chebyshev Distance and Cosine Distance

Vectore-based distance measuring methods: Euclidean Distance L2-Norm, Manhatten Distance L1-Norm, Chebyshev Distance and Cosine Distance

Vektor-Datenbanken für Deep Learning

Der Aufbau von künstlichen Neuronalen Netzen im Deep Learning sieht nicht vor, dass ganze Sätze in ihren textlichen Bestandteilen in das jeweilige Netz eingelesen werden, denn sie funktionieren am besten mit rein nummerischen Input. Die Texte müssen in diese transformiert werden, eventuell auch nach diesen in Cluster eingeteilt und für verschiedene Trainingsszenarien separiert werden.

Vektordatenbanken werden für die Datenvorbereitung (Annotation) und als Trainingsdatenbank für Deep Learning zur effizienten Speicherung, Organisation und Manipulation der Texte genutzt. Für Natural Language Processing (NLP) benötigen Modelle des Deep Learnings die zuvor genannten Word Embedding, also hochdimensionale Vektoren, die Informationen über Worte, Sätze oder Dokumente repräsentieren. Nur eine Vektordatenbank macht diese effizient abrufbar.

Vektor-Datenbank und Large Language Modells (LLM)

Ohne Vektor-Datenbanken wären die Erfolge von OpenAI und anderen Anbietern von LLMs nicht möglich geworden. Aber fernab der Entwicklung in San Francisco kann jedes Unternehmen unter Einsatz von Vektor-Datenbanken und den APIs von Google, OpenAI / Microsoft oder mit echten Open Source LLMs (Self-Hosting) ein wahres Orakel über die eigenen Unternehmensdaten herstellen. Dazu werden über APIs die Embedding-Engines z. B. von OpenAI genutzt. Wir von DATANOMIQ nutzen diese Architektur, um Unternehmen und andere Organisationen dazu zu befähigen, dass kein Wissen mehr verloren geht.
Vektor-Datenbank für KI-Applikation (z. B. OpenAI ChatGPT)

Mit der DATANOMIQ Enterprise AI Architektur, die auf jeder Cloud ausrollfähig ist, verfügen Unternehmen über einen intelligenten Unternehmens-Repräsentanten als KI, der für Mitarbeiter relevante Dokumente und Antworten auf Fragen liefert. Sollte irgendein Mitarbeiter im Unternehmen bereits einen bestimmten Vorgang, Vorfall oder z. B. eine technische Konstruktion oder einen rechtlichen Vertrag bearbeitet haben, der einem aktuellen Fall ähnlich ist, wird die AI dies aufspüren und sinnvollen Kontext, Querverweise oder Vorschläge oder lückenauffüllende Daten liefern.

Die AI lernt permanent mit, Unternehmenswissen geht nicht verloren. Das ist Wissensmanagement auf einem neuen Level, dank Vektor-Datenbanken und KI.

Big Data – Das Versprechen wurde eingelöst

Big Data tauchte als Buzzword meiner Recherche nach erstmals um das Jahr 2011 relevant in den Medien auf. Big Data wurde zum Business-Sprech der darauffolgenden Jahre. In der Parallelwelt der ITler wurde das Tool und Ökosystem Apache Hadoop quasi mit Big Data beinahe synonym gesetzt. Der Guardian verlieh Apache Hadoop mit seinem Konzept des Distributed Computing mit MapReduce im März 2011 bei den MediaGuardian Innovation Awards die Auszeichnung “Innovator of the Year”. Im Jahr 2015 erlebte der Begriff Big Data in der allgemeinen Geschäftswelt seine Euphorie-Phase mit vielen Konferenzen und Vorträgen weltweit, die sich mit dem Thema auseinandersetzten. Dann etwa im Jahr 2018 flachte der Hype um Big Data wieder ab, die Euphorie änderte sich in eine Ernüchterung, zumindest für den deutschen Mittelstand. Die große Verarbeitung von Datenmassen fand nur in ganz bestimmten Bereichen statt, die US-amerikanischen Tech-Riesen wie Google oder Facebook hingegen wurden zu Daten-Monopolisten erklärt, denen niemand das Wasser reichen könne. Big Data wurde für viele Unternehmen der traditionellen Industrie zur Enttäuschung, zum falschen Versprechen.

Von Big Data über Data Science zu AI

Einer der Gründe, warum Big Data insbesondere nach der Euphorie wieder aus der Diskussion verschwand, war der Leitspruch “Shit in, shit out” und die Kernaussage, dass Daten in großen Mengen nicht viel wert seien, wenn die Datenqualität nicht stimme. Datenqualität hingegen, wurde zum wichtigen Faktor jeder Unternehmensbewertung, was Themen wie Reporting, Data Governance und schließlich dann das Data Engineering mehr noch anschob als die Data Science.

Google Trends - Big Data (blue), Data Science (red), Business Intelligence (yellow) und Process Mining (green).

Google Trends – Big Data (blue), Data Science (red), Business Intelligence (yellow) und Process Mining (green). Quelle: https://trends.google.de/trends/explore?date=2011-03-01%202023-01-03&geo=DE&q=big%20data,data%20science,Business%20Intelligence,Process%20Mining&hl=de

Small Data wurde zum Fokus für die deutsche Industrie, denn “Big Data is messy!”1 und galt als nur schwer und teuer zu verarbeiten. Cloud Computing, erst mit den Infrastructure as a Service (IaaS) Angeboten von Amazon, Microsoft und Google, wurde zum Enabler für schnelle, flexible Big Data Architekturen. Zwischenzeitlich wurde die Business Intelligence mit Tools wie Qlik Sense, Tableau, Power BI und Looker (und vielen anderen) weiter im Markt ausgebaut, die recht neue Disziplin Process Mining (vor allem durch das deutsche Unicorn Celonis) etabliert und Data Science schloss als Hype nahtlos an Big Data etwa ab 2017 an, wurde dann ungefähr im Jahr 2021 von AI als Hype ersetzt. Von Data Science spricht auf Konferenzen heute kaum noch jemand und wurde hype-technisch komplett durch Machine Learning bzw. Artificial Intelligence (AI) ersetzt. AI wiederum scheint spätestens mit ChatGPT 2022/2023 eine neue Euphorie-Phase erreicht zu haben, mit noch ungewissem Ausgang.

Big Data Analytics erreicht die nötige Reife

Der Begriff Big Data war schon immer etwas schwammig und wurde von vielen Unternehmen und Experten schnell auch im Kontext kleinerer Datenmengen verwendet.2 Denn heute spielt die Definition darüber, was Big Data eigentlich genau ist, wirklich keine Rolle mehr. Alle zuvor genannten Hypes sind selbst Erben des Hypes um Big Data.

Während vor Jahren noch kleine Datenanalysen reichen mussten, können heute dank Data Lakes oder gar Data Lakehouse Architekturen, auf Apache Spark (dem quasi-Nachfolger von Hadoop) basierende Datenbank- und Analysesysteme, strukturierte Datentabellen über semi-strukturierte bis komplett unstrukturierte Daten umfassend und versioniert gespeichert, fusioniert, verknüpft und ausgewertet werden. Das funktioniert heute problemlos in der Cloud, notfalls jedoch auch in einem eigenen Rechenzentrum On-Premise. Während in der Anfangszeit Apache Spark noch selbst auf einem Hardware-Cluster aufgesetzt werden musste, kommen heute eher die managed Cloud-Varianten wie Microsoft Azure Synapse oder die agnostische Alternative Databricks zum Einsatz, die auf Spark aufbauen.

Die vollautomatisierte Analyse von textlicher Sprache, von Fotos oder Videomaterial war 2015 noch Nische, gehört heute jedoch zum Alltag hinzu. Während 2015 noch von neuen Geschäftsmodellen mit Big Data geträumt wurde, sind Data as a Service und AI as a Service heute längst Realität!

ChatGPT und GPT 4 sind King of Big Data

ChatGPT erschien Ende 2022 und war prinzipiell nichts Neues, keine neue Invention (Erfindung), jedoch eine große Innovation (Marktdurchdringung), die großes öffentliches Interesse vor allem auch deswegen erhielt, weil es als kostenloses Angebot für einen eigentlich sehr kostenintensiven Service veröffentlicht und für jeden erreichbar wurde. ChatGPT basiert auf GPT-3, die dritte Version des Generative Pre-Trained Transformer Modells. Transformer sind neuronale Netze, sie ihre Input-Parameter nicht nur zu Klasseneinschätzungen verdichten (z. B. ein Bild zeigt einen Hund, eine Katze oder eine andere Klasse), sondern wieder selbst Daten in ähnliche Gestalt und Größe erstellen. So wird aus einem gegeben Bild ein neues Bild, aus einem gegeben Text, ein neuer Text oder eine sinnvolle Ergänzung (Antwort) des Textes. GPT-3 ist jedoch noch komplizierter, basiert nicht nur auf Supervised Deep Learning, sondern auch auf Reinforcement Learning.
GPT-3 wurde mit mehr als 100 Milliarden Wörter trainiert, das parametrisierte Machine Learning Modell selbst wiegt 800 GB (quasi nur die Neuronen!)3.

ChatGPT basiert auf GPT3.5 und wurde in 3 Schritten trainiert. Neben Supervised Learning kam auch Reinforcement Learning zum Einsatz.

ChatGPT basiert auf GPT-3.5 und wurde in 3 Schritten trainiert. Neben Supervised Learning kam auch Reinforcement Learning zum Einsatz. Quelle: openai.com

GPT-3 von openai.com war 2021 mit 175 Milliarden Parametern das weltweit größte Neuronale Netz der Welt.4 

Größenvergleich: Parameteranzahl GPT-3 vs GPT-4

Größenvergleich: Parameteranzahl GPT-3 vs GPT-4 Quelle: openai.com

Der davor existierende Platzhirsch unter den Modellen kam von Microsoft mit “nur” 10 Milliarden Parametern und damit um den Faktor 17 kleiner. Das nun neue Modell GPT-4 ist mit 100 Billionen Parametern nochmal 570 mal so “groß” wie GPT-3. Dies bedeutet keinesfalls, dass GPT-4 entsprechend 570 mal so fähig sein wird wie GPT-3, jedoch wird der Faktor immer noch deutlich und spürbar sein und sicher eine Erweiterung der Fähigkeiten bedeuten.

Was Big Data & Analytics heute für Unternehmen erreicht

Auf Big Data basierende Systeme wie ChatGPT sollte es – der zuvor genannten Logik folgend – jedoch eigentlich gar nicht geben dürfen, denn die rohen Datenmassen, die für das Training verwendet wurden, konnten nicht im Detail auf ihre Qualität überprüft werden. Zum Einen mittelt die Masse an Daten die in ihnen zu findenden Fehler weitgehend raus, zum Anderen filtert Deep Learning selbst relevante Muster und unliebsame Ausreißer aus den Datenmassen heraus. Neuronale Netze, der Kern des Deep Learning, können durchaus als große Filter verstanden und erklärt werden.

Davon abgesehen, dass die neuen ChatBot-APIs von den Cloud-Providern Microsoft, Google und auch Amazon genutzt werden können, um Arbeitsprozesse und Kommunikation zu automatisieren, wird Big Data heute in vielen Unternehmen dazu eingesetzt, um Unternehmens-/Finanzkennzahlen auszuwerten und vorherzusagen, um Produktionsqualität zu überwachen, um Maschinen-Sensordaten mit den Geschäftsdaten aus ERP-, MES- und CRM-Systemen zu verheiraten, um operative Prozesse über mehrere IT-Systeme hinweg zu rekonstruieren und auf Schwachstellen hin zu untersuchen und um Schlussendlich auch den weiteren Datenhunger zu stillen, z. B. über Text-Extraktion aus Webseiten (Intelligence Gathering), die mit NLP und Computer Vision mächtiger wird als je zuvor.

Big Data hält sein Versprechen dank AI

Die frühere Enttäuschung aus Big Data resultierte aus dem fehlenden Vermittler zwischen Big Data (passive Daten) und den Applikationen (z. B. Industrie 4.0). Dieser Vermittler ist der aktive Part, die AI und weiterführende Datenverarbeitung (z. B. Lakehousing) und Analysemethodik (z. B. Process Mining). Davon abgesehen, dass mit AI über Big Data bereits in Medizin und im Verkehrswesen Menschenleben gerettet wurden, ist Big Data & AI längst auch in gewöhnlichen Unternehmen angekommen. Big Data hält sein Versprechen für Unternehmen doch noch ein und revolutioniert Geschäftsmodelle und Geschäftsprozesse, sichert so Wettbewerbsfähigkeit. Zumindest, wenn Unternehmen sich auf diesen Weg tatsächlich einlassen.

Quellen:

  1. Edd Dumbill: What is big data? An introduction to the big data landscape. (Memento vom 23. April 2014 im Internet Archive) auf: strata.oreilly.com.
  2. Fergus Gloster: Von Big Data reden aber Small Data meinen. Computerwoche, 1. Oktober 2014
  3. Bussler, Frederik (July 21, 2020). “Will GPT-3 Kill Coding?”. Towards Data Science. Retrieved August 1, 2020.2022
  4. developer.nvidia.com, 1. Oktober 2014
Interview Benjamin Aunkofer - Datenstrategien und Data Teams entwickeln!

Interview – Datenstrategie und Data Teams entwickeln!

Das Format Business Talk am Kudamm in Berlin führte ein drittes Interview mit Benjamin Aunkofer zum Thema “Datenstrategie und Data Team Organisation”.

In dem Interview erklärt Benjamin Aunkofer, was Unternehmen Datenstrategien entwickeln, um Ihren Herausforderungen gerecht zu werden. Außerdem gibt er Tipps, wie Unternehmen ein fähiges Data Team aufbauen, qualifizieren und halten.

Nachfolgend das Interview auf Youtube sowie die schriftliche Form zum Nachlesen:


Interview – Datenstrategien und Aufbau von Data Teams

  1. Herr Aunkofer, Sie unterstützen Unternehmen u.a. bei der Entwicklung von Datenstrategien und dem Aufbau von Data Teams. Was genau ist denn eine Datenstrategie?Eine Datenstrategie ist eine Strategie über die Nutzung von Daten zur Geschäftsoptimierung. Man kann auch sagen: Eine Datenstrategie ist ein Business Plan darüber, wie Daten richtig im Unternehmen genutzt werden sollen.Abgesehen vom Aufbau neuer eigener Geschäftsmodelle mit Daten, können grundsätzlich drei Faktoren im Unternehmen mit der Nutzung von Daten optimiert werden.1. Umsätze, also die Erhöhung der Umsätze durch bessere Produkte oder durch besseres Verständnis der Kunden
    2. Die Reduktion von Kosten und
    3. die verbesserte Risikoerkennung und -bewertung, z. B. in der Wirtschaftsprüfung.Eine Datenstrategie ist abgerichtet auf die Unternehmensziele und ist der Masterplan dafür, diese auch zu erreichen.
  2. Und was sind die typischen Ziele mit denen Kunden an Sie herantreten?Das hängt stark von der Branche ab, also Handelsunternehmen wollen vor allem die Kunden besser verstehen, Marketing besser ausrichten oder auch Produkte verbessern. Immobilienunternehmen wollen stets DIE Markttransparenz für sich und industrielle Unternehmen, also Maschinenbau, Zulieferer, Pharma usw. wollen meistens intelligente Produkte, oder mehr noch, schlanke Prozesse zur Kosteneinsparung, aber auch, um mehr Umsatz zu machen, denn Schnelligkeit heißt Wettbewerbsfähigkeit.Am Ende ist das aber auch alles sehr individuell von Unternehmen zu Unternehmen.
  3. Die Entwicklung einer Datenstrategie erfordert sicherlich ein systematisches Vorgehen. Was sind die wichtigsten Schritte?Ja genau, wir haben da eine generelle Vorgehensweise. Verkürzt erläutert, in fünf Schritten, wollen wir zu Anfang erstmal die Unternehmensvision für die nächste Zeit wissen und diese, wenn nicht schon gegeben, in klare Unternehmensziele herunter gebrochen haben. Das ist der erste und wichtigste Schritt.Weil, wenn wir das haben, dann können wir die dafür relevanten Daten und Datenquellen identifizieren. Das sind vielleicht unternehmensinterne Daten aus den IT-Systemen, ERP, CRM usw. und manchmal auch noch Daten aus unternehmensexternen Quellen, z. B. aus dem Social Media, Marktplattformen, Open Data usw. In manchen Fällen dreht sich auch alles nur um interne oder nur um externe Daten. Auch prüfen wir natürlich, ob Daten erst noch generiert oder gesammelt werden müssen und wie es um den rechtlichen Rahmen bzgl. der Nutzung steht. Das war der zweite Schritt.Wenn die relevanten Datenquellen identifiziert sind, sind im dritten Schritt die richtigen Methoden der Datennutzung auszumachen, z. B. der Aufbau einer Datenplattform, vielleicht ein Data Warehouse zur Datenkonsolidierung, Process Mining zur Prozessanalyse oder Predictive Analytics für den Aufbau eines bestimmten Vorhersagesystems, KI zur Anomalieerkennung oder je nach Ziel etwas ganz anderes.Der vierte Schritt ist die Überlegung, wie das ganze organisatorisch gelöst werden soll, also z. B. über eine zentrale verantwortliche Stelle im Unternehmen oder dezentral in bestimmten Fachabteilungen? Stehen die dafür richtigen Mitarbeiter zur Verfügung? Müssen Qualifizierungsmaßnahmen getroffen werden? Im Grunde kennt das wohl jeder, dass Unternehmen einfach z. B. ein Tool eingeführt haben, dass dann aber nicht genutzt wird. Dies müssen wir zu verwenden wissen.Tja und wenn das auch erledigt ist, muss das alles nur nochmal aufgeschrieben und in eine Planung mit Meilensteinen gebracht werden. Budgets, Staffing, Make or Buy usw. kommt da alles rein. Und voila, dann haben wir unsere Datenstrategie.
  4. Unterstützen Sie auch bei der Umsetzung der Datenstrategien?Ja klar, schon viel gemacht, sogar in verschiedensten Branchen. Diese Arbeit macht sogar großen Spaß für alle Beteiligten und es gibt nichts Spannenderes, als diesen Plan in die Zukunft zu gestalten.
  5. Sie arbeiten nicht nur als externer Dienstleister, sondern bietet auch Hilfestellung beim Aufbau und der Ausbildung eigener Data Teams. Welche Weiterbildungsformate bieten Sie an?Also wenn es hier einen Fachkräftemangel gibt, dann definitiv bei den Datenexperten. Übrigens nicht mehr so stark bei den Data Scientists, auch wenn richtig gute Mitarbeiter ebenfalls rar gesät sind, den größten Bedarf haben Unternehmen eher bei den Data Engineers. Das sind die Kollegen, die die Data Warehouses oder Data Lakes aufbauen und pflegen.Es gibt aber viele junge Leute, die da gerne einsteigen wollen. Das Problem auf der anderen Seite ist jedoch, dass Unternehmen natürlich eher erfahrene Leute suchen, die schneller und besser mit den großen Praxisproblemen klarkommen, die in den Datenarchitekturen sich nun mal so einschleichen. Diese erfahrenen Experten sind aber schwer zu finden und Stellen daher meistens sehr lange unbesetzt, oder dann mit Mitarbeitern, die kein Deutsch sprechen können.Wo wir von DATANOMIQ helfen können: Durch uns als Coach können Unternehmen auf ihrer Suche dem DEM Superexperten auch einfach günstigere, unerfahrene, aber motivierte Leute einstellen. Motivation der Mitarbeiter ist nicht zu unterschätzen! Als externer Dienstleister können wir dann unterstützen und schulen zu gleich. Und das machen wir über drei verschiedene Stufen:Trainings, Workshops und Coachings.Beim Training arbeiten wir mit Didaktik. Die Daten sind einfach gehalten und beispielhaft, denn wir möchten nicht zu lange über sie reden, sondern über die richtige Methodik der Datenaufbereitung oder Datenanalyse.Beim Workshop behandeln wir das reale Problem mit den echten Daten, mit denen der Mitarbeiter im Unternehmen konfrontiert ist. Hier schauen wir erstmal gemeinsam blöd aus der Wäsche, aber erarbeiten uns dann gemeinsam zügig die Lösung.

    Und beim Coaching schauen wir dann eigentlich nur zu und geben Ratschläge, wie man besser an die Aufgabenstellung herangehen könnte. Der Mitarbeiter hat also selbst das Zepter in der Hand und das Doing.Wir sind dann nur der Support.

    So können wir Stellen schnell besetzen und niemand muss Sorge haben, dass die Kompetenz nicht ausreicht. Auf diese Weise habe ich schon mehrere Data Teams für Kunden aufgebaut und parallel natürlich auch mein eigenes.

 

Sehen Sie die zwei anderen Video-Interviews von Benjamin Aunkofer:

 

 

 

 

 

 


 

How to tackle lack of data: an overview on transfer learning

1, Data is the new oil, but labeled data might be closer to it

Even though we have been in the 3rd AI boom and machine learning is showing concrete effectiveness at a commercial level, after the first two AI booms we are facing a problem: lack of labeled data or data themselves. The increasing number of papers on deep learning demonstrate that researches on AI have developed rapidly recently. If architectures of neural networks and supervised learning are all you know about deep learning, you will be overwhelmed by complications of topics studied these days, for example generative models, making more compact neural net models by for example knowledge distillation, and explainable AI (XAI). Those researches are often conducted on easily available benchmark datasets which you can easily download, often with corresponding ground truth data (label data) necessary for training. However once you try to apply the techniques to more specific data, you usually cannot prepare enough label data which theoretical researches assume. Thus among fascinating deep learning topics, in this article I am going to pick up how to tackle lack of label or data themselves, and transfer learning. Transfer learning is a technique of machine learning to take advantages of knowledge learned in one dataset to deal with a task in another dataset. Presumably due to this fact, Andrew Ng, in his presentation in NeurIPS 2016, gave a rough and abstract predictions of how transfer learning in machine learning would make commercial success like white lines in the figure below. The explanation is straightforward, and given the trends in topics of researches on machine learning these days, this prediction is actually right. But at the same time, in my opinion supervised learning, transfer learning, and unsupervised learning cannot be clearly separated like the graph originally suggested by Andrew Ng. Those fields complement each other, and one can easily shift to another.

Source: https://ruder.io/transfer-learning/ The lines and texts in white are based on explanations by Andrew Ng. The orange cells are placed at random, so not that they represent commercial success of each field.

Along with the rapid progress of deep learning mentioned above, a lot of hypes and catchphrases regarding big data and machine learning were made, and an interesting one is “Data is the new oil.” That might have been said only because big data is sources of various industries. But I would say, the characteristic is more striking in training data for machine learning. Distributions of training data for machine learning are more complicated like various energy resources besides oil in the world. Labeled data might be also like uranium. Just as uranium-235 accounting for only less than one percent of uranium in the world can be used to generate energy, only a part of massive data in the world is labeled such that they can be used for supervised machine learning. And as uranium-235 is used effectively jointly with less active uranium-238, labeled data show greater potentials with unlabeled data. And training data for machine learning have another unpleasant analogy to energy resources. Like most mainstream energy resources, only limited companies or institutions would be able to mine and refine huge labeled datasets with gigantic computation resources, and most people more or less need to rely on that for their business. Even though alternative renewable energy resources are proposed, principal energy resources are indispensable for making industries stable. As well, even though a lot of techniques actually have been proposed to lack of data, it often turns out just fine-tuning pre-trained models is the most practical, which need huge datasets and rich computational resources. And I think recent success in for example BERT or GPT made this trend more visible.

*I am sorry in a case I am mistaken about energy resources. I just wanted to come up with some cool metaphors.

But I still think knowing about transfer learning more comprehensively would be effective. That is partly because I have been working on relatively unique data which are hard to even label. As I was studying computer vision (CV) in plant science field, I frequently saw relatively unique data obtained with special apparatuses. Such data are for the most part look far from very general dataset, which huge pre-trained models are trained on. At the same time such plant data have very complicated structures and hard to label. And also in my work, have to detect certain values in various formats in very specific documents, in German. Such data are far from general datasets, and even labeling is hard in that case. We have to carefully tackle lack of data every time on each type of data in that case.

In this article I would first like to explain in the first place what it is like to lack data and next introduce representative techniques to tackle lack of labeled data. Many of them are classified to transfer learning, but other techniques like unsupervised learning or self-supervised learning are used in them or share a lot in their ideas. Thus my main purpose of writing this article is to let you have a richer view on transfer learning. And you would see “transfer learning” these days are mainly about fine-tuning of pre-trained models. Also how to tackle lack of data or labels is in other words how to efficiently achieve good performance in machine learning. Thus even if tons of high quality labeled data are at your disposal, learning those ideas would be still effective to you. I hope you could find some hints of machine learning through my articles.

2, What does lack of data or labels mean in the first place?

We need to first consider what lack of labels or data means, and my answer to the title of this section is “It depends.” The more data you have, the better performances you get. And the bigger machine learning models are, the more data they usually need for training. I assume that people reading this article more or less understand neural networks and how they are trained with back propagation. But let’s review the process here. Most machine learning frameworks are more or less expressed like the figure below unless reinforcement learning is considered. The ultimate purpose of machine learning is to train a model f(\boldsymbol{x}_n;\boldsymbol{\theta}) by adjusting parameters \boldsymbol{\theta}. And the parameters \boldsymbol{\theta} are optimized so that a loss function L is minimized. If it is a supervised learning, the a value of a loss function is denoted L(f(\boldsymbol{x}_n, \boldsymbol{\theta}), \boldsymbol{y}_n) =L(\hat{\boldsymbol{y}}_n, \boldsymbol{y}_n), and it gets smaller as f(\boldsymbol{x}_n, \boldsymbol{\theta}) gets closer to \boldsymbol{y}_n. That is, \boldsymbol{y}_n is giving supervision to adjust f(\boldsymbol{\theta}) via L(\hat{\boldsymbol{y}}_n, \boldsymbol{y}_n). And in a case of unsupervised learning, a loss function is L(\hat{\boldsymbol{y}}_n), which is often heuristically handcrafted.

The very first problem from lacking training data you would learn is overfitting. That is, a machine learning model can be specialized too much for a training dataset, and it loses generalization to other data from the same dataset. It is like students with little imaginations and flexibility gradually memorizing all the answers in a textbook and failing to answer new questions they have not encountered yet. Overfitting is judged by relations of training and validation loss like in the graph below. Training loss in blue indicates how the students adjust to the textbook. The smaller the training loss is, the more they memorizes from the textbook and the less flexible they are. The orange line indicates their performance in newly appeared questions in tests. The smaller the validation loss is, the better the students perform on tests. Thus the students should stop learning with the textbook when the validation loss is about to increase. This is called early stopping in machine learning. And if you increase training data, the orange graph usually shifts to the right side, usually providing smaller validation loss, namely better performance. An important point is, this ideal relations of training and validation losses will not appear if sizes or expressivity of a model is not enough. Thus the more training data you use, the more parameters you need for the model to enhance its expressivity.

 

*Depending on sizes of training data, the curve of training loss also changes, so please bear it in mind that this graph is not correct and is very simplified.

What I said so far might sound too elementary. My point is, the more data you have, and the bigger computation resource you have, the better performance you get. In other words, machine learning has scalability with data and parameters. This characteristic is clearly observed in models in natural language processing (NLP) and computer vision (CV) like in the graphs below. When I read some papers,often I am very fascinated by their performances. But sometimes it turns out that the methods are mainly creatively in terms of how they increase training data, which is personally boring. And even if performance of GPT looks astonishing, I cannot really like them because of this simple fact.

However another important point is, conversely you don’t need to increase training data or parameters of a model once it achieves an ideal score in metrics. When you make a toy model with small training data, as long as your clients or co-researchers are already happy, that is enough. Therefore lack of data or labels has to be discussed depending on sizes of machine learning and their performances you expect. Given those points mentioned so far, my answer to the question “What does lack of data or labels mean?” would rephrased like “If your model is properly designed to reach the performance you expect and it starts overfitting, you are facing lack of data.” And such decisions basically has to be made based on experiments.

3, Types of lack of data

Even though I explained lack of labels or data is a contextual matter, the problems actually exist at any case. That is, you often fail to achieve ideas accuracy partly due to lack of training data. I would like to classify types of situations of data of label shortage as below.

We should first think about the case where lack of labels does not matter in the first place. If you can analyze data with statistical knowledge or unsupervised machine learning, just extracting data without labeling would be enough. And sometimes ad hoc analysis with simple data visualization will help your decision makings. And some dashboards made from those unlabeled data will already give you some insights into data.

The next case is that, popular machine learning fields with enough investments usually have huge datasets that huge academic institutes or companies have been preparing.  For example KITTI dataset, which include labels like trajectories and depth data, is by Karlsruhe Institute of Technology and Toyota Technological Institute. Such datasets are useful for self-driving-related researches, and many types of ground truth data are provided such as odometry, depth, opticla flow, detection. This kind of data might be considered “enough” only because they are enough for training machine learning models and quantitatively evaluating them in papers, regardless of practical usefulness at a commercial level. But at any rate, popular fields with large benchmark datasets are likely to get investments for commercial uses.

Next let’s see cases of data shortage. You should also keep it in mind that there are also several types of situations of data shortage. In fact there are cases where certain labels are supposed to be scarce such as classifications of imbalanced data, for example anomaly detection, judging spam mails,  or medical examination. In those problems only some percent of data are classified as “errors,” “spam,” or “disease,” and others are classified as “normal.” Just keeping classifying data into “normal” would give maybe more than 95% accuracy. But finding the rest some percent accurately is much more important. In this case model performances need to be evaluated with ROC curves, namely relations of true positives and false positives.

The next type is more related to cases assumed in transfer learning. Some data are in the first place very expensive to obtain. For example CT images have to be stored by special medical apparatuses as you know. And even if a lot of CT images are already obtained, annotating the images often needs professional skills, thus its annotations cost is high. Another case of high annotation cost is for example detection or segmentation of objects in images. Even if you can collect numerous images on the Internet, annotating bounding boxes or pixel-wise segments require a lot of time. Annotating around 1000 images  for classification might be ok, but annotating them at a pixel level is really time consuming. If you have a tablet, I would like you to paint each segment of objects in a picture with different colors. And you should multiply the time spent by 80,000, as many as the training images needed for Mask R-CNN, a popular model for instance segmentation. As you can imagine, it is a huge tediou work. Even preparing some 50 labeled images for fine-tuning is paiful, and even annotations for computer vision tasks itself is also a field of deep learning.

*I would say medical image processing is a relatively popular field in CV with deep learning, and there are several famous datasets on this field.

4, An overview on ways for dealing with lack of labeled data

I am going to first roughly introduce what kind of approaches can be taken to deal with lack of labeled data or data itself, but you should also keep it in mind that they are not clearly separated. Just as I am going to explain, one type of techniques can easily shift to another type. You should flexibly switch among them depending on your situations. And also please keep it in mind that these are well-studied areas, and tons of ingenious papers are announced one after another, usually giving slight changes in their performances. Problems I point out about each technique might not be a problem anymore with recently published researches on researches currently peer-read. It is hard to prove that something does not exist. Given those points, I think it is convenient to classify technique of dealing with label or data shortage as below.

Through this article, ideas of domains are important. A domain simply means a combination of a dataset and a task with it. Transfer learning is a family of machine learning techniques to make uses of knowledge learned in a domain to another domain, and the former is called a source domain, the latter a target domain. And discrepancies between a source domain and a target domain is called a domain shift. The figure below abstractly visualize examples of domains and domain shifts. Intuitively it is easy to imagine that face a CV task and an NLP task have bigger domain shifts than domains of leaf images taken from different angles, but quantitatively evaluating domain shifts is in practice hard, and I am not going to introduce the topic because that will need a lot of mathematics.

Instead of formulating transfer learning, I would like to take learning languages as an intuitive example of transfer learning. Most people master at least one native language before learning another one. Baby brains are a kind of fantastic machine learning models, and after overcoming many obstacles they master native languages. And people take advantages of their mother tongues to learn another language. Usually they learn foreign languages by comparing structures of translated sentences. And naturally, if both a foreign language and your language have analogies like grammatical cases or genders in common, language learning would be easy. In other words, proficiency in one language is helpful in leaning some language. But it is also possible that your native language badly affects learning the second language, due to grammatical structures, pronunciations. The case of a source domain deteriorating performances in a target domain is called negative transfer and contexts of transfer learning.

*I know similarities languages are not the sole and definite barometers of effectiveness in learning foreign languages. Sizes of economy or markets in a country would also affects English language acquisition of people there. But at least it is unfair to compare for example German or Dutch people learning English with Japanese, Chinese people learning it. Unlike Eastern Asian people who have to learn thousands of characters to at least read decent texts or who use very different grammars, European people obviously can use “transfer learning” to learn English.

5, Increasing training data

When you lack data or labels, the most straightforward and often quick solution is to just increase data. The two topics I will cover in this section are mainly conducted in one domain.

Data augmentation

Data augmentation is one of the first techniques you would learn to mitigate overfitting of machine learning, which is in short caused by lack of data. The idea is very simple and it is implemented well in deep learning libraries, so I would only briefly talk about it here. The idea of data augmentation is simply transforming input data by for example flipping, rotating, zooming, changing colors. By doing so for example an input image \boldsymbol{x}_n of a butterfly below with a label of \boldsymbol{y}_n = \text{Butterfly} can be converted to more than 6 images. This corresponds to getting a converted \boldsymbol{x}'_n= g(\boldsymbol{x}_n) in the machine learning outline in the last section. And this process is the same as increasing the size of a dataet \mathcal {D}. And one point you have to be careful is, you must not change \boldsymbol{x}_n too much to change corresponding \boldsymbol{y}_n. For example if \boldsymbol{x}_n is distorted too much, it cannot be recognized as \boldsymbol{y}_n anymore even by humans. Or if you rotate an image of a digit 6 180 degrees, its becomes 9. Recent researches focus on automatically find what kind of data augmentation is effective by using for example reinforcement learning.

Here let me take an example of data augmentation technique that would be contrary to your intuition. A technique named mixup literally mix up data with different classes and their labels. In classification problems, labels are expressed as one-hot vectors, that is only an element corresponding to a correct element is 1 and the others are 0. In a case of binary dog-or-cat classification, each label is \boldsymbol{y}_n = (1, 0)^T or \boldsymbol{y}_n = (0, 1)^T, respectively. In data augmentation, distorting data too much is a taboo because label data is contaminated, but in mixup you literally mix up labels. Randomly choosing a two inputs \boldsymbol{x}_n , \boldsymbol{x}_{n'} and a  number \lambda \in [0,1], you prepare a input and label pair (\lambda \boldsymbol{x}_n + (1 - \lambda) \boldsymbol{x}_{n'},  \lambda \boldsymbol{y}_n + (1 - \lambda) \boldsymbol{y}_{n'}). The figure below is an example of a mixing up a cat input and a dog input, and corresponding labels. It is known augmenting training data like this improves classification performances. It is said this is partly due to machine learning models effectively learning decision boundaries. In classification ambiguous inputs are bottlenecks, so learning to giving ambiguous outputs to ambiguous inputs can enhance classification abilities.

*One-hot-encoded labels are called hard labels, and otherwise soft labels. Recent topics in deep learning, such as lottery hypothesis, knowledge distillation, imply that whether supervising labels are hard or not is important in deep learning. Hopefully I would like to explain why little by little in my articles.

6, Active learning

Active learning is about how to annotate data and get labeled data efficiently. Labels of data do not equally contribute to enhancing machine learning models, and labels actually have qualities. Even if you give apparently similar images with the same label to machine learning models during training, the models cannot learn so much from the pair of data. You need to efficiently dig data to know its distribution by giving labels to samples. I think a good metaphor is geological survey by excavating with some boring. In order to know substances or features of ground, some earth need to be sampled with boring. But you cannot freely penetrate everywhere mainly due to costs. They need to be sampled one by one due to uncertainty about the ground.

 

Similar approaches are often taken in machine learning or statistics, that is estimating distributions of data with a small size of samples is an important idea. A basic idea for doing that is you sample or annotate data which decreases uncertainty of your model the most. The figure simply exhibits the idea. We want to regress a data distribution with the red curve, and the cross marks can be sampled from the distribution. And the part filled with light blue shows uncertainty of the model to predict a value of y for a x. When you want to regress the data with as few samples as possible, data points should be sampled from the parts with great uncertainties. And by doing so, you can see that the data is regressed efficiently with few samples.

We have seen that modeling uncertainty is the key to active learning, and that can be applied to annotations of data in deep learning. An example of the process is displayed below, and in this case a deep neural network model (DNN model) is trained with some labeled data, and you give some signals for data annotations based on uncertainty of outputs of DNN models. And human annotators prioritize giving labels to the data. Such uncertainly can be estimated by using entropy of outputs or modeling data distributions.

 

But when you get a certain amount of labels, the situation will be the same as semi-supervised learning, which I will explain next. That is, you might be already able to make the most of the labels so far with the help of unlabeled data. You should consider stopping labeling and start labeling depending on situations. And importantly, starting naively annotating data might become a quick solution rather than thinking about how to make uses of limited labels if extracting data itself is easy and does not cost so much. “Shut up and annotate!” could be often the best practice in practice. And annotations would be an effective way for exploratory data analysis (EDA), so I recommend you to immediately start annotating about 10 random samples at any rate.

7, Dealing with lack of labels in a single domain

In many cases, data themselves are easily available, and only annotations costs matter. The following two topics consider such cases, and again only one domain is considered. But by the end of this article you would see that other techniques covered in this article have a lot of analogies with topics introduced here.

Semi-supervised learning

Semi-supervised learning is a type of supervised learning where only limited labels are available in one domain. This is important in because many of other techniques in this article can be seen as semi-supervised learning from certain points of views. The figure below shows an intuition on semi-supervised learning in a case of classification task. In this case, original data distribution have two clusters of circles and triangles and a clear border can be drawn between them. But only with limited labeled data, decision boundaries would be ambiguous. However in fact, with a help of unlabeled data in dotted lines, machine learning model might be able to recognize two clusters with a help of unlabeled data. In other words, unlabeled data help models learn distribution of data. this might be natural as clusters of data can be estimated with unsupervised learning.

*As I have already mentioned, active learning could soon shift to semi-supervised learning, and it might be worth trying it before finishing labeling. But suspending labeling and resuming it later might not be efficient. At any rate you need to be flexible depending on situations.

Semi-supervised learning is applicable to several tasks, not only classification. I explained that normal supervised learning is adjusting parameters \boldsymbol{\theta} of a model f(\boldsymbol{\theta}) so that it minimize loss function L(\boldsymbol{\theta}, \mathcal{D}_{\text{L}}) for a labeled dataset \mathcal{D}_{\text{L}}. In semi-supervised learning, we assume that usually a bigger unsupervised dataset \mathcal{D}_{\text{UL}} is available in the same domain. And semi-supervised learning optimize \boldsymbol{\theta} by jointly minimizing L(\boldsymbol{\theta}, \mathcal{D}_{\text{L}}) + L'(\boldsymbol{\theta}, \mathcal{D}_{\text{UL}}) after designing a loss function L'(\boldsymbol{\theta}, \mathcal{D}_{\text{UL}}) for the unlabeled dataset. There are following 3 major ways of semi-supervised learning depending on how you design a L'(\boldsymbol{\theta}, \mathcal{D}_{\text{UL}}).

  • Consistency regularization: adding slight changes to data \boldsymbol{x}_{\text{UL}} in \mathcal{D}_{\text{UL}} and get \boldsymbol{x}'_{\text{UL}}. And training f(\boldsymbol{\theta}) so that f(\boldsymbol{\theta}, \boldsymbol{x}_{\text{UL}}) and f(\boldsymbol{\theta}, \boldsymbol{x}'_{\text{UL}}) give out a consistent output.
  • Pseudo label: after training f(\boldsymbol{\theta}) with \mathcal{D}_{\text{L}}, using some estimations f(\boldsymbol{\theta}, \boldsymbol{x}_{\text{UL}}) as labels of \mathcal{D}_{\text{UL}} .
  • Entropy minimization: encouraging outputs f(\boldsymbol{\theta}, \boldsymbol{x}_{\text{UL}}) to have smaller entropy.

More or less similar ideas show up in different transfer learning techniques, so it would be effective to learn the three semi-supervised learning ideas above.

Self-supervised learning

Self-supervised learning is often counted as unsupervised learning. Both unsupervised and self-supervised learning do not need label data, but especially when labels generated by processing themselves, that is often called self-supervised learning. A representative case of using self-supervised learning is auto-encoder. Simpler labels can be generated from input data themselves with elementary data processing. For example in a case of image processing, by rotating an input image 0, 90, 180, 270 degrees respectively, a classification task of estimating rotation degrees can be made. Another case is estimating the original input image after some simple image processing (for example colorization).  These simple tasks generated solely from an input is called pretext task. And in a case of image processing, deep learning models can be prompted to learn image features .

Source: https://atcold.github.io/pytorch-Deep-Learning/en/week10/10-1/

Pretext tasks are applicable also to other fields for example NLP. A very simple task is hiding a part of an input sentence, and let neural networks estimate the blank word. And this is a basic idea of how to train BERTs, famous pre-trained NLP models. BERT models are trained this way with a huge and very general corpus without any specific topics. By doing so BERT model can already learn to detect some clusters of meanings in texts, as I visualize in the next section. But if you fine-tune BERT models with labeled texts with very specific topics, that often fails to achieve satisfying performance. In that case, the BERT models have to “get used to” the new dataset. In that case, BERT can “get used to” the new dataset by applying self-supervised learning on the new dataset. This tutorial of Huggingface demonstrates this with an example of adjusting a BERT model trained with Wikipedia to the IMDb dataset.

In the case above, the BERT model is fine-tuned with relatively lots of unlabeled data and after that trained with fewer labels. As a whole this can be seen as semi-supervised learning ,with fewer labels of the IMBb dataset and more unlabeled data. Also the ideas of pretext tasks, which prompt models to give consistent outputs given preprocessed inputs, have some analogies with consistency regularization in semi-supervised learning.

*The Huggingface tutorial says, they fine-tune a pre-trained BERT model trained in a self-supervised way to adjus it, and they call it “domain adaptation.” As you can see from the statement, distinctions of topics covered in this article can be just ambiguous.

8, Dealing with lack of data or labels over several domains

Another approach for tackling label or data shortage is taking advantages of other domains, which are usually larger and have enough labels. And such techniques is called transfer learning as I mentioned. It seems like transfer learning in business refers to “fine-tuning” explained below, but in academic contexts it is often also said transfer learning is almost synonym to “domain adaptation.” At any rate, my point is it would be more important to have comprehensive view on the techniques rather than clearly distinguishing them.

Fine tuning

Fine tuning would be the easiest way of transfer learning, and at the same time it is very powerful. Even though I am going to introduce other technique of transfer learning, more often than not it turns out that fine tuning can compensate them. Here I will only explain what it is like to use fine-tuning. I would say using fine-tuning is easy like using instant coffee. Conventionally you needed to train your original model with your own data, and that is very affected by sizes of data you have. I would say, that was like making coffee or coffee cakes from coffee you made from beans. But by using pre-trained models already trained somewhere with huge datasets, you can use models which can already more or less recognize data. The idea was very normal already in the field of CV, and NLP got the same idea with the advent of BERT, or already with word embeddings. That is like people learned to use instant coffee instead of roasting and brewing coffee every time.

How such instant coffee is made depends on which type of deep learning is used on a huge dataset. Backbone CNN is usually trained on ImageNet dataset with supervised learning of a classification task. In case of BERT, it is trained with a huge corpus with a pretext task of estimating blank words of input sentences, which is classified to self-supervised learning. Let me more practically what the “coffee syrup” means. Machine learning is at any rate just mapping of tensors or vectors. In CV, an input images as a tensor is converted into a a vector or a tensor, and tasks like image classification are conducted with the converted tensor or vector. In case of an NLP task, usually a sequence of vectors is converted to a vector or another sequence of vectors. And these resulting tensors of vectors from models are the very “coffee syrup” I am talking about. An important point is, fine-tuning also considers transfer learning between different tasks. Backbone CNNs are usually trained with classification, BERT with self-supervised learning, but the there are a variety of final tasks. They are called downstream tasks. In other words, you don’t necessarily drink instant coffee as coffee.

 

The two figures below are visualizations what the “instant coffee syrup” means. I processed random N images in a dataset with a pre-trained backbone CNN, and I got corresponding D dimensional vectors, that is a N\times D tensor. And I applied t-SNE to reduce its dimension from D to 2 and got a N\times 2 tensor.  The figure below shows arrangements of input images in the 2 dimensional space. As you can see, semantically similar images get closer.

Just as well, if you process random texts with BERT and apply a dimension reduction, you get a visualization like below. As well as the figure above, texts in similar topic get closer.

To make it catchy I expressed them as “coffee syrup” but this is a kind of how so-called AI sees data. Images and texts are just vectors or tensors on computer, and AI process another set of tensors of vectors in spaces which make sense to them.

Fine-tuning is quite easy. You have only to train a pre-trained model you downloaded just like normal supervised learning with your dataset. And when you train CV models with backbone CNN, the backbone is almost automatically downloaded. You have to be careful about some points, for example you have to set learning rate smaller. Let me avoid too detailed points in this article. Hopefully in the future, I’d like to write about more practical fine-tuning tips.

Domain adaptation

Domain adaptation is another family of techniques to make uses of knowledge gained in one domain in another domain. Domain adaptation is a Domain adaptation is these days often used as almost a synonym of transfer learning. But papers on domain adaptation usually assume to handle the same tasks both in a source and a target domain. So I would say domain adaptation is a subfield of transfer learning. Domain adaptation is more of how to tackle deterioration of machine learning performances when trained models are applied in different domains. Based on how much labels are available in each domain, domain adaptation is classified to several types. And unsupervised domain adaptation (UDA), where labels are available only in a source domain, is considered as the most challenging and studied well.

*Another explanation I often hear about domain adaptation is, when a models trained on a dataset is trained on another data, domain adaptation can be used to mitigate decreases in performance. I think in this context, performance of the model on the source domain is not discussed. When you apply some retraining with a new dataset, performance of the model on the source domain often drastically decrease. This is called catastrophic forgetting, and techniques like continuous learning are studied to tackle this problem. I have not really seen continuous learning in contexts of domain adaptation, but I thin these are related.

There several approaches in domain adaptation, and one frequently used approach is using adversarial loss. As we saw with the example of getting “coffee syrup,” data is first mapped into a certain space, and this is often called feature extraction. And outputs with the feature extractor are processed are processed more to give task-specific results with some networks. Often in domain adaptation, we put a domain discriminator network right after the feature extractor. And the domain discriminator classifies whether the features extracted come from the source or target domain. The feature extractor tries to extract features the domain have in common, and the domain discriminator tries to distinguish them, and two networks compete. In this way, the feature extractor and the domain discriminator form generative adversarial network (GAN), and the feature extractor learns to extract features that are hard to distinguish their domains. Feature extractor is trained so that it extract domain invariant features, for example edges and silhouette.

As well as in other transfer learning techniques, one ultimate goal of UDA is training a deep learning model only with synthetic labeled data, for example CGI, and apply the model on a totally unlabeled dataset. Converting a source domain to look like a target domain with Cycle GAN is an often used approach in domain adaptation. In domain adaptation a source domain is supposed to be easier to annotate. The figure below is an example of converting a black and white cell images  to colored images.

*You could easily try converting data with Cycle GAN by preparing two datasets, and I made the converted data by myself. But you need at least one GPU to try that.

However some people insist that usefulness of UDA is very questionable. In the first place, if you do not have any labels on the target domain, that means you cannot evaluate anything qualitatively on the dataset of interest. And if you can prepare some of evaluation data or labels, applying other techniques like fine-tuning might be enough.

Meta learning and few-shot learning

One simple way to explain meta learning is that, it is a machine learning technique teach models to learn efficiently. We can also say that it is a transfer learning case where target domains are unknown.  A famous meta learning method is Model-Agnostic Meta-Learning (MAML). MAML is used to get an ideal parameter \boldsymbol{\theta} which can be quickly and effectively used to new tasks. Like in the figure below, \boldsymbol{\theta} reaches the generally convenient parameter shown as the black dot. And the parameter can quickly reach the parameters \theta_{i}^{\ast}, which effective for each task.

Another interesting application of meta learning is few-shot learning. Few-shot learning trains a classification model to learn to acquire classification ability based only on a very few samples. By letting the models learn classification tasks over many episodes, the model learn comes to learn efficiently from limited data samples at a test phase. The figure below shows a case of few-shot learning, where a model learns some episodes of 3-class classifications with only 4 samples per class. Few-shot learning attempts to enable human-level flexibility of perception. MAML is known to be effective also for few-shot learning.

However, studies these days do also show that fine tuning pre-trained models with a few sample data show competitive results to those by few-shot learning. Similar things can be said about large language models like GPT. Chat GPT or GPT-3/GPT-4 for example can be fine-tuned with small extra training samples, and the logic behind is different from meta learning. Fine-tuning pre-trained models rather might be closer to human learning. Humans can effectively learn new topics based on what they have experienced so far. Thus again here, fine-tuning models can be an easier and realistic solution.

I have explained an overview of machine learning techniques for handling lack of data, and as you might have noticed, fine-tuning models could be enough in many cases. I am not sure how much other transfer learning technique would be widely as useful as fine-tuning at a business level. At least, I hope this article would be a rough guideline for machine learning tasks with small sizes of data or labels. And if you have a chance to work on very unique data with very few labels, you wouldn’t be able to rely so much on only naive fine tuning of pre-trained models. In that case, you tasks have your own problem, and you would have to be careful about your EDA, data cleaning, and labeling. In that case you should consider some techniques introduced here. Hopefully someday I would like to write more detailed tutorials with each transfer learning technique. And I hope you would be able to apply a variety of transfer learning locally, not only relying on huge resources of gigantic entities.  And that would lead to a more secure future, I guess.

Benjamin Aunkofer - Interview über AI as a Service

Interview – Daten vermarkten, nicht verkaufen!

Das Format Business Talk am Kudamm in Berlin führte ein Interview mit Benjamin Aunkofer zu den Themen “Daten vermarkten, nicht verkaufen!”.

In dem Interview erklärt Benjamin Aunkofer, warum der Datenschutz für die meisten Anwendungsfälle keine Rolle spielt und wie Unternehmen mit Data as a Service oder AI as a Service Ihre Daten zu Geld machen, selbst dann, wenn diese Daten nicht herausgegeben werden können.

Nachfolgend das Interview auf Youtube sowie die schriftliche Form zum Nachlesen:


Nachfolgend das Transkript zum Interview:

1 – Herr Aunkofer, Daten gelten als der wichtigste Rohstoff des 22. Jahrhunderts. Bei der Vermarktung datengestützter Dienstleistung tun sich deutsche Unternehmen im Vergleich zur Konkurrenz aus den USA oder Asien aber deutlich schwerer. Woran liegt das?

Ach da will ich keinen Hehl draus machen. Die Unterschiede liegen in den verschiedenen Kulturen begründet. In den USA herrscht in der Gesellschaft ein sehr freiheitlicher Gedanke, der wohl eher darauf hinausläuft, dass wer Daten sammelt, über diese dann eben auch weitgehend verfügt.

In Asien ist die Kultur eher kollektiv ausgerichtet, um den Einzelnen geht es dort ja eher nicht so.

In Deutschland herrscht auch ein freiheitlicher Gedanke – Gott sei Dank – jedoch eher um den Schutz der personenbezogenen Daten.

Das muss nun aber gar nicht schlimm sein. Zwar mag es in Deutschland etwas umständlicher und so einen Hauch langsamer sein, Daten nutzen zu dürfen. Bei vielen Anwendungsfällen kann man jedoch sehr gut mit korrekt anonymisierten Massendaten arbeiten und bei gesellschaftsfördernen Anwendungsfällen, man denke z. B. an medizinische Vorhersagen von Diagnosen oder Behandlungserfolgen oder aber auch bei der Optimierung des öffentlichen Verkehrs, sind ja viele Menschen durchaus bereit, ihre Daten zu teilen.

 Gesellschaftlichen Nutzen haben wir aber auch im B2B Geschäft, bei dem wir in Unternehmen und Institutionen die Prozesse kundenorientierter und schneller machen, Maschinen ausfallsicherer machen usw.. Da haben wir meistens sogar mit gar keinen personenbezogenen Daten zu tun.

2 – Sind die Bedenken im Zusammenhang mit Datenschutz und dem Schutz von Geschäftsgeheimnissen nicht berechtigt?

Also mit Datenschutz ist ja der gesetzliche Datenschutz gemeint, der sich nur auf personenbezogene Daten bezieht. Für Anwendungsfälle z. B. im Customer Analytics, also da, wo man Kundendaten analysieren möchte, geht das nur über die direkte Einwilligung oder eben durch anonymisierte Massendaten. Bei betrieblicher Prozessoptimierung, Anlagenoptimierung hat man mit personenbezogenen Daten aber fast nicht zu tun bzw. kann diese einfach vorher wegfiltern.

Ein ganz anderes Thema ist die Datensicherheit. Diese schließt die Sicherheit von personenbezogenen Daten mit ein, betrifft aber auch interne betriebliche Angelegenheiten, so wie etwas Lieferanten, Verträge, Preise… vielleicht Produktions- und Maschinendaten, natürlich auch Konstruktionsdaten in der Industrie.

Dieser Schutz ist jedoch einfach zu gewährleisten, wenn man einige Prinzipien der Datensicherheit verfolgt. Wir haben dafür Checklisten, quasi wie in der Luftfahrt. Bevor der Flieger abhebt, gehen wir die Checks durch… da stehen so Sachen drauf wie Passwortsicherheit, Identity Management, Zero Trust, Hybrid Cloud usw.

3 – Das Rückgrat der deutschen Wirtschaft sind die vielen hochspezialisierten KMU. Warum sollte sich beispielsweise ein Maschinenbauer darüber Gedanken machen, datengestützte Geschäftsmodelle zu entwickeln?

Nun da möchte ich dringend betonen, dass das nicht nur für Maschinenbauer gilt, aber es stimmt schon, dass Unternehmen im Maschinenbau, in der Automatisierungstechnik und natürlich der Werkzeugmaschinen richtig viel Potenzial haben, ihre Geschäftsmodelle mit Daten auszubauen oder sogar Datenbestände aufzubauen, die dann auch vermarktet werden können, und das so, dass diese Daten das Unternehmen gar nicht verlassen und dabei geheim bleiben.

4 – Daten verkaufen, ohne diese quasi zu verkaufen? Wie kann das funktionieren?

Das verrate ich gleich, aber reden wir vielleicht kurz einmal über das Verkaufen von Daten, die man sogar gerne verkauft. Das Verkaufen von Daten ist nämlich gerade so ein Trend. Das Konzept dafür heißt Data as a Service und bezieht sich dabei auf öffentliche Daten aus Quellen der Kategorie Open Data und Public Data. Diese Daten können aus dem Internet quasi gesammelt, als Datenbasis dann im Unternehmen aufgebaut werden und haben durch die Zusammenführung, Bereinigung und Aufbereitung einen Wert, der in die Millionen gehen kann. Denn andere Unternehmen brauchen vielleicht auch diese Daten, wollen aber nicht mehr warten, bis sie diese selbst aufbauen. Beispiele dafür sind Daten über den öffentlichen Verkehr, Infrastruktur, Marktpreise oder wir erheben z. B. für einen Industriekonzern Wasserqualitätsdaten beinahe weltweit aus den vielen vielen regionalen Veröffentlichungen der Daten über das Trinkwasser. Das sind zwar hohe Aufwände, aber der Wert der zusammengetragenen Daten ist ebenfalls enorm und kann an andere Unternehmen weiterverkauft werden. Und nur an jene Unternehmen, an die man das eben zu tun bereit ist.

5 – Okay, das sind öffentliche Daten, die von Unternehmen nutzbar gemacht werden. Aber wie ist es nun mit Daten aus internen Prozessen?

Interne Daten sind Geschäftsgeheimnisse und dürfen keinesfalls an Dritte weitergegeben werden. Dazu gehören beispielsweise im Handel die Umsatzkurven für bestimmte Produktkategorien sowie aber auch die Retouren und andere Muster des Kundenverhaltens, z. B. die Reaktion auf die Konfiguration von Online-Marketingkampagnen. Die Unternehmen möchten daraus jedoch Vorhersagemodelle oder auch komplexere Anomalie-Erkennung auf diese Daten trainieren, um sie für sich in ihren operativen Prozessen nutzbar zu machen. Machine Learning, übrigens ein Teilgebiet der KI (Künstlichen Intelligenz), funktioniert ja so, dass man zwei Algorithmen hat. Der erste Algorithmus ist ein Lern-Algorithmus. Diesen muss man richtig parametrisieren und überhaupt erstmal den richtigen auswählen, es gibt nämlich viele zur Auswahl und ja, die sind auch miteinander kombinierbar, um gegenseitige Schwächen auszugleichen und in eine Stärke zu verwandeln. Der Lernalgorithmus erstellt dann, über das Training mit den Daten, ein Vorhersagemodell, im Grunde eine Formel. Das ist dann der zweite Algorithmus. Dieser Algorithmus entstand aus den Daten und reflektiert auch das in den Daten eingelagerte Wissen, kanalisiert als Vorhersagemodell. Und dieses kann dann nicht nur intern genutzt werden, sondern auch anderen Unternehmen zur Nutzung zur Verfügung gestellt werden.

6 – Welche Arten von Problemen sind denn geeignet, um aus Daten ein neues Geschäftsmodell entwickeln zu können?

Alle operativen Geschäftsprozesse und deren Unterformen, also z. B. Handels-, Finanz-, Produktions- oder Logistikprozesse generieren haufenweise Daten. Das Problem für ein Unternehmen wie meines ist ja, dass wir zwar Analysemethodik kennen, aber keine Daten. Die Daten sind quasi wie der Inhalt einer Flasche oder eines Ballons, und der Inhalt bestimmt die Form mit. Unternehmen mit vielen operativen Prozessen haben genau diese Datenmengen.Ein Anwendungsfallgebiet sind z. B. Diagnosen. Das können neben medizinischen Diagnosen für Menschen auch ganz andere Diagnosen sein, z. B. über den Zustand einer Maschine, eines Prozesses oder eines ganzen Unternehmens. Die Einsatzgebiete reichen von der medizinischen Diagnose bis hin zu der Diagnose einer Prozesseffizienz oder eines Zustandes in der Wirtschaftsprüfung.Eine andere Kategorie von Anwendungsfällen sind die Prädiktionen durch Text- oder Bild-Erkennung. In der Versicherungsindustrie oder in der Immobilienbranche B. gibt es das Geschäftsmodell, dass KI-Modelle mit Dokumenten trainiert werden, so dass diese automatisiert, maschinell ausgelesen werden können. Die KI lernt dadurch, welche Textstellen im Dokument oder welche Objekte im Bild eine Rolle spielen und verwandelt diese in klare Aussagen.

Die Industrie benutzt KI zur generellen Objekterkennung z. B. in der Qualitätsprüfung. Hersteller von landwirtschaftlichen Maschinen trainieren KI, um Unkraut über auf Videobildern zu erkennen. Oder ein Algorithmus, der gelernt hat, wie Ultraschalldaten von Mirkochips zu interpretieren sind, um daraus Beschädigungen zu erkennen, so als Beispiel, den kann man weiterverkaufen.

Das Verkaufen erfolgt dabei idealerweise hinter einer technischen Wand, abgeschirmt über eine API. Eine API ist eine Schnittstelle, über die man die KI verwenden kann. Daraus wird dann AI as a Service, also KI als ein Service, den man Dritten gegen Bezahlung nutzen lassen kann.

7 – Gehen wir mal in die Praxis: Wie lassen sich aus erhobenen Daten Modelle entwickeln, die intern genutzt oder als Datenmodell an Kunden verkauft werden können?

Zuerst müssen wir die Idee natürlich richtig auseinander nehmen. Nach einer kurzen Euphorie-Phase, wie toll die Idee ist, kommt ja dann oft die Ernüchterung. Oft überwinden wir aber eben diese Ernüchterung und können starten. Der einzige Knackpunkt sind meistens fehlende Daten, denn ja, wir reden hier von großen Datenhistorien, die zum Einen überhaupt erstmal vorliegen müssen, zum anderen aber auch fast immer aufbereitet werden müssen.Wenn das erledigt ist, können wir den Algorithmus trainieren, ihn damit auf eine bestimmte Problemlösung sozusagen abrichten.Übrigens können Kunden oder Partner die KI selbst nachtrainieren, um sie für eigene besondere Zwecke besser vorzubereiten. Nehmen wir das einfache Beispiel mit der Unkrauterkennung via Bilddaten für landwirtschaftliche Maschinen. Nun sieht Unkraut in fernen Ländern sicherlich ähnlich, aber doch eben anders aus als hier in Mitteleuropa. Der Algorithmus kann jedoch nachtrainiert werden und sich der neuen Situation damit anpassen. Hierfür sind sehr viel weniger Daten nötig als es für das erstmalige Anlernen der Fall war.

8 – Viele Unternehmen haben Bedenken wegen des Zeitaufwands und der hohen Kosten für Spezialisten. Wie hoch ist denn der Zeit- und Kostenaufwand für die Implementierung solcher KI-Modelle in der Realität?

Das hängt sehr stark von der eigentlichen Aufgabenstellung ab, ob die Daten dafür bereits vorliegen oder erst noch generiert werden müssen und wie schnell das alles passieren soll. So ein Projekt dauert pauschal geschätzt gerne mal 5 bis 8 Monate bis zur ersten nutzbaren Version.

Sehen Sie die zwei anderen Video-Interviews von Benjamin Aunkofer:

 

 

 

 

 

 


 

How to speed up claims processing with automated car damage detection

AI drives automation, not only in industrial production or for autonomous driving, but above all in dealing with bureaucracy. It is an realy enabler for lean management!

One example is the use of Deep Learning (as part of Artificial Intelligence) for image object detection. A car insurance company checks the amount of the damage by a damage report after car accidents. This process is actually performed by human professionals. With AI, we can partially automate this process using image data (photos of car damages). After an AI training with millions of photos in relation to real costs for repair or replacement, the cost estimation gets suprising accurate and supports the process in speed and quality.

AI drives automation and DATANOMIQ drives this automation with you! You can download the Infographic as PDF.

How to speed up claims processing with automated car damage detection

How to speed up claims processing
with automated car damage detection

Download this Infographic as PDF now by clicking here!

We wrote this article in cooperation with pixolution, a company for computer vision and AI-bases visual search. Interested in introducing AI / Deep Learning to your organization? Do not hesitate to get in touch with us!

DATANOMIQ is the independent consulting and service partner for business intelligence, process mining and data science. We are opening up the diverse possibilities offered by big data and artificial intelligence in all areas of the value chain. We rely on the best minds and the most comprehensive method and technology portfolio for the use of data for business optimization.

Interview – Mehr Business-Nerds, bitte!

Die Haufe Akademie im Gespräch mit Prof. Dr. Stephan Matzka, Hochschulprofessor an der HTW Berlin und Trainer der Haufe Akademie darüber, wie Data Science und KI verdaulich vermittelt werden können und was eigentlich passiert, wenn man es nicht tut.

Sie beschäftigen sich mit Data Science, Algorithmen und Machine Learning – Hand aufs Herz: Sind Sie ein Nerd, Herr Prof. Matzka?   

Stephan Matzka: (lacht) Ich bin ein neugieriger Mensch und möchte gerne mehr über die Menschen und Dinge erfahren, die mich umgeben. Dafür benötige ich Informationen, die ich einordnen und bewerten kann und nichts anderes macht Data Science. Wenn Neugier also einen Nerd ausmacht, bin ich gerne ein Nerd.

Aber all die Buzzwords, die Sie gerade genannt haben, wie Machine Learning oder Algorithmen, blenden mehr als sie helfen. Ich spreche lieber von menschlicher und künstlicher Intelligenz. Deren Gemeinsamkeiten und Unterschiede sind gut zu erklären und dieses Verständnis ist der Schlüssel für alles Weitere.

Ist das Verständnis für Data Science und Machine Learning auch der Schlüssel für den Zukunftserfolg von Unternehmen oder wird die Businessrelevanz von Data Science überschätzt?

Stephan Matzka: Zunächst mal ist Machine Learning größtenteils einfach Statistik, die sehr clever angewandt wird. Damit wir Benutzer:innen nicht wie in der Schule mit der Hand rechnen müssen, gibt es Algorithmen, die uns die Arbeit abnehmen. Die Theorie ist also altbekannt. Aber die technischen Möglichkeiten haben sich geändert.

Sie können das mit Strom vergleichen, den gibt es schon länger. Aber erst mit einem Elektromotor können Sie Power auf die Straße bringen. Daten sind also altbekannte Rohstoffe, die Algorithmen und Rechenleistung von heute aber ein völlig neuer Motor.

Wenn Sie sehen, wie radikal die Dampfmaschine und der Elektromotor die Wirtschaft beeinflusst haben, dann gewinnen Sie einen Eindruck, was gerade im Bereich künstliche Intelligenz abgeht, und das über alle Unternehmensgrößen und Branchen hinweg.

So eine Dampfmaschine ist für viele wahrscheinlich deutlich einfacher zu greifen als das tech-lastige Universum Data Science. Das ist schon sehr abstrakt. Ist es so schwierig, wie es aussieht?

Stephan Matzka: Data Science kann man, wie alle Dinge im Leben, kompliziert oder einfach machen. Und es gibt auch auf diesem Feld Menschen, die Schwieriges einfach aussehen lassen. Das sind die Vorbilder, von denen wir alle lernen können.

Künstliche Intelligenz, oder kurz KI, bietet Menschen und Unternehmen große Chancen, wenn Sie sich rechtzeitig damit beschäftigen. Dabei geht es um nicht weniger als die Frage, ob wir in unserer Arbeitswelt zukünftig KI für uns arbeiten lassen oder abwarten, bis uns ein Algorithmus vorgibt, was wir als Nächstes tun sollen. Mit der richtigen Unterstützung ist der Aufwand jedoch überschaubar und der Nutzen für Unternehmen und Organisationen enorm.

Viele Mitarbeiter:innen hören nach „Wir sind jetzt agil“ neuerdings „Mach‘ mal KI“ – was raten Sie den Kolleg:innen und Entscheider:innen in mittelständischen Unternehmen für den Umgang mit dem Thema?

Stephan Matzka: Es braucht zum einen Impulse „von außen“, um sich mit diesem wichtigen Thema auseinanderzusetzen und einen Start zu finden. Und zum anderen braucht es Mitarbeiter:innen, die datenaffin sind, sich mit dem Thema bereits auseinandergesetzt haben und Use Cases entwickeln sowie hinterfragen können. Meine Berufserfahrung zeigt: Gerade am Anfang ist es noch sehr leicht, bei den klassischen „Low Hanging Fruits“ Erfolge zu erzielen. Das motiviert für das nächste Projekt und schon ist das Momentum im Unternehmen.

Was sind die Minimalanforderungen in einem Unternehmen, um mit Data Science und Machine Learning einen echten Mehrwert zu schaffen und die „Low Hanging Fruits“ zu ernten?  

Stephan Matzka: Der Rohstoff sind Daten in digitaler Form, ob in Excel-Listen, in SAP oder einer Datenbank ist erst mal zweitrangig. Für die Auswertung brauchen Sie passende Software und Menschen, die diese Software bedienen können.

In jedem Unternehmen gibt es solche Daten, die Software ist häufig kostenlos, der eigentliche Engpass sind aktuell die Expert:innen.

Könnte ich mir nicht die Arbeit sparen und Beratungsunternehmen einsetzen?

Stephan Matzka: Das könnten Sie, und Beratungsunternehmen können Ihnen oft auch die richtigen Themen aufzeigen. Gleichzeitig wirft dies zwei wesentliche Fragen auf: Wie können Sie die Qualität und den Preis einer Lösung beurteilen, die Ihnen ein externer Dienstleister anbietet? Und zweitens, wie verankern Sie nachhaltig das Wissen in Ihrem Unternehmen?

Damit die Beratungsleistung Ihnen also wirklich weiterhilft, benötigen Sie Beurteilungskompetenz auf dem Gebiet der künstlichen Intelligenz im eigenen Unternehmen. Diese Beurteilungskompetenz im Businesskontext zu schaffen ist aus meiner Sicht ein wesentlicher Erfolgsfaktor für Unternehmen und sollte eher kurz- als mittelfristig angegangen werden.

Haufe Akademie: Nochmal zurück zu den Daten: Woher weiß ich, ob ich genug Daten habe? Sonst bilde ich jemanden aus oder stelle jemanden ein, der mich Geld kostet, aber nichts zu tun hat.

Stephan Matzka: Mit den Daten ist es ein wenig so wie mit den Finanzen, kann ich jemals „genug Budget“ im Unternehmen haben? Natürlich ist es mit großen Datenmengen leichter möglich, bessere Resultate zu erzielen, genauso wie mit mehr Projektbudget. Aber wir alle haben schon erlebt, wie kleine Projekte Erstaunliches bewegt haben und Großprojekte spektakulär gescheitert sind.

Genau wie Budgets sind Daten meist in dem Umfang vorhanden, in dem sie eben verfügbar sind. Die vorhandenen Daten klug zu nutzen: Das ist das Ziel.

Ein Beispiel aus der Praxis: Es gibt sehr große Firmen mit riesigen Datenmengen, die mir, nachdem ich bei ihnen einen Drucker gekauft habe, weiter Werbung für andere Drucker zeigen anstatt Werbung für passende Toner. So eine KI würde mir kein mittelständisches Unternehmen abnehmen.

Gleichzeitig werden Sie sich wundern, welches Wissen oft schon in einfachen Excel-Tabellen schlummert. Wissen Sie zum Beispiel, was Ihnen der höchste Umsatz eines Kunden in den letzten zwölf Monaten und die Zeitabstände der letzten drei Bestellungen schon jetzt über die nächste Bestellung verraten?

In meinen Recherchen zum Thema bin ich oft an hohen Einstiegshürden gescheitert. Trotzdem habe ich gespürt, dass das Thema wichtig ist. Das war mitunter frustrierend. Welche Fragen sollte ich mir als Mitarbeiter:in stellen, wenn ich mich für Data Science interessiere, aber keine Vorkenntnisse habe?

Stephan Matzka: Das Wichtigste ist erstmal, sich nicht abschrecken zu lassen. 80% der Themen lassen sich zum Beispiel komplett ohne Mathematik erklären. Nochmal 15% sind Stoff der Sekundarstufe, bleiben noch 5% übrig. Die haben es tatsächlich in sich und dann können Sie sich immer noch entscheiden: Finde ich das Thema so spannend (und habe ich die Zeit), dass ich mich auch da noch reinarbeite. Oder reichen mir die 95% Verständnis für die zuverlässige Lösung meiner Business-Fragestellungen aus. Viel entscheidender ist für mich, sich dem Thema mutig anzunehmen, die ersten Erfolge zu feiern und mit diesem Rückenwind die nächsten Schritte zu tun.

Vielen Dank für das Gespräch, Herr Prof. Matzka!

Data Dimensionality Reduction Series: Random Forest

Hello lovely individuals, I hope everyone is doing well, is fantastic, and is smiling more than usual. In this blog we shall discuss a very interesting term used to build many models in the Data science industry as well as the cyber security industry.

SUPER BASIC DEFINITION OF RANDOM FOREST:

Random forest is a form of Supervised Machine Learning Algorithm that operates on the majority rule. For example, if we have a number of different algorithms working on the same issue but producing different answers, the majority of the findings are taken into account. Random forests, also known as random selection forests, are an ensemble learning approach for classification, regression, and other problems that works by generating a jumble of decision trees during training.

When it comes to regression and classification, random forest can handle both categorical and continuous variable data sets. It typically helps us outperform other algorithms and overcome challenges like overfitting and the curse of dimensionality.

QUICK ANALOGY TO UNDERSTAND THINGS BETTER:

Uncle John wants to see a doctor for his acute abdominal discomfort, so he goes to his pals for recommendations on the top doctors in town. After consulting with a number of friends and family members, Atlas chooses to visit the doctor who received the highest recommendations.

So, what does this mean? The same is true for random forests. It builds decision trees from several samples and utilises their majority vote for classification and average for regression.

HOW BIAS AND VARIANCE AFFECTS THE ALGORITHM?

  1. BIAS
  • The algorithm’s accuracy or quality is measured.
  • High bias means a poor match
  1. VARIANCE
  • The accuracy or specificity of the match is measured.
  • A high variance means a weak match

We would like to minimise each of these. But, unfortunately we can’t do this independently, since there is a trade-off

EXPECTED PREDICTION ERROR = VARIANCE + BIAS^2 + NOISE^2

Bias vs Variance Tradeoff

HOW IS IT DIFFERENT FROM OTHER TWO ALGORITHMS?

Every other data dimensionality reduction method, such as missing value ratio and principal component analysis, must be built from the scratch, but the best thing about random forest is that it comes with built-in features and is a tree-based model that uses a combination of decision trees for non-linear data classification and regression.

Without wasting much time, let’s move to the main part where we’ll discuss the working of RANDOM FOREST:

WORKING WITH RANDOM FOREST:

As we saw in the analogy, RANDOM FOREST operates on the basis of ensemble technique; however, what precisely does ensemble technique mean? It’s actually rather straightforward. Ensemble simply refers to the combination of numerous models. As a result, rather than a single model, a group of models is utilised to create predictions.

ENSEMBLE TECHNIQUE HAS 2 METHODS:

Ensemble Learning: Bagging and Boosting

1] BAGGING

2] BOOSTING

Let’s dive deep to understand things better:

1] BAGGING:

LET’S UNDERSTAND IT THROUGH A BETTER VIEW:

Bagging simply helps us to reduce the variance in a loud datasets. It works on an ensemble technique.

  1. Algorithm independent : general purpose technique
  2. Well suited for high variance algorithms
  3. Variance reduction is achieved by averaging a group of data.
  4. Choose # of classifiers to build (B)

DIFFERENT TRAINING DATA:

  1. Sample Training Data with Replacement
  2. Same algorithm on different subsets of training data

APPLICATION :

  1. Use with high variance algorithms (DT, NN)
  2. Easy to parallelize
  3. Limitation: Loss of Interpretability
  4. Limitation: What if one of the features dominates?

SUMMING IT ALL UP:

  1. Ensemble approach = Bootstrap Aggregation.
  2. In bagging a random dataset is selected as shown in the above figure and then a model is built using those random data samples which is termed as bootstrapping.
  3. Now, when we train this random sample data it is not mendidate to select data points only once, while training the sample data we can select the individual data point more then once.
  4. Now each of these models is built and trained and results are obtained.
  5. Lastly the majority results are being considered.

We can even calculate  the error from this thing know as random forest OOB error:

RANDOM FORESTS: OOB ERROR  (Out-of-Bag Error) :

▪ From each bootstrapped sample, 1/3rd of it is kept aside as “Test”

▪ Tree built on remaining 2/3rd

▪ Average error from each of the “Test” samples is called “Out-of-Bag Error”

▪ OOB error provides a good estimate of model error

▪ No need for separate cross validation

2] BOOSTING:

Boosting in short helps us to improve our prediction by reducing error in predictive data analysis.

Weak Learner: only needs to generate a hypothesis with a training accuracy greater than 0.5, i.e., < 50% error over any distribution.

KEY INTUITION:

  1. Strong learners are very difficult to construct
  2. Constructing weaker Learners is relatively easy influence with the empirical squared improvement when assigned to the model

APPROACH OUTLINE:

  1. Start with a ML algorithm for finding the rough rules of thumb (a.k.a. “weak” or “base” algorithm)
  2. Call the base algorithm repeatedly, each time feeding it a different subset of the training examples
  3. The basic learning algorithm creates a new weak prediction rule each time it is invoked.
  4. After several rounds, the boosting algorithm must merge these weak rules into a single prediction rule that, hopefully, is considerably more accurate than any of the weak rules alone.

TWO KEY DETAILS :

  1. In each round, how is the distribution selected ?
  2. What is the best way to merge the weak rules into a single rule?

BOOSTING is classified into two types:

1] ADA BOOST

2] XG BOOST

As far as the Random forest is concerned it is said that it follows the bagging method, not a boosting method. As the name implies, boosting involves learning from others, which in turn increases learning. Random forests have trees that run in parallel. While creating the trees, there is no interaction between them.

Boosting helps us reduce the error by decreasing the bias whereas, on other hand, Bagging is a manner to decrease the variance within the prediction with the aid of generating additional information for schooling from the dataset using mixtures with repetitions to provide multi-sets of the original information.

How Bagging helps with variance – A Simple Example

BAGGED TREES

  1. Decision Trees have high variance
  2. The resultant tree (model) is determined by the training data.
  3. (Unpruned) Decision Trees tend to overfit
  4. One option: Cost Complexity Pruning

BAG TREES

  1. Sample with replacement (1 Training set → Multiple training sets)
  2. Train model on each bootstrapped training set
  3. Multiple trees; each different : A garden ☺
  4. Each DT predicts; Mean / Majority vote prediction
  5. Choose # of trees to build (B)

ADVANTAGES

Reduce model variance / instability.

RANDOM FOREST : VARIABLE IMPORTANCE

VARIABLE IMPORTANCE :

▪ Each time a tree is split due to a variable m, Gini impurity index of the parent node is higher than that of the child nodes

▪ Adding up all Gini index decreases due to variable m over all trees in the forest, gives a measure of variable importance

IMPORTANT FEATURES AND HYPERPARAMETERS:

  1. Diversity :
  2. Immunity to the curse of dimensionality :
  3. Parallelization :
  4. Train-Test split :
  5. Stability :
  6. Gini significance (or mean reduction impurity) :
  7. Mean Decrease Accuracy :

FEATURES THAT IMPROVE THE MODEL’S PREDICTIONS and SPEED :

  1. maximum_features :

Increasing max features often increases model performance since each node now has a greater number of alternatives to examine.

  1. n_estimators :

The number of trees you wish to create before calculating the maximum voting or prediction averages. A greater number of trees improves speed but slows down your code.

  1. min_sample_leaf :

If you’ve ever designed a decision tree, you’ll understand the significance of the minimal sample leaf size. A leaf is the decision tree’s last node. A smaller leaf increases the likelihood of the model collecting noise in train data.

  1. n_jobs :

This option instructs the engine on how many processors it is permitted to utilise.

  1. random_state :

This argument makes it simple to duplicate a solution. If given the same parameters and training data, a definite value of random state will always provide the same results.

  1. oob_score:

A random forest cross validation approach is used here. It is similar to the leave one out validation procedure, except it is significantly faster.

LET’S SEE THE STEPS INVOLVED IN IMPLEMENTATION OF RANDOM FOREST ALGORITHM:

Step1: Choose T- number of trees to grow

Step2: Choose m<p (p is the number of total features) —number of features used to calculate the best split at each node (typically 30% for regression, sqrt(p) for classification)

Step3: For each tree, choose a training set by choosing N times (N is the number of training examples) with replacement from the training set

Step4: For each node, calculate the best split, Fully grown and not pruned.

Step5: Use majority voting among all the trees

Following is a full case study and implementation of all the principles we just covered, in the form of a jupyter notebook including every concept and all you ever wanted to know about RANDOM FOREST.

GITHUB Repository for this blog article: https://gist.github.com/Vidhi1290/c9a6046f079fd5abafb7583d3689a410

AI for games, games for AI

1, Who is playing or being played?

Since playing Japanese video games named “Demon’s Souls” and “Dark Souls” when they were released by From Software, I had played almost no video games for many years. During the period, From Software established one genre named soul-like games. Soul-like games are called  死にゲー in Japanese, which means “dying games,” and they are also called マゾゲー, which means “masochistic games.”  As the words imply, you have to be almost masochistic to play such video games because you have to die numerous times in them. And I think recently it has been one of the most remarkable times for From Software because in November of 2021 “Dark Souls” was selected the best video game ever by Golden Joystick Awards. And in the end of last February a new video game by From Software called “Elden Ring” was finally released. After it proved that Miyazaki Hidetaka, the director of Soul series, collaborated with George RR Martin, the author of the original of “Game of Thrones,” “Elden Ring” had been one of the most anticipated video games. In spite of its notorious difficulty as well as other soul-like games so far, “Elden Ring” became a big hit, and I think Miyazak Hidetaka is now the second most famous Miyazaki in the world.  A lot of people have been playing it, raging, and screaming. I was no exception, and it took me around 90 hours to finish the video game, breaking a game controller by the end of it. It was a long time since I had been so childishly emotional last time, and I was almost addicted to trial and errors the video game provides. At the same time, one question crossed my mind: is it the video game or us that is being played?

The childhood nightmare strikes back. Left: the iconic and notorious boss duo Ornstein and Smough in Dark Souls (2011), right: Godskin Duo in Elden Ring (2022).

Miyazaki Hidetaka entered From Software in 2004 and in the beginning worked as a programmer of game AI, which controls video games in various ways. In the same year an AI researcher Miyake Youichiro also joined From Software, and I studied a little about game AI by his book after playing “Elden Ring.” I found that he also joined “Demon’s Souls,” in which enemies with merciless game AI were arranged, and I had to conquer them to reach the demon in the end at every dungeon. Every time I died, even in the terminal place with the boss fight, I had to restart from the start, with all enemies reviving. That requires a lot of trial and errors, and that was the beginning of soul-like video games today.  In the book by the game AI researcher who was creating my tense and almost traumatizing childhood experiences, I found that very sophisticated techniques have been developed to force players to do trial and errors. They were sophisticated even at a level of controlling players at a more emotional level. Even though I am familiar with both of video games and AI at least more than average, it was not until this year that I took care about this field. After technical breakthroughs mainly made Western countries, video game industry showed rapid progress, and industry is now a huge entertainment industry, whose scale is now bigger that those of movies and music combined. Also the news that Facebook changed its named to Meta and that Microsoft announced to buy Activision Blizzard were sensational recently. However media coverage about those events would just give you impressions that those giant tech companies are making uses of the new virtual media as metaverse or new subscription services. At least I suspect these are juts limited sides of investments on the video game industry.

The book on game AI also made me rethink AI technologies also because I am currently writing an article series on reinforcement learning (RL) as a kind of my study note. RL is a type of training of an AI agent through trial-and-error-like processes. Rather than a labeled dataset, RL needs an environment. Such environment receives an action from an agent and gives the consequent state and next reward. From a view point of the agent, it give an action and gets the consequent next state and a corresponding reward, which looks like playing a video game. RL mainly considers a more simplified version of video-game-like environments called a Markov decision processes (MDPs), and in an MDP at a time step t an RL agents takes an action A_t, and gets the next state S_t and a corresponding reward R_t. An MDP is often displayed as a graph at the left side below or the graphical model at the right side.

Compared to a normal labeled dataset used for other machine learning, such environment is something hard to prepare. The video game industry has been a successful manufacturer of such environments, and as a matter of fact video games of Atari or Nintendo Entertainment System (NES) are used as benchmarks of theoretical papers on RL. Such video games might be too primitive for considering practical uses, but researches on RL are little by little tackling more and more complicated video games or simulations. But also I am sure creating AI that plays video games better than us would not be their goals. The situation seems like they are cultivating a form of more general intelligence inside computer simulations which is also effective to the real world. Someday, experiences or intelligence grown in such virtual reality might be dragged to our real world.

Testing systems in simulations has been a fascinating idea, and that is also true of AI research. As I mentioned, video games are frequently used to evaluate RL performances, and there are some tools for making RL environments with modern video game engines. Providing a variety of such sophisticated computer simulations will be indispensable for researches on AI. RL models need to be trained in simulations before being applied on physical devices because most real machines would not endure numerous trial and errors RL often requires. And I believe the video game industry has a potential of developing such experimental fields of AI fueled by commercial success in entertainment. I think the ideas of testing systems or training AI in simulations is getting a bit more realistic due to recent development of transfer learning.

Transfer learning is a subfield of machine learning which apply intelligence or experiences accumulated in datasets or tasks to other datasets or tasks. This is not only applicable to RL but also to more general machine learning tasks like regression or classification. Or rather it is said that transfer learning in general machine learning would show greater progress at a commercial level than RL for the time being. And transfer learning techniques like using pre-trained CNN or BERT is already attracting a lot of attentions. But I would say this is only about a limited type of transfer learning. According to Matsui Kota in RIKEN AIP Data Driven Biomedical Science Team, transfer learning has progressed rapidly after the advent of deep learning, but many types of tasks and approaches are scattered in the field of transfer learning. As he says, the term transfer learning should be more carefully used. I would like to say the type of transfer learning discussed these days are a family of approaches for tackling lack of labels. At the same time some of current researches on transfer learning is also showing possibilities that experiences or intelligence in computer simulations are transferable to the real world. But I think we need to wait for more progress in RL before such things are enabled.

Source: https://ruder.io/transfer-learning/

In this article I would like to explain how video games or computer simulations can provide experiences to the real world in two ways. I am first going to briefly explain how video game industry in the first place has been making game AI to provide game users with tense experiences. And next I will explain how RL has become a promising technique to beat such games which were originally invented to moderately harass human players. And in the end, I am going to briefly introduce ideas of transfer learning applicable to video games or computer simulations. What I can talk in this article is very limited for these huge study areas or industries. But I hope you would see the video game industry and transfer learning in different ways after reading this article, and that might give you some hints about how those industries interact to each other in the future. And also please keep it in mind that I am not going to talk so much about growing video game markets, computer graphics, or metaverse. Here I focus on aspects of interweaving knowledge and experiences generated in simulation or real physical worlds.

2, Game AI

The fact that “Dark Souls” was selected the best game ever at least implies that current video game industry makes much of experiences of discoveries and accomplishments while playing video games, rather than cinematic and realistic computer graphics or iconic and world widely popular characters. That is a kind of returning to the origin of video games. Video games used to be just hard because the more easily players fail, the more money they would drop in arcade games. But I guess this aspect of video games tend to be missed when you talk about video games as a video game fan. What you see in advertisements of video games are more of beautiful graphics, a new world, characters there, and new gadgets. And it has been actually said that qualities of computer graphics have a big correlation with video game sales. In the third article of my series on recurrent neural networks (RNN), I explained how video game industry laid a foundation of the third AI boom during the second AI winter in 1990s. To be more concrete, graphic cards developed rapidly to realize more photo realistic graphics in PC games, and the graphic card used in Xbox was one of the first programmable GPU for commercial uses. But actually video games developed side by side with computer science also outside graphics. Thus in this section I am going to how video games have developed by focusing on game AI, which creates intelligence in video games in several ways. And my explanations on game AI is going to be a rough introduction to a huge and great educational works by Miyake Youichiro.

Playing video games is made up by decision makings, and such decision makings are made in react to game AI. In other words, a display is input into your eyes or sight nerves, and sequential decision makings, that is how you have been moving fingers are outputs. Complication of the experiences, namely hardness of video games, highly depend on game AI.  Game AI is mainly used to design enemies in video games to hunt down players. Ideally game AI has to be both rational and human. Rational game AI implemented in enemies frustrate or sometimes despair users by ruining users’ efforts to attack them, to dodge their attacks, or to use items. At the same time enemies have to retain some room for irrationality, that is they have to be imperfect. If enemies can perfectly conquer players’ efforts by instantly recognizing their commands, such video games would be theoretically impossible to beat. Trying to defeat such enemies is nothing but frustrating. Ideal enemies let down their guard and give some timings for attacking and trying to conquer them. Sophisticated game AI is inevitable to make grownups all over the world childishly emotional while playing video games.

These behaviors of game AI are mainly functions of character AI, which is a part of game AI. In order to explain game AI, I also have to explain a more general idea of AI, which is not the one often called “AI” these days. Artificial intelligence (AI) is in short a family of technologies to create intelligence, with computers. And AI can be divided into two types, symbolism AI and connectionism AI. Roughly speaking, the former is manual and the latter is automatic. Symbolism AI is described with a lot of rules, mainly “if” or “else” statements in code. For example very simply “If the score is greater than 5, the speed of the enemy is 10.” Or rather many people just call it “programming.”

*Note that in contexts of RL, “game AI” often means AI which plays video games or board games. But “game AI” in video games is a more comprehensive idea orchestrating video games.

This meme describes symbolism AI well.

What people usually call “AI” in this 3rd AI boom is the latter, connictionism AI. Connectionism AI basically means neural networks, which is said to be inspired by connections of neurons. But the more you study neural networks, the more you would see such AI just as “functions capable of universal approximation based on data.” That means, a function f, which you would have learned in school such as y = f(x) = ax + b is replaced with a complicated black box, and such black box f is automatically learned with many combinations of (x, y). And such black boxes are called neural networks, and the combinations of (x, y) datasets. Connectionism AI might sound more ideal, but in practice it would be hard to design characters in AI based on such training with datasets.

*Connectionism, or deep learning is of course also programming. But in deep learning we largely depend on libraries, and a lot of parameters of AI models are updated automatically as long as we properly set datasets. In that sense, I would connectionism is more automatic. As I am going to explain, game AI largely depends on symbolism AI, namely manual adjustment of lesser parameters, but such symbolism AI would behave much more like humans than so called “AI” these days when you play video games.

Digital game AI today is application of the both types of AI in video games. It initially started mainly with symbolism AI till around 2010, and as video games get more and more complicated connectionism AI are also introduced in game AI. Video game AI can be classified to mainly navigation AI, character AI, meta AI, procedural AI, and AI outside video games. The figure below shows relations of general AI and types of game AI.

Very simply putting, video game AI traced a history like this: the initial video games were mainly composed of navigation AI showing levels, maps, and objects which move deterministically based on programming.  Players used to just move around such navigation AI. Sooner or later, enemies got certain “intelligence” and learned to chase or hunt down players, and that is the advent of character AI. But of course such “intelligence” is nothing but just manual programs. After rapid progress of video games and their industry, meta AI was invented to control difficulties of video games, thereby controlling players’ emotions. Procedural AI automatically generates contents of video games, so video games are these days becoming more and more massive. And as modern video games are too huge and complicated to debug or maintain manually, AI technologies including deep learning are used. The figure below is a chronicle of development of video games and AI technologies covered in this article. Let’s see a brief history of video games and game AI by taking a closer look at each type of game AI a little more precisely.

Navigation AI

Navigation AI is the most basic type of game AI, and that allows character AI to recognize the world in video games. Even though I think character AI, which enables characters in video games to behave like humans, would be the most famous type of game AI, it is said navigation AI has an older history. One important function of navigation AI is to control objects in video games, such as lifts, item blocks, including attacks by such objects. The next aspect of navigation AI is that it provides character AI with recognition of worlds. Unlike humans, who can almost instantly roughly recognize circumstances, character AI cannot do that as we do. Even if you feel as if the character you are controlling are moving around mountains, cities, or battle fields, sometimes escaping from attacks by other AI, for character AI that is just moving on certain graphs. The figure below are some examples of world representations adopted in some popular video games. There are a variety of such representations, and please let me skip explaining the details of them. An important point is, relatively wide and global recognition of worlds by characters in video games depend on how navigation AI is designed.

Source: Youichiro Miyake, “AI Technologies in Game Industry”, (2020)

The next important feature of navigation AI is path finding. If you have learned engineering or programming, you should be already familiar with pathfiniding algorithms. They had been known since a long time ago, but it was not until “Counter-Strike” in 2000 the techniques were implemented at an satisfying level for navigating characters in a 3d world. Improvements of pathfinding in video games released game AI from fixed places and enabled them to be more dynamic.

*According to Miyake Youichiro, the advent of pathfinding in video games released character AI from staying in a narrow space and enable much more dynamic and human-like movements of them. And that changed game AI from just static objects to more intelligent entity.

Navigation meshes in “Counter-Strike (2000).” Thanks to these meshes, continuous 3d world can be processed as discrete nodes of graphs. Source: https://news.denfaminicogamer.jp/interview/gameai_miyake/3

Character AI

Character AI is something you would first imagine from the term AI. It controls characters’ actions in video games. And differences between navigation AI and character AI can be ambiguous. It is said Pac-Man is one of the very first character AI. Compared to aliens in Space Invader deterministically moved horizontally, enemies in Pac-Man chase a player, and this is the most straightforward difference between navigation AI and character AI.

Source: https://en.wikipedia.org/wiki/Space_Invaders https://en.wikipedia.org/wiki/Pac-Man

Character AI is a bunch of sophisticated planning algorithms, so I can introduce only a limited part of it just like navigation AI. In this article I would like to take an example of “F.E.A.R.” released in 2005. It is said goal-oriented action planning (GOAP) adopted in this video game was a breakthrough in character AI. GOAP is classified to backward planning, and if there exists backward ones, there is also forward ones. Using a game tree is an examples of forward planning. The figure below is an example of a tree game of tic-tac-toe. There are only 9 possible actions at maximum at each phase, so the number of possible states is relatively limited.

https://en.wikipedia.org/wiki/Game_tree

But with more options of actions like most of video games, forward plannings have to deal much larger sizes of future action combinations. GOAP enables realistic behaviors of character AI with a heuristic idea of planning backward. To borrow Miyake Youichiro’s expression, GOAP processes actions like sticky notes. On each sticky note, there is a combination of symbols like “whether a target is dead,” “whether a weapon is armed,” or “whether the weapon is loaded.” A sticky note composed of such symbols form an action, and each action comprises a prerequisite, an action, and an effect. And behaviors of character AI is conducted with planning like pasting the sticky notes.

Based on: Youichiro Miyake, “AI Technologies in Game Industry”, (2020)

More practically sticky notes, namely actions are stored in actions pools. For a decision making, as displayed in the left side of the figure below, actions are connected as a chain. First an action of a goal is first set, and an action can be connected to the prerequisite of the goal via its effect. Just as well corresponding former actions are selected until the initial state.  In the example of chaining below, the goal is “kSymbol_TargetIsDead,” and actions are chained via “kSymbol_TargetIsDead,” “kSymbol_WeaponLoaded,” “kSymbol_WeaponArmed,” and “None.” And there are several combinations of actions to reach a certain goal, so more practically each action has a cost, and the most ideal behavior of character AI is chosen by pathfinding on a graph like the right side of the figure below. And the best planning is chosen by a pathfinding algorithm.

Based on: Youichiro Miyake, “AI Technologies in Game Industry”, (2020)

Even though many of highly intelligent behaviors of character AI are implemented as backward plannings as I explained, planning forward can be very effective in some situations. Board game AI is a good example. A searching algorithm named Monte Carlo tree search is said to be one breakthroughs in board game AI. The searching algorithm randomly plays a game until the end, which is called playout. Numerous times of playouts enables evaluations of possibilities of winning. Monte Carlo Tree search also enables more efficient searches of games trees.

Meta AI

Meta AI is a type of AI such that controls a whole video game to enhance player’s experiences. To be more concrete, it adjusts difficulties of video games by for example dynamically arranging enemies. I think differences between meta AI and navigation AI or character AI can be also ambiguous. As I explained, the earliest video games were composed mainly with navigation AI, or rather just objects. Even if there are aliens or monsters, they can be just part of interactive objects as long as they move deterministically. I said character AI gave some diversities to their behaviors, but how challenging a video game is depends on dynamic arrangements of such objects or enemies. And some of classical video games like “Xevious,” as a matter of fact implemented such adjustments of difficulties of game plays. That is an advent of meta AI, but I think they were not so much distinguished from other types of AI, and I guess meta AI has been unconsciously just a part of programming.

It is said a turning point of modern meta AI is a shooting game “Left 4 Dead” released in 2008, where zombies are dynamically arranged. As well as many masterpiece thriller films, realistic and tense terrors are made by combinations of intensities and relaxations. Tons of monsters or zombies coming up one after another and just shooting them look stupid or almost like comedies. .And analyzing the success of “Counter-Strike,” they realized that users liked rhythms of intensity and relaxation, so they implemented that explicitly in “Left 4 Dead.” The graphs below concisely shows how meta AI works in the video game. When the survivor intensity, namely players’ intensity is low, the meta AI arrange some enemies. Survivor intensity increases as players fight with zombies or something, and then meta AI places fewer enemies so that players can relax. While players re relatively relaxing, desired population of enemies increases when they actually show up in video games, again the phase of intensity comes.

Source: Michael Booth, “Replayable Cooperative Game Design: Left 4 Dead”, (2009), Valve

*Soul series video games do not seem to use meta AI so much. Characters in the games are rearranged in more or less the same ways every time players fail. Soul-like games make much of experiences that players find solutions by themselves, which means that manual but very careful arrangements of enemies and interactive objects are also very effective.

Meta AI can be used to make video games more addictive using data analysis. Recent social network games can record logs of game plays. Therefore if you can observe a trend that more users unsubscribe when they get less rewards in certain online events, operating companies of the game can adjust chances of getting “rare” items.

Procedural AI and AI outside video games

How clearly you can have an image of what I am going to explain in this subsection would depend how recently you have played video games. If your memories of playing video games stops with good old days of playing side-scrolling ones like Super Mario Brothers, you should at first look up some videos of playing open world games. Open world means a use of a virtual reality in which players can move an behave with a high degree of freedom. The term open world is often used as opposed to the linear games, where players have process games in the order they are given. Once you are immersed in photorealistic computer graphic worlds in such open world games, you would soon understand why metaverse is attracting attentions these days. Open world games for example like “Fallout 4” are astonishing in that you can even talk to almost everyone in them. Just as “Elden Ring” changed former soul series video games into an open world one, it seems providing open world games is one way to keep competitive in the video game industry. And such massive world can be made also with a help of procedural AI. Procedural AI can be seen as a part of meta AI, and it generates components of games such as buildings, roads, plants, and even stories. Thanks to procedural AI, video game companies with relatively small domestic markets like Poland can make a top-level open world game such as “The Witcher 3: Wild Hunt.”

An example of technique of procedural AI adopted in “The Witcher 3: Wild Hunt” for automatically creating the massive open world. Source: Marcin Gollent, “Landscape creation and rendering in REDengine 3”, (2014), Game Developers Conference

Creating a massive world also means needs of tons of debugging and quality assurance (QA). Combining works by programmers, designers, and procedural AI will cause a lot of unexpected troubles when it is actually played. AI outside game can be used to find these problems for quality assurance. Debugging and and QA have been basically done manually, and especially when it comes to QA, video game manufacturer have to employ a lot of gamer to let them just play prototype of their products. However as video games get bigger and bigger, their products are not something that can be maintained manually anymore. If you have played even one open world game, that would be easy to imagine, so automatic QA would remain indispensable in the video game industry. For example an open world game “Horizon Zero Dawn” is a video game where a player can very freely move around a massive world like a jungle. The QA team of this video game prepared bug maps so that they can visualize errors in video games. And they also adopted a system named “Apollo-Autonomous Automated Autobots” to let game AI automatically play the video game and record bugs.

As most video games both in consoles or PCs are connected to the internet these days, these bugs can be fixed soon with updates. In addition, logs of data of how players played video games or how they failed can be stored to adjust difficulties of video games or train game AI. As you can see, video games are not something manufacturers just release. They are now something develop interactively between users and developers, and players’ data is all exploited just as your browsing history on the Internet.

I have briefly explained AI used for video games over four topics. In the next two sections, I am going to explain how board games and video games can be used for AI research.

3, Reinforcement learning: we might be a sort of well-made game AI models

Machine learning, especially RL is replacing humans with computers, however with incredible computation resources. Invention of game AI, in this context including computers playing board games, has been milestones of development of AI for decades. As Western countries had been leading researches on AI, defeating humans in chess, a symbol of intelligence, had been one of goals. Even Alan Turing, one of the fathers of computers, programmed game AI to play chess with one of the earliest calculators. Searching algorithms with game trees were mainly studied in the beginning. Game trees are a type of tree graphs to show how games proceed, by expressing future possibilities with diverging tree structures. And searching algorithms are often used on tree graphs to ignore future steps which are not likely to be effective, which often looks like cutting off branches of trees. As a matter of fact, chess was so “simple” that searching algorithms alone were enough to defeat Garry Kasparov, the world chess champion at that time in 1997. That is, growing trees and trimming them was enough for the “simplicity” of chess as long as a super computer of IBM was available. After that computer defeated one of the top players of shogi, a Japanese version of chess, in 2013. And remarkably, in 2016 AlphaGo of DeepMind under Google defeated the world go champion. Game AI has been gradually mastering board games in order of increasing search space size.

Source: https://www.livescience.com/59068-deep-blue-beats-kasparov-progress-of-ai.html https://fortune.com/2016/03/21/google-alphago-win-artificial-intelligence/

We can say combinations of techniques which developed in different streams converged into game AI today, like I display in the figure below. In AlphaGo or maybe also general game AI, neural networks enable “intuition” on phases of board games, searching algorithms enables “foreseeing,” and RL “experiences.” And as almost no one can defeat computers in board games anymore, the next step of game AI is how to conquer other video games.  Since progress of convolutional neural network (CNN) in this 3rd AI boom, computers got “eyes” like we do, and the invention of ResNet in 2015 is remarkable. Thus we can now use displays of video games as inputs to neural networks. And combinations of reinforcement learning and neural networks like (CNN) is called deep reinforcement learning. Since the advent of deep reinforcement learning, many people are trying to apply it on various video games, and they show impressive results. But in general that is successful in bird’s-eye view games. Even if some of researches can be competitive or outperform human players, even in first person shooting video games, they require too much computational resources and heuristic techniques. And usually they take too much time and computer resource to achieve the level.

*Even though CNN is mainly used as “eyes” of computers, it is also used to process a phase of a board game. That means each phase of is processed like an arrangement of pixels of an image. This is what I mean by “intuition” of deep learning. Just as neural networks can recognize objects, depending on training methods they can recognize boards at a high level.

Now I would like you to think about what “smartness” means. Competency in board games tend to have correlations with mathematical skills. And actually in many cases people proficient in mathematics are also competent in board games. Even though AI can defeat incredibly smart top board game players to the best of my knowledge game AI has yet to play complicated video games with more realistic computer graphics. As I explained, behaviors of character AI is in practice implemented as simpler graphs, and tactics taken in such graphs will not be as complicated as game trees of competitive board games. And the idea of game AI playing video games itself not new, and it is also used in debugging of video games. Thus the difficulties of computers playing video games would come more from how to associate what they see on displays with more long-term and more abstract plannings. And currently, kids would more flexibly switch to other video games and play them more professionally in no time. I would say the difference is due to frames of tasks. A frame roughly means a domain or a range which is related to a task. When you play a board game, its frame is relatively small because everything you can do is limited in the rule of the game which can be expressed as simple data structure. But playing video games has a wider frame in that you have to recognize only the necessary parts important for playing video games from its constantly changing displays, namely sequences of RGB images. And in the real world, even a trivial action like putting a class on a table is selected from countless frames like what your room looks like, how soft the floor is, or what the temperature is. Human brains are great in that they can pick up only necessary frames instantly.

As many researchers would already realize, making smaller models with lower resources which can learn more variety of tasks is going to be needed, and it is a main topic these days not only in RL but also in other machine learning. And to be honest, I am skeptical about industrial or academic benefits of inventing specialized AI models for beating human players with gigantic computation resources. That would be sensational and might be effective for gathering attentions and funds. But as many AI researchers would already realize, inventing a more general intelligence which would more flexibly adjust to various tasks is more important. Among various topics of researches on the problem, I am going to pick up transfer learning in the next section, but in a more futuristic and dreamy sense.

4, Transfer learning and game for AI

In an event with some young shogi players, to a question “What would you like to request to a god?” Fujii Sota, the youngest top shogi player ever, answered “If he exists, I would like to ask him to play a game with me.” People there were stunned by the answer. The young genius, contrary to his sleepy face, has an ambition which only the most intrepid figures in mythology would have had. But instead of playing with gods, he is training himself with game AI of shogi. His hobby is assembling computers with high end CPUs, whose performance is monstrous for personal home uses. But in my opinion such situation comes from a fact that humans are already a kind of well-made machine learning models and that highly intelligent games for humans have very limited frames for computers.

*It seems it is not only computers that need huge energy consumption to play board games. Japanese media often show how gorgeous and high caloric shogi players’ meals are during breaks. And more often than not, how fancy their feasts are is the only thing most normal spectators like me in front of TVs can understand, albeit highly intellectual tactics made beneath the wooden boards.

As I have explained, the video game industry has been providing complicated simulational worlds with sophisticated ensemble of game AI in both symbolism and connectionism ways. And such simulations, initially invented to hunt down players, are these days being conquered especially by RL models, and the trend showed conspicuous progress after the advent of deep learning, that is after computers getting “eyes.” The next problem is how to transfer the intelligence or experiences cultivated in such simulations to the real world. Only humans can successfully train themselves with computer simulations today as far as I know, but more practically it is desired to transfer experiences with wider frames to more inflexible entities like robots. Such technologies would be ideal especially for RL because physical devices cannot make numerous trial and errors in the real world. They should be trained in advance in computer simulations. And transfer learning could be one way to take advantages of experiences in computer simulations to the real world. But before talking about such transfer learning, we need to be careful about the term “transfer learning.” Transfer learning is a family of machine learning technologies to makes uses of knowledge learned in a dataset, which is usually relatively huge, to another task with another dataset. Even though I have been emphasizing transferring experiences in computer simulations, transfer learning is a more general idea applicable to more general use cases, also outside computer simulations. Or rather, transfer learning is attracting a lot of attentions as a promising technique for tackling lack of data in general machine learning. And another problem is even though transfer learning has been rapidly developing recently, various research topics are scattered in the field called “transfer learning.” And arranging these topics would need extra articles or something. Thus in the rest of this article,  I would like to especially focus on uses of video games or computer simulations in transfer learning. When it comes to already popular and practical transfer learning techniques like fine tuning with pre-trained backbone CNN or BERT, I am planning to cover them with more practical introduction in one of my upcoming articles. Thus in this article, after simply introducing ideas of domains and transfer learning, I am going to briefly introduce transfer learning and explain domain adaptation/randomization.

Domain and transfer learning

There is a more strict definition of a domain in machine learning, but all you have to know is it means in short a type of dataset used for a machine learning task. And different domains have a domain shift, which in short means differences in the domains. A text dataset and an image dataset have a domain shift. An image dataset of real objects and one with cartoon images also have a smaller domain shift. Even differences in lighting or angles of cameras would cause a domain  shift. In general, even if a machine learning model is successful in tasks in a domain, even a domain shift which is trivial to humans declines performances of the model. In other words, intelligence learned in one domain is not straightforwardly applicable to another domain as humans can do. That is, even if you can recognize objects both a real and cartoon cars as a car, that is not necessarily true of machine learning models. As a family of techniques for tackling this problem, transfer learning makes a use of knowledge in a source domain (the dots in blue below), and apply the knowledge to a target domain. And usually, a source domain is assumed to be large and labeled, and on the other hand a target domain is assumed to be relatively small or even unlabeled. And tasks in a source or a target domain can be different. For example, CNN models trained on classification of ImageNet can be effectively used for object detection. Or BERT is trained on a huge corpus in a self-supervised way, but it is applicable to a variety of tasks in natural language processing (NLP).

*To people in computer vision fields, an explanation that BERT is a NLP version of pre-trained CNN would make the most sense. Just as a pre-trained CNN maps an image, arrangements of RGB pixels values, to a vector representing more of “meaning” of the image, BERT maps a text,  a sequence of one-hot encodings, into a vector or a sequence of vectors in a semantic field useful for NLP.

Transfer learning is a very popular topic, and it is hard to arrange and explain types of existing techniques. I think that is because many people are tackling more or less the similar problems with slightly different approaches. For now I would like you to keep it in mind that there are roughly three points below to consider in transfer learning

  1. What to transfer
  2. When to transfer
  3. How to transfer

The answer of the second point above “When to transfer” is simply “when domains are more or less alike.” Transfer learning assume similarities between target and source domains to some extent. “How to transfer” is very task-specific, so this is not something I can explain briefly here. I think the first point “what to transfer” is the most important for now to avoid confusions about what “transfer learning” means. “What to transfer” in transfer learning is also classified to the three types below.

  • Instance transfer (transferring datasets themselves)
  • Feature transfer (transferring extracted features)
  • Parameter transfer (transferring pre-trained models)

In fact, when you talk about already practical transfer learning techniques like using pre-trained CNN or BERT, they refer to only parameter transfer above. And please let me skip introducing it in this article. I am going to focus only on techniques related to video games in this article.

*I would like to give more practical introduction on for example BERT in one of my upcoming articles.

Domain adaptation or randomization

I first got interested in relations of video games and AI research because I was studying domain adaptation, which tackles declines of machine learning performance caused by a domain shift. Domain adaptation is sometimes used as a synonym to transfer learning. But compared to that general transfer learning also assume different tasks in different domains, domain adaptation assume the same task. Thus I would say domain adaptation is a subfield of transfer learning. There are several techniques for domain adaptation, and in this article I would like to take feature alignment as an example of frequently used approaches. Input datasets have a certain domain shift like blue and circle dots in the figure below. This domain shift cannot be changed if datasets themselves are not directly converted. Feature alignment make the domain shift smaller in a feature space after data being processed by the feature extractor. The features expressed as square dots in the figure are passed to task-specific networks just as normal machine learning. With sufficient labels in the source domain and with fewer or no labels in the target one, the task-specific networks are supervised. On the other hand, the features are also passed to the domain discriminator, and the discriminator predicts which domain the feature comes from. The domain discriminator is normally trained with supervision by classification loss, but the feature supervision is reversed when it trains the feature extractor. Due to the reversed supervision the feature extractor learns mix up features because that is worse for discriminating distinguishing the source or target domains. In this way, the feature extractor learns extract domain invariant features, that is more general features both domains have in common.

*The feature extractor and the domain discriminator is in a sense composing generative adversarial networks (GAN), which is often used in data generation. To know more about GAN, you could check for example this article.

One of motivations behind domain adaptation is that it enables training AI tasks with synthetic datasets made by for example computer graphics because they are very easy to annotate and prepare labels necessary for machine learning tasks. In such cases, domain invariant features like curves or silhouettes are expected to learn. And learning computer vision tasks from GTA5 dataset which are applicable to Cityscapes dataset is counted as one of challenging tasks in papers on domain adaptation. GTA of course stands for “Grand Theft Auto,” the video open-world video game series. If this research continues successfully developing, that would imply possibilities of capability of teaching AI models to “see” only with video games. Imagine that a baby first learns to play Grand Theft Auto 5 above all and learns what cars, roads, and pedestrians are.  And when you bring the baby outside, even they have not seen any real cars, they point to a real cars and people and say “car” and “pedestrians,” rather than “mama” or “dada.”

In order to enable more effective domain adaptation, Cycle GAN is often used. Cycle GAN is a technique to map texture in one domain to another domain. The figure below is an example of applying Cycle GAN on GTA5 dataset and Cityspaces Dataset, and by doing so shiny views from a car in Los Santos can be converted to dark and depressing scenes in Germany in winter. This instance transfer is often used in researches on domain adaptation.

Source: https://junyanz.github.io/CycleGAN/

Even if you mainly train depth estimation with data converted like above, the model can predict depth data of the real world domain without correct depth data. In the figure below, A is the target real data, B is the target domain converted like a source domain, and C is depth estimation on A.

Source: Abarghouei et al., “Real-Time Monocular Depth Estimation using Synthetic Data with Domain Adaptation via Image Style Transfer”, (2018), Computer Vision and Pattern Recognition Conference

Crowd counting is another field where making a labeled dataset with video games is very effective. A MOD for making a crowd arbitrarily is released, and you can make labeled datasets like below.

Source: https://gjy3035.github.io/GCC-CL/

*Introducing GTA mod into research is hilarious. You first need to buy PC software of Grand Theft Auto 5 and gaming PC at first. And after finishing the first tutorial in the video game, you need to find a place to place a camera, which looks nothing but just playing video games with public money.

Domain adaptation problems I mentioned are more of matters of how to let computers “see” the world with computer simulations. But the gap between the simulational worlds and the real world does not exist only in visual ways like in CV. How robots or vehicles are parametrized in computers also have some gaps from the real world, so even if you replace only observations with simulations, it would be hard to train AI. But surprisingly, some researches have already succeeded in training robot arms only with computer simulations. An approach named domain randomization seems to be more or less successful in training robot arms only with computer simulations and apply the learned experience to the real world. Compared to domain adaptation aligned source domain to the target domain, domain randomization is more of expanding the source domain by changing various parameters of the source domain. And the target domain, namely robot arms in the real world is in the end included in the expanded source domain. And such expansions are relatively easy with computer simulations.

For example a paper “Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience” proposes a technique to reflect real world feed back to simulations in domain randomization, and this pipeline enables a robot arm to do real world tasks in a few iteration of real world trainings.

Based on: Chebotar et al. , “Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience”, (2019), International Conference on Robotics and Automation

As the video shows, the ideas of training a robot with computer simulations is becoming more realistic.

The future of games for AI

I have been emphasizing how useful video games are in AI researches, but I am not sure if how much the field purely rely on video games like it is doing especially on RL. Autonomous driving is a huge research field, and modern video games like Grand Thef Auto are already good driving simulations in urban areas. But other realistic simulations like CARLA have been developed independent of video games. And in a paper “Exploring the Limitations of Behavior Cloning for Autonomous Driving,” some limitations of training self-driving cars in the simulation are reported. And some companies like Waymo switched to recurrent neural networks (RNN) for self-driving cars. It is natural that fields like self-driving, where errors of controls can be fatal, are not so optimistic about adopting RL for now.

But at the same time, Microsoft bought a Project Bonsai, which is aiming at applying RL to real world tasks. Also Microsoft has Project Malmo or AirSim, which respectively use Minecraft or Unreal Engine for AI reseraches. Also recently the news that Microsoft bought Activision Blizzard was a sensation last year, and media’s interests were mainly about metaverse or subscription service of video games. But Microsoft also bouth Zenimax Media, is famous for open world like Fallout or Skyrim series. Given that these are under Microsoft, it seems the company has been keen on merging AI reserach and developing video games.

As I briefly explained, video games can be expanded with procedural AI technologies. In the future AI might be trained in video game worlds, which are augmented with another form of AI. Combinations of transfer learning and game AI might possibly be a family of self-supervising technologies, like an octopus growing by eating its own feet. At least the biggest advantage of the video game industry is, even technologies themselves do not make immediate profits, researches on them are fueled by increasing video game fans all over the world. This is a kind of my sci-fi imagination of the world. Though I am not sure which is more efficient to manually design controls of robots or training AI in such indirect ways. And I prefer to enhance physical world to metaverse. People should learn to put their controllers someday and to enhance the real world. Highly motivated by “Elden Ring” I wrote this article. Some readers might got interested in the idea of transferring experiences in computer simulations to the real world. I am also going to write about transfer learning in general that is helpful in practice.

[1]三宅 陽一郎, 「ゲームAI技術入門 – 広大な人工知能の世界を体系的に学ぶ」, (2019), 技術評論社
Miyake Youichiro, “An Introduction to Game AI – Systematically Learning the Wide World of Artificial Intelligence”, (2019), Gijutsu-hyoron-sya

[2]三宅 陽一郎, 「21世紀に“洋ゲー”でゲームAIが遂げた驚異の進化史。その「敗戦」から日本のゲーム業界が再び立ち上がるには?【AI開発者・三宅陽一郎氏インタビュー】」, (2017), 電ファミニコゲーマー
Miyake Youichiro, ”The history of Astonishing Game AI which Western Video Games in 21st Century Traced. What Should the Japanese Video Game Industry Do to Recover from the ‘Defeat in War’? [An Interview with an AI Developer Miyake Yoichiro]”

[3]Rob Leane, “Dark Souls named greatest game of all time at Golden Joysticks”, (2021), RadioTimes.com

[4] Matsui Kota, “Recent Advances on Transfer Learning and Related Topics (ver.2)”, (2019), RIKEN AIP Data Driven Biomedical Science Team

[3] Sebastian Ruder, “Transfer Learning – Machine Learning’s Next Frontier”, (2017)
https://ruder.io/transfer-learning/

[4] Matsui Kota, “Recent Advances on Transfer Learning and Related Topics (ver.2)”, (2019), RIKEN AIP Data Driven Biomedical Science Team

[5] Youichiro Miyake, “AI Technologies in Game Industry”, (2020)
https://www.slideshare.net/youichiromiyake/ai-technologies-in-game-industry-english

[6] Michael Booth, “Replayable Cooperative Game Design: Left 4 Dead”, (2009), Valve

[7] Marcin Gollent, “Landscape creation and rendering in REDengine 3”, (2014), Game Developers Conference
https://www.gdcvault.com/play/1020197/Landscape-Creation-and-Rendering-in

* I make study materials on machine learning, sponsored by DATANOMIQ. I do my best to make my content as straightforward but as precise as possible. I include all of my reference sources. If you notice any mistakes in my materials, including grammatical errors, please let me know (email: yasuto.tamura@datanomiq.de). And if you have any advice for making my materials more understandable to learners, I would appreciate hearing it.

AI Role Analysis in Cybersecurity Sector

Cybersecurity as the name suggests is the process of safeguarding networks and programs from digital attacks. In today’s times, the world sustains on internet-connected systems that carry humungous data that is highly sensitive. Cyberthreats are on the rise with unscrupulous hackers taking over the entire industry by storm, with their unethical practices. This not only calls for more intense cyber security laws, but also the vigilance policies of the corporates, big and small, government as well as non-government; needs to be revisited.

With such huge responsibility being leveraged over the cyber-industry, more and more cyber-security enthusiasts are showing keen interest in the industry and its practices. In order to further the process of secured internet systems for all, unlike data sciences and other industries; the Cybersecurity industry has seen a workforce rattling its grey muscle with every surge they experience in cyber threats. Talking of AI impressions in Cybersecurity is still in its nascent stages of deployment as humans are capable of more; when assisted with the right set of tools.

Automatically detecting unknown workstations, servers, code repositories, and other hardware and software on the network are some of the tasks that could be easily managed by AI professionals, which were conducted manually by Cybersecurity folks. This leaves room for cybersecurity officials to focus on more urgent and critical tasks that need their urgent attention. Artificial intelligence can definitely do the leg work of processing and analyzing data in order to help inform human decision-making.

AI in cyber security is a powerful security tool for businesses. It is rapidly gaining its due share of trust among businesses for scaling cybersecurity. Statista, in a recent post, listed that in 2019, approximately 83% of organizations based in the US consider that without AI, their organization fails to deal with cyberattacks. AI-cyber security solutions can react faster to cyber security threats with more accuracy than any human. It can also free up cyber security professionals to focus on more critical tasks in the organization.

CHALLENGES FACED BY AI IN CYBER SECURITY

As it is said, “It takes a thief to catch a thief”. Being in its experimental stages, its cost could be an uninviting factor for many businesses. To counter the threats posed by cybercriminals, organizations ought to level up their internet security battle. Attacks backed by the organized crime syndicate with intentions to dismantle the online operations and damage the economy are the major threats this industry face today. AI is still mostly experimental and, in its infancy, hackers will find it much easy to carry out speedier, more advanced attacks. New-age automation-driven practices are sure to safeguard the crumbling internet security scenarios.

AI IN CYBER SECURITY AS A BOON

There are several advantageous reasons to embrace AI in cybersecurity. Some notable pros are listed below:

  •  Ability to process large volumes of data
    AI automates the creation of ML algorithms that can detect a wide range of cybersecurity threats emerging from spam emails, malicious websites, or shared files.
  • Greater adaptability
    Artificial intelligence is easily adaptable to contemporary IT trends with the ever-changing dynamics of the data available to businesses across sectors.
  • Early detection of novel cybersecurity risks
    AI-powered cybersecurity solutions can eliminate or mitigate the advanced hacking techniques to more extraordinary lengths.
  • Offers complete, real-time cybersecurity solutions
    Due to AI’s adaptive quality, artificial intelligence-driven cyber solutions can help businesses eliminate the added expenses of IT security professionals.
  • Wards off spam, phishing, and redundant computing procedures
    AI easily identifies suspicious and malicious emails to alert and protect your enterprise.

AI IN CYBERSECURITY AS A BANE

Alongside the advantages listed above, AI-powered cybersecurity solutions present a few drawbacks and challenges, such as:

  • AI benefits hackers
    Hackers can easily sneak into the data networks that are rendered vulnerable to exploitation.
  • Breach of privacy
    Stealing log-in details of the users and using them to commit cybercrimes, are deemed sensitive issues to the privacy of an entire organization.
  • Higher cost for talents
    The cost of creating an efficient talent pool is very high as AI-based technologies are in the nascent stage.
  • More data, more problems
    Entrusting our sensitive data to a third-party enterprise may lead to privacy violations.

AI-HUMAN MERGER IS THE SOLUTION

AI professionals backed with the best AI certifications in the world assist corporations of all sizes to leverage the maximum benefits of the AI skills that they bring along, for the larger benefit of the organization. Cybersecurity teams and AI systems cannot work in isolation. This communion is a huge step forward to leveraging maximum benefits for secured cybersecurity applications for organizations. Hence, this makes AI in cybersecurity a much-coveted aspect to render its offerings in the long run.