My elaborate study notes on reinforcement learning

I will not tell you why, but all of a sudden I was in need of writing an article series on Reinforcement Learning. Though I am also a beginner in reinforcement learning field. Everything I knew was what I learned from one online lecture conducted in a lazy tone in my college. However in the process of learning reinforcement learning, I found a line which could connect the two dots, one is reinforcement learning and the other is my studying field. That is why I made up my mind to make an article series on reinforcement learning seriously.

To be a bit more concrete, I imagine that technologies in our world could be enhanced by a combination of reinforcement learning and virtual reality. That means companies like Toyota or VW might come to invest on visual effect or video game companies more seriously in the future. And I have been actually struggling with how to train deep learning with cgi, which might bridge the virtual world and the real world.

As I am also a beginner in reinforcement learning, this article series would a kind of study note for me. But as I have been doing in my former articles, I prefer exhaustive but intuitive explanations on AI algorithms, thus I will do my best to make my series as instructive and effective as existing tutorial on reinforcement learning.

This article is going to be composed of the following contents.

In this article I would like to share what I have learned about RL, and I hope you could get some hints of learning this fascinating field. In case you have any comments or advice on my “study note,” leaving a comment or contacting me via email would be appreciated.

AI Platforms – A Comprehensive Guide

A comprehensive guide compiled to introduce readers to AI platforms, their types, and benefits. A concluding section to discuss AI platform selection strategy with Attri’s Best of Breed approach to build AI platforms. 

Don’t you think that this century is really fortunate? In my opinion, the answer is yes; we witnessed technological transformations and their miracles that created substantial changes in our lifestyle. While talking about these life-changing technological revolutions, AI or artificial intelligence deserves a front seat due to its incredible contribution and capabilities. Now everyone knows AI has limitless potential simply from creating funny faces in mobile to taking informed and intelligent business decisions. In the last 50 years, we have progressed by leaps and bounds to give machines the ability to understand, help and mimic us.

Artificial intelligence enables machines to imitate human intelligence across a variety of domains ranging from problem-solving and reasoning to General Intelligence and in-depth knowledge representation. With tremendous progress in AI, another enabler came into existence and received attention—AI platforms. AI-platform is a layer that integrates all the tools and processes required to build, deploy and monitor ML models. In this article, we shall go through the various aspects of AI platforms covering a range of topics like AI Platform types, the benefits such platforms entail, selection strategy in detail as well as a brief look into Attri’s industry contribution with an Open AI Platform.

Diving Deeper With AI Platforms

The AI Platform acts as a layer over your current AI infrastructure and integrates all the tools and processes required to develop ML models. It provides you the flexibility to integrate all your ML models under a single roof. With this flexibility, you can create and deploy several ML models over the platform. Further, you can even monitor these models to confirm that they are serving their intended purpose. AI platform makes your AI adoption easy by attaining the following requirements–

  • Use of vast data to develop ML solutions.
  • Ensure transparency and reproducibility within a project
  • Accelerate collaboration and governance within teams
  • Ensure scalability for ever-growing machine learning demands

An ideal AI platform should ensure the following features for better addressing different challenges.

  • Seamless access control: Ensure robust access control to team members in order to conquer the challenge of centralized data access with AI projects.
  • Excellent monitoring: Integrate top-notch observability practices while developing ML models.
  • Data and technology-agnostic integration: Seamless experience to enterprises with infrastructure set up responsibility handed over to platform providers
  • All-inclusive Platform: Single platform to facilitate all underlying tasks from data preparation to model deployment
  • Continuous Improvement: Ability to produce and deploy models as a reproducible package and thereby integrate changes with models that are already in production
  • Rapid Processing: Faster data preparation and powerful visual interfaces

AI Platform Classification

With loads of AI platform providers available in the market, AI platform classification becomes a tough job, as it requires thinking separately on each platform’s offerings, its features, and cost factors. Also, you need to check whether AI solutions are open source AI platforms or proprietary offerings.

We have decided to present an AI platform classification based on its striking features and offerings. With this, we have classified AI platforms across three main classes—

  • AI cloud-based platforms
  • AI conversational platforms
  • No code AI platforms

Cloud based AI Platforms

All major cloud providers offer cloud-based AI platforms to boost businesses with AI capabilities. With cloud AI platforms, enterprises can leverage cloud providers’ matchless technical expertise to overcome affordability and data requirement challenges associated with AI implementation. Cloud-based AI offerings benefit businesses with economic AI solutions, defined and pre-packaged services, lower risks, and modern technology.

Amazon Web Services

AWS offers a comprehensive set of AI solutions to conquer major hurdles in the AI adoption journey of businesses. AWS has been recognized as the topmost cloud AI partner with its broad capable portfolio. AWS pre-trained models cater to diverse use cases like forecasting, recommendations, computer vision, language interpretation, customer engagement, and safety for deploying ML models at scale. Amazon also provides text analytics, NLP, chatbots, and document analysis solutions. Fully managed AWS packages amplify your experience with minimum resource requirements and wizard-based friendly model development experience. Hence, AWS is one of the top cloud AI partners that cater to your AI adoption needs.

Google cloud

 The Google Cloud Platform (GCP) is a Google offering for cloud-driven computing services devised to support multiple use cases such as hosting containerized applications, massive-scale data analytics platforms, and even applying ML and AI for business use cases. Google AI Platform is a Google Cloud offering that helps build, deploy and manage machine learning models in the cloud.

Google leverages enterprise AI experience through its consumer-facing products. Google helps improve customer satisfaction through Contact Center AI. Google offering DialogFlow CX is used to create advanced chatbots that handle customer messaging, response, and voice recognition. Digiflow is applied to create virtual agents for messaging services, mobile apps, and IoT devices.

Google’s Cloud Vision API is beneficial to recognize objects, logos, and landmarks within content or images. Google provides Natural Language API to bring more clarity in content classification, entities, syntax, and sentiments. Further, Google speech API helps in converting audio to text and recognizing 110 languages.

Google’s Cloud ML services facilitate better decision-making with end-to-end ML solutions. Google offers an all-inclusive ML development platform that enables effective decision-making backed by explainable AI, continuous evaluation, data labeling, pipelines, training, and what-if tool. This platform is based on the TensorFlow framework and it enables building predictive models for various scenarios.

Kubeflow is a Cloud-Native and open-source platform that helps you build portable ML pipelines that can be executed on-premises or on the cloud. With this, you can access Google technologies like TPUs, TensorFlow, and TFX tools as you deploy your ML models in production.

For expert ML developers, Google provides an Open Source AI platform with TensorFlow models that are trained for various scenarios. It offers an excellent prediction service using trained models.

Microsoft Azure

Similar to Amazon Web Services, Microsoft Azure ML capabilities are based on its real-time and live applications. Azure provides superior machine learning capabilities to develop, train, and deploy machine learning models through Azure Machine Learning, Azure Databricks, and ONNX.

  • Azure Machine Learning

A Python-based ML service to facilitate automated machine learning.

  • ONNX

An open-source model format enables machine learning through various frameworks and hardware platforms of the user’s choice.

  • Azure Cognitive Search

Formerly known as Azure Search,this is the only cloud search service that allows built-in AI capabilities to explore content effectively at scale. Microsoft empowers the user with cognitive search services like text analytics, translation, document analytics, custom vision, and Azure Machine Learning solutions.

IBM Cloud

IBM has brought Watson studio a data analysis application to accelerate innovation and ML-centric practices in business.  IBM Cloud AI Platform offers 170 services with more emphasis on data-speech conversions and analytics. Watson Studio offers an all-inclusive suite to work with data and train, build and deploy ML models.

An innovative giant IBM also brought AI based learning platform recently to aid academic stakeholder like students, researchers and teachers.

AI Conversational Platforms

Conversational AI opens new doors for automated conversations between an enterprise and its customers. These conversations include messaging or voice-based communication platforms to enable text or audio-based conversation.

Conversational platforms leverage your customer experience with a range of applications such as follow-up, guidance, or the resolution of customer queries and round-the-clock support. These platforms are beneficial to drive more leads, increase conversions by cross-selling and upselling, promotional efforts, customer research, queries resolution and customer feedback handling, etc.

AI technology helps systems to mimic human conversations to a certain level and with great accuracy. An AI offering- Natural Language processing is used to shape these conversations by understanding intent, text, speech, and languages.

Intelligent Virtual Assistants

The intelligent virtual assistants represent an advanced level of Conversational AI and their discussion is incomplete without a mention to Siri and Alexa. Most popular intelligent virtual assistants include Siri by Apple, Alexa by Amazon, Google Assistant, and Bixby by Samsung. While Alexa performs as a voice assistant for the home, Siri and Bixby stand as mobile assistants with numerous operations support like navigation, text-to-speech, response to weather, quick reply, and address search.

SAP Conversational AI

SAP Conversational AI is one of the leading conversational AI platforms. With its friendly UI and multiple versioning, it offers a better experience of mimicking human conversations. SAP Conversational AI Platform uses NLP to facilitate developing chatbot that works more humanely and serves your customers 24*7. Its striking features include—

  • Simple integration
  • NLP capabilities
  • Analytics tools to help you
  • Multi-language support

Clinc

A powerful self-learning Conversational AI Platform enriched with NLP capabilities and machine learning. It secures top position in the Conversational AI Platform list due to its learning from previous conversations and improving responses over time. Its feature set include—

  • No technical expertise required
  • Self-learning abilities
  • NLP capabilities

Kore.ai

An enterprise-grade Conversational AI Platform to cater to your consumer as well as staff needs. It helps to build a virtual chatbot for any suitable platform without compromising the safety and security standards. Its major features cover—

  • The high degree of customization for chatbots
  • Comprehensive analytics with FAQs and alerts
  • Simple integration with ML models and channels
  • Flexible deployment
  • Supported with a multi-pronged NLP engine

Mindmeld

It is an excellent option as a Deep-Domain Conversational AI Platform with NLP capabilities. It can be used for both text-based and voice-based virtual assistants. This platform effectively caters to multiple industries and their numerous use cases. Check its striking features list—

  • Open-source platform
  • NLP capabilities
  • Supports discovering on-demand video or music
  • Quick chat-based transactions

No Code AI Platforms

As discussed above, AI platform classification necessitates platform considerations from various perspectives. We are introducing another category of AI platforms—No Code AI Platforms. The motivation behind introducing these platforms is to encourage enterprise AI adoption while keeping AI implementation costs low and minimizing dependencies on skilled professionals. Many IT giants are now offering no-code AI Platforms to enterprises for their AI adoption.

Google ML Kit

Google ML Kit comes with Android and iOS and it facilitates the integration of functions with lesser codes or with minimum knowledge of machine learning algorithms. This open source AI Platform supports different features such as text recognition, face detection, and landmark recognition.

RapidMiner Studio

RapidMiner Studio enables powerful data analytics with drag and drop features. Rapidminer Studio allows easy integration with databases, warehouses, social media for easy data access by authorized persons.

ML Platform Selection Strategy

Having discussed so many types of ML platforms, their features, and offerings, the next question is–how to select the best ML Platform for an enterprise AI adoption. Well, to answer this Million-Dollar question, we need to consider a few key aspects, such as

  • Who will use and benefit from the AI Platform? It is required to find out AI platform users here, the data science team, analytics team, developers, and how the platform will benefit each stakeholder.
  • The next aspect is to explore the skill levels of AI platform users, are they competent to handle ML development and analytics requirements with years of experience
  • Proficiency of users with programming languages
  • The next point in finalizing the AI platform strategy is to conclude code-first or code-free approaches to streamline AI workflows. This aspect can be studied by thinking about different attributes such as data preparation ease, feature engineering automation, ML algorithms, Model Deployment ease, and platform integration aspects.

Once you come up with answers to these queries, you will be able to finalize the best AI Platform Selection strategy for your enterprise. It can be a unique cloud platform, or even it can be a hybrid solution with a “best-of-breed” approach.

All-in-one platform strategy involves getting one end-to-end platform for the entire AI project lifecycle from raw data prep to ETL to building and operationalizing models followed by monitoring and governance of systems.

The best-of-breed approach allows using the preferred and custom tools for each phase of the lifecycle and aligning these tools together to build a customized platform solution for AI adoption.

This approach offers an excellent AI platform solution for organizations looking for flexible, inexpensive, change-oriented AI solutions and having a DIY spirit. With this mix-and-match approach, you can combine APIs offered by different cloud platforms and deliver AI solutions that cater to your AI use cases. Organizations using the best-of-breed approach are more comfortable with technology shifts with their abilities to use, adopt and swap out tools as requirement changes.

Business Process AI Transformation Simplified With Attri’s Open AI Platform

At Attri, we provide AI platform solutions to diverse industry verticals. With our flagship Open AI Platform, we heighten your AI adoption experience with a rich array of platform features like—

  • Customizable best-of-breed architecture
  • Utilize existing infrastructure
  • AI as a platform solution
  • Reduced effort in migrating to a new technology
  • Centralized Monitoring and Governance
  • Explainable and Responsible AI

We help you achieve your business process transformation goals with our unique AI offerings such as Open AI Platform  and Open AI solutions.

Our AI platform assures multiple benefits to your enterprise while keeping AI adoptions costs low and ensuring faster AI implementations. We can summarize the benefits of Attri Open AI Platform as under–

No efforts in reinventing complete AI suites

Attri’s AI Platform integrates multiple AI services and eliminates the need for reinventing complete AI suites. The platform delights enterprises with scalability, the ability to reuse current infrastructure, and customizable architecture.

Accelerated Go To Market

Attri’s Open AI Platform ensures accelerated GTM with a sincere approach to testing, reviewing, and finalizing reference templates for different industries.

No vendor lock-in

With Open AI Platform, we bring client-friendly policies such as no vendor lock-in and flexibility to choose their preferred tools and technology.

High reliability

We keep our AI Platform highly reliable with a comprehensive testing approach. We also meet the growing requirements of enterprises by ensuring high scalability with our open AI platform.

Get connected with us for your enterprise AI adoption requirements.

Know more about our Open AI Platform…

Predictive Maintenance – Konzept und Chancen

(Maschinen)Zeit ist kostbar. Das trifft besonders auf produzierende Unternehmen zu. Denn hier gilt, jeder Stillstand einer Anlage kostet wertvolle Produktionskapazität. Stillstände einer Maschine lassen sich nicht 100%ig vermeiden, nur können sie mit dem passenden Predictive Maintenance Konzept (im weiteren Text als PdM abgekürzt) reduziert und besser planbar gemacht werden. Mit intelligenten Add-ons wie IoT Anbindung von Maschinen in einer Industrie 4.0 Umgebungen und die Integration in ein gut geplantes Predictive Maintenance System lassen sich Kosten einsparen.

Was ist Predictive Maintenance?

Unter PdM sind Features beim Betrieb einer Anlage oder Maschine gemeint, die aus historischen Daten lernen und in Verbindung mit aktuellen oder sogar Echtzeitdaten Prognosen über bevorstehende Ereignisse durchführen. Aus den Berechnungen können z.B. kommende Wartungsarbeiten oder sich andeutende Ausfälle von Komponenten abgeleitet werden. Ersatzteile müssen nicht mehr vorsorglich auf Lager gelegt werden, sondern können aufgrund der tatsächlichen Notwendigkeit bestellt werden.

Vorteile durch Einsatz von Predictive Maintenance

Weiterer Pluspunkt ist die gute Planbarkeit von Wartungsintervallen. Betreiber als auch Hersteller können die Terminpläne nach dem vorherberechneten Wartungszeitpunkt einteilen. Als Betreiber können die Verfügbarkeiten mit den notwendigen Kapazitäten für die Produktion korreliert werden. Als Hersteller können Sie die Bestellung von Ersatzteilen im tatsächlich gebrauchten Umfang termingerecht durchführen. Und Sie müssen nicht von jedem möglichen Ersatzteil eine Anzahl immer vorrätig haben.

Technisches Konzept und Überlegungen

Für die Implementierung eines PdM, egal ob Sie das selbst durchführen wollen oder als Produkt zukaufen wollen, ist ein technisches Konzept der Startpunkt. Wir wollen hier die Eckdaten dieser Überlegungen skizzieren, um so als Arbeitsunterlage zur Erarbeitung des Konzepts dienen zu können.

Ein PdM Konzept besteht grob aus den folgenden Komponenten:

Predictive Maintenance Konzept

  • Maschine: hier werden Daten erzeugt, die in das PdM System übernommen werden sollen. Von den Maschinen sollen die Daten möglichst rasch und instantan abgegriffen werden, um die Werte in die Cloud zu laden und am Endgerät verfügbar zu haben. Meist ist auf den Maschinen nur begrenzter Speicherplatz vorhanden. Und die Daten können dort nur kurz zwischengespeichert werden.
  • Data-Agent und Transferprotokoll: die Übertragung der Daten auf die Cloudplattform wird durch eine Softwarekomponente durchgeführt. Diese kann entweder vom Hersteller mitgeliefert und bereits in der Maschine integriert sein. Oder sie wird als Teil des PdM Konzepts ergänzt.
    Aufgabe ist, die Daten gesichert auf die Cloudplattform zu übertragen. Bei Ausfall der Netzwerkverbindung kann ein lokales Spooling der Daten mit anschließender gesammelter Übertragung erfolgen.
    Neben der Datenübertragung muss an dieser Stelle auch eine Registrierung neuer Maschinen möglich sein. Der Data-Agent darf sich nicht einfach durch Klonen auf eine neue Maschine übertragen lassen. Eine geeignete Sicherung über z.B. Hardware-IDs muss hier durchgeführt werden.
  • Data-Aggregator: kann oder soll nicht die Maschine direkt Daten in die Cloud übertragen, können die Daten mehrerer Maschinen mit einem Data-Aggregator zusammengefasst werden. Von dort wird dann die verschlüsselte Übertragung auf die Cloudplattform durchgeführt.
    Gründe für den Einsatz eines Data-Aggregators könnten sein, dass beim Endkunden keine Übertragung von einzelnen Maschinen ins Netz erlaubt ist. Oder dass auch „Legacy“ Maschinen angebunden werden sollen, für die es keine Plugins zur direkten Datenübertragung gibt. Z.B., wenn eine Maschine mit einer älteren SPS angebunden werden soll, für die technisch keine direkte Übertragung auf die Cloud möglich ist.
  • Cloud- / Webplattform: die übertragenen Daten müssen zentral in einer geeigneten Umgebung gespeichert werden. Aus diesen gesammelten Daten werden die eigentlichen Erkenntnisse und Vorhersagen eines PdM Systems gewonnen. Durch eine KI und selbstlernende Algorithmen können die Daten weiter verwertet werden. Die gewonnen Ergebnisse aus den analysierten Maschinendaten sind die Basis für das PdM System und werden den Anwendern grafisch aufbereitet oder als Infos und Warnungen per Message zugestellt.
  • Endgerät: ist der Zugangspunkt für den Anwender. Die PdM Daten werden als App oder als Webanwendung dargestellt.

Der Data-Agent / Data-Aggregator kann mittels Edge Computing lokale Intelligenz erhalten. Daten können bereits vorausgewertet und zusammengefasst werden. Das reduziert die übertragenen Daten.

Welche Werte sollen übertragen werden?

Ziel des PdM ist durch das Abgreifen und Auswerten der Maschinendaten im Endeffekt eine Vernetzung mit den informationsverarbeitenden Systemen in einem Unternehmen. Das wird z.B. ein ERP Enterprise Resource Planning oder ein MES Manufacturing Execution System sein. Dort werden aufgrund der PdM Daten die Ressourcen und Kapazitäten für die Produktion geplant.

Typische Daten zur Übertragung bei einem PdM sind:

  • Temperatur
  • Druck
  • Geschwindigkeiten
  • Zurückgelegte Wege
  • Schaltspiele
  • Viskositäten
  • Flüssigkeitsstände
  • Vibrationen

Welche Daten Sie abgreifen können hängt damit zusammen, ob Sie Anwender also ein Produktionsbetrieb oder Hersteller also ein Maschinebauer sind. Als Anwender haben Sie normalerweise weniger tiefe Zugriffsmöglichkeiten auf Daten und Parameter der Maschinen. Nur die vom Hersteller bereitgestellten und dokumentierten Werte sind zugänglich. Diese sind vom Level eher auf Applikationsschicht angesetzt. Als Hersteller können Sie auf beliebige Werte zurückgreifen. Dazu gehören auch Dinge wie Schaltspiele oder zurückgelegte Fahrwege von Motoren.

Übertragen Sie jene Daten ins PdM, aus denen Sie die Wartungsarbeiten Ihrer Maschine ermitteln können.
Z.B. bei einer Glashärteanlage wären das die Betriebsstunden der Keramikwalzen, zurückgelegte Wege der Keilriemen oder Einschaltzeiten der Heizelemente.
Z.B. bei einer Automatisierung für die Leiterplattenfertigung wären das die Betriebsstunden der Saugnäpfe oder die zurückgelegten Wege der Antriebsriemen.

Wie oft sollen Werte übertragen werden?

Wenn Sie von der Netzwerkanbindung mit keinen Einschränkungen in Bezug auf Bandbreite oder Datenlimit rechnen müssen, nehmen Sie als Übertragungsintervall eine relativ gute Granularität an. Wählen Sie es so aus, dass Probleme an der Maschine auch nachträglich noch analysiert und der Auslöser gefunden werden kann.
Bei den meisten Industrie 4.0 Umgebungen sollte die Datenmenge keine große Rolle spielen. Sollten Sie in einer IoT Umgebung mit z.B. LoRaWAN-Anbindung arbeiten, dann teilen Sie die Daten in Kategorien nach Priorität ein. Z.B. hoch, mittel, niedrig oder z.B. Produktion, Standby. Die Übertragung der Kategorien können dann je nach Betriebszustand differenziert werden, wann welche Kategorie wichtig ist und priorisiert übertragen werden soll.

Chancen

Die Umsetzung eines Predictive Maintenance Konzepts hilft Ihnen die Produktion agiler zu gestalten. Terminpläne aufgrund vorausgesagter Wartungszeiten der Anlagen lassen eine präzisere und engere Planung der Kapazitäten zu. Dieser Effekt wirkt sich positiv auf Produktionskosten aus.

Ein großes Sparpotential hat ein PdM für die kommenden CO2 Steuermodelle. Mit den ermittelten Daten können Sie exakte Berechnungen über die verbrauchte Energie auf das produzierte Werkstück durchführen und so CO2 Steuer sparen.

Mit smarten Diensten wie einem PdM können Sie als Maschinenhersteller dauerhaft Geld verdienen. Sie generieren weiteren Umsatz von Ihren Kunden und erhöhen gleichzeitig die Kundenbindung. Ihre Kunden werden durch vorausschauende Wartung zufriedener mit Ihren Produkten.

Fazit

Vorausschauende Wartung hat Potential für Endanwender als auch für Hersteller. Beim Endanwender steht das Sparpotential im Vordergrund beim Hersteller die Kundenzufriedenheit. Mit intelligenten Edge-Computing Komponenten lassen sich PdM Lösungen gut skalieren und die Datenmenge reduzieren.

Die Umsetzung einer Lösung für Predictive Maintenance ist nicht an die Installation oder Entwicklung einer neuen Anlage gebunden. Auch bereits laufende Maschinen können leicht in ein PdM integriert werden.

 

How to Successfully Perform a Data Quality Assessment (DQA)

People generate 2.5 quintillion bytes of data every single day. That’s 1.7 megabytes generated every second for each of the 7.8 billion residents of Earth. A lot of that information is junk that somebody can easily discard, but just as much can prove to be vital. How do you tell the difference?

According to industry experts, poor quality information costs the U.S. economy upwards of $3.1 trillion annually. That is why data quality assessments (DQAs) are so important.

A Brief Explanation of Data Quality Assessments

With companies around the globe generating massive amounts of data every second of the day, it’s essential to have tools that help you sort through it all. Data quality assessments are usually carried out by software programmed with a predefined set of rules. They can compare the incoming information to those guidelines and provide reports.

This is a simplified explanation, but the goal of these DQA programs is to separate the wheat from the chaff. They eliminate any unnecessary or redundant data, leaving only the highest quality information.

The biggest challenge here is figuring who will determine what is considered quality. Data quality depends on three things: the individual or team that creates the requirements, how they complete that task, and how flexible the program meets those obligations.

How to Perform a DQA

Once you have your DQA program in place, performing an assessment is relatively simple. The challenge lies in establishing the program. The first step is to determine the scope of the data you’re trying to assess. The details of this step will depend on your system and the amount of information you have to sort through. You can set up a program to assess a single data point at a time, but if your system generates a lot of info, this isn’t effective from an efficiency standpoint.

Define your scope carefully to ensure the program does the job correctly without wasting time sorting through bytes one at a time.

Now that you have a framework to work from, you can move on to monitoring and cleansing data. Analyze your information against the scope and details you’ve established. Validate each point against your existing statistical measures, and determine its quality.

Next, ensure all the data requirements are available and correctly formatted. You may wish to provide training for any new team members entering information to ensure it’s in a format that the DQA system can understand.

Finally, make it a point to verify that your data is consistent with the rules you’ve established, as well as your business goals. DQAs aren’t a one-and-done kind of program. Monitoring needs to be an ongoing process to prevent things from falling through the cracks and keeping bad information from potentially costing you millions of dollars.

Benefits of DQA

A data quality assessment has various benefits, both on the commercial and consumer side of your business. Accuracy is essential. It’s valuable for marketers who purchase demographic data, with 84% stating it plays a large role in their purchase decisions. Targeted marketing is one of the most popular forms of advertisement, and while it’s not always practical, its efficacy drops even further if the demographic data is incorrect.

High-quality data should be accurate, complete, relevant, valid, timely and consistent. Maintaining frequent and comprehensive quality assessments can help you do that and more. The goal of collecting this information is to produce results. The higher quality your data is, the easier and faster your system will work, with better results than you might manage without DQAs.

Data Quality Assessment vs. Data Profiling

When talking about data quality, you’ll often see the terms assessment and profiling used interchangeably. While the concepts are similar, they are not the same. Data profiling is a valuable tool for setting up your quality assessment program, giving you the information you’ll need to build your program in the future. It isn’t a step you can perform independently and expect to get the same results.

If you don’t already have a DQA in place, start with profiling to create the foundation for a comprehensive data quality assessment program.

The Growing Importance of Data Quality

Data quality has always been important. However, as the population generates more information every year, learning how to separate value from junk is more critical than ever.

Coffee Shop Location Predictor

As part of this article, we will explore the main steps involved in predicting the best location for a coffee shop in Vancouver. We will also take into consideration that the coffee shop is near a transit station, and has no Starbucks near it. Well, while at it, let us also add an extra feature where we make sure the crime in the area is lower.

Introduction

In this article, we will highlight the main steps involved to predict a location for a coffee shop in Vancouver. We also want to make sure that the coffee shop is near a transit station, and has no Starbucks near it. As an added feature, we will make sure that the crime concentration in the area is low, and the entire program should be implemented in Python. So let’s walk through the steps.

Steps Required

  • Get crime history for the last two years
  • Get locations of all transit stations and Starbucks in Vancouver
  • Check all the transit stations that do not have any Starbucks near them
  • Get all the data regarding crimes near the filtered transit stations
  • Create a grid of all possible coordinates around the transit station
  • Check crime around each created coordinate and display the top 5 locations.

Gathering Data

This covers the first two steps required to get data from the internet, both manually and automatically.

Getting all Crime History

We can get crime history for the past 14 years in Vancouver from here. This data is in raw crime.csv format, so we have to process it and filter out useless data. We then write this processed information on the crime_processed.csv file.

Note: There are 530,653 records of crime in this file

In this program, we will just use the type and coordinate of the crime. There are many crime types, but we have classified them into three major categories namely;

Theft (red), Break and Enter (orange) and Mischief (green)

These all crimes can be plotted on Graph as displayed below.

This may seem very congested and full, so let’s see a closeup image for future references.

Getting Locations of all Rapid Transit Stations

We can get the coordinates of all Transit Stations in Vancouver from here. This dataset has all coordinates of rapid transit stations in three transit lines in Vancouver. There are a total of 23 of them in Vancouver, we can then use it for further processing.

Getting Locations of all Starbucks

The Starbucks data is present here, we can scrape it easily and get the locations of all the Starbucks in Vancouver. We just need the Starbucks that is near transit stations, so we’ll filter out the rest. There are a total 24 Starbucks in Vancouver, and 10 of them are near Transit Stations.

Note: Other than the coordinates of Transit Stations and Starbucks, we also need coordinates and type of the crime.

Transit Stations with no Starbucks

As we have all the data required, now moving to the next step. We need to get to the transit Station locations that have no Starbucks near them. For that we can create an area of particular radius around each Transit Station. Then check all Starbucks locations with respect to them, whether they are within that area or not.

If none of the Starbucks are within that particular Transit Station’s area, we can append it to a list. At the end, we have a list of all Transit locations with no Starbucks near them. There are a total of 6 Transit Stations with no Starbucks near them.

Crime near Transit Stations

Now lets filter out all crime records and get just what we are interested in, which means the crime near Transit stations. For that we will plot an area of specific radius around each of them to see the crimes. These are more than 110,000 crime records.

Crime near located Transit Stations

Now that we have all the Transit Stations that don’t have any Starbucks near them and also the crime near all Transit Stations. So, let’s use this information and get crime near the located Transit Stations. These are about 44,000 crime records.

This may seem correct at first glance, but the points are overlapping due to abundance, so we can create different lists of crimes based on their types.

Theft

Break and Enter

Mischief

Generating all possible coordinates

Now finally, we have all the prerequisites and let’s get to the main task at hand, predicting the best coordinate for the coffee shop.

There may be many approaches to solve this problem, but the one I used in this program is that I will create a grid of all possible locations (coordinates) in the area of 1 km radius around each located transit station.

Initially I generated 1 coordinate for every m, this resulted in 1000,000 coordinates in every km. This is a huge number, and for the 6 located Transit stations, it becomes 6 Million. It may not seem much at first glance because computers can handle such data in a few seconds.

But for location prediction we need to compare each coordinate with crime coordinates. As the algorithm has to check for ~7,000 Thefts, ~19,000 Break ins, and ~17,000 Mischiefs around each generated coordinate. Computing this would want the program to process an estimate of 432.4 Billion times. This sort of execution takes many hours on normal computers (sometimes days).

The solution to this is to create a coordinate for each 10 m area, this results about 10,000 coordinate per km. For the above mentioned number of crimes, the estimated processes will be several Billions. That would significantly reduce the time, but is still not less.

To control this, we can remove the duplicate values in crime coordinates and those which are too close to each other ~1m. Doing so, we are left with just 816 Thefts, 2,654 Break ins, and 8,234 Mischiefs around each generated coordinate.
The precision will not be affected much but the time and computational resources required will be reduced a lot.

 

Checking Crime near Generated coordinates

Now that we have all the locations, we will start some processing on it and check each coordinate against some constraints. That are respectively;

  1. Filter out Coordinates having Theft near 1 km
    We get 122,000 coordinates with no Thefts (Below merged 1000 to 1)
  2. Filter out Coordinates having Break Ins near 200m
    We get 8000 coordinates with no Thefts (Below merged 1000 to 1)
  3. Filter out Coordinates having Mischief near 200m
    We get 6000 coordinates with no Thefts (Below merged 1000 to 1)
    Now that we have 6 Coordinates of best locations that have passed through all the constraints, we will order them.To order them, we will check their distance from the nearest transit location. The nearest will be on top of the list as the best possible location, then the second and so on. The generated List is;

    1. -123.0419406741792, 49.24824259252004
    2. -123.05887151659479, 49.24327221040713
    3. -123.05287151659476, 49.24327221040713
    4. -123.04994067417924, 49.239242592520064
    5. -123.0419406741792, 49.239242592520064
    6. -123.0409406741792, 49.239242592520064

How can MindTrades help?

MindTrades Consulting Services, a leading marketing agency provides in-depth analysis and insights for the global IT sector including leading data integration brands such as Diyotta. From Cloud Migration, Big Data, Digital Transformation, Agile Deliver, Cyber Security, to Analytics- Mind trades provides published breakthrough ideas, and prompt content delivery. For more information, refer to mindtrades.com.

Code

https://github.com/Mindtrades-Consulting/Coffee-Shop-Location-Predictor