Process Mining mit MEHRWERK – Artikelserie

Dieser Artikel der Artikelserie Process Mining Tools beschäftigt sich mit dem Anbieter MEHRWERK. Das im Jahr 2008 gegründete Unternehmen, heute geführt durch drei Geschäftsführer, bietet Business Intelligence als Beratung und Dienstleistung rund um die Produkte des BI-Software-Anbieters QlikTech an. Rund zehn Jahre später, 2018, stieg das Unternehmen auch als Teil-Software-Anbieter in Process Mining ein. MEHRWERK ProcessMining, kurz MPM, ist einen Process Mining Lösung auf der Basis des weit verbreiteten BI-Tools Qlik Sense.

Lösungspakete: Standard-Lizenz
Zielgruppe:  Für mittel- und große Unternehmen
Datenquellen: Beliebig über Standard-Konnektoren von Qlik Sense
Datenvolumen: Unlimitierte Datenmengen
Architektur: On-Premise, Cloud oder Multi-Cloud

Für den Einsatz von MEHRWERK ProcessMining wird Qlik Sense Enterprise benötigt, welches sowohl On-Premise auf unternehmenseigenen Windows-Servern direkt installiert werden kann, über Kubernetes via Container ebenfalls On-Premise oder in  sowie auch noch einfacher direkt in der Qlik Cloud oder aus Datenschutzgründen in Verbindung mit der Hochskalierbarkeit der Cloud als hybrides Deployment.

Bedienbarkeit und Anpassungsfähigkeit der Analysen

Die Beurteilung der Bedienbarkeit ist nahezu vollständig abhängig von der Einschätzung zur Bedienbarkeit von Qlik Sense, da MPM auf diesem gängigen BI-Tool basiert. Im Wording von Qlik Sense arbeiten Developer in einem Hub und erstellen Apps, die ein oder mehrere Worksheets (Arbeitsblätter) umfassen können, welche horizontal durchgeblättert werden können. Die Qlik-Technologie ermöglicht es dabei übrigens auch, neben Story-Telling-Boards ganze Dashboards oder einzelne Visualisierungen über Mashups in Webseiten einzubetten.

Jede App kann in einem bestimmten Stream veröffentlicht werden. Über die Apps und die Streams wird der Zugriff durch die Nutzer erweitert, beschränkt oder anderweitig organisiert. Die Zugriffe auf Apps können über Security Rules gesteuert und beschränkt werden, was für die Data Governance eines Unternehmens wichtig ist und die Lösung auch mandantenfähig macht.

Figure 1 - Übersicht über die wichtigsten Schaltflächen einer Qlik Sense-App

Figure 1 – Übersicht über die wichtigsten Schaltflächen einer Qlik Sense-App

Wer mit Qlik Sense als BI-Tool bereits vertraut ist, wird sich hier sofort zurechtfinden und kann direkt in Process Mining als Analyseform, die immer mehr zum festen Bestandteil leistungsstarker BI-Systeme wird, einsteigen. Standardmäßig startet jede App im Ansichtsmodus. Die Qlik Sense-User-Role „Analyzer User“ ist nur für diese Ansicht berechtigt und kann Apps nur lesend verwenden. Die App ist jedoch interaktiv nutzbar, so dass alle in der App verfügbaren Dimensionen anklickbar und als Filter nutzbar sind. Die Besonderheit ist hier das assoziative Datenmodell, welches durch Qlik’s inMemory Engine bereitgestellt wird. Diese überwindet die Einschränkungen relationaler Datenbanken und SQL-Abfragen. Bei diesem traditionellen Ansatz müssen Datenquellen mit SQL-Join-Befehlen kombiniert werden, und es müssen im Voraus Annahmen über die Art der Fragen getroffen werden, die die Anwender stellen werden. Wenn ein Benutzer eine Analyse durchführen möchte, die nicht geplant war, müssen die Daten neu aufgebaut werden, was die Ausführung komplexer Abfragen zur Folge hat und eine gewisse Wartezeit verursacht. Die assoziative Engine hingegen ermöglicht “on the fly”-Berechnungen und Aggregationen, die sofortige Erkenntnisse über die betrachteten Prozesse liefern.

Für Anwender, die mit den Filtermöglichkeiten nicht so vertraut sind, bietet Qlik auch die assoziative Suche an. Diese ermöglicht es, Suchbegriffe, ähnlich wie bei Google, einzugeben. Die Assoziative Engine ermittelt dann mögliche Treffer und Verbindungen in den Daten, welche daraufhin entsprechend gefiltert werden.

Die User-Role „Professional User“ kann jede veröffentlichte App zudem im Editier-Modus öffnen und eigene Arbeitsblätter und Analysen auf Basis zentral definierter Masteritems (Kennzahlen und Dimensionen) erstellen. Ebenfalls können bestehende Dashboards dupliziert werden, um diese für den eigenen Bedarf anzupassen, z. B. um Tabellen und Diagrammen anzupassen oder zu löschen. Dabei erfolgt jedoch keine Datenduplizierung, da Qlik Sense einen sogenannten Server Side Authoring Ansatz verfolgt. Durch das Konzept der Master Items wird zusätzlich sichergestellt, dass die Data Governance erhalten bleibt. Die erstellen Arbeitsblätter können durch die Professional User wiederrum veröffentlicht werden. Dabei ist sichergestellt, dass alle anderen Anwender diese „Community Sheets“ nur mit den Daten ihres Berechtigungskontexts sehen.

Figure 2 - Eine QlikSense App im Edit-Modus für "Professional User".

Figure 2 – Eine QlikSense App im Edit-Modus für “Professional User”.

Jede Seite der App kann beliebig gestaltet werden, auch so, dass Read-Only-Nutzer über die Standard-Lizenz viele Möglichkeiten des Ablesens und der Filterung von Daten erhalten.

Figure 3 - Hier eine Seite der App, die nur zur Filterung von Dimensionen gestaltet ist: Die Filterung von Prozessnetzen nach Vorgangsnummern, Produkten und/oder Prozess-Varianten

Figure 3 – Hier eine Seite der App, die nur zur Filterung von Dimensionen gestaltet ist: Die Filterung von Prozessnetzen nach Vorgangsnummern, Produkten und/oder Prozess-Varianten

MEHRWERK ProcessMining liefert Vorlagen als Standard-App, die typische Analyse-Szenarien wie das Prozess-Flussdiagramm und Filter für Durchlaufzeiten, Frequenzen und Varianten bereits vorgeben und somit den Einstieg erleichtern. Die Template App liefert außerdem sehr umfangreiche Process Mining Funktionen wie Conformance Checking, automatisierte Ursachenanalysen, Prozessmusterabfragen oder kontinuierliches Process Monitoring gleich mit aus. Außerdem können u.a. Schichten, Prozesshierarchien oder Sollprozesse konfiguriert werden.

Nur User mit der Qlik Sense „Professional User“ Lizenz können dazu im Editier-Modus auch die Datenmodelle einsehen, erstellen und anpassen. So wie auch in der klassischen Business Intelligence sind im Process Mining Datenmodelle in Form sogenannter Event-Logs entscheidend für die Analyse und die Vorbedingung auch für die MPM App.

Figure 4 - Beispielhaftes Event Log aus der Beispielvorlage-App von MEHRWERK.

Figure 4 – Beispielhaftes Event Log aus der Beispielvorlage-App von MEHRWERK.

Das Event Log kann und sollte neben den drei Must-Haves für Process Mining (Case-ID, Activity Description & Timestamp) noch beliebig viele weitere hilfreiche Informationen in weiteren Spalten aufführen. Denn nur so können Abweichungen, Anomalien oder andere Auffälligkeiten im Prozess in einen Kontext gesetzt werden, um gezielte Maßnahmen treffen zu können.

Integrationsfähigkeit

Die Frage, wie gut und leicht sich MEHRWERK ProcessMining in die Unternehmens-IT einfügen lässt, stellt sich mit der Frage, ob Qlik Sense bereits Teil der IT-Infrastruktur ist oder beispielsweise als Cloud-Lösung eingesetzt wird. Unternehmen, die bisher nicht auf Qlik Sense setzten, müssten hier die grundsätzliche Frage der Voraussetzungen des Tools von QlikTech stellen.  Vollständigerweise sei jedoch angemerkt, dass laut Aussage von MEHRWERK ca. 40% ihrer Kunden vorher kein Qlik Sense im Einsatz hatten und die Installation von Qlik Sense keine große Hürde darstellt.

Ein wesentlicher Aspekt der Integrationsfähigkeit ist jedoch nicht nur die Integration der Software in die IT-Infrastruktur, sondern auch, wie leicht sich Daten in das benötigte Datenformat (Event Log) überführen lässt. Es ist zwar möglich, Qlik Sense mit MPM ausschließlich für die Datenanalyse/-visualisierung zu verwenden, und die Datenmodellierung dann mit anderen Tools (Datenbanken, ETL) durchzuführen. Allerdings bringt Qlik Sense selbst eine Menge an Konnektoren zu vielen Datenquellen mit. Wie mit jedem Process Mining Tool ist gibt es dabei zwei Konzepte der Datenaufbereitung. Die eine Möglichkeit ist das Laden, Konsolidieren und Vorbereiten der Datenbank für ein Data Warehouse (DWH), das die Daten bereits in Event Logs transformiert. In diesem Fall kann MPM die Daten über einen Standard-Konnektor von Qlik Sense importieren, in ein MPM-spezifisches Event Log nachbereiten und dann direkt mit der Analyse starten. Dabei benötigt Qlik Sense keine eigene Datenbank für die Datenhaltung sondern verabeitet die Daten hochkomprimiert in der eigenen, patentierten InMemory-Engine.

Figure 5 - Qlik Sense Standard Connectors

Figure 5 – Qlik Sense Standard Connectors

Das andere Konzept der Datenaufbereitung ist die Nutzung von Qlik Sense auch als Tool für das Datenmanagement. Hierfür werden die Standard-Konnektoren genutzt, um Daten möglichst direkt an Qlik Sense anzubinden. In diesem Fall muss die Bildung des anwendungsfallspezifischen Event Logs als prozessprotokollartiges Datenmodell in Qlik Sense erfolgen. Dies lässt sich in einem prozeduralen Skript mit der Qlik-eigenen Skriptsprache, die an die Sprache DAX von Microsoft sowie an SQL erinnert, umsetzen. Dabei kann das Skript in mehrere Segmente unterteilt und die Ausführung automatisiert und ge-timed werden. MEHRWERK ProcessMining bietet hierfür standardisierte ETL-Best-Practices an, die erlauben mit Hilfe von Regelwerken die Eventloggenerierung stark zu vereinfachen. Ein großer Vorteil ist die Verzahnung von Process Mining Funktionalitäten während des ETL-Prozesses. Dies erlaubt frühzeitiges und visuelles Validieren schon bei der Beladung.

Figure 6 - Das Laden und Modellieren von Daten kann eingeschränkt visuell mit klickbaren Oberflächen erfolgen. Mehr Möglichkeiten bietet jedoch der Qlik Script Editor.

Figure 6 – Das Laden und Modellieren von Daten kann eingeschränkt visuell mit klickbaren Oberflächen erfolgen. Mehr Möglichkeiten bietet jedoch der Qlik Script Editor.

Skalierbarkeit

Klassischerweise wurde Qlik Sense Server On-Premise in der eigenen IT-Infrastruktur installiert. Die Software Qlik Sense ist nur als Server-Version verfügbar. Qlik Sense setzt auf eine patentierte In-Memory-Technologie. Technisch ist Qlik Sense in Sachen Performance nur durch die Hardware begrenzt.

Heute kann Qlik Sense Server auch direkt über die Qlik Cloud genutzt oder über Kubernetes auf eigene Server oder in die Multi-Cloud ausgeliefert werden. Ein Betrieb bei typischen Cloud-Anbietern wie von Amazon, Google oder Microsoft ist problemlos möglich und somit technisch auch beliebig skalierbar.

Zukunftsfähigkeit

Die Zukunftsfähigkeit von MPM liegt in erster Linie in der Weiterentwicklung von Qlik Sense durch QlikTech. Im Magic Quadrant von Gartner 2020 für BI- und Analytics-Tools zählt Qlik zu den top drei Anführern nach Tableau und Microsoft.

Auf Grund der großen Qlik-Community und der weiten Verbreitung als BI-Tool zählt die Lösung von MEHRWERK vermutlich zu einer sehr zukunftssicheren mit vielen Weiterentwicklungsmöglichkeiten. Aus der Community und von anderen BI-Unternehmen gibt es viele Erweiterungen für Qlik Sense, die den Funktionsumfang von der Konnektivität zu anderen Tools bis hin zur einfacheren oder visuell attraktiveren Analyse verbessern. Für Qlik Sense gibt es viele weitere Anbieter für diverse Erweiterungen sowie Qlik-eigene und kompatible Co-Lösungen für Master Data Management und Data Governance. Auch die Integration von Data Science Tools via Programmiersprachen wie Python oder R ist möglich und erweitert diese Plattform in Richtung Advanced Analytics.

Die Weiterentwicklung der Process Mining Lösung erfolgt unabhängig davon auch durch MEHRWERK selbst, so wird Machine Learning vermehrt dazu eingesetzt, Process Anomalien zu erkennen sowie Durchlaufzeiten von Prozessen zu prognostizieren.

Preisgestaltung

Die Preisgestaltung wird von MEHRWERK nicht transparent kommuniziert und liegt im Vergleich zu anderen Process Mining Tools erfahrungsgemäß im Mittelfeld. Neben den MPM spezifischen Kosten werden darüber hinaus auch User-Lizenzen für Qlik Sense fällig. Weitere mögliche Kosten hängen auch von der Wahl ab, ob die Qlik Cloud, eine andere Cloud-Plattform oder die On-Premise-Installation geplant wird.

Fazit

MEHRWERK Process Mining ist für Unternehmen, die voll und ganz auf QlikSense als BI-Tool setzen, eine echte Option für den schnellen und leistungsstarken Einstieg in diese spezielle Analysemethodik. Mitarbeiter, die Qlik Sense bereits kennen, finden sich hier beinahe sofort zurecht und können direkt starten, sofern Event-Logs vorliegen. Die Gestaltung von Event-Logs in Qlik Sense bedingt jedoch etwas Erfahrung mit der Datenaufbereitung und -modellierung in Qlik Sense und Kenntnisse in Qlik Script.

Five ways Data Science is used in Fintech

Data science experts process and act upon data that digital resources produce. In the fintech world, data comes from mobile apps, transactions, conversations and financial standings. With this data for fintech, experts can improve the experience and success of businesses and customers alike.

Apps like PayPal, Venmo and Cash App have led the way for other fintech organizations, big and small, to grow. In fact, roughly 65% of Americans are already using digital banking in some capacity, whether it’s an app or online service. This growth, in turn, brings benefits. From personalization to integrating robotic advisors, here are five ways data scientists help fintech brands.

1. Personalization

Finance is one of the most personal industries out there as it deals with your private accounts and data. To match this uniqueness, fintechs can use data science for personalization. That way, customer service caters to individual needs.

As the fintech company gathers data from individual transactions, communications, behavior and interests, data scientists can then use said data to curate a better experience for the customer. They can advertise products and services that the customer may need to help with savings, for instance.

Contis is one example of a fintech that has integrated personalization into its services. Customers receive specific recommendations to create an efficient experience.

2. Fundraising

Fundraising had an interesting year in 2020. Amid racial justice protests and movements, crowdfunding took off on fintechs like GoFundMe and Kickstarter. These platforms helped provide funding for those who needed it. From here, data scientists can use fundraising in unique ways.

They can help raise money by targeting people who have donated in the past, or who are likely to donate based on spending habits. This data provides a more well-rounded fundraising campaign.

Then, once they do have donors, they can again use data to segment contributors by interest, demographic or engagement history. This segmentation helps advertise in a more personal, interest-specific way.

3. Fraud Detection

Cybercriminals thrive on an abundance of digital interactions. With the rise in digital banking — and the pandemic-driven shift to technology — fintechs could potentially see high rates of fraud. In fact, by the end of 2020, the United States saw about $11 billion in lost funds from credit card fraud alone.

Data for fintech brands will help address and prevent fraud like this in the future. As customers produce data from their transactions and interactions, it provides a better picture of their behavior. If there’s deviance, the data then shows potential fraud may be occurring.

If fraud does occur, data scientists can then use that instance to learn and properly recognize how data behaves during cybercriminal activity.

4. Robo-Advisors

With more people using fintech services, employees have a lot on their hands. They must properly address the customers’ needs and provide solutions. However, in the online world, employees are now getting some robotic assistance.

Robo-advisors use machine learning algorithms to interact with customers online or on mobile apps. They ask questions, understand the problems and provide solutions. They also collect data like customer goals and financial plans, which they can report back to data scientists for analysis.

Overall, roughly 75% and 46% of large and small banks, respectively, are implementing artificial intelligence to some degree. This data-driven revolution is one to keep your eye on.

5. Blockchain Governance

Blockchain governance is a somewhat newer way that experts can use data for fintech services. The blockchain is commonly known for its support of cryptocurrency services. Though crypto assets like Bitcoin and Ethereum are on the rise, the blockchain itself is still getting its footing.

Now, fintechs like PayPal are offering crypto services, which means data scientists will be able to expand what’s possible for digital banking. As customers transfer crypto funds, data scientists can monitor their activity and get a better handle on the data that exists on the blockchain. From there, they can provide personalization and prevent fraud in the same ways as they would with standard digital banking.

A Changing Landscape

As data scientists continue to help fintech services grow, you’ll notice each of these five areas begins to become more common. Some, like personalization and fraud detection, are already key focuses for fintech companies. However, alongside robo-advisor, fundraising and blockchain, they all have room to grow through the use of data science.

Data Security for Data Scientists & Co. – Infographic

Data becomes information and information becomes knowledge. For this reason, companies are nowadays also evaluated with regard to their data and their data quality. Furthermore, data is also the material that is needed for management decisions and artificial intelligence. For this reason, IT Security is very important and special consulting and auditing companies offer their own services specifically for the security of IT systems.

However, every Data Scientist, Data Analyst and Data Engineer rarely only works with open data, but rather intensively with customer data. Therefore, every expert for the storage and analysis of data should at least have a basic knowledge of Data Security and work according to certain principles in order to guarantee the security of the data and the legality of the data processing.

There are a number of rules and principles for data security that must be observed. Some of them – in our opinion the most important ones – we from DATANOMIQ have summarized in an infographic for Data Scientists, Data Analysts and Data Engineers. You can download the infographic here: DataSecurity_Infographic

Data Security for Data Scientists, Data Analysts and Data Engineers

Data Security for Data Scientists, Data Analysts and Data Engineers

Download Infographic as PDF

Infographic - Data Security for Data Scientists, Data Analysts and Data Engineers

Infographic – Data Security for Data Scientists, Data Analysts and Data Engineers

In-memory Caching in Finance

Big data has been gradually creeping into a number of industries through the years, and it seems there are no exceptions when it comes to what type of business it plans to affect. Businesses, understandably, are scrambling to catch up to new technological developments and innovations in the areas of data processing, storage, and analytics. Companies are in a race to discover how they can make big data work for them and bring them closer to their business goals. On the other hand, consumers are more concerned than ever about data privacy and security, taking every step to minimize the data they provide to the companies whose services they use. In today’s ever-connected, always online landscape, however, every company and consumer engages with data in one way or another, even if indirectly so.

Despite the reluctance of consumers to share data with businesses and online financial service providers, it is actually in their best interest to do so. It ensures that they are provided the best experience possible, using historical data, browsing histories, and previous purchases. This is why it is also vital for businesses to find ways to maximize the use of data so they can provide the best customer experience each time. Even the more traditional industries like finance have gradually been exploring the benefits they can gain from big data. Big data in the financial services industry refers to complex sets of data that can help provide solutions to the business challenges financial institutions and banking companies have faced through the years. Considered today as a business imperative, data management is increasingly leveraged in finance to enhance processes, their organization, and the industry in general.

How Caching Can Boost Performance in Finance

In computing, caching is a method used to manage frequently accessed data saved in a system’s main memory (RAM). By using RAM, this method allows quick access to data without placing too much load on the main data stores. Caching also addresses the problems of high latency, network congestion, and high concurrency. Batch jobs are also done faster because request run times are reduced—from hours to minutes and from minutes to mere seconds. This is especially important today, when a host of online services are available and accessible to users. A delay of even a few seconds can lead to lost business, making both speed and performance critical factors to business success. Scalability is another aspect that caching can help improve by allowing finance applications to scale elastically. Elastic scalability ensures that a business is equipped to handle usage peaks without impacting performance and with the minimum required effort.

Below are the main benefits of big data and in-memory caching to financial services:

  • Big data analytics integration with financial models
    Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
    Predictive modeling can be improved significantly with big data analytics so it can better estimate business outcomes. Proper management of data helps improve algorithmic understanding so the business can make more accurate predictions and mitigate inherent risks related to financial trading and other financial services.
  • Real-time stock market insights
    As data volumes grow, data management becomes a vital factor to business success. Stock markets and investors around the globe now rely on advanced algorithms to find patterns in data that will help enable computers to make human-like decisions and predictions. Working in conjunction with algorithmic trading, big data can help provide optimized insights to maximize portfolio returns. Caching can consequently make the process smoother by making access to needed data easier, quicker, and more efficient.
  • Customer analytics
    Understanding customer needs and preferences is the heart and soul of data management, and, ultimately, it is the goal of transforming complex datasets into actionable insights. In banking and finance, big data initiatives focus on customer analytics and providing the best customer experience possible. By focusing on the customer, companies are able to Ieverage new technologies and channels to anticipate future behaviors and enhance products and services accordingly. By building meaningful customer relationships, it becomes easier to create customer-centric financial products and seize market opportunities.
  • Fraud detection and risk management
    In the finance industry, risk is the primary focus of big data analytics. It helps in identifying fraud and mitigating operational risk while ensuring regulatory compliance and maintaining data integrity. In this aspect, an in-memory cache can help provide real-time data that can help in identifying fraudulent activities and the vulnerabilities that caused them so that they can be avoided in the future.

What Does This Mean for the Finance Industry?

Big data is set to be a disruptor in the finance sector, with 70% of companies citing big data as a critical factor of the business. In 2015 alone, financial service providers spent $6.4 billion on data-related applications, with this spending predicted to increase at a rate of 26% per year. The ability to anticipate risk and pre-empt potential problems are arguably the main reasons why the finance industry in general is leaning toward a more data-centric and customer-focused model. Data analysis is also not limited to customer data; getting an overview of business processes helps managers make informed operational and long-term decisions that can bring the company closer to its objectives. The challenge is taking a strategic approach to data management, choosing and analyzing the right data, and transforming it into useful, actionable insights.

How Data Science Is Helping to Detect Child Abuse

Image Source: pixabay.com

There is no good way to begin a conversation about child abuse or neglect. It is a sad and oftentimes sickening topic. But the fact of the matter is it exists in our world today and frequently goes unnoticed or unreported, leaving many children and young adults to suffer. Of the nearly 3.6 million events that do get reported, there are rarely enough resources to go around for thorough follow up investigations.

This means that at least some objective decisions have to be made by professionals in the field. These individuals must assess reports, review, and ultimately decide which cases to prioritize for investigation on a higher level, which ones are probably nothing, and which ones are worrisome but don’t quite meet the definition of abuse or neglect.

Overall, it can be a challenging job that wears on a person, and one that many think is highly based on subjective information and bias. Because of this, numerous data researchers have worked to develop risk assessment models that can help these professionals discover hidden patterns and/or biases and make more informed decisions.

Defining the Work Space

Defining exactly what falls under the category of child abuse or neglect can be a surprisingly sticky topic. Broadly, it means anything that causes lasting physical or mental harm to children and young adults or negligence that could potentially harm or threaten a child’s wellbeing. What exactly constitutes child abuse can depend upon the state you live in.

Ultimately, a lot of the defining aspects are gray. Does spanking count as physical abuse or is the line drawn when it becomes hitting with a closed fist? Likewise, are parents negligent if they must leave their kids home alone to go to work? Does living in poverty automatically make people bad parents because there may not always be enough food in the house?

Sadly, many of these questions have become more prevalent during the COVID-19 pandemic. With many families cooped up at home together, not going to work or to school, kids who live in violent households are more likely to be abused and fewer people are seeing the children regularly to observe and report signs of abuse. Unfortunately, limited statistical data is available at this point, but with so many people having lost jobs, especially amongst families that may have already been teetering on the edge of poverty, situations that could be defined as neglectful are thought to be exploding in prevalence.

Identifying Patterns

The idea of using data science to help determine the risk of abuse and neglect that many children face can be seen by many as a powerful means of tackling a difficult issue. Much like many other aspects of our world today — data has become a very useful and highly valued commodity that can work to help us understand some of the deeper or hidden patterns.

That is exactly what has been incorporated in Allegheny County, Pennsylvania. The algorithm that was developed assesses the “risk factor” for each maltreatment allegation that is made in the county. The system takes into account several factors including mental health and drug treatment services, criminal histories, past calls, and more. All of this ultimately adds up to helping employees take into account how at-risk a child may actually be and whether or not the case will be prioritized for further investigation. Generalized reports indicate that the program works well, but that even it can ‘learn’ to make decisions based on bias.

In this situation, the goal of the program isn’t to take all of the power away from the employees but rather to work as a tool to help them make a sounder decision. Some risk factors will automatically be referred to a case handler for further investigation, but most will allow for the case assessor to weigh the algorithm with research and other information that may not be well accounted for in the model. If there are significant differences between the case assessor’s conclusion and the model’s conclusion, a supervisor reviews the information and makes the final decision.

Dealing with Bias

Of course, using a model such as this one can be a double-edged sword. Certain things are difficult to account for. For instance, if parents take financial advantage of their children by using their clean Social SecuNumbers to open credit cards and other credit accounts, their children can then be saddled with poor credit they did not create. Because this type of financial abuse is difficult to prove, it can be difficult for young adults to repair their credit later in life or hold their parents or guardians responsible. And the algorithm may struggle to recognize this as abuse. But it can also take out some of the bias that many of the call takers can inadvertently have when they are assessing the risk level of certain cases. A difference in conclusion from the model can force them to take a second look at the hard facts.

But there is a flip side. All models are created based on some level of personal decisions from the algorithm designer — those decisions can carry into biases that get carried into the model’s outputs. For example, the Alleghany County model has come under significant scrutiny for being biased against people who are living in poverty and using government programs to get by.

Because some of the major components the model uses to assess risk are public data, statistically, more people who rely on government programs are likely to be flagged regardless of how serious a call that comes in appears. Whereas families in the upper and middle class with private health insurance, drug treatment centers, and food security may be less likely to be picked up.

***

Big data can play a profound role in our lives and has the potential to be a powerful tool in helping identify and address child abuse cases. The available models can aid caseworkers in prioritizing risk assessments for further investigation and make a difference in the lives of children that are facing unacceptable situations. Being aware of and working to address any biases in models is an ongoing issue and those that aid in child abuse detection are no exception. Ultimately, if used correctly, this can be a powerful tool.

Digital und Data braucht Vorantreiber

2020 war das Jahr der Trendwende hin zu mehr Digitalisierung in Unternehmen: Telekommunikation und Tools für Unified Communications & Collaboration (UCC) wie etwa Microsoft Teams oder Skype boomen genauso wie der digitale Posteingang und das digitale Signieren von Dokumenten. Die  Vernetzung und Automatisierung ganz im Sinne der Industrie 4.0 finden nicht nur in der Produktion und Logistik ihren Einzug, sondern beispielsweise auch in Form der Robot Process Automation (RPA) ins Büro – bei vielen Unternehmen ein aktuelles Top-Thema. Und in Zeiten, in denen der öffentliche Verkehr zum unangenehmen Gesundheitsrisiko wird und der Individualverkehr wieder cool ist, boomen digital unterstützte Miet- und Sharing-Angebote für Automobile mehr als je zuvor, gleichwohl autonome Fahrzeuge oder post-ausliefernde Drohnen nach wie vor schmerzlich vermisst werden.

Nahezu jedes Unternehmen muss in der heutigen Zeit nicht nur mit der Digitalisierung der Gesellschaft mithalten, sondern auch sich selbst digital organisieren können und bestenfalls eigene Innovationen vorantreiben. Hierfür ist sollte es mindestens eine verantwortliche Stelle geben, den Chief Digital Officer.

Chief Digital Officer gelten spätestens seit 2020 als Problemlöser in der Krise

Einem Running Gag zufolge haben wir den letzten Digitalisierungsvorschub keinem menschlichen Innovator, sondern der Corona-Pandemie zu verdanken. Und tatsächlich erzwang die Pandemie insbesondere die verstärkte Etablierung von digitalen Alternativen für die Kommunikation und Zusammenarbeit im Unternehmen sowie noch digitalere Shop- und Lieferdiensten oder auch digitale Qualifizierungs- und Event-Angebote. Dennoch scheint die Pandemie bisher noch mit überraschend wenig Innovationskraft verbunden zu sein, denn die meisten Technologien und Konzepte der Digitalisierung waren lange vorher bereits auf dem Erfolgskurs, wenn auch ursprünglich mit dem Ziel der Effizienzsteigerung im Unternehmen statt für die Einhaltung von Abstandsregeln. Die eigentlichen Antreiber dieser Digitalisierungsvorhaben waren bereits lange vorher die Chief Digital Officer (CDO).

Zugegeben ist der Grad an Herausforderung nicht für alle CDOs der gleiche, denn aus unterschiedlichen Branchen ergeben sich unterschiedliche Schwerpunkte. Die Finanzindustrie arbeitet seit jeher im Kern nur mit Daten und betrachtet Digitalisierung eher nur aus der Software-Perspektive. Die produzierende Industrie hat mit der Industrie 4.0 auch das Themenfeld der Vernetzung größere Hürden bei der umfassenden Digitalisierung, aber auch die Logistik- und Tourismusbranchen müssen digitalisieren, um im internationalen Wettbewerb nicht den Boden zu verlieren.

Digitalisierung ist ein alter Hut, aber aktueller denn je

Immer wieder wird behauptet, Digitalisierung sei neu oder – wie zuvor bereits behauptet – im Kern durch Pandemien getrieben. Dabei ist, je nach Perspektive, der Hauptteil der Digitalisierung bereits vor Jahrzehnten mit der Einführung von Tabellenkalkulations- sowie ERP-Software vollzogen. Während in den 1980er noch Briefpapier, Schreibmaschinen, Aktenordner und Karteikarten die Bestellungen auf Kunden- wie auf Lieferantenseite beherrschten, ist jedes Unternehmen mit mehr als hundert Mitarbeiter heute grundsätzlich digital erfasst, wenn nicht gar längst digital gesteuert. Und ERP-Systeme waren nur der Anfang, es folgten – je nach Branche und Funktion – viele weitere Systeme: MES, CRM, SRM, PLM, DMS, ITS und viele mehr.

Zwischenzeitlich kamen um die 2000er Jahre das Web 2.0, eCommerce und Social Media als nächste Evolutionsstufe der Digitalisierung hinzu. Etwa ab 2007 mit der Vorstellung des Apple iPhones, verstärkt jedoch erst um die 2010er Jahre durchdrangen mobile Endgeräte und deren mobile Anwendungen als weitere Befähiger und Game-Changer der Digitalisierung den Markt, womit auch Gaming-Plattformen sich wandelten und digitale Bezahlsysteme etabliert werden konnten. Zeitlich darauf folgten die Trends Big Data, Blockchain, Kryptowährungen, Künstliche Intelligenz, aber auch eher hardware-orientierte Themen wie halb-autonom fahrende, schwimmende oder fliegende Drohnen bis heute als nächste Evolutionsschritte der Digitalisierung.

Dieses Alter der Digitalisierung sowie der anhaltende Trend zur weiteren Durchdringung und neuen Facetten zeigen jedoch auch die Beständigkeit der Digitalisierung als Form des permanenten Wandels und dem Data Driven Thinking. Denn heute bestreben Unternehmen auch Mikroprozesse zu digitalisieren und diese besser mit der Welt interagieren zu lassen. Die Digitalisierung ist demzufolge bereits ein Prozess, der seit Jahrzehnten läuft, bis heute anhält und nur hinsichtlich der Umsetzungsschwerpunkte über die Jahre Verschiebungen erfährt – Daher darf dieser Digitalisierungsprozess keinesfalls aus dem Auge verloren werden. Digitalisierung ist kein Selbstzweck, sondern ein Innovationsprozess zur Erhaltung der Wettbewerbsfähigkeit am Markt.

Digital ist nicht Data, aber Data ist die Konsequenz aus Digital

Trotz der längst erreichten Etablierung des CDOs als wichtige Position im Unternehmen, gilt der Job des CDOs selbst heute noch als recht neu. Zudem hatte die Position des CDOs keinen guten Start, denn hinsichtlich der Zuständigkeit konkurriert der CDO nicht nur sowieso schon mit dem CIO oder CTO, er macht sich sogar selbst Konkurrenz, denn er ist namentlich doppelbesetzt: Neben dem Chief Digital Officer gibt es ebenso auch den noch etwas weniger verbreiteten Chief Data Officer. Doch spielt dieser kleine namentliche Unterschied eine Rolle? Ist beides nicht doch das gemeinsame Gleiche?

Die Antwort darauf lautet ja und nein. Der CDO befasst sich mit den zuvor bereits genannten Themen der Digitalisierung, wie mobile Anwendungen, Blockchain, Internet of Thing und Cyber Physical Systems bzw. deren Ausprägungen als vernetze Endgeräte entsprechend der Konzepte wie Industrie 4.0, Smart Home, Smart Grid, Smart Car und vielen mehr. Die einzelnen Bausteine dieser Konzepte generieren Daten, sind selbst jedoch Teilnehmer der Digitalisierungsevolution. Diese Teilnehmer aus Hardware und Software generieren über ihren Einsatz Daten, die wiederum in Datenbanken gespeichert werden können, bis hin zu großen Volumen aus heterogenen Datenquellen, die gelegentlich bis nahezu in Echtzeit aktualisiert werden (Big Data). Diese Daten können dann einmalig, wiederholt oder gar in nahezu Echtzeit automatisch analysiert werden (Data Science, KI) und die daraus entstehenden Einblicke und Erkenntnisse wiederum in die Verbesserung der digitalen Prozesse und Produkte fließen.

Folglich befassen sich Chief Digital Officer und Chief Data Officer grundsätzlich im Kern mit unterschiedlichen Themen. Während der Chief Digital Officer sich um die Hardware- und Software im Kontext zeitgemäßer Digitalisierungsvorhaben und deren organisatorische Einordnung befasst, tut dies der Chief Data Officer vor allem im Kontext der Speicherung und Analyse von Daten sowie der Data Governance.

Treffen werden sich Digital und Data jedoch immer wieder im Kreislauf der kontinuierlichen Verbesserung von Produkt und Prozess, insbesondere bei der Gestaltung und Analyse der Digital Journey für Mitarbeiter, Kunden und Partnern und Plattform-Entscheidungen wie etwas Cloud-Systeme.

Oftmals differenzieren Unternehmen jedoch gar nicht so genau und betrachten diese Position als Verantwortliche für sowohl Digital als auch für Data und nennen diese Position entweder nach dem einen oder nach dem anderen – jedoch mit Zuständigkeiten für beides. In der Tat verfügen heute nur sehr wenige Unternehmen über beide Rollen, sondern haben einen einzigen CDO. Für die meisten Anwender klingt das trendige Digital allerdings deutlich ansprechender als das nüchterne Data, so dass die Namensgebung der Position eher zum Chief Digital Officer tendieren mag. Nichtsdestotrotz sind Digital-Themen von den Data-Themen recht gut zu trennen und sind strategisch unterschiedlich einzuordnen. Daher benötigen Unternehmen nicht nur eine Digital-, sondern ebenso eine Datenstrategie – Doch wie bereits angedeutet, können CDOs beide Rollen übernehmen und sich für beide Strategien verantwortlich fühlen.

Die gemeinsame Verantwortung von Digital und Data kann sogar als vorteilhafte Nebenwirkung besonders konsistente Entscheidungen ermöglichen und so typische Digital-Themen wie Blockchain oder RPA mit typischen Data-Themen wie Audit-Datenanalysen oder Process Mining verbinden. Oder der Dokumenten-Digitalisierung und -Verwaltung in der kombinierten Betrachtung mit Visual Computing (Deep Learning zur Bilderkennung).

Vielfältige Kompetenzen und Verantwortlichkeiten eines CDOs

Chief Digital Officer befassen sich mit Innovationsthemen und setzen sie für ihr Unternehmen um. Sie sind folglich auch Change Manager. CDOs dürfen keinesfalls bequeme Schönwetter-Manager sein, sondern müssen den Wandel im Unternehmen vorantreiben, Hemmnissen entgegenstehen und bestehende Prozesse und Produkte hinterfragen. Die Schaffung und Nutzung von digitalen Produkten und Prozessen im eigenen Unternehmen sowie auch bei Kunden und Lieferanten generiert wiederum Daten in Massen. Der Kreislauf zwischen Digital und Data treibt einen permanenten Wandel an, den der CDO für das Unternehmen positiv nutzbar machen muss und dabei immer neue Karriereperspektiven für sich und seine Mitarbeiter schaffen kann.

Zugegeben sind das keine guten Nachrichten für Mitarbeiter, die auf Beständigkeit setzen. Die Iterationen des digitalen Wandels zirkulieren immer schneller und stellen Ingenieure, Software-Entwickler, Data Scientists und andere Technologieverantwortliche vor den Herausforderungen des permanenten und voraussichtlich lebenslangen Lernens. Umso mehr muss ein CDO hier lernbereit und dennoch standhaft bleiben, denn Gründe für den Aufschub von Veränderungen findet im Zweifel jede Belegschaft.

Ein CDO mit umfassender Verantwortung lässt auch das Thema der Datennutzung nicht aus und versteht Architekturen für Business Intelligence und Machine Learning. Um seiner Personalverantwortung gerecht zu werden, muss er sich mit diesen Themen auskennen und mit Experten für Digital und Data auf Augenhöhe sprechen können. Jeder CD sollte wissen, was zum Beispiel ein Data Engineer oder Data Scientist können muss, wie Business-Experten zu verstehen und Vorstände zu überzeugen sind – Denn als Innovator, Antreiber und Wandler fürchten gute CDOs nichts außer den Stillstand.

Select the Right career path between Software Developer and Data Scientist

In today’s digital day and age, a software development career is one of the most lucrative ones. Custom software developers abound, offering all sorts of services for business organizations anywhere in the world. Software developers of all kinds, vendors, full-time staff, contract workers, or part-time workers, all are important members of the Information Technology community. 

There are different career paths to choose from in the world of software development. Among the most promising ones include a software developer career and a data scientist career. What exactly are these?

Software developers are the brainstorming, creative masterminds behind all kinds of computer programs. Although there may be some that focus on a specific app or program, others build giant networks or underlying systems, which power and trigger other programs. That’s why there are two classifications of a software developer, the app software developer, and the developers of systems software.

On the other hand, data scientists are a new breed of experts in analytical data with the technical skills to resolve complex issues, as well as the curiosity to explore what problems require solving. Data scientists, in any custom software development service, are part trend-spotter, part mathematicians, and part computer scientists. And, since they bestraddle both IT and business worlds, they’re highly in-demand and of course well-paid. 

When it comes to the field of custom software development and software development in general, which career is the most promising? Let’s find out. 

Data Science and Software Development, the Differences

Although both are extremely technical, and while both have the same sets of skills, there are huge differences in how these skills are applied. Thus, to determine which career path to choose from, let’s compare and find the most critical differences. 

The Methodologies

Data Science Methodology

There are different places in which a person could come into the data science pipeline. If they are gathering data, then they probably are called a data engineer, and they would be pulling data from different resources, cleaning and processing it, and storing it in a database. Usually, this is referred to as the ETL process or the extract, transform, and load. 

If they use data to create models and perform analysis, probably they’re called a ‘data analyst’ or a ‘machine learning engineer’. The critical aspects of this part of the pipeline are making certain that any models made don’t violate the underlying assumptions, and that they are driving worthwhile insights. 

Methodology in Software Development 

In contrast, the development of software makes use of the SDLC methodology or the software development life cycle. The workflow or cycle is used in developing and maintaining software. The steps are planning, implementing, testing, documenting, deploying, and maintaining. 

Following one of the different SDLC models, in theory, could lead to software that runs at peak efficiency and would boost any future development. 

The Approaches

Data science is a very process-oriented field The practitioners consume and analyze sets of data to understand a problem better and come up with a solution. Software development is more of approaching tasks with existing methodologies and frameworks. For example, the Waterfall model is a popular method that maintains every software development life cycle phase that should be completed and reviewed before going to the next. 

Some frameworks used in development include the V-shaped model, Agile, and Spiral. Simply, there is no equal data science process, although a lot of data scientists are within one of the approaches as part of the bigger team. Pure developers of the software have a lot of roles to fill outside data science, from front-end development to DevOps and infrastructure roles. 

Moreover, although data analytics pays well, the roles of software developers of all kinds are still higher in demand. Thus, if machine learning isn’t your thing, then you could spend your spare time in developing expertise in your area of interest instead. 

The Tools

The wheelhouse of a data scientist has data analytics tools, machine learning, data visualization, working with databases, and predictive modeling. If you use plenty of data ingestion and storage they probably would use MongoDB, Amazon S3, PostgreSQL, or something the same. For building a model, there’s a great chance that they would be working with Scikit-learn or Statsmodels. 

Big data distributed processing needs Apache Spark. Software engineers use software to design and analyze tools, programming languages, software testing, web apps tools, and so on. With data science, many depend on what you’re attempting to accomplish. For actually creating TextWrangler, code Atom, Emacs, Visual Code Studio, and Vim are popular. 

Django by Python, Ruby on Rails, and Flask see plenty of use in the backend web development world. Vue.js emerged recently as one of the best ways of creating lightweight web apps, and similarly for AJAX when creating asynchronous-updating, creating dynamic web content. Everyone must know how to utilize a version control system like GitHub for instance. 

The Skills

To become a data scientist, some of the most important things to know include machine learning, programming, data visualization, statistics, and the willingness to learn. Various positions may need more than these skills, but it’s a safe bet to say that these are the bare minimum when you pursue a data science career. 

Often, the necessary skills to be a developer of the software will be a little more intangible. The ability of course to program and code in various programming languages is required, but you should also be able to work well in development teams, resolve an issue, adapt to various scenarios, and should be willing to learn. This again isn’t an exhaustive list of skills, but these certainly would serve you well if you are interested in this career. 

Conclusion

You should, at the end of the day must choose a career path that’s based on your strengths and interests. The salaries of data scientists and software developers  are the same to an average at least. However, before choosing which is better for you, consider experimenting with various projects and interact with different aspects of the business to determine where your skills and personality best fits in since that is where you’ll grow the most in the future.

Top 10 Python Libraries Of All Time

Python is a very popular and renowned language that has replaced several programming languages in the market. Its amazing collection of libraries makes it a convenient programming language for developers.

Python is an ocean of libraries serving an ample number of purposes and as a developer; you must possess sound knowledge of the 10 libraries. One needs to familiarize themselves with the libraries to go on and work on different projects. For the data scientist, it has been a charmer now.

Here today, for you this is a curated list of 10 Python libraries that can help you along with its significant features, when to use them, and also the benefits.

10 Best Python Libraries of All Times

  1. Pandas: Pandas is an open-source library that offers instant high performance, data analysis, and simple data structures. When can you use it? It can be used for data munging and wrangling. If one is looking for quick data visuals, aggregation, manipulation, and reading, then this library is suitable. You can impute the missing data files, plot the data, and make edits in the data column. Moreover, for renaming and merging, this tool can do wonders. It is a foundation library, and a data scientist should have in-depth knowledge about Pandas before any other library knowledge.
  1. TensorFlow: TensorFlow is developed by Google in collaboration with the Brain Team. Using this tool, you can instantly visualize any part of the graphical representation. It comes with modularity and offers high flexibility in its operations. This library is ideal for running and operating in large scale systems. So, as long as you have good internet connectivity, you can use it because it is an open-source platform. What is the beauty of this library? It comes with an unending list of applications associated with it.
  1. NumPy: NumPy is the most popular Python library used by developers. It is used by various libraries for conducting easy operations. What is the beauty of NumPy? Array Interface is the beauty of NumPy and it is always a highlighted feature. NumPy is interactive and very simple to use. It can instantly solve complicated mathematical problems. With this, you need not worry about daunting phases of coding and offering open-source contributions. This interface is widely used for expressing raw streams, sound waves, and other images. If you are looking to implement this into machine learning, you must possess in-depth knowledge about NumPy.
  1. Keras: Are you looking for a cool Python library? Well, Keras is the coolest machine learning python library. It runs smoothly on both CPU and GPU. Do you want to know where Keras is used? It is used in popular applications like Uber, Swiggy, Netflix, Square, and Yelp. Keras easily supports the fully connected, pooling, convolution, and recurrent neural networks. For any innovative research, it does fine because it is expressive and flexible. Keras is completely based on a framework, which enables easy debugging and exploring. Various large scientific organizations use Keras for innovative research.
  1. Scikit- Learn: If your project deals with complex data, it has to be the Scikit- Learn python library. This Python Machine Learning Library is associated with NumPy and SciPy. After various modifications, one such feather cross-validation is used for enabling more than one metric. It is used for extracting features and data from texts and images. It uses various algorithms to make changes in machine learning. What are its functions? It is used in model selection, classification, clustering, and regression. Various training methods like nearest neighbor and logistics regressions are subjected to minimal modification.
  1. PyTorch: PyTorch is the largest library which conducts various computations and accelerations. Also, it solves complicated application issues that are related to the neural networks. It is completely based on the machine language Torch, which is a free and open-source platform. PyTorch is new but gaining huge popularity and very much a favorite among the developers. Why such popularity? It comes with a hybrid end-user which ensures easy usage and flexibility. For processing natural language applications, this library is used. Do you know what the best part is? It is outperforming and taking the popularity of Tensor Flow in recent times.
  1. MoviePy: The MoviePy is a tool that offers unending functionality related to movies and visuals. It is used for exporting, modifying, and importing various video files. Do you want to add a title to your video or rotate it 90 degrees? Well, MoviePy helps you to do all such tasks related to videos. It is not a tool for manipulating data like Pillow. In any task related to movies and videos in python coding, you can no doubt rely on the functionality of MoviePy. It is designed to conduct all the aspects of a standard task and can get it done instantly. For any common task associated with videos, it has a MoviePy library.
  1. Matplotlib: Matplotlib is no doubt a quintessential python library whose presence can never be forgotten. You can visualize data and create innovative and interesting stories. When can you use it? You can use Matplotlib for embedding different plots into the application as it provides an object-oriented application program interface. Any sort of visualization, be it bar graph, histogram, pie chart, or graphs, Matplotlib can easily depict it. With this library, you can create any type of visualization. Do you want to know what visualizations you can create? You can create a histogram, Bar graph, pie chart, area plot, stem plot, and line plot. It also facilitates the legends, grids, and labels.
  2. Tkinter: Tkinter is a library that can help you create any Python application with the help of a graphical user interface. Tkinter is the most common and easy to use python library for developing apps with GUI. It binds python to the GUI tool kit which can be used in any modern operating system. To create a python GUI, Tkinter is the only best way to start instantly.
  3. Plotly: The Plotly is an essential graph plotting python library for developers. Users can import, copy, paste, export the data that needs to be analyzed and visualized. When can you use it? You can use Plotly to display and create figures and visual images. What is interesting is that it has amazing features for sending data to the various cloud servers.

What are the visual charts prepared with Plotly? You can create line pie, bubble, dot, scatter, and pie. One can also construct financial charts, contours, maps, subplots, carpet, radar, and logs. Do you have anything in your mind which needs to be represented visually? Use Plotly!

Finishing Up

In a nutshell, you have the best python libraries of recent times which contribute hugely to development. If your favorite python library didn’t make it in this list of the top 10 best python libraries, do not take offense.

Python comes with unending library packages, and these 10 are some of its popular and best-used ones. If you are a python developer, these are the best libraries you must have in-depth knowledge of.

New Era of Data Science in Today’s World

In today’s digital world, most organizations are flooded with data, both structured and unstructured. Data is a commodity now, and organizations should know how to monetize that data and derive a profit from the deluge. And valuing data is one of the best ways enterprises can become successful in distinguishing themselves in the marketplace.

Data is the new oil

Indeed, data itself has become a commodity, and the mere possession of abundant amounts of data is not enough. But the ability to monetize data effectively (and not merely hoard it) can undoubtedly be a source of competitive advantage in the digital economy. However, we need to refine this data. And refinement of this “new oil” will take a reasonable amount of time. In my opinion, we are still not there. As a result, “data refinement” remains a key factor for successful advanced analytics.

If we talk about the level of activity in data and analytics space in the last two years, most advanced analytics evolved around three categories:

  • Descriptive, or what has happened
  • Predictive, or what could happen, and
  • Prescriptive, or what we should do.

Descriptive analytics has been the core analytics for many years. In the past, we could only describe what has happened to historical data (such as that found in a data warehouse), with dashboard reporting, using traditional analytics. But with the advent of advanced analytics, machine learning (ML), and deep learning and artificial intelligence (AI), our focus has changed to real-time analytics. In the last two years, much work has been done in predictive analytics, and as we move forward into our analytics journey, data-centric organizations will now focus on prescriptive analytics. The use of prescriptive analytics, along with predictive analytics, is very important for any organization to be successful in the future.

Current and recent trends in data and analytics

The analytics trends revolve around AI and ML. The Analytics-as-a-Service model is an essential model for any smart, data-driven organization. We can make an impact on society and try to make a better place to live with the use of advanced analytics. At NTT DATA, we strive to solve these problems to improve the quality, safety and advancement of humanity. From a business perspective, we use data analytics and predictive modeling to help companies increase their sales and revenue.

Let me give you some examples. We have been involved with several technology partners in a project for the Smart City. This project involved the use of predictive analytics for the validation of critical alerts to help reduce the time and amount of data required to be processed. It used Internet of Things (IoT) devices, high-definition video cameras, and sound sensors, as well as video and sound data captured from specific locations. Eventually, the solution also integrated with available data from data sources such as crime, weather and social media. The overall objective of the Smart City project was to use and apply advanced analytics with cognitive computing to facilitate safety decision-making, and for a responder to respond earlier based on real-time data.

Another example is the Smart ICU System developed by NTT DATA for predictive detection of threats for seriously ill patients in an ICU, based on the data. This data was consolidated from various medical devices in the ICU into one platform. From that data, we developed a model that predicts the risk of complications that might occur within the next couple of hours or so of a medical event. We have also used advanced analytics provided by weather data forecasting and used predictive models to predict natural disasters.

Data and analytics strategy

A strategy is an essential aspect of any data-driven organization. It should cover data strategy for AI, ML, statistical modeling and other data science disciplines, such as predictive and prescriptive analytics. In general, advanced analytics is more predictive and actionable than retrospective. Smart organizations see positive results when they place a strategy for data and analytics in the hands of employees who are well-positioned to make decisions, such as those who interact with customers, oversee product development, or run production processes. With data-based insight and clear decision rules, employees can deliver more meaningful services, better assess and address customer demands, and optimize production.

Smart organizations must take time to clean and update their underlying modern data architecture — along with their data governance process, for a cleaner data and analytics strategy. A modern data architecture, combined with a good governance process, can leverage AI and ML to help organizations stay ahead of their competitors.

Data analytics innovation

Machine and Deep Learning, along with AI, are all very popular, but I would like to reiterate that advanced technologies like AI and machine learning will continue to transform data analytics. The next innovation could be the use of automated analytics, which machine learning tools can use to identify hidden patterns in data. For example, customer retention issues, customer default on loans, or predicting customers who are prone to auto accidents. Also, predictive analytics and prescriptive analytics are going to be the key for any future innovations in AI and ML.

We must make targeted investments in traditional business innovation tools, along with emerging data analytics tools to derive benefits from data-driven business initiatives. We need to invest in cloud and underlying IT infrastructure to support these analytics and business initiatives. Most importantly, we also need to invest in people — cross training skilled resources and empowering the people who work closely with clients to make the right decisions for analytics.

Connections Between Data Science & Finance

Image Source: pixabay.com

The world of finance is changing at an unprecedented rate. Data science has completely altered the face of traditional finance management. Though data has long been a critical component to finances, the introduction of big data and artificial intelligence have created new tools that are strengthening the predictive ability of many financial institutions.

These changes have led to a rapid increase in the need for financial professionals with data science skills. Nearly every sector in finances is converting to greater use of data science and management from the stock market and retirement accounts to credit score calculation. A greater understanding of the interplay between data and finance is a key skill gap.

Likewise, they have opened many doors for those that are interested in analyzing their personal finances. More and more people are taking their finances into their own hands and using the data tools available to make the best decisions for them. In today’s world, the sky’s the limit for financial analysis and management!

The Rise of the Financial Analyst

Financial analysts are the professionals who are responsible for the general management of money and investments both in an industrial and personal finance realm. Typically a financial analyst will spend time reviewing and understanding the overall stock portfolio and financial standing of a client including:

  • Stocks
  • Bonds
  • Retirement accounts
  • Financial history
  • Current financial statements and reports
  • Overarching business and industry trends

From there, the analyst will provide a recommendation with data-backed findings to the client on how they should manage their finances going into the future.

As you can imagine, with all of this data to analyze, the need for financial analysts to have a background or understanding of data science has never been higher! Finance jobs requiring skills such as artificial intelligence and big data increased by over 60% in the last year. Though these new jobs are typically rooted in computer science and data analytics, most professionals still need a background in financial management as well.

The unique skills required for a position like this means there is a huge (and growing) skills gap in the financial sector. Those professionals that are qualified and able to rise to fill the need are seeing substantial pay increases and hundreds of job opportunities across the nation and the globe.

A Credit Score Example

But where does all of this data science and professional financial account management come back to impact the everyday person making financial decisions? Surprisingly, pretty much in every facet of their lives. From things like retirement accounts to faster response times in financial analysis to credit scores — data science in the financial industry is like a cloaked hand pulling the strings in the background.

Take, for example, your credit score. It is one of the single most important numbers in your life, for better or worse. A high credit score can open all sorts of financial doors and get you better interest rates on the things you need loans for. A bad score can limit the amount lenders willing to qualify you for a loan and increase the interest rate substantially, meaning you will end up paying far more money in the end.

Your credit score is calculated by several things — though we understand the basic outline of what goes into the formula, the finer points are somewhat of a mystery. We know the big factors are:

  • Personal financial history
  • Debit-credit ratio
  • Length of credit history
  • Number of new credit hits or applications

All of this data and number crunching can have a real impact on your life, just one example of how data in the financial world is relevant.

Using Data Science in Personal Finance

Given all this information, you might be thinking to yourself that what you really need is a certificate in data science. Certainly, that will open a number of career doors for you in a multitude of realms, not just the finance industry. Data science is quickly becoming a cornerstone of how most major industries do business.

However, that isn’t necessarily required to get ahead on managing your personal finances. Just a little information about programs such as Excel can get you a long way. Some may even argue that Excel is the original online data management tool as it can be used to do things like:

  • Create schedules
  • Manage budgets
  • Visualize data in charts and graphs
  • Track revenues and expenses
  • Conditionally format information
  • Manage inventory
  • Identify trends in large data sets

There are even several tools and guides out there that will help you to get started!

***

Data analysis and management is here to stay, especially when it comes to the financial industry. The tools are likely to continue to become more important and skills in their use will increase in value. Though there are a lot of professional skills using big data to manage finances, there are still a lot of tools out there that are making it easier than ever to glean insights into your personal finances and make informed financial decisions.