Posts

Top 7 MBA Programs to Target for Business Analytics 

Business Analytics refers to the science of collecting, analysing, sorting, processing and compiling various available data pertaining to different areas and facets of business. It also includes studying and scrutinising the information for useful and deep insights into the functioning of a business which can be used smartly for making important business-related decisions and changes to the existing system of operations. This is especially helpful in identifying all loopholes and correcting them.

The job of a business analyst is spread across every domain and industry. It is one of the highest paying jobs in the present world due to the sheer shortage of people with great analytical minds and abilities. According to a report published by Ernst & Young in 2019, there is a 50% rise in how firms and enterprises use analytics to drive decision making at a broad level. Another reason behind the high demand is the fact that nowadays a huge amount of data is generated by all companies, large or small and it usually requires a big team of analysts to reach any successful conclusion. Also, the nature and high importance of the role compels every organisation and firm to look for highly qualified and educated professionals whose prestigious degrees usually speak for them.

An MBA in Business Analytics, which happens to be a branch of Business Intelligence, also prepares one for a successful career as a management, data or market research analyst among many others. Below, we list the top 7 graduate school programs in Business Analytics in the world that would make any candidate ideal for this high paying job.

1 New York University – Stern School of Business

Location: New York City, United States

Tuition Fees: $74,184 per year

Duration:  2 years (full time)

With a graduate acceptance rate of 23%, the NYU Stern School makes it to this list due to the diversity of the course structure that it offers in its MBA program in Business Analytics. One can specialise and learn the science behind econometrics, data mining, forecasting, risk management and trading strategies by being a part of this program. The School prepares its students and offers employability in fields of investment banking, marketing, consulting, public finance and strategic planning. Along with opportunities to study abroad for small durations, the school also offers its students ample chances to network with industry leaders by means of summer internships and career workshops. It is a STEM designated two-year, full time degree program.

2 University of Pennsylvania – Wharton School Business 

Location: Philadelphia, United States

Tuition fees: $81,378 per year

Duration: 20 months (full time, including internship)

The only Ivy-League school in the list with one of the best Business Analytics MBA programs in the world, Wharton has an acceptance rate of 19% only. The tough competition here is also characterised by the high range of GMAT scores that most successful applicants have – it lies between 540 and 790, averaging at a very high threshold of 732. Most of Wharton’s graduating class finds employment in a wide range of sectors including consulting, financial services, technology, real estate and health care among many others. The long list of Wharton’s alumni includes some of the biggest business entities in the world, them being – Warren Buffet, Elon Musk, Sundar Pichai, Ronald Perelman and John Scully.

The best part about Wharton’s program structure is its focus on building leadership and a strong sense of teamwork in every student.

3 Carnegie Mellon University – Tepper School of Business

Location: Pittsburgh, United States

Tuition Fees: $67,575

Duration: 18 months (online)

The Tepper School of Business in Carnegie Mellon University is the only graduate school in the list that offers an online Master of Science program in Business Analytics. The primary objectives of the program is to equip students with creative problem solving expertise and deep analytic skills. The highlights of the program include machine learning, programming in Python and R, corporate communication and the knowledge of various business domains like marketing, finance, accounting and operations.

The various sub courses offered within the program include statistics, data management, data analytics in finance, data exploration and optimization for prescriptive analytics. There are several special topics offered too, like Ethics in Artificial Intelligence and People Analytics among many others.

4 Massachusetts Institute of Technology – Sloan School of Management

Location: Cambridge, United States

Tuition Fees: $136,480

Duration: 12 months

The Master of Business Analytics program at MIT Sloan is a relatively new program but has made it to this list due to MIT’s promise and commitment of academic and all-rounder excellence. The program is offered in association with MIT’s Operations Research Centre and is customised for students who wish to pursue a career in the industry of data sciences. The program is easily comprehensible for students from any educational background. It is a STEM designated program and the curriculum includes several modules like machine learning, usage of analytics software tools like Python, R, SQL and Julia. It also includes courses on ethics, data privacy and a capstone project.

5 University of Chicago – Graham School

Location: Chicago, United States

Tuition Fees: $4,640 per course

Duration: 12 months (full time) or 4 years (part time)

The Graham School in the University of Chicago is mainly interested in candidates who show love and passion for analytics. An incoming class at Graham usually consists of graduates in science or social science, professionals in an early career who wish to climb higher in the job ladder and mid-career professionals who wish to better their analytical skills and enhance their decision-making prowess.

The curriculum at Graham includes introduction to statistics, basic levels of programming in analytics, linear and matrix algebra, machine learning, time series analysis and a compulsory core course in leadership skills. The acceptance rate of the program is relatively higher than the previous listed universities at 34%.

6 University of Warwick – Warwick Business School

Location: Coventry, United Kingdom

Tuition Fees: $34,500

Duration: 12 months (full time)

The only school to make it to this list from the United Kingdom and the only one outside of the United States, the Warwick Business School is ranked 7th in the world by the QS World Rankings for their Master of Science degree in Business Analytics. The course aims to build strong and impeccable quantitative consultancy skills in its candidates. One can also look forward to improving their business acumen, communication skills and commercial research experience after graduating out of this program.

The school has links with big corporates like British Airways, IBM, Proctor and Gamble, Tesco, Virgin Media and Capgemini among others where it offers employment for its students.

7 Columbia University – School of Professional Studies

Location: New York City, United States 

Tuition Fees: $2,182 per point

Duration: 1.5 years full time (three terms)

The Master of Sciences program in Applied Analytics at Columbia University is aimed for all decision makers and also favours candidates with strong critical thinking and logical reasoning abilities. The curriculum is not very heavy on pure stats and data sciences but it allows students to learn from extremely practical and real-life experiences and examples. The program is a blend of several online and on-campus classes with several week-long courses also. A large number of industry experts and guest lectures take regular classes, conduct workshops and seminars for exposing the students to the real-world scenario of Business Analytics. This also gives the students a solid platform to network and broaden their perspective.

Several interesting courses within the paradigm of the program includes storytelling with data, research design, data management and a capstone project.

The admission to every school listed above is extremely competitive and with very limited intake. However, as it is rightly said, hard work is the key to success, one can rest guaranteed that their career will never be the same if they make it into any of these programs.

How Important is Customer Lifetime Value?

This is the third article of article series Getting started with the top eCommerce use cases. If you are interested in reading the first article you can find it here.

Customer Lifetime Value

Many researches have shown that cost for acquiring a new customer is higher than the cost of retention of an existing customer which makes Customer Lifetime Value (CLV or LTV) one of the most important KPI’s. Marketing is about building a relationship with your customer and quality service matters a lot when it comes to customer retention. CLV is a metric which determines the total amount of money a customer is expected to spend in your business.

CLV allows marketing department of the company to understand how much money a customer is going  to spend over their  life cycle which helps them to determine on how much the company should spend to acquire each customer. Using CLV a company can better understand their customer and come up with different strategies either to retain their existing customers by sending them personalized email, discount voucher, provide them with better customer service etc. This will help a company to narrow their focus on acquiring similar customers by applying customer segmentation or look alike modeling.

One of the main focus of every company is Growth in this competitive eCommerce market today and price is not the only factor when a customer makes a decision. CLV is a metric which revolves around a customer and helps to retain valuable customers, increase revenue from less valuable customers and improve overall customer experience. Don’t look at CLV as just one metric but the journey to calculate this metric involves answering some really important questions which can be crucial for the business. Metrics and questions like:

  1. Number of sales
  2. Average number of times a customer buys
  3. Full Customer journey
  4. How many marketing channels were involved in one purchase?
  5. When the purchase was made?
  6. Customer retention rate
  7. Marketing cost
  8. Cost of acquiring a new customer

and so on are somehow associated with the calculation of CLV and exploring these questions can be quite insightful. Lately, a lot of companies have started to use this metric and shift their focuses in order to make more profit. Amazon is the perfect example for this, in 2013, a study by Consumers Intelligence Research Partners found out that prime members spends more than a non-prime member. So Amazon started focusing on Prime members to increase their profit over the past few years. The whole article can be found here.

How to calculate CLV?

There are several methods to calculate CLV and few of them are listed below.

Method 1: By calculating average revenue per customer

 

Figure 1: Using average revenue per customer

 

Let’s suppose three customers brought 745€ as profit to a company over a period of 2 months then:

CLV (2 months) = Total Profit over a period of time / Number of Customers over a period of time

CLV (2 months) = 745 / 3 = 248 €

Now the company can use this to calculate CLV for an year however, this is a naive approach and works only if the preferences of the customer are same for the same period of time. So let’s explore other approaches.

Method 2

This method requires to first calculate KPI’s like retention rate and discount rate.

 

CLV = Gross margin per lifespan ( Retention rate per month / 1 + Discount rate – Retention rate per month)

Where

Retention rate = Customer at the end of the month – Customer during the month / Customer at the beginning of the month ) * 100

Method 3

This method will allow us to look at other metrics also and can be calculated in following steps:

  1. Calculate average number of transactions per month (T)
  2. Calculate average order value (OV)
  3. Calculate average gross margin (GM)
  4. Calculate customer lifespan in months (ALS)

After calculating these metrics CLV can be calculated as:

 

CLV = T*OV*GM*ALS / No. of Clients for the period

where

Transactions (T) = Total transactions / Period

Average order value (OV) = Total revenue / Total orders

Gross margin (GM) = (Total revenue – Cost of sales/ Total revenue) * 100 [but how you calculate cost of sales is debatable]

Customer lifespan in months (ALS) = 1 / Churn Rate %

 

CLV can be calculated using any of the above mentioned methods depending upon how robust your company wants the analysis to be. Some companies are also using Machine learning models to predict CLV, maybe not directly but they use ML models to predict customer churn rate, retention rate and other marketing KPI’s. Some companies take advantage of all the methods by taking an average at the end.

Python vs R: Which Language to Choose for Deep Learning?

Data science is increasingly becoming essential for every business to operate efficiently in this modern world. This influences the processes composed together to obtain the required outputs for clients. While machine learning and deep learning sit at the core of data science, the concepts of deep learning become essential to understand as it can help increase the accuracy of final outputs. And when it comes to data science, R and Python are the most popular programming languages used to instruct the machines.

Python and R: Primary Languages Used for Deep Learning

Deep learning and machine learning differentiate based on the input data type they use. While machine learning depends upon the structured data, deep learning uses neural networks to store and process the data during the learning. Deep learning can be described as the subset of machine learning, where the data to be processed is defined in another structure than a normal one.

R is developed specifically to support the concepts and implementation of data science and hence, the support provided by this language is incredible as writing codes become much easier with its simple syntax.

Python is already much popular programming language that can serve more than one development niche without straining even for a bit. The implementation of Python for programming machine learning algorithms is very much popular and the results provided are accurate and faster than any other language. (C or Java). And because of its extended support for data science concept implementation, it becomes a tough competitor for R.

However, if we compare the charts of popularity, Python is obviously more popular among data scientists and developers because of its versatility and easier usage during algorithm implementation. However, R outruns Python when it comes to the packages offered to developers specifically expertise in R over Python. Therefore, to conclude which one of them is the best, let’s take an overview of the features and limits offered by both languages.

Python

Python was first introduced by Guido Van Rossum who developed it as the successor of ABC programming language. Python puts white space at the center while increasing the readability of the developed code. It is a general-purpose programming language that simply extends support for various development needs.

The packages of Python includes support for web development, software development, GUI (Graphical User Interface) development and machine learning also. Using these packages and putting the best development skills forward, excellent solutions can be developed. According to Stackoverflow, Python ranks at the fourth position as the most popular programming language among developers.

Benefits for performing enhanced deep learning using Python are:

  • Concise and Readable Code
  • Extended Support from Large Community of Developers
  • Open-source Programming Language
  • Encourages Collaborative Coding
  • Suitable for small and large-scale products

The latest and stable version of Python has been released as Python 3.8.0 on 14th October 2019. Developing a software solution using Python becomes much easier as the extended support offered through the packages drives better development and answers every need.

R

R is a language specifically used for the development of statistical software and for statistical data analysis. The primary user base of R contains statisticians and data scientists who are analyzing data. Supported by R Foundation for statistical computing, this language is not suitable for the development of websites or applications. R is also an open-source environment that can be used for mining excessive and large amounts of data.

R programming language focuses on the output generation but not the speed. The execution speed of programs written in R is comparatively lesser as producing required outputs is the aim not the speed of the process. To use R in any development or mining tasks, it is required to install its operating system specific binary version before coding to run the program directly into the command line.

R also has its own development environment designed and named RStudio. R also involves several libraries that help in crafting efficient programs to execute mining tasks on the provided data.

The benefits offered by R are pretty common and similar to what Python has to offer:

  • Open-source programming language
  • Supports all operating systems
  • Supports extensions
  • R can be integrated with many of the languages
  • Extended Support for Visual Data Mining

Although R ranks at the 17th position in Stackoverflow’s most popular programming language list, the support offered by this language has no match. After all, the R language is developed by statisticians for statisticians!

Python vs R: Should They be Really Compared?

Even when provided with the best technical support and efficient tools, a developer will not be able to provide quality outputs if he/she doesn’t possess the required skills. The point here is, technical skills rank higher than the resources provided. A comparison of these two programming languages is not advisable as they both hold their own set of advantages. However, the developers considering to use both together are less but they obtain maximum benefit from the process.

Both these languages have some features in common. For example, if a representative comes asking you if you lend technical support for developing an uber clone, you are directly going to decline as Python and R both do not support mobile app development. To benefit the most and develop excellent solutions using both these programming languages, it is advisable to stop comparing and start collaborating!

R and Python: How to Fit Both In a Single Program

Anticipating the future needs of the development industry, there has been a significant development to combine these both excellent programming languages into one. Now, there are two approaches to performing this: either we include R script into Python code or vice versa.

Using the available interfaces, packages and extended support from Python we can include R script into the code and enhance the productivity of Python code. Availability of PypeR, pyRserve and more resources helps run these two programming languages efficiently while efficiently performing the background work.

Either way, using the developed functions and packages made available for integrating Python in R are also effective at providing better results. Available R packages like rJython, rPython, reticulate, PythonInR and more, integrating Python into R language is very easy.

Therefore, using the development skills at their best and maximizing the use of such amazing resources, Python and R can be togetherly used to enhance end results and provide accurate deep learning support.

Conclusion

Python and R both are great in their own names and own places. However, because of the wide applications of Python in almost every operation, the annual packages offered to Python developers are less than the developers skilled in using R. However, this doesn’t justify the usability of R. The ultimate decision of choosing between these two languages depends upon the data scientists or developers and their mining requirements.

And if a developer or data scientist decides to develop skills for both- Python and R-based development, it turns out to be beneficial in the near future. Choosing any one or both to use in your project depends on the project requirements and expert support on hand.

Multi-touch attribution: A data-driven approach

Customers shopping behavior has changed drastically when it comes to online shopping, as nowadays, customer likes to do a thorough market research about a product before making a purchase.

What is Multi-touch attribution?

This makes it really hard for marketers to correctly determine the contribution for each marketing channel to which a customer was exposed to. The path a customer takes from his first search to the purchase is known as a Customer Journey and this path consists of multiple marketing channels or touchpoints. Therefore, it is highly important to distribute the budget between these channels to maximize return. This problem is known as multi-touch attribution problem and the right attribution model helps to steer the marketing budget efficiently. Multi-touch attribution problem is well known among marketers. You might be thinking that if this is a well known problem then there must be an algorithm out there to deal with this. Well, there are some traditional models  but every model has its own limitation which will be discussed in the next section.

Types of attribution models

Most of the eCommerce companies have a performance marketing department to make sure that the marketing budget is spent in an agile way. There are multiple heuristics attribution models pre-existing in google analytics however there are several issues with each one of them. These models are:

Traditional attribution models

First touch attribution model

100% credit is given to the first channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 1: First touch attribution model

Last touch attribution model

100% credit is given to the last channel as it is considered that the first marketing channel was responsible for the purchase.

Figure 2: Last touch attribution model

Linear-touch attribution model

In this attribution model, equal credit is given to all the marketing channels present in customer journey as it is considered that each channel is equally responsible for the purchase.

Figure 3: Linear attribution model

U-shaped or Bath tub attribution model

This is most common in eCommerce companies, this model assigns 40% to first and last touch and 20% is equally distributed among the rest.

Figure 4: Bathtub or U-shape attribution model

Data driven attribution models

Traditional attribution models follows somewhat a naive approach to assign credit to one or all the marketing channels involved. As it is not so easy for all the companies to take one of these models and implement it. There are a lot of challenges that comes with multi-touch attribution problem like customer journey duration, overestimation of branded channels, vouchers and cross-platform issue, etc.

Switching from traditional models to data-driven models gives us more flexibility and more insights as the major part here is defining some rules to prepare the data that fits your business. These rules can be defined by performing an ad hoc analysis of customer journeys. In the next section, I will discuss about Markov chain concept as an attribution model.

Markov chains

Markov chains concepts revolves around probability. For attribution problem, every customer journey can be seen as a chain(set of marketing channels) which will compute a markov graph as illustrated in figure 5. Every channel here is represented as a vertex and the edges represent the probability of hopping from one channel to another. There will be an another detailed article, explaining the concept behind different data-driven attribution models and how to apply them.

Figure 5: Markov chain example

Challenges during the Implementation

Transitioning from a traditional attribution models to a data-driven one, may sound exciting but the implementation is rather challenging as there are several issues which can not be resolved just by changing the type of model. Before its implementation, the marketers should perform a customer journey analysis to gain some insights about their customers and try to find out/perform:

  1. Length of customer journey.
  2. On an average how many branded and non branded channels (distinct and non-distinct) in a typical customer journey?
  3. Identify most upper funnel and lower funnel channels.
  4. Voucher analysis: within branded and non-branded channels.

When you are done with the analysis and able to answer all of the above questions, the next step would be to define some rules in order to handle the user data according to your business needs. Some of the issues during the implementation are discussed below along with their solution.

Customer journey duration

Assuming that you are a retailer, let’s try to understand this issue with an example. In May 2016, your company started a Fb advertising campaign for a particular product category which “attracted” a lot of customers including Chris. He saw your Fb ad while working in the office and clicked on it, which took him to your website. As soon as he registered on your website, his boss called him (probably because he was on Fb while working), he closed everything and went for the meeting. After coming back, he started working and completely forgot about your ad or products. After a few days, he received an email with some offers of your products which also he ignored until he saw an ad again on TV in Jan 2019 (after 3 years). At this moment, he started doing his research about your products and finally bought one of your products from some Instagram campaign. It took Chris almost 3 years to make his first purchase.

Figure 6: Chris journey

Now, take a minute and think, if you analyse the entire journey of customers like Chris, you would realize that you are still assigning some of the credit to the touchpoints that happened 3 years ago. This can be solved by using an attribution window. Figure 6 illustrates that 83% of the customers are making a purchase within 30 days which means the attribution window here could be 30 days. In simple words, it is safe to remove the touchpoints that happens after 30 days of purchase. This parameter can also be changed to 45 days or 60 days, depending on the use case.

Figure 7: Length of customer journey

Removal of direct marketing channel

A well known issue that every marketing analyst is aware of is, customers who are already aware of the brand usually comes to the website directly. This leads to overestimation of direct channel and branded channels start getting more credit. In this case, you can set a threshold (say 7 days) and remove these branded channels from customer journey.

Figure 8: Removal of branded channels

Cross platform problem

If some of your customers are using different devices to explore your products and you are not able to track them then it will make retargeting really difficult. In a perfect world these customers belong to same journey and if these can’t be combined then, except one, other paths would be considered as “non-converting path”. For attribution problem device could be thought of as a touchpoint to include in the path but to be able to track these customers across all devices would still be challenging. A brief introduction to deterministic and probabilistic ways of cross device tracking can be found here.

Figure 9: Cross platform clash

How to account for Vouchers?

To better account for vouchers, it can be added as a ‘dummy’ touchpoint of the type of voucher (CRM,Social media, Affiliate or Pricing etc.) used. In our case, we tried to add these vouchers as first touchpoint and also as a last touchpoint but no significant difference was found. Also, if the marketing channel of which the voucher was used was already in the path, the dummy touchpoint was not added.

Figure 10: Addition of Voucher as a touchpoint

AI For Advertisers: How Data Analytics Can Change The Maths Of Advertising?

All Images Credit: Freepik

The task of understanding a customer’s journey and designing your marketing strategy accordingly can be difficult in this data-driven world. Today, the customer expresses their needs in myriad forms of requests.

Consumers express their needs and want attitudes, and values in various forms through search, comments, blogs, Tweets, “likes,” videos, and conversations and access such data across many channels like web, mobile, and face to face. Volume, variety, velocity and veracity of the data accumulated through these customer interactions are huge.

BigData and data analytics can be leveraged to understand several phases of the customer journey. There are risks involved in using Artificial Intelligence for the marketing data analysis of data breach and even manipulation. But, AI do have brighter prospects when it comes to marketing and advertiser applications.

As the CEO of a technology firm Chop Dawg and marketer, Joshua Davidson puts it, “AI-powered apps are going to be the future for us, and there are several industries that are ripe for this.” The mobile-first strategy of many enterprises has powered the use of AI for digital marketing and developing technologies and innovations to power industries with intelligent systems.

How AI and Machine learning are affecting customer journeys?

Any consumer journey begins with the recognition of a problem and then stages like initial consideration, active evaluation, purchase, and postpurchase come through up till the consumer journey is over. The need for identifying the purchasing and need patterns of the consumers and finding the buyer personas to strategize the marketing for them.

Need and Want Recognition:

Identifying a need is quite difficult as it is the most initial level of a consumer’s journey and it is more on the category level than at a brand level. Marketers and advertisers are relying on techniques like market research, web analytics, and data mining to build consumer profiles and buyer’s persona for understanding the needs and influencing the purchase of products. AI can help identify these wants and needs in real-time as the consumers usually express their needs and wants online and help build profiles more quickly.

AI technologies offered by several firms help in consumer profiling. Firms like Microsoft offers Azure that crunches billions of data points in seconds to determine the needs of consumers. It then personalizes web content on specific platforms in real-time to align with those status-updates. Consumer digital footprints are evolving through social media status updates, purchasing behavior, online comments and posts. Ai tends to update these profiles continuously through machine learning techniques.

Initial Consideration:

A key objective of advertising is to insert a brand into the consideration set of the consumers when they are looking for deliberate offerings. Advertising includes increasing the visibility of brands and emphasize on the key reasons for consideration. Advertisers currently use search optimization, paid search advertisements, organic search, or advertisement retargeting for finding the consideration and increase the probability of consumer consideration.

AI can leverage machine learning and data analytics to help with search, identify and rank functions of consumer consideration that can match the real-time considerations at any specific time. Take an example of Google Adwords, it analyzes the consumer data and helps advertisers make clearer distinctions between qualified and unqualified leads for better targeting.

Google uses AI to analyze the search-query data by considering, not only the keywords but also context words and phrases, consumer activity data and other BigData. Then, Google identifies valuable subsets of consumers and more accurate targeting.

Active Evaluation: 

When consumers narrow it down to a few choices of brands, advertisers need to insert trust and value among the consumers for brands. A common technique is to identify the higher purchase consumers and persuade them through persuasive content and advertisement. AI can support these tasks using some techniques:

Predictive Lead Scoring: Predictive lead scoring by leveraging machine learning techniques of predictive analytics to allow marketers to make accurate predictions related to the intent of purchase for consumers. A machine learning algorithm runs through a database of existing consumer data, then recognize trends and patterns and after processing the external data on consumer activities and interests, creates robust consumer profiles for advertisers.

Natural Language Generation: By leveraging the image, speech recognition and natural language generation, machine learning enables marketers to curate content while learning from the consumer behavior in real-time scenarios and adjusts the content according to the profiles on the fly.

Emotion AI: Marketers use emotion AI to understand consumer sentiment and feel about the brand in general. By tapping into the reviews, blogs or videos they understand the mood of customers. Marketers also use emotion AI to pretest advertisements before its release. The famous example of Kelloggs, which used emotion AI to help devise an advertising campaign for their cereal, eliminating the advertisement executions whenever the consumer engagement dropped.

Purchase: 

As the consumers decide which brands to choose and what it’s worth, advertising aims to move them out of the decision process and push for the purchase by reinforcing the value of the brand compared with its competition.

Advertisers can insert such value by emphasizing convenience and information about where to buy the product, how to buy the product and reassuring the value through warranties and guarantees. Many marketers also emphasize on rapid return policies and purchase incentives.

AI can completely change the purchase process through dynamic pricing, which encompasses real-time price adjustments on the basis of information such as demand and other consumer-behavior variables, seasonality, and competitor activities.

Post-Purchase: 

Aftersales services can be improved through intelligent systems using AI technologies and machine learning techniques. Marketers and advertisers can hire dedicated developers to design intelligent virtual agents or chatbots that can reinforce the value and performance of a brand among consumers.

Marketers can leverage an intelligent technique known as Propensity modeling to identify the most valuable customers on the basis of lifetime value, likelihood of reengagement, propensity to churn, and other key performance measures of interest. Then advertisers can personalize their communication with these customers on the basis of these data.

Conclusion:

AI has shifted the focus of advertisers and marketers towards the customer-first strategies and enhanced the heuristics of customer engagement. Machine learning and IoT(Internet of Things) has already changed the way customer interact with the brands and this transition has come at a time when advertisers and marketers are looking for new ways to tap into the customer mindset and buyer’s persona.

All Images Credit: Freepik

Best machine learning algorithms you should know

Machine learning is a key technology tool businesses use to build tools that enhance their operations. To do that, they take advantage of machine learning algorithms that come in different shapes and sizes, servicing different purposes and working on different data sets. Choosing the right algorithm for the job is what makes machine learning and deep learning projects successful. That’s why being aware of all the different types of machine learning algorithms is so important – that’s how you get better results and build more advanced solutions.

Here’s an overview of the best machine learning algorithms you should know before starting your project.

What is meant by machine learning algorithms?

First things first, what is machine learning and how do algorithms fit into the picture? A machine learning (ML) algorithm is a process or set of procedures that allow a model to adapt to the data with a specific objective set as the goal.

An ML algorithm specifies how the data is transformed from the input to output, helping the model to learn the appropriate mapping from input to output. That model specifies the mapping functions and holds the parameters in place, while the machine learning algorithm updates the parameters to help the model match its goal.

What are the algorithms used in machine learning?

Algorithms can model problems in many different ways. The easiest way to differentiate between different ML algorithms is by comparing them by learning styles that they can adapt. Generally, machine learning algorithms can adapt to several learning styles that help to solve different problems.

Here are four learning styles in machine learning you need to know:

1 Supervised learning

In supervised learning, the input data serves as training data and comes with a known label or result – for example, the price at a time or spam/not-spam.

In this variant, the training process is critical for preparing a model that makes predictions and then is corrected when the predictions are wrong. The training process continues until the model achieves the appropriate level of accuracy. Classification and regression are examples of problems for this learning type.

 

2 Unsupervised learning

In unsupervised learning, input data isn’t labeled and doesn’t come with a known result. Data scientists prepare models by deducing the structures in the input data to extract general rules or reduce redundancy through mathematical processes. Unsupervised learning addresses problems such as association rule learning, dimensionality reduction, and clustering.

3 Semi-supervised learning

In this learning style, the input data is a mixture of labeled and unlabeled examples. The prediction problem is known, but the model needs to learn the structures for organizing data and making predictions on its own. This learning style is used to address problems such as regression and classification.

4 Reinforcement learning

One of three basic machine learning paradigms together with supervised learning and unsupervised learning, reinforcement learning (RL) is an area of machine learning that focuses on the ways in which software agents should take actions to maximize a specified notion of cumulative reward in a given environment.

The best machine learning algorithms you should know

1 Linear Regression

Linear regression is an algorithm that correlates between two variables in the data set, examining the input and output sets to show a relationship between them. For example, the algorithm can show how changing one of the input variables affects the other variable. The relationship is represented by plotting a line on the graph.

Linear regression is one of the most popular algorithms in machine learning because it’s transparent and requires no tuning to work. Practical applications of this algorithm are risk assessment or sales forecasting solutions.

2 Logistic regression

Logistic regression is a type of constrained Linear Regression with a non-linearity application after you apply weights. Note that this algorithm is used for classification, not regression. The algorithm restricts the outputs close to +/- classes (and 1 and 0 in the case of sigmoid) and can be trained with Gradient Descent or L-BFGS.

Logistic regression is used in Natural Language Processing (NLP) applications, where it often appears under the name of Maximum Entropy Classifier.

3 Principal component analysis (PCA or LDA)

Principal component analysis is an unsupervised method that helps data scientists to understand better the global properties of a data set that consists of vectors. It analyzes the covariance matrix of data points to learn which dimensions/data points have high variance among themselves and low covariance with others. The algorithm helps data scientists to get data points with reduced dimensions.

4 K-means clustering

K- means clustering is a type of unsupervised clustering algorithm that sorts data sets through defined clusters. It offers results in the form of groups based on internal patterns.

For example, you can use a K-means algorithm for sorting web results for the word “cat,” and it will show all the results in the form of groups. The main advantage of this algorithm is its accuracy as it provides data groupings faster than other algorithms.

 

5 Decision trees

A decision tree is made of various branches that represent the outcome of many decisions. This algorithm collects and graphs data in multiple branches to predict response variables on the basis of past decisions. It comes in handy for mapping our decisions and presents results visually to communicate findings easily.

Decision trees work best for smaller data sets and relatively low-stake decisions – otherwise, the long-tail visuals can be hard to decipher. The key advantage of this algorithm is that it allows showing multiple outcomes and tests without having to involve data scientists – it’s easy to use.

6 Random forests

A random forest consists of a great number of individual decision trees where they all operate as an ensemble. An individual tree in the random forest generates a class prediction – the class which receives the highest number of votes becomes the model’s prediction. Having many relatively uncorrelated models (trees) operating as a committee easily outperforms individual constituent models.

The low correlation between these models is the strength of this approach because it allows producing ensemble predictions that are far more accurate than individual predictions. Note that decisions trees protect each other from individual errors. While some trees may generate false predictions, others will generate the right ones – as a group; they will be able to move in the right direction.

7 Support Vector Machine

Support Vector Machines (SVMs) are linear models similar to linear or logistic regression we’ve discussed earlier. However, there’s one difference – they have a different margin-based loss function, which can be optimized by using methods such as L-BFGS or SGD. SVMs internally analyze data sets into classes, which is helpful for future classifications.

The main idea behind SVM is separating data into classes and maximizing the margins of entering future data into classes. This type of algorithm works best for training data. However, it can also serve as a tool for processing nonlinear data. The financial sector makes use of Support Vector Machines thanks to its accuracy in classifying both current and future data sets.

8 Apriori

The Apriori algorithm is used a lot in market analysis. It’s based on the principle of Apriori and checks for positive and negative correlations between products after analyzing values in data sets.

For example, if two values often correlate in a data set, the algorithm will conclude that A will often lead to B, referring to the information in data sets. For example, if customers often buy product A and product B together, this relation will hold a high percentage and help companies like Google or Amazon to predict product searches and purchases.

9 Naive Bayes Classifier

This handy classification technique is based on Bayes’ Theorem, which assumes independence among predictors. The algorithm will assume that the presence of a specific feature in a class is not related to the presence of any other feature in the same class.

For example, a fruit may be considered a banana if it’s yellow, curved, and about 15 cm long. These features depend on each other, and on the existence of hooter features, they all independently contribute to the probability that this fruit is a banana. That’s why the algorithm bears the name “Naive.”

The algorithm offers a model that is easy to build and helpful in handling very large data sets. It can outperform the most sophisticated classification methods.

10 K-Nearest Neighbors (KNN)

This is one of the simplest algorithm types used in machine learning for classification and regression. KNN algorithms classify new data points on the basis of similarity measures, such as the distance function. They perform classification by using a majority vote of the data points’ neighbors. They then assign data to the class, which has the nearest neighbors. Together with increasing the number of nearest neighbors (the value of k), the accuracy may increase as well.

11 Ordinary Least Squares Regression (OLSR)

Ordinary Least Squares Regression (OLSR) is a generalized linear modeling technique data scientists use for estimating unknown parameters that are part of a linear regression model. OLSR describes the relationship between a dependent variable and one or more of its independent variables.

The algorithm is applied in diverse fields such as economics, finance, medicine, and social sciences. Companies use it in machine learning and predictive analytics to dynamically predict specific outcomes on the basis of variables that change dynamically.

We hope that this machine learning algorithms list helps you pick the right tools of the trade for your next machine learning project. If you’d like to learn more about Machine Learning, Data Science and Web Development, visit the Sunscrapers company blog.

Visual Question Answering with Keras – Part 2: Making Computers Intelligent to answer from images

Making Computers Intelligent to answer from images

This is my second blog on Visual Question Answering, in the last blog, I have introduced to VQA, available datasets and some of the real-life applications of VQA. If you have not gone through then I would highly recommend you to go through it. Click here for more details about it.

In this blog post, I will walk through the implementation of VQA in Keras.

You can download the dataset from here: https://visualqa.org/index.html. All my experiments were performed with VQA v2 and I have used a very tiny subset of entire dataset i.e all samples for training and testing from the validation set.

Table of contents:

  1. Preprocessing Data
  2. Process overview for VQA
  3. Data Preprocessing – Images
  4. Data Preprocessing through the spaCy library- Questions
  5. Model Architecture
  6. Defining model parameters
  7. Evaluating the model
  8. Final Thought
  9. References

NOTE: The purpose of this blog is not to get the state-of-art performance on VQA. But the idea is to get familiar with the concept. All my experiments were performed with the validation set only.

Full code on my Github here.


1. Preprocessing Data:

If you have downloaded the dataset then the question and answers (called as annotations) are in JSON format. I have provided the code to extract the questions, annotations and other useful information in my Github repository. All extracted information is stored in .txt file format. After executing code the preprocessing directory will have the following structure.

All text files will be used for training.

 

2. Process overview for VQA:

As we have discussed in previous post visual question answering is broken down into 2 broad-spectrum i.e. vision and text.  I will represent the Neural Network approach to this problem using the Convolutional Neural Network (for image data) and Recurrent Neural Network(for text data). 

If you are not familiar with RNN (more precisely LSTM) then I would highly recommend you to go through Colah’s blog and Andrej Karpathy blog. The concepts discussed in this blogs are extensively used in my post.

The main idea is to get features for images from CNN and features for the text from RNN and finally combine them to generate the answer by passing them through some fully connected layers. The below figure shows the same idea.

 

I have used VGG-16 to extract the features from the image and LSTM layers to extract the features from questions and combining them to get the answer.

3. Data Preprocessing – Images:

Images are nothing but one of the input to our model. But as you already may know that before feeding images to the model we need to convert into the fixed-size vector.

So we need to convert every image into a fixed-size vector then it can be fed to the neural network. For this, we will use the VGG-16 pretrained model. VGG-16 model architecture is trained on millions on the Imagenet dataset to classify the image into one of 1000 classes. Here our task is not to classify the image but to get the bottleneck features from the second last layer.

Hence after removing the softmax layer, we get a 4096-dimensional vector representation (bottleneck features) for each image.

Image Source: https://www.cs.toronto.edu/~frossard/post/vgg16/

 

For the VQA dataset, the images are from the COCO dataset and each image has unique id associated with it. All these images are passed through the VGG-16 architecture and their vector representation is stored in the “.mat” file along with id. So in actual, we need not have to implement VGG-16 architecture instead we just do look up into file with the id of the image at hand and we will get a 4096-dimensional vector representation for the image.

4. Data Preprocessing through the spaCy library- Questions:

spaCy is a free, open-source library for advanced Natural Language Processing (NLP) in Python. As we have converted images into a fixed 4096-dimensional vector we also need to convert questions into a fixed-size vector representation. For installing spaCy click here

You might know that for training word embeddings in Keras we have a layer called an Embedding layer which takes a word and embeds it into a higher dimensional vector representation. But by using the spaCy library we do not have to train the get the vector representation in higher dimensions.

 

This model is actually trained on billions of tokens of the large corpus. So we just need to call the vector method of spaCy class and will get vector representation for word.

After fitting, the vector method on tokens of each question will get the 300-dimensional fixed representation for each word.

5. Model Architecture:

In our problem the input consists of two parts i.e an image vector, and a question, we cannot use the Sequential API of the Keras library. For this reason, we use the Functional API which allows us to create multiple models and finally merge models.

The below picture shows the high-level architecture idea of submodules of neural network.

After concatenating the 2 different models the summary will look like the following.

The below plot helps us to visualize neural network architecture and to understand the two types of input:

 

6. Defining model parameters:

The hyperparameters that we are going to use for our model is defined as follows:

If you know what this parameter means then you can play around it and can get better results.

Time Taken: I used the GPU on https://colab.research.google.com and hence it took me approximately 2 hours to train the model for 5 epochs. However, if you train it on a PC without GPU, it could take more time depending on the configuration of your machine.

7. Evaluating the model:

Since I have used the very small dataset for performing these experiments I am not able to get very good accuracy. The below code will calculate the accuracy of the model.

 

Since I have trained a model multiple times with different parameters you will not get the same accuracy as me. If you want you can directly download mode.h5 file from my google drive.

 

8. Final Thoughts:

One of the interesting thing about VQA is that it a completely new field. So there is absolutely no end to what you can do to solve this problem. Below are some tips while replicating the code.

  1. Start with a very small subset of data: When you start implementing I suggest you start with a very small amount of data. Because once you are ready with the whole setup then you can scale it any time.
  2. Understand the code: Understanding code line by line is very much helpful to match your theoretical knowledge. So for that, I suggest you can take very few samples(maybe 20 or less) and run a small chunk (2 to 3 lines) of code to get the functionality of each part.
  3. Be patient: One of the mistakes that I did while starting with this project was to do everything at one go. If you get some error while replicating code spend 4 to 5 days harder on that. Even after that if you won’t able to solve, I would suggest you resume after a break of 1 or 2 days. 

VQA is the intersection of NLP and CV and hopefully, this project will give you a better understanding (more precisely practically) with most of the deep learning concepts.

If you want to improve the performance of the model below are few tips you can try:

  1. Use larger datasets
  2. Try Building more complex models like Attention, etc
  3. Try using other pre-trained word embeddings like Glove 
  4. Try using a different architecture 
  5. Do more hyperparameter tuning

The list is endless and it goes on.

In the blog, I have not provided the complete code you can get it from my Github repository.

9. References:

  1. https://blog.floydhub.com/asking-questions-to-images-with-deep-learning/
  2. https://tryolabs.com/blog/2018/03/01/introduction-to-visual-question-answering/
  3. https://github.com/sominwadhwa/vqamd_floyd

DATANOMIQ MeetUp: Interactive Data Exploration and GUI’s in JupyterNotebooks

After our first successful collaboration Meetup with Mister Spex, we straightly continue with our next partner: VW Digital Labs!

Join us on Wednesday, October 9 for our DATANOMIQ Data Science Meetup at VW Digital Labs and get inspired.

When:
Wednesday, October 9, time TBA

Where:
VW Digital Labs
Stralauer Allee 7, 10245 Berlin

 

AGENDA
18:30 doors open
19:00 Interactive Data Exploration and GUI’s in JupyterNotebooks – Christopher Kipp.
– using ipywidgets to get basic UI components and connet them
– qgrid to make Dataframes interactive (sortable, filterable, …)
– building interactive visualisations with bqplot

19:20 Q&A

10 minute break

19:40 second presentation
20:00 Q&A

20:15 networking

 

FREE ENTRY, snacks and drinks sponsored by VW digital labs.

Make sure to get your ticket: https://www.eventbrite.de/e/datanomiq-meetup-interactive-data-exploration-and-guis-in-jupyternotebook-tickets-72931655545

Entrance only with registration.

 

Join our MeetUp group: https://www.meetup.com/de-DE/DATANOMIQ-Data-Science-Berlin/

Visual Question Answering with Keras – Part 1

This is Part I of II of the Article Series Visual Question Answering with Keras

Making Computers Intelligent to answer from images

If we look closer in the history of Artificial Intelligence (AI), the Deep Learning has gained more popularity in the recent years and has achieved the human-level performance in the tasks such as Speech Recognition, Image Classification, Object Detection, Machine Translation and so on. However, as humans, not only we but also a five-year child can normally perform these tasks without much inconvenience. But the development of such systems with these capabilities has always considered an ambitious goal for the researchers as well as for developers.

In this series of blog posts, I will cover an introduction to something called VQA (Visual Question Answering), its available datasets, the Neural Network approach for VQA and its implementation in Keras and the applications of this challenging problem in real life. 

Table of Contents:

1 Introduction

2 What is exactly Visual Question Answering?

3 Prerequisites

4 Datasets available for VQA

4.1 DAQUAR Dataset

4.2 CLEVR Dataset

4.3 FigureQA Dataset

4.4 VQA Dataset

5 Real-life applications of VQA

6 Conclusion

 

  1. Introduction:

Let’s say you are given a below picture along with one question. Can you answer it?

I expect confidently you all say it is the Kitchen without much inconvenience which is also the right answer. Even a five-year child who just started to learn things might answer this question correctly.

Alright, but can you write a computer program for such type of task that takes image and question about the image as an input and gives us answer as output?

Before the development of the Deep Neural Network, this problem was considered as one of the difficult, inconceivable and challenging problem for the AI researcher’s community. However, due to the recent advancement of Deep Learning the systems are capable of answering these questions with the promising result if we have a required dataset.

Now I hope you have got at least some intuition of a problem that we are going to discuss in this series of blog posts. Let’s try to formalize the problem in the below section.

  1. What is exactly Visual Question Answering?:

We can define, “Visual Question Answering(VQA) is a system that takes an image and natural language question about the image as an input and generates natural language answer as an output.”

VQA is a research area that requires an understanding of vision(Computer Vision)  as well as text(NLP). The main beauty of VQA is that the reasoning part is performed in the context of the image. So if we have an image with the corresponding question then the system must able to understand the image well in order to generate an appropriate answer. For example, if the question is the number of persons then the system must able to detect faces of the persons. To answer the color of the horse the system need to detect the objects in the image. Many of these common problems such as face detection, object detection, binary object classification(yes or no), etc. have been solved in the field of Computer Vision with good results.

To summarize a good VQA system must be able to address the typical problems of CV as well as NLP.

To get a better feel of VQA you can try online VQA demo by CloudCV. You just go to this link and try uploading the picture you want and ask the related question to the picture, the system will generate the answer to it.

 

  1. Prerequisites:

In the next post, I will walk you through the code for this problem using Keras. So I assume that you are familiar with:

  1. Fundamental concepts of Machine Learning
  2. Multi-Layered Perceptron
  3. Convolutional Neural Network
  4. Recurrent Neural Network (especially LSTM)
  5. Gradient Descent and Backpropagation
  6. Transfer Learning
  7. Hyperparameter Optimization
  8. Python and Keras syntax
  1. Datasets available for VQA:

As you know problems related to the CV or NLP the availability of the dataset is the key to solve the problem. The complex problems like VQA, the dataset must cover all possibilities of questions answers in real-world scenarios. In this section, I will cover some of the datasets available for VQA.

4.1 DAQUAR Dataset:

The DAQUAR dataset is the first dataset for VQA that contains only indoor scenes. It shows the accuracy of 50.2% on the human baseline. It contains images from the NYU_Depth dataset.

Example of DAQUAR dataset

Example of DAQUAR dataset

The main disadvantage of DAQUAR is the size of the dataset is very small to capture all possible indoor scenes.

4.2 CLEVR Dataset:

The CLEVR Dataset from Stanford contains the questions about the object of a different type, colors, shapes, sizes, and material.

It has

  • A training set of 70,000 images and 699,989 questions
  • A validation set of 15,000 images and 149,991 questions
  • A test set of 15,000 images and 14,988 questions

Image Source: https://cs.stanford.edu/people/jcjohns/clevr/?source=post_page

 

4.3 FigureQA Dataset:

FigureQA Dataset contains questions about the bar graphs, line plots, and pie charts. It has 1,327,368 questions for 100,000 images in the training set.

4.4 VQA Dataset:

As comapred to all datasets that we have seen so far VQA dataset is relatively larger. The VQA dataset contains open ended as well as multiple choice questions. VQA v2 dataset contains:

  • 82,783 training images from COCO (common objects in context) dataset
  • 40, 504 validation images and 81,434 validation images
  • 443,757 question-answer pairs for training images
  • 214,354 question-answer pairs for validation images.

As you might expect this dataset is very huge and contains 12.6 GB of training images only. I have used this dataset in the next post but a very small subset of it.

This dataset also contains abstract cartoon images. Each image has 3 questions and each question has 10 multiple choice answers.

  1. Real-life applications of VQA:

There are many applications of VQA. One of the famous applications is to help visually impaired people and blind peoples. In 2016, Microsoft has released the “Seeing AI” app for visually impaired people to describe the surrounding environment around them. You can watch this video for the prototype of the Seeing AI app.

Another application could be on social media or e-commerce sites. VQA can be also used for educational purposes.

  1. Conclusion:

I hope this explanation will give you a good idea of Visual Question Answering. In the next blog post, I will walk you through the code in Keras.

If you like my explanations, do provide some feedback, comments, etc. and stay tuned for the next post.

A Bird’s Eye View: How Machine Learning Can Help You Charge Your E-Scooters

Bird scooters in Columbus, Ohio

Bird scooters in Columbus, Ohio

Ever since I started using bike-sharing to get around in Seattle, I have become fascinated with geolocation data and the transportation sharing economy. When I saw this project leveraging the mobility data RESTful API from the Los Angeles Department of Transportation, I was eager to dive in and get my hands dirty building a data product utilizing a company’s mobility data API.

Unfortunately, the major bike and scooter providers (Bird, JUMP, Lime) don’t have publicly accessible APIs. However, some folks have seemingly been able to reverse-engineer the Bird API used to populate the maps in their Android and iOS applications.

One interesting feature of this data is the nest_id, which indicates if the Bird scooter is in a “nest” — a centralized drop-off spot for charged Birds to be released back into circulation.

I set out to ask the following questions:

  1. Can real-time predictions be made to determine if a scooter is currently in a nest?
  2. For non-nest scooters, can new nest location recommendations be generated from geospatial clustering?

To answer these questions, I built a full-stack machine learning web application, NestGenerator, which provides an automated recommendation engine for new nest locations. This application can help power Bird’s internal nest location generation that runs within their Android and iOS applications. NestGenerator also provides real-time strategic insight for Bird chargers who are enticed to optimize their scooter collection and drop-off route based on proximity to scooters and nest locations in their area.

Bird

The electric scooter market has seen substantial growth with Bird’s recent billion dollar valuation  and their $300 million Series C round in the summer of 2018. Bird offers electric scooters that top out at 15 mph, cost $1 to unlock and 15 cents per minute of use. Bird scooters are in over 100 cities globally and they announced in late 2018 that they eclipsed 10 million scooter rides since their launch in 2017.

Bird scooters in Tel Aviv, Israel

Bird scooters in Tel Aviv, Israel

With all of these scooters populating cities, there’s much-needed demand for people to charge them. Since they are electric, someone needs to charge them! A charger can earn additional income for charging the scooters at their home and releasing them back into circulation at nest locations. The base price for charging each Bird is $5.00. It goes up from there when the Birds are harder to capture.

Data Collection and Machine Learning Pipeline

The full data pipeline for building “NestGenerator”

Data

From the details here, I was able to write a Python script that returned a list of Bird scooters within a specified area, their geolocation, unique ID, battery level and a nest ID.

I collected scooter data from four cities (Atlanta, Austin, Santa Monica, and Washington D.C.) across varying times of day over the course of four weeks. Collecting data from different cities was critical to the goal of training a machine learning model that would generalize well across cities.

Once equipped with the scooter’s latitude and longitude coordinates, I was able to leverage additional APIs and municipal data sources to get granular geolocation data to create an original scooter attribute and city feature dataset.

Data Sources:

  • Walk Score API: returns a walk score, transit score and bike score for any location.
  • Google Elevation API: returns elevation data for all locations on the surface of the earth.
  • Google Places API: returns information about places. Places are defined within this API as establishments, geographic locations, or prominent points of interest.
  • Google Reverse Geocoding API: reverse geocoding is the process of converting geographic coordinates into a human-readable address.
  • Weather Company Data: returns the current weather conditions for a geolocation.
  • LocationIQ: Nearby Points of Interest (PoI) API returns specified PoIs or places around a given coordinate.
  • OSMnx: Python package that lets you download spatial geometries and model, project, visualize, and analyze street networks from OpenStreetMap’s APIs.

Feature Engineering

After extensive API wrangling, which included a four-week prolonged data collection phase, I was finally able to put together a diverse feature set to train machine learning models. I engineered 38 features to classify if a scooter is currently in a nest.

Full Feature Set

Full Feature Set

The features boiled down into four categories:

  • Amenity-based: parks within a given radius, gas stations within a given radius, walk score, bike score
  • City Network Structure: intersection count, average circuity, street length average, average streets per node, elevation level
  • Distance-based: proximity to closest highway, primary road, secondary road, residential road
  • Scooter-specific attributes: battery level, proximity to closest scooter, high battery level (> 90%) scooters within a given radius, total scooters within a given radius

 

Log-Scale Transformation

For each feature, I plotted the distribution to explore the data for feature engineering opportunities. For features with a right-skewed distribution, where the mean is typically greater than the median, I applied these log transformations to normalize the distribution and reduce the variability of outlier observations. This approach was used to generate a log feature for proximity to closest scooter, closest highway, primary road, secondary road, and residential road.

An example of a log transformation

Statistical Analysis: A Systematic Approach

Next, I wanted to ensure that the features I included in my model displayed significant differences when broken up by nest classification. My thinking was that any features that did not significantly differ when stratified by nest classification would not have a meaningful predictive impact on whether a scooter was in a nest or not.

Distributions of a feature stratified by their nest classification can be tested for statistically significant differences. I used an unpaired samples t-test with a 0.01% significance level to compute a p-value and confidence interval to determine if there was a statistically significant difference in means for a feature stratified by nest classification. I rejected the null hypothesis if a p-value was smaller than the 0.01% threshold and if the 99.9% confidence interval did not straddle zero. By rejecting the null-hypothesis in favor of the alternative hypothesis, it’s deemed there is a significant difference in means of a feature by nest classification.

Battery Level Distribution Stratified by Nest Classification to run a t-test

Battery Level Distribution Stratified by Nest Classification to run a t-test

Log of Closest Scooter Distribution Stratified by Nest Classification to run a t-test

Throwing Away Features

Using the approach above, I removed ten features that did not display statistically significant results.

Statistically Insignificant Features Removed Before Model Development

Model Development

I trained two models, a random forest classifier and an extreme gradient boosting classifier since tree-based models can handle skewed data, capture important feature interactions, and provide a feature importance calculation. I trained the models on 70% of the data collected for all four cities and reserved the remaining 30% for testing.

After hyper-parameter tuning the models for performance on cross-validation data it was time to run the models on the 30% of test data set aside from the initial data collection.

I also collected additional test data from other cities (Columbus, Fort Lauderdale, San Diego) not involved in training the models. I took this step to ensure the selection of a machine learning model that would generalize well across cities. The performance of each model on the additional test data determined which model would be integrated into the application development.

Performance on Additional Cities Test Data

The Random Forest Classifier displayed superior performance across the board

The Random Forest Classifier displayed superior performance across the board

I opted to move forward with the random forest model because of its superior performance on AUC score and accuracy metrics on the additional cities test data. AUC is the Area under the ROC Curve, and it provides an aggregate measure of model performance across all possible classification thresholds.

AUC Score on Test Data for each Model

AUC Score on Test Data for each Model

Feature Importance

Battery level dominated as the most important feature. Additional important model features were proximity to high level battery scooters, proximity to closest scooter, and average distance to high level battery scooters.

Feature Importance for the Random Forest Classifier

Feature Importance for the Random Forest Classifier

The Trade-off Space

Once I had a working machine learning model for nest classification, I started to build out the application using the Flask web framework written in Python. After spending a few days of writing code for the application and incorporating the trained random forest model, I had enough to test out the basic functionality. I could finally run the application locally to call the Bird API and classify scooter’s into nests in real-time! There was one huge problem, though. It took more than seven minutes to generate the predictions and populate in the application. That just wasn’t going to cut it.

The question remained: will this model deliver in a production grade environment with the goal of making real-time classifications? This is a key trade-off in production grade machine learning applications where on one end of the spectrum we’re optimizing for model performance and on the other end we’re optimizing for low latency application performance.

As I continued to test out the application’s performance, I still faced the challenge of relying on so many APIs for real-time feature generation. Due to rate-limiting constraints and daily request limits across so many external APIs, the current machine learning classifier was not feasible to incorporate into the final application.

Run-Time Compliant Application Model

After going back to the drawing board, I trained a random forest model that relied primarily on scooter-specific features which were generated directly from the Bird API.

Through a process called vectorization, I was able to transform the geolocation distance calculations utilizing NumPy arrays which enabled batch operations on the data without writing any “for” loops. The distance calculations were applied simultaneously on the entire array of geolocations instead of looping through each individual element. The vectorization implementation optimized real-time feature engineering for distance related calculations which improved the application response time by a factor of ten.

Feature Importance for the Run-time Compliant Random Forest Classifier

Feature Importance for the Run-time Compliant Random Forest Classifier

This random forest model generalized well on test-data with an AUC score of 0.95 and an accuracy rate of 91%. The model retained its prediction accuracy compared to the former feature-rich model, but it gained 60x in application performance. This was a necessary trade-off for building a functional application with real-time prediction capabilities.

Geospatial Clustering

Now that I finally had a working machine learning model for classifying nests in a production grade environment, I could generate new nest locations for the non-nest scooters. The goal was to generate geospatial clusters based on the number of non-nest scooters in a given location.

The k-means algorithm is likely the most common clustering algorithm. However, k-means is not an optimal solution for widespread geolocation data because it minimizes variance, not geodetic distance. This can create suboptimal clustering from distortion in distance calculations at latitudes far from the equator. With this in mind, I initially set out to use the DBSCAN algorithm which clusters spatial data based on two parameters: a minimum cluster size and a physical distance from each point. There were a few issues that prevented me from moving forward with the DBSCAN algorithm.

  1. The DBSCAN algorithm does not allow for specifying the number of clusters, which was problematic as the goal was to generate a number of clusters as a function of non-nest scooters.
  2. I was unable to hone in on an optimal physical distance parameter that would dynamically change based on the Bird API data. This led to suboptimal nest locations due to a distortion in how the physical distance point was used in clustering. For example, Santa Monica, where there are ~15,000 scooters, has a higher concentration of scooters in a given area whereas Brookline, MA has a sparser set of scooter locations.

An example of how sparse scooter locations vs. highly concentrated scooter locations for a given Bird API call can create cluster distortion based on a static physical distance parameter in the DBSCAN algorithm. Left:Bird scooters in Brookline, MA. Right:Bird scooters in Santa Monica, CA.

An example of how sparse scooter locations vs. highly concentrated scooter locations for a given Bird API call can create cluster distortion based on a static physical distance parameter in the DBSCAN algorithm. Left:Bird scooters in Brookline, MA. Right:Bird scooters in Santa Monica, CA.

Given the granularity of geolocation scooter data I was working with, geospatial distortion was not an issue and the k-means algorithm would work well for generating clusters. Additionally, the k-means algorithm parameters allowed for dynamically customizing the number of clusters based on the number of non-nest scooters in a given location.

Once clusters were formed with the k-means algorithm, I derived a centroid from all of the observations within a given cluster. In this case, the centroids are the mean latitude and mean longitude for the scooters within a given cluster. The centroids coordinates are then projected as the new nest recommendations.

NestGenerator showcasing non-nest scooters and new nest recommendations utilizing the K-Means algorithm

NestGenerator showcasing non-nest scooters and new nest recommendations utilizing the K-Means algorithm.

NestGenerator Application

After wrapping up the machine learning components, I shifted to building out the remaining functionality of the application. The final iteration of the application is deployed to Heroku’s cloud platform.

In the NestGenerator app, a user specifies a location of their choosing. This will then call the Bird API for scooters within that given location and generate all of the model features for predicting nest classification using the trained random forest model. This forms the foundation for map filtering based on nest classification. In the app, a user has the ability to filter the map based on nest classification.

Drop-Down Map View filtering based on Nest Classification

Drop-Down Map View filtering based on Nest Classification

Nearest Generated Nest

To see the generated nest recommendations, a user selects the “Current Non-Nest Scooters & Predicted Nest Locations” filter which will then populate the application with these nest locations. Based on the user’s specified search location, a table is provided with the proximity of the five closest nests and an address of the Nest location to help inform a Bird charger in their decision-making.

NestGenerator web-layout with nest addresses and proximity to nearest generated nests

NestGenerator web-layout with nest addresses and proximity to nearest generated nests

Conclusion

By accurately predicting nest classification and clustering non-nest scooters, NestGenerator provides an automated recommendation engine for new nest locations. For Bird, this application can help power their nest location generation that runs within their Android and iOS applications. NestGenerator also provides real-time strategic insight for Bird chargers who are enticed to optimize their scooter collection and drop-off route based on scooters and nest locations in their area.

Code

The code for this project can be found on my GitHub

Comments or Questions? Please email me an E-Mail!