Coffee Shop Location Predictor

As part of this article, we will explore the main steps involved in predicting the best location for a coffee shop in Vancouver. We will also take into consideration that the coffee shop is near a transit station, and has no Starbucks near it. Well, while at it, let us also add an extra feature where we make sure the crime in the area is lower.

Introduction

In this article, we will highlight the main steps involved to predict a location for a coffee shop in Vancouver. We also want to make sure that the coffee shop is near a transit station, and has no Starbucks near it. As an added feature, we will make sure that the crime concentration in the area is low, and the entire program should be implemented in Python. So let’s walk through the steps.

Steps Required

  • Get crime history for the last two years
  • Get locations of all transit stations and Starbucks in Vancouver
  • Check all the transit stations that do not have any Starbucks near them
  • Get all the data regarding crimes near the filtered transit stations
  • Create a grid of all possible coordinates around the transit station
  • Check crime around each created coordinate and display the top 5 locations.

Gathering Data

This covers the first two steps required to get data from the internet, both manually and automatically.

Getting all Crime History

We can get crime history for the past 14 years in Vancouver from here. This data is in raw crime.csv format, so we have to process it and filter out useless data. We then write this processed information on the crime_processed.csv file.

Note: There are 530,653 records of crime in this file

In this program, we will just use the type and coordinate of the crime. There are many crime types, but we have classified them into three major categories namely;

Theft (red), Break and Enter (orange) and Mischief (green)

These all crimes can be plotted on Graph as displayed below.

This may seem very congested and full, so let’s see a closeup image for future references.

Getting Locations of all Rapid Transit Stations

We can get the coordinates of all Transit Stations in Vancouver from here. This dataset has all coordinates of rapid transit stations in three transit lines in Vancouver. There are a total of 23 of them in Vancouver, we can then use it for further processing.

Getting Locations of all Starbucks

The Starbucks data is present here, we can scrape it easily and get the locations of all the Starbucks in Vancouver. We just need the Starbucks that is near transit stations, so we’ll filter out the rest. There are a total 24 Starbucks in Vancouver, and 10 of them are near Transit Stations.

Note: Other than the coordinates of Transit Stations and Starbucks, we also need coordinates and type of the crime.

Transit Stations with no Starbucks

As we have all the data required, now moving to the next step. We need to get to the transit Station locations that have no Starbucks near them. For that we can create an area of particular radius around each Transit Station. Then check all Starbucks locations with respect to them, whether they are within that area or not.

If none of the Starbucks are within that particular Transit Station’s area, we can append it to a list. At the end, we have a list of all Transit locations with no Starbucks near them. There are a total of 6 Transit Stations with no Starbucks near them.

Crime near Transit Stations

Now lets filter out all crime records and get just what we are interested in, which means the crime near Transit stations. For that we will plot an area of specific radius around each of them to see the crimes. These are more than 110,000 crime records.

Crime near located Transit Stations

Now that we have all the Transit Stations that don’t have any Starbucks near them and also the crime near all Transit Stations. So, let’s use this information and get crime near the located Transit Stations. These are about 44,000 crime records.

This may seem correct at first glance, but the points are overlapping due to abundance, so we can create different lists of crimes based on their types.

Theft

Break and Enter

Mischief

Generating all possible coordinates

Now finally, we have all the prerequisites and let’s get to the main task at hand, predicting the best coordinate for the coffee shop.

There may be many approaches to solve this problem, but the one I used in this program is that I will create a grid of all possible locations (coordinates) in the area of 1 km radius around each located transit station.

Initially I generated 1 coordinate for every m, this resulted in 1000,000 coordinates in every km. This is a huge number, and for the 6 located Transit stations, it becomes 6 Million. It may not seem much at first glance because computers can handle such data in a few seconds.

But for location prediction we need to compare each coordinate with crime coordinates. As the algorithm has to check for ~7,000 Thefts, ~19,000 Break ins, and ~17,000 Mischiefs around each generated coordinate. Computing this would want the program to process an estimate of 432.4 Billion times. This sort of execution takes many hours on normal computers (sometimes days).

The solution to this is to create a coordinate for each 10 m area, this results about 10,000 coordinate per km. For the above mentioned number of crimes, the estimated processes will be several Billions. That would significantly reduce the time, but is still not less.

To control this, we can remove the duplicate values in crime coordinates and those which are too close to each other ~1m. Doing so, we are left with just 816 Thefts, 2,654 Break ins, and 8,234 Mischiefs around each generated coordinate.
The precision will not be affected much but the time and computational resources required will be reduced a lot.

 

Checking Crime near Generated coordinates

Now that we have all the locations, we will start some processing on it and check each coordinate against some constraints. That are respectively;

  1. Filter out Coordinates having Theft near 1 km
    We get 122,000 coordinates with no Thefts (Below merged 1000 to 1)
  2. Filter out Coordinates having Break Ins near 200m
    We get 8000 coordinates with no Thefts (Below merged 1000 to 1)
  3. Filter out Coordinates having Mischief near 200m
    We get 6000 coordinates with no Thefts (Below merged 1000 to 1)
    Now that we have 6 Coordinates of best locations that have passed through all the constraints, we will order them.To order them, we will check their distance from the nearest transit location. The nearest will be on top of the list as the best possible location, then the second and so on. The generated List is;

    1. -123.0419406741792, 49.24824259252004
    2. -123.05887151659479, 49.24327221040713
    3. -123.05287151659476, 49.24327221040713
    4. -123.04994067417924, 49.239242592520064
    5. -123.0419406741792, 49.239242592520064
    6. -123.0409406741792, 49.239242592520064

How can MindTrades help?

MindTrades Consulting Services, a leading marketing agency provides in-depth analysis and insights for the global IT sector including leading data integration brands such as Diyotta. From Cloud Migration, Big Data, Digital Transformation, Agile Deliver, Cyber Security, to Analytics- Mind trades provides published breakthrough ideas, and prompt content delivery. For more information, refer to mindtrades.com.

Code

https://github.com/Mindtrades-Consulting/Coffee-Shop-Location-Predictor

 

I'm Mrinalini, a content marketing manager with MindTrades, a digital transformation company based in the US. I enjoy writing about data science and have been published in several magazines!

0 replies

Leave a Reply

Want to join the discussion?
Feel free to contribute!

Leave a Reply

Your email address will not be published. Required fields are marked *

297 Views