Discovering Patterns in Comfort Retailer Places with Geospatial Affiliation Rule Mining | by Elliot Humphrey | Apr, 2023


Picture by Matt Liu on Unsplash

When strolling round Tokyo you’ll typically move quite a few comfort shops, regionally often called “konbinis”, which is sensible since there are over 56,000 comfort shops in Japan. Usually there will likely be totally different chains of comfort retailer positioned very shut to 1 one other; it’s not unusual to see shops across the nook from one another or on reverse sides of the road. Given Tokyo’s inhabitants density, it’s comprehensible for competing companies to be pressured nearer to one another, nevertheless, might there be any relationships between which chains of comfort shops are discovered close to one another?

The objective will likely be to gather location information from quite a few comfort retailer chains in a Tokyo neighbourhood, to know if there are any relationships between which chains are co-located with one another. To do that would require:

  • Skill to question the situation of various comfort shops in Tokyo, with a purpose to retrieve every retailer’s title and site
  • Discovering which comfort shops are co-located with one another inside a pre-defined radius
  • Utilizing the info on co-located shops to derive affiliation guidelines
  • Plotting and visualising outcomes for inspection

Let’s start!

For our use case we wish to discover comfort shops in Tokyo, so first we’ll have to perform a little homework on what are the frequent retailer chains. A fast Google search tells me that the principle shops are FamilyMart, Lawson, 7-Eleven, Ministop, Each day Yamazaki and NewDays.

Now we all know what we’re looking, lets go to OSMNX; an incredible Python bundle for looking information in OpenStreetMap (OSM). In accordance the OSM’s schema, we must always be capable of discover the shop title in both the ‘model:en’ or ‘model’ discipline.

We are able to begin by importing some helpful libraries for getting our information, and defining a operate to return a desk of areas for a given comfort retailer chain inside a specified space:

import geopandas as gpd
from shapely.geometry import Level, Polygon
import osmnx
import shapely
import pandas as pd
import numpy as np
import networkx as nx

def point_finder(place, tags):
'''
Returns a dataframe of coordinates of an entity from OSM.

Parameters:
place (str): a location (i.e., 'Tokyo, Japan')
tags (dict): key worth of entity attribute in OSM (i.e., 'Title') and worth (i.e., amenity title)
Returns:
outcomes (DataFrame): desk of latitude and longitude with entity worth
'''

gdf = osmnx.geocode_to_gdf(place)
#Getting the bounding field of the gdf
bounding = gdf.bounds
north, south, east, west = bounding.iloc[0,3], bounding.iloc[0,1], bounding.iloc[0,2], bounding.iloc[0,0]
location = gdf.geometry.unary_union
#Discovering the factors inside the space polygon
level = osmnx.geometries_from_bbox(north,
south,
east,
west,
tags=tags)
level.set_crs(crs=4326)
level = level[point.geometry.within(location)]
#Ensuring we're coping with factors
level['geometry'] = level['geometry'].apply(lambda x : x.centroid if kind(x) == Polygon else x)
level = level[point.geom_type != 'MultiPolygon']
level = level[point.geom_type != 'Polygon']

outcomes = pd.DataFrame({'title' : record(level['name']),
'longitude' : record(level['geometry'].x),
'latitude' : record(level['geometry'].y)}
)

outcomes['name'] = record(tags.values())[0]
return outcomes

convenience_stores = place_finder(place = 'Shinjuku, Tokyo',
tags={"model:en" : " "})

We are able to move every comfort retailer title and mix the outcomes right into a single desk of retailer title, longitude and latitude. For our use case we will deal with the Shinjuku neighbourhood in Tokyo, and see what the abundance of every comfort retailer appears to be like like:

Frequency rely of comfort shops. Picture by creator.

Clearly FamilyMart and 7-Eleven dominate within the frequency of shops, however how does this look spatially? Plotting geospatial information is fairly simple when utilizing Kepler.gl, which features a good interface for creating visualisations which will be saved as html objects or visualised immediately in Jupyter notebooks:

Location map of Shinjuku comfort shops, color coded by retailer title. Picture by creator.
Location map of Shinjuku comfort shops, color coded density in a two minute strolling radius (168m). picture by creator.

Now that we’ve our information, the subsequent step will likely be to search out nearest neighbours for every comfort retailer. To do that, we will likely be utilizing Scikit Study’s ‘BallTree’ class to search out the names of the closest comfort shops inside a two minute strolling radius. We’re not eager about what number of shops are thought-about nearest neighbours, so we’ll simply take a look at which comfort retailer chains are inside the outlined radius.

# Convert location to radians
areas = convenience_stores[["latitude", "longitude"]].values
locations_radians = np.radians(areas)

# Create a balltree to go looking areas
tree = BallTree(locations_radians, leaf_size=15, metric='haversine')

# Discover nearest neighbours in a 2 minute strolling radius
is_within, distances = tree.query_radius(locations_radians, r=168/6371000, count_only=False, return_distance=True)

# Exchange the neighbour indices with retailer names
df = pd.DataFrame(is_within)
df.columns = ['indices']
df['indices'] = [[val for val in row if val != idx] for idx, row in enumerate(df['indices'])]

# create non permanent index column
convenience_stores = convenience_stores.reset_index()
# set non permanent index column as index
convenience_stores = convenience_stores.set_index('index')
# create index-name mapping
index_name_mapping = convenience_stores['name'].to_dict()

# change index values with names and take away duplicates
df['indices'] = df['indices'].apply(lambda lst: record(set(map(index_name_mapping.get, set(lst)))))
# Append again to unique df
convenience_stores['neighbours'] = df['indices']

# Determine when a retailer has no neighbours
convenience_stores['neighbours'] = [lst if lst else ['no-neighbours'] for lst in convenience_stores['neighbours']]

# Distinctive retailer names
unique_elements = set([item for sublist in convenience_stores['neighbours'] for merchandise in sublist])
# Depend every shops frequency within the set of neighbours per location
counts = [dict(Counter(row)) for row in convenience_stores['neighbours']]

# Create a brand new dataframe with the counts
output_df = pd.DataFrame(counts).fillna(0)[sorted(unique_elements)]

If we wish to enhance the accuracy of our work, we might change the haversine distance measure for one thing extra correct (i.e., strolling occasions calculated utilizing networkx), however we’ll preserve issues easy.

This can give us a DataFrame the place every row corresponds to a location, and a binary rely of which comfort retailer chains are close by:

Pattern DataFrame of comfort retailer nearest neighbours for every location. Picture by creator.

We now have a dataset able to carry out affiliation rule mining. Utilizing the mlxtend library we will derive affiliation guidelines utilizing the Apriori algorithm. There’s a minimal help of 5%, in order that we will look at solely the principles associated to frequent occurrences in our dataset (i.e., co-located comfort retailer chains). We use the metric ‘carry’ when deriving guidelines; carry is the ratio of the proportion of areas that comprise each the antecedent and consequent relative to the anticipated help beneath the idea of independence.

from mlxtend.frequent_patterns import association_rules, apriori

# Calculate apriori
frequent_set = apriori(output_df, min_support = 0.05, use_colnames = True)
# Create guidelines
guidelines = association_rules(frequent_set, metric = 'carry')
# Kind guidelines by the help worth
guidelines.sort_values(['support'], ascending=False)

This provides us the next outcomes desk:

Affiliation guidelines for comfort retailer information. Picture by creator.

We’ll now interpret these affiliation guidelines to make some excessive degree takeaway learnings. To interpret this desk its greatest to learn extra about Affiliation Guidelines, utilizing these hyperlinks:

Okay, again to the desk.

Help is telling us how typically totally different comfort retailer chains are literally discovered collectively. Due to this fact we will say that 7-Eleven and FamilyMart are discovered collectively in ~31% of the info. A carry over 1 signifies that the presence of the antecedent will increase the probability of the ensuing, suggesting that the areas of the 2 chains are partially dependent. Then again, the affiliation between 7-Eleven and Lawson reveals the next carry however with a decrease confidence.

Each day Yamazaki has a low help close to our cutoff and reveals a weak relationship with the situation of FamilyMart, given by a carry barely above 1.

Different guidelines are referring to mixtures of comfort shops. For instance when a 7-Eleven and FamilyMart are already co-located, there’s a excessive carry worth of 1.42 that means a powerful affiliation with Lawson.

If we had simply stopped at discovering the closest neighbours for every retailer location, we’d not have been capable of decide something in regards to the relationships between these shops.

An instance of why geospatial affiliation guidelines will be insightful for companies is in figuring out new retailer areas. If a comfort retailer chain is opening a brand new location, affiliation guidelines may help to determine which shops are more likely to co-occur.

The worth on this turns into clear when tailoring advertising and marketing campaigns and pricing methods, because it supplies quantitative relationships about which shops are more likely to compete. Since we all know that FamilyMart and 7-Eleven typically co-occur, which we exhibit with affiliation guidelines, it will make sense for each of those chains to pay extra consideration to how their merchandise compete relative to different chains akin to Lawson and Each day Yamazaki.

On this article we’ve created geospatial affiliation guidelines for comfort retailer chains in a Tokyo neighbourhood. This was achieved utilizing information extraction from OpenStreetMap, discovering nearest neighbour comfort retailer chains, visualising information on maps, and creating affiliation guidelines utilizing an Apriori algorithm.

Thanks for studying!

Leave a Reply

Your email address will not be published. Required fields are marked *