Effective way to map 15k cities in Python

python geo map visualization
choropleth map python
map data visualization using python
us map data visualization python
flow map python
python plot route on map
mapping with python
python county map

I have a data set of around 15k observations. This observations are city names from all over the world. This Data set has been populated by people from many different countries which means that i have several duplicates of the same city in different languages. see below DF extract:

city_name bruselas brussel brussels brussels brussels auderghem bruxelles bruxelles belgium munchen munchenstein munchwilen munderkingen mundolsheim mungia munguia munich munich munich munich germany munich munchen munich rupert mayer strasse

The task is to map all cities in the DF to its english name but, becaue the cities are in different format and in different languages i am finding it very difficult to come up with a solution other than perform this task manually which is not productive as we have 15,000+ observations to go through. The final data set should look something like this(using a few of the observations above only):

city_name mapped_city brussels auderghem Brusels bruxelles Brusels bruxelles belgium Brusels munchen Munich munich germany Munich

Any help would be greatly appreciated


You could just use Google Maps or OpenStreetMap to search for those places and see what they return. Both seem to be capable of handling queries in different languages (e.g. München/Munich, Beijing/Peking), with or without the country, and some misspellings (e.g."munchen" without the "ü").

AFAIK, the Google Maps API is not free-for-use, but the OSM API should be, and in any case, you can just issue a GET request to both and parse the result. For example, for OpenStreetMap:

import requests, re, json

lst = {'bruxelles', 'munguia', 'munich rupert mayer strasse', 'munchen', 
       'mundolsheim', 'munchenstein', 'munich', 'brussels', 'munich  germany', 
       'bruselas', 'brussels  auderghem ', 'munderkingen', 'mungia', 
       'munchwilen', 'bruxelles belgium', 'munich  munchen ', 'brussel'}

query = "https://nominatim.openstreetmap.org/search.php?q=%s"
for x in lst:
    response = requests.get(query % x)
    matches = re.findall(r'"placename": (".*?"),', response.text)
    print(x, "-->", json.loads(matches[0]))

The result is not perfect, e.g. some results are a bit too specific, but there are other attributes you could use, e.g. the "type" (which should probably be "city"). With some cleanup and some more tinkering this should get you started.

munich --> München
munderkingen --> Munderkingen
munich  munchen  --> Johanniter-Unfall-Hilfe e.V., Regionalgeschäftsstelle
mungia --> Mungia
brussels  auderghem  --> Auderghem - Oudergem
munchwilen --> Münchwilen
bruselas --> Bruxelles / Brussel
bruxelles --> Bruxelles / Brussel
munguia --> Mungia
munich rupert mayer strasse --> Rupert-Mayer-Straße
mundolsheim --> Mundolsheim
bruxelles belgium --> Bruxelles / Brussel
munchen --> München
brussels --> Bruxelles / Brussel
munchenstein --> Münchenstein
munich  germany --> München
brussel --> Bruxelles / Brussel

The same should work for Google Maps, too, with a similar request, but the results seem not to be as easy to parse as with OSM.

(Disclaimer: Not sure if they are too thrilled if you spam them with 15k such requests, you might want to spread those out a bit, or use a more official API than just HTTP requests. You definitely should cache the results of both, complete search queries (to tweak which attributes to use without querying again) and the mapped cities in case of duplicate user-specified cities in order to minimize the number of requests, and thus both their server load and your running time.)

How to Visualize Data on top of a Map in Python using the Geoviews , How to create an interactive map plot using python and geoviews. place to place (e.g. different cities) and you want to make a plot to visualize� Parameter Description; function: Required. The function to execute for each item: iterable: Required. A sequence, collection or an iterator object. You can send as many iterables as you like, just make sure the function has one parameter for each iterable.


Using Fuzzy Wuzzy which uses the Levenshtein distance algorithm

import pandas as pd
from fuzzywuzzy import process, fuzz

df = pd.read_clipboard(sep='\t')
print(df.head(5))
0     brussels
1    auderghem
2    bruxelles
3    bruxelles
4      belgium

we need a master list of cities to use as a lookup I assume you know what the cities are, i'll use this one from github.

cities = pd.read_csv('https://datahub.io/core/world-cities/r/0.csv')

choices = df['City Names'].to_dict()
lookups = cities['name'].tolist()


res = [(lookup,) + item for lookup in lookups for item in process.extract(lookup, choices,limit=2)]
df = pd.DataFrame(res, columns=["lookup", "matched", "score", "idx"])

print(df)
  lookup    matched  score  idx
9401     Munich     munich    100   13
12612    Mungia     mungia    100   10
9400     Munich     munich    100   12
1820   Brussels   brussels    100    0
12613    Mungia    munguia     92   11
...         ...        ...    ...  ...
27205    Желино  auderghem      0    1
27204    Желино   brussels      0    0
27487   Зуунмод  auderghem      0    1
27486   Зуунмод   brussels      0    0
27212    Теарце   brussels      0    0

Naturally, if you edit the lookup data frame before hand to keep only the cities you know are in your list then that will make the lookup run faster and return results that you need.

for e,g

lookups = ['brussels','munich']

print(df.sort_values('score',ascending=False))
     lookup    matched  score  idx
0  brussels   brussels    100    0
2    munich     munich    100   12
3    munich     munich    100   13
1  brussels  bruxelles     71    2

you can then take the lookup with the highest score.

hopefully this points you in the right direction. I'm no expert with this library so it would be best to peruse the documentation for your use case to optimize your code. Best of luck.

How to find coordinates for a lot of locations based on the town or , I decided to develop a python script using the Nominatim tool by Open Streetmap to Nevertheless, if there is a question, I will do my best and also ask him. the google maps API (depends on your legal needs - read the small print to know how Again, for 15k locations, please consider the terms of use for public servers. Python Maps also called ChainMap is a type of data structure to manage multiple dictionaries together as one unit. The combined dictionary contains the key and value pairs in a specific sequence eliminating any duplicate keys.


Try mapping by first letters of a city that will reduce your work load

Mapping my cross-country road trip with Python, Mapping my cross-country road trip with Python make a map of all the places I went, but I didn't actually know how to work with I've been having fun with it in my job, and I realized taht it also means I get to finally make my map! Honestly it was one of the best free campgrounds I found – I stayed there� Lets get started with google maps in python! We are going to cover making a basic map, adding different layers to the maps, and then creating driving directions! Before this article, I did a quick…


Course, Please note that although we'll try our best to avoid it, the race location, course and start times are subject to change. 2020 Course Map coming soon! Course is � Returns : Returns a list of the results after applying the given function to each item of a given iterable (list, tuple etc.) NOTE : The returned value from map() (map object) then can be passed to functions like list() (to create a list), set() (to create a set) .


Compare City Grids With This Street Network Tool – Next City, The tool allows anyone (who knows Python coding) to plug in a city name, and latitude and longitude coordinates or a street address and get a square mile map. Signing up for our newsletter is the best way to stay informed on the issues and Housing Shortage, Can LA Find Homes for 15,000 People? More modern solutions such as leaflet or the Google Maps API may be a better choice for more intensive map visualizations. Still, Basemap is a useful tool for Python users to have in their virtual toolbelts. In this section, we'll show several examples of the type of map visualization that is possible with this toolkit.


Shapefile Library, For these reasons the Python Shapefile Library does not currently handle prj files . Combining AutoCAD data with GIS data, such as shapefiles, to use on maps is City of Atlanta GIS Note: The GSU Library has access to many City of Atlanta about how GIS technology can help you work and research more effectively. Hey ninjas, in this Python 3 tutorial I'll introduce you to the map function. A map function can take in a collection as a parameter, perform a computation on each item within the collection, and