6  Data Visualization

6.1 Handling Spatial Data with GeoPandas and gmplot

The following section was written by Thomas Schittina, a senior majoring in statistics and minoring in mathematics at the University of Connecticut.

This section focuses on how to manipulate and visualize spatial data in Python, with a particular focus on the packages GeoPandas and gmplot. We’ll start with GeoPandas and do the following:

  • Cover the core concepts and functionalities
  • Walkthrough an example using NYC shape data

For gmplot we will:

  • Talk about why you’ll need a Google Maps API key
  • See some of the different plotting functionalities
  • Walkthrough an example using NYC shape data

6.1.1 GeoPandas

6.1.1.1 Introducing GeoPandas

Founded in 2013, GeoPandas is an open-source extension of Pandas that adds support for geospatial data. GeoPandas is built around the GeoSeries and GeoDataFrame objects. Both are subclasses of the corresponding Pandas objects, so they should feel familiar to those who have used Pandas before.

6.1.1.2 A Remark about Shapely

The package Shapely is a core dependency of GeoPandas that handles geometric operations. Each geometry (point, polygon, etc.) stored in a GeoDataFrame is a Shapely object, and GeoPandas internally calls Shapely methods to perform spatial analysis. You won’t often need to interact directly with Shapely when using GeoPandas. Still, you may want to familiarize yourself with its basic concepts.

Shapely Documentation can be found here.

6.1.1.3 GeoSeries and GeoDataFrame

GeoSeries:

  • Similar to Series, but should exclusively contain geometries
  • GeoSeries.crs stores the Coordinate Reference System information

GeoDataFrame:

  • May consist of both Series and GeoSeries
  • May contain several GeoSeries, but only one active geometry column
    • Geometric operations will only apply to the active column
    • Accessed and manipulated with GeoDataFrame.geometry
  • Otherwise similar to a normal DataFrame

6.1.2 Example with NYC MODZCTA Shapefile

Given a file containing geospatial data, geopandas.read_file() will detect the filetype and create a GeoDataFrame.

import geopandas as gpd
import os

# get .shp from MODZCTA_Shapefile folder
shapefile_path = None
for file in os.listdir('MODZCTA_Shapefile'):
    if file.endswith(".shp"):
        shapefile_path = os.path.join('MODZCTA_Shapefile', file)
        break  # Use the first .shp file found

# read in data
gdf = gpd.read_file(shapefile_path)

gdf.drop(columns=['label', 'zcta'], inplace=True)

gdf.head()
modzcta pop_est geometry
0 10001 23072.0 POLYGON ((-73.98774 40.74407, -73.98819 40.743...
1 10002 74993.0 POLYGON ((-73.9975 40.71407, -73.99709 40.7146...
2 10003 54682.0 POLYGON ((-73.98864 40.72293, -73.98876 40.722...
3 10026 39363.0 MULTIPOLYGON (((-73.96201 40.80551, -73.96007 ...
4 10004 3028.0 MULTIPOLYGON (((-74.00827 40.70772, -74.00937 ...

It’s very important to know which CRS your geospatial data is in. Operations involving distance or area require a projected CRS (using feet, meters, etc.). If a geographic CRS is used (degrees), the calculations will likely be wrong.

print(gdf.crs)

# convert to projected CRS
gdf = gdf.to_crs(epsg=3857)

print(gdf.crs)
EPSG:4326
EPSG:3857

Originally, the geometries were in EPSG 4326, which is measured by latitude and longitude. In order to work with the shape data, the CRS was converted to EPSG 3857, which uses meters.

Now we can start working with the spatial data. First, let’s compute the area of each zip code and store it as a new column.

# create column of areas
gdf['area'] = gdf.area
gdf.head(3)
modzcta pop_est geometry area
0 10001 23072.0 POLYGON ((-8236278.03 4974664.364, -8236327.85... 2.987592e+06
1 10002 74993.0 POLYGON ((-8237364.444 4970258.308, -8237318.6... 3.974361e+06
2 10003 54682.0 POLYGON ((-8236377.258 4971559.548, -8236390.9... 2.611531e+06

Our active geometry column is the shape data for each zip code, so gdf.area() only acts on that column and ignores the others.

Let’s also find the boundary of each zip code, as well as its geographic center.

# create columns for boundary and centorid info
gdf['boundary'] = gdf.boundary
gdf['centroid'] = gdf.centroid

gdf[['modzcta', 'boundary', 'centroid']].head(3)
modzcta boundary centroid
0 10001 LINESTRING (-8236278.03 4974664.364, -8236327.... POINT (-8237323.727 4975637.524)
1 10002 LINESTRING (-8237364.444 4970258.308, -8237318... POINT (-8236103.249 4970509.323)
2 10003 LINESTRING (-8236377.258 4971559.548, -8236390... POINT (-8236435.551 4972866.281)

Suppose we want to find the distance between two centroids. The current active geometry column is the shape data. Run gdf.geometry = gdf['centroid'] to switch the active geometry.

# switch active geometry to centroid info
gdf.geometry = gdf['centroid']

Then we can calculate the distance between the first two centroids with distance().

# find distance between first two centroids
gdf.geometry[0].distance(gdf.geometry[1])
5271.432980923517

6.1.2.1 Plotting with GeoPandas

GeoPandas also includes some basic plotting functionality. Similar to Pandas, plot() will generate visuals using matplotlib.

# plot NYC zip codes with color mapping by area
gdf.geometry = gdf['geometry'] # must switch active geometry back first
gdf.plot('area', legend=True)

Interactive maps can also be generated using explore, but you will need to install optional dependencies. An alternative approach is the package gmplot, which we’ll discuss next. First though, here is a list of common GeoPandas methods we’ve not yet covered.

  • to_file(): save GeoDataFrame to a geospatial file (.shp, .GEOjson, etc.)
  • length(): calculate the length of a geometry, useful for linestrings
  • instersects(): check if one geometry intersects with another
  • contains(): check if one geometry contains another
  • buffer(): create a buffer of specified size around a geometry
  • equals(): check if the CRS of two objects is the same
  • is_valid(): check for invalid geometries

6.1.3 gmplot

6.1.3.1 Google Maps API

An API key is not necessary to create visuals with gmplot, but it is highly recommended. Without a key, any generated output will be dimmed and have a watermark.

Example with no API key

The process to create an API key is very simple. Go here and click on Get Started. It requires some credit card information, but you start on a free trial with $300 of credit. You will not be charged unless you select activate full account.

There are some configuration options you can set for your key. Google has many different APIs, but gmplot only requires the Maps Javascript API.

6.1.3.2 Creating Plots with gmplot

gmplot is designed to mimic matplotlib, so the syntax should feel similar. The class GoogleMapPlotter provides the core functionality of the package.

import gmplot

apikey = open('gmapKey.txt').read().strip() # read in API key

# plot map centered at NYC with zoom = 11
gmap = gmplot.GoogleMapPlotter(40.5665, -74.1697, 11, apikey=apikey)

Note: To render the classnotes on your computer, you will need to create the text file gmapKey.txt and store your Google Maps API key there.

The arguments include:

  • The latitude and longitude of NYC
  • The level of zoom
  • API key (even if it’s not used directly)
  • more optional arguments for further customization

6.1.4 Making Maps with NYC Zip Code Data

Let’s display the largest zip code by area in NYC.

gdf = gdf.to_crs(epsg=4326) # convert CRS to plot by latitude and longitude
largest_zip = gdf['geometry'][gdf['area'].idxmax()] # returns Shapely POLYGON

coords = list(largest_zip.exterior.coords) # unpack boundary coordinates
lats = [lat for lon, lat in coords]
lons = [lon for lon, lat in coords]

# plot shape of zip code
gmap.polygon(lats, lons, face_color='green', edge_color='blue', edge_width=3)

# gmap.draw('largest_zip.html')

After creating the plot, gmap.draw('filename') saves it as an HTML file in the current working directory, unless another location is specified. In the classnotes, all outputs will be shown as a PNG image.

Largest NYC Zip Code by area

Let’s also plot the centriod of this zip code, and include a link to gmplot’s documentation (in the classnotes this link won’t work because the PNG is used).

gdf.geometry = gdf['centroid'] # now working with new geometry column
gdf = gdf.to_crs(epsg=4326) # convert CRS to plot by latitude and longitude

centroid = gdf['centroid'][gdf['area'].idxmax()] # returns Shapely POINT

# plot the point with info window
gmap.marker(centroid.y, centroid.x, title='Center of Zip Code',
            info_window="<a href='https://github.com/gmplot/gmplot/wiki'>gmplot docs</a>")

# plot the polygon
gmap.polygon(lats, lons, face_color='green', edge_color='blue', edge_width=3)

# gmap.draw('zip_w_marker.html')

Here’s the output:

Center of largest NYC Zip Code

6.1.4.1 Other Features of gmplot

  • directions(): draw directions from one point to another
  • scatter(): plot a collection of points
  • heatmap(): plot a heatmap
  • enable_marker_dropping(): click on map to create/remove markers
  • from_geocode(): use name of location instead of coordinates
  • see docs for more

You can also change the map type when you create an instance of GoogleMapPlotter.

# create hybrid type map
gmap = gmplot.GoogleMapPlotter(40.776676, -73.971321, 11.5, apikey=apikey,
                               map_type='hybrid')

# gmap.draw('nyc_hybrid.html')

Hybrid map of NYC

6.1.5 Summary

Geopandas is a powerful tool for handling spatial data and operations. It builds on regular Pandas by introducing two new data structures, the GeoSeries and GeoDataFrame. Under the hood, Shapely handles geometric operations.

The package gmplot is a simple yet dynamic tool that overlays spatial data onto interactive Google maps. It does so through the class GoogleMapPlotter, which offers an alternative to Geopandas’ built in graphing methods for simple plots.

6.2 Google Maps visualizations with Folium

This section was created by Vlad Lagutin. I am a sophomore majoring in Statistical Data Science at the University of Connecticut.

Here I introduce one more library for geospatial visualizations, in addition to GeoPandas and gmplot libraries described in the previous section.

6.2.1 Folium and its features

  • Folium is a Python library used to create interactive maps

  • It is built on top of Leaflet.js, an open-source JavaScript library for interactive maps

  • Manipulate your data in Python, visualize it in a Leaflet map with Folium

  • Easily compatible with Pandas and Geopandas in Python

  • Supports interactive features such as popups, zoom and tooltips

  • Able to export maps to HTML

6.2.2 Initializing Maps and Tile Layers

This is how simple map is created. It is often useful to provide arguments like location and zoom_start for convenience:

location - location where map is initialized

zoom_start - specifies starting zoom

import folium

m = folium.Map(location=[38.8, -106.54], 
               zoom_start=4)
m
Make this Notebook Trusted to load map: File -> Trust Notebook

We can add various Tile Layers to modify how our base map looks like: built-in folium styles, as well as many other tiles provided by Leaflet can be found here.

m = folium.Map(location=[50, -100], zoom_start=4)

# not built-in layer; add the link here
folium.TileLayer('https://{s}.tile.opentopomap.org/{z}/{x}/{y}.png', 
                 name='OpenTopoMap', attr='OpenTopoMap').add_to(m)

# built-in layers
folium.TileLayer('CartoDB Positron', name='Positron', 
attr='cartodb positron').add_to(m)

folium.TileLayer('CartoDB Voyager', name='Voyager', 
attr='Voyager').add_to(m)

# to be able to use them, add Layer Control
folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

After adding LayerControl, we can switch these layers using control in the top right corner of the produced map.

6.2.2.1 Geojson files

With Geojson files, we can visualize the borders of counties or states inside of them. These GeoJson files can be found online.

m = folium.Map(location=[50, -100], zoom_start=4)

folium.GeoJson('data/us_states.json', name="USA").add_to(m)

folium.GeoJson('data/canada_provinces.json', name="Canada").add_to(m)

folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.2.2 Styling

We can style these geojson objects. Useful parameters:

  • color - color of line stroke

  • weight - line stroke width

  • opacity - opacity of line strokes

  • fillcolor - filling color of regions

  • fillOpacity - opacity of regions

# initialize styling dictionary
style = {'color': 'black', 'weight': 1,
                'fillColor': 'purple'}  


m = folium.Map(location=[50, -100], zoom_start=4)

# pass styling dictinary to a special argument "style_function"
folium.GeoJson('data/us_states.json', name="USA",
               style_function= lambda x: style).add_to(m)

folium.GeoJson('data/canada_provinces.json', name="Canada",
               style_function= lambda x: style).add_to(m)


folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.3 Markers

It is possible for user to label certain locations on the map with various types of markers. Folium provides several types of them.

6.2.3.1 Circle Markers

As one can understand from the title, these are just circles.

There are two types of circle markers:

folium.Circle - has radius in meters

folium.CircleMarker - has radius in pixels

m = folium.Map(location=[38.8974579,-77.0376094], 
               zoom_start=13.7)

# Radius in meters
folium.Circle(location=[38.89766472658641, -77.03654034831065],
radius=100).add_to(m)


# Circle marker has radius in pixels
folium.CircleMarker(location=[38.88946075081255, -77.03528690318743],
radius=50).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

As you can see, the marker around the Washington monument increases while zooming out, and the marker around White House remains the same.

6.2.3.2 Styling for Circles

We can style circles as well, here are some important parameters:

  • stroke - set to True to enable line stroke, default is True

  • weight - line stroke width in pixels, default is 5

  • color - line stroke color

  • opacity - line stroke opacity

  • fill - set to True to enable filling with color, default is False

  • fill_color - fill Color

  • fill_opacity - ranges between 0 to 1. 0 means transparent, 1 means opaque

Moreover, we can also add

  • tooltip - a label that appears when you put your cursor over an element

  • popup - a box with info that appears when you click on element

m = folium.Map(location=[38.8974579,-77.0376094], 
               zoom_start=13.7)



# Radius in meters
folium.Circle(radius=100, location=[38.89766472658641, -77.03654034831065],
              color='black', 
              fill=True,
              fill_opacity=0.7,
              tooltip="White House",
              # can also just write string popup; use html
              popup=folium.Popup("""<h2>The White House</h2><br/>  
              <img src="https://cdn.britannica.com/43/93843-050-A1F1B668/White-House-Washington-DC.jpg" 
                                 alt="Trulli" style="max-width:100%;max-height:100%">""", max_width=500)
              ).add_to(m)





# Circle marker has radius in pixels
folium.CircleMarker(radius=50, location=[38.88946075081255, -77.03528690318743],
              color='purple', 
              fill=True,
              tooltip="Washington monument",
              popup=folium.Popup("""<h2>The Washington monument</h2><br/>  
              <img src="https://www.trolleytours.com/wp-content/uploads/2016/06/washington-monument.jpg" 
                                 alt="Trulli" style="max-width:100%;max-height:100%">""", max_width=500)
                                ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.3.3 Markers

In addition to circles, we can add just Markers:

m = folium.Map(location=[39.8584824090568, -99.63735509074904],
               zoom_start= 4)


folium.Marker(location=[43.88284841471961, -85.43121849839345]
              ).add_to(m)

folium.Marker(location=[42.97269745752499, -98.88739407603738]
              ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.3.4 Styling for Markers

Here we can use icon parameter to change the icon of a marker.

Icon names for glyphicons by bootstrapcan be found here

Icon names by fontawesome can be found here, need to add prefix='fa'

m = folium.Map(location=[39.8584824090568, -99.63735509074904],
               zoom_start= 4)


folium.Marker(
    location=[43.88284841471961, -85.43121849839345],
    tooltip='See location',
    popup='*location*',
    icon=folium.Icon(icon='glyphicon-volume-off', color='red')
).add_to(m)


folium.Marker(
    location=[42.97269745752499, -98.88739407603738],
    tooltip="See location",
    popup="*location*",
    icon=folium.Icon(icon='fa-cube', prefix='fa', color='green')
).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.4 Grouping

We can create groups of Markers, choosing whether we want to show them or not

m = folium.Map(location=[39.8584824090568, -99.63735509074904],
               zoom_start= 4)


# adding group 1
group_1 = folium.FeatureGroup("first group").add_to(m)

folium.Marker(location=(37.17403654771468, -96.90854476924225), 
              icon=folium.Icon("red")
              ).add_to(group_1)


folium.Marker(location=[43.88284841471961, -85.43121849839345]
              ).add_to(m)



# adding group 2
group_2 = folium.FeatureGroup("second group").add_to(m)


folium.Marker(location=(42.53679960949629, -110.16683522968691), 
              icon=folium.Icon("green")
              ).add_to(group_2)

folium.Marker(location=[42.97269745752499, -98.88739407603738]
              ).add_to(m)



folium.LayerControl().add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

Using Layer Control on the top right, we can turn on and off these groups of Markers.
However, two blue markers were not added to any of the groups but to the map directly, so we cannot hide them.

6.2.5 Drawing different shapes on a map

We can draw different shapes like rectangles, lines, and polygons. Styling works the same as it does for circles

6.2.5.1 Rectangle

For a rectangle, we just need two diagonal points.
We can draw it, for example, around a Wyoming state, since it has a rectangular form:

m = folium.Map(location=[39.8584824090568, -99.63735509074904],
               zoom_start=4)


# for rectangle, we need only 2 diagonal points
folium.Rectangle([(45.0378, -111.0328), 
                  (41.0734, -104.0689)],
                  color='purple',
                  fill=True,
                  tooltip='see the name',
                  popup="Wyoming state",
                  fill_color='blue').add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.5.2 Polygon

However, for states like Nevada, which are not of a rectangular form, we can use Polygon:

polygon_coords = [(42.0247, -120.0016),
                (42.0106, -114.0776),
                (36.1581, -114.0157),
                (36.1220, -114.6994),
                (35.0721, -114.7066),
                (39.0379, -120.0695)
]


folium.Polygon(polygon_coords,
                color='purple',
                fill=True,
                tooltip='see the name',
                popup="Nevada state",
                fill_color='blue').add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.5.3 PolyLine

It is also possible to just create lines;
The only diffence between Polygon and PolyLine is that Polygon connects the first point to the last and PolyLine does not

polyline_coords = [(34.9614, -108.2743),
                 (38.5229, -112.7751),
                 (42.9696, -112.9947),
                 (45.9843, -118.5384)
]



folium.PolyLine(polyline_coords,
                 color='red',
                 ).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.5.4 Draw it yourself

Is it possible to draw these shapes ourselves, we just need to import Draw plugin:

from folium.plugins import Draw

# add export button, allowing to save as geojson file
Draw(export=True).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

To draw, use tools on a panel on the left.

6.2.6 Heatmap

We can also create simple HeatMaps:

Arguments:

  • data (list of points of the form [lat, lng] or [lat, lng, weight]) – The points you want to plot. You can also provide a numpy.array of shape (n,2) or (n,3). Ideally, the weight should be between 0 and 1.

  • name (default None) – The name of the Layer, as it will appear in LayerControls

  • min_opacity (default 1.) – The minimum opacity the heat will start at

  • radius (default 25) – Radius of each “point” of the heatmap

  • blur (default 15) – Amount of blur

import numpy as np
from folium.plugins import HeatMap


m = folium.Map(location=[40.71, -74],
               zoom_start=10)


data = (
    np.random.normal(size=(100, 2)) * np.array([[0.1, 0.1]]) + 
    np.array([[40.7128, -74.0060]])
    ).tolist()

HeatMap(data).add_to(m)

m
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.7 Demonstration

Here is the demonstration with flood data from Midterm project:

import pandas as pd

flood_df = pd.read_csv("data/nycflood2024.csv", 
                       parse_dates=["Created Date", "Closed Date"])

flood_df.columns = flood_df.columns.str.replace(" ", "_").str.lower()


# some manipulation/cleaning
flood_df = flood_df[~flood_df['location'].isna()]

flood_df["response_time"] = (flood_df["closed_date"] - 
        flood_df["created_date"]).dt.total_seconds() / 3600


flood_df = flood_df[~flood_df['response_time'].isna()]


mean = flood_df['response_time'].mean()
std = flood_df['response_time'].std()

# standardize response time
flood_df['z_score'] = (flood_df['response_time'] - mean) / std




m = folium.Map(location=[40.7128, -74], zoom_start=12,
               tiles='cartodb dark_matter',
               world_copy_jump=True)


# styling dictionary
style = {'color': 'orange', 'weight': 1,
                'fillColor': 'white'}  

folium.GeoJson("data/new-york-city-boroughs.json", name='Borough borders',
style_function=lambda x: style).add_to(m)



# create 4 groups: 2 for usual points, 2 for outliers
outlier_flood = folium.FeatureGroup('Flooding outliers').add_to(m)
normal_flood = folium.FeatureGroup('Flooding complaints').add_to(m)
outlier_CB = folium.FeatureGroup('Catch Basin outliers').add_to(m)
normal_CB = folium.FeatureGroup('Catch Basin complaints').add_to(m)


# reducing number of points, so the code cell can run
flood_df = flood_df[:3500]


for _, row in flood_df.iterrows():
    
    if row["descriptor"] == 'Street Flooding (SJ)':
        if row['z_score'] > 3 or row['z_score'] < -3:
            # visualize SF outliers
            folium.Marker(location=[row.latitude, row.longitude],
                          popup="Unsusually large response time", 
                          tooltip='SF outlier',
                          icon=folium.Icon(icon='glyphicon-minus-sign',
                                           color='red')).add_to(outlier_flood)
        else:
            # ordinary SF locations
            folium.Circle(radius=1, 
                location=[row.latitude, row.longitude],
                color="red",
                tooltip='SF complaint',
                popup="Normal SF complaint",
                ).add_to(normal_flood)
            
    else:
        # visualize CB outliers
        if row['z_score'] > 3 or row['z_score'] < -3:
            folium.Marker(location=[row.latitude, row.longitude],
                          tooltip='CB outlier',
                          popup="Unsusually large response time", 
                          icon=folium.Icon(icon='fa-exclamation-triangle',
                                           prefix='fa', 
                                           color='orange')).add_to(outlier_CB)
        else:
            # ordinary CB locations
            folium.Circle(radius=1,
                location=[row.latitude, row.longitude],
                color="blue",
                tooltip='CB complaint',
                popup="Normal CB complaint", 
                ).add_to(normal_CB)
        



folium.LayerControl().add_to(m)

m
/var/folders/cq/5ysgnwfn7c3g0h46xyzvpj800000gn/T/ipykernel_90339/2407777587.py:3: UserWarning:

Could not infer format, so each element will be parsed individually, falling back to `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
Make this Notebook Trusted to load map: File -> Trust Notebook

6.2.8 Further Reading