8.3. Cartopy

Cartopy is a python library that can be used in combination with matplotlib to create geographical maps.

Starting in 2016, The Cartopy Project began development as a replacement for Basemap.

Cartopy has 28 people listed as contributors.

If you use conda, cartopy can be installed with conda install -c conda-forge cartopy, which takes care of the dependencies such as proj, geos, pyshp, shapely, etc.

If you use pip, the system dependencies need to be installed first before running pip install cartopy. See https://scitools.org.uk/cartopy/docs/latest/installing.html for detailed instructions.

Pros of Cartopy:

  • Relatively easy to learn

  • Many different types of mapping projections

  • Great options to color in the map

Cons of Cartopy:

  • Slow loading times for complex maps

  • Alternate files often required for high precision mapping

Sources:

  • https://matplotlib.org/basemap/

  • https://scitools.org.uk/cartopy/docs/latest/

  • https://www.youtube.com/watch?v=4M2aiHvhr5Y

import cartopy
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shpreader
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import seaborn as sns

8.3.1. Basic Plot

To create a cartopy plot, simply set a variable (m for map) to plt.axes() with a projection argument. Then add coastlines so the graph has a visible element.

m = plt.axes(projection=ccrs.PlateCarree())
m.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f6f624c7d00>
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
../_images/cartopy_3_2.png

8.3.2. Advanced Graph

By using add_feature, we can add more geographic information to our graph. gridlines will add longitude and latitude lines to the graph.

plt.figure(figsize=(16,12))
m = plt.axes(projection=ccrs.PlateCarree())

m.add_feature(cfeature.LAND)
m.add_feature(cfeature.OCEAN)
m.add_feature(cfeature.COASTLINE)
m.add_feature(cfeature.BORDERS, color = "red", alpha = 0.5)
m.add_feature(cfeature.LAKES)
m.add_feature(cfeature.RIVERS, linestyle = ':')

m.gridlines(draw_labels=True, color = "gray")
<cartopy.mpl.gridliner.Gridliner at 0x7f6f1e841eb0>
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/mpl/gridliner.py:549: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  elif len(intersection) > 4:
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/mpl/gridliner.py:556: ShapelyDeprecationWarning: __getitem__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  xy = np.append(intersection[0], intersection[-1],
<__array_function__ internals>:5: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/numpy/lib/function_base.py:4817: ShapelyDeprecationWarning: The array interface is deprecated and will no longer work in Shapely 2.0. Convert the '.coords' to a numpy array instead.
  return concatenate((arr, values), axis=axis)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_ocean.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/cultural/ne_110m_admin_0_boundary_lines_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/mpl/style.py:76: UserWarning: facecolor will have no effect as it has been defined as "never".
  warnings.warn('facecolor will have no effect as it has been '
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_lakes.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/110m/physical/ne_110m_rivers_lake_centerlines.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
../_images/cartopy_5_6.png

8.3.3. Projection Comparison

Here are some examples of different projections.

fig = plt.figure(figsize=(20,20))
fig.suptitle('Projections', fontsize = 15, y = 0.93)

projections = {'PlateCarree': ccrs.PlateCarree(),
               'EquidistantConic': ccrs.EquidistantConic(),
               'Miller': ccrs.Miller(), 
               'Mollweide': ccrs.Mollweide(), 
               'Orthographic': ccrs.Orthographic()}

for index, projection in enumerate(projections.items()):
    ax = fig.add_subplot(7, 5, index+1, projection=projection[1])
    ax.coastlines()
    ax.set_title(projection[0])
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/crs.py:228: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  if len(multi_line_string) > 1:
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/crs.py:280: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
  for line in multi_line_string:
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/crs.py:347: ShapelyDeprecationWarning: __len__ for multi-part geometries is deprecated and will be removed in Shapely 2.0. Check the length of the `geoms` property instead to get the  number of parts of a multi-part geometry.
  if len(p_mline) > 0:
../_images/cartopy_7_1.png

8.3.4. Plotting Specific Locations

If you only want to plot a certain area of the map, you can specify set_extent. To do this, specify a list of points [lowest longitude, highest longitude, lowest latidude, highest latitude] as well as the projection (optionally).

fig = plt.figure(figsize=(14, 14))
m = plt.axes(projection=ccrs.PlateCarree())
# (x0, x1, y0, y1)
m.set_extent([-130, -60, 20, 55], ccrs.PlateCarree())         
m.add_feature(cfeature.STATES)
m.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f6f1e698ac0>
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/50m/cultural/ne_50m_admin_1_states_provinces_lakes.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/50m/physical/ne_50m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
../_images/cartopy_9_3.png

8.3.5. Reading Shape Files

If you would like to plot one area without the adjacent landscape, you can use a shape file via Reader from cartopy.io.shapereader.

# get shape file
reader = shpreader.Reader('../data/state/tl_2021_us_state.dbf')

# convert shape file to format that cartopy can work with
states = list(reader.geometries())
STATES = cfeature.ShapelyFeature(states, ccrs.PlateCarree())

# create plot
plt.figure(figsize=(10, 6))
m = plt.axes(projection=ccrs.PlateCarree())
plt.title("United States")
m.set_extent([-130, -60, 20, 50])
m.add_feature(STATES, facecolor='none', edgecolor='black')
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f6f1cf1ea00>
../_images/cartopy_11_1.png

Here is a plot of New York City that we will use later.

# get shape file
reader = shpreader.Reader('../data/nyc/geo_export_a1f96cd2-7ce6-44bb-9568-f3ee8bcba44a.dbf')

# convert shape file to format that cartopy can work with
geom = list(reader.geometries())
GEOM = cfeature.ShapelyFeature(geom, ccrs.PlateCarree())

# create plot
plt.figure(figsize=(14, 14))
m = plt.axes(projection=ccrs.PlateCarree())
plt.title("NYC")
m.set_extent([-74.3, -73.6, 40.4, 41])
m.add_feature(GEOM, facecolor='none', edgecolor='black')
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f6f1e5644f0>
../_images/cartopy_13_1.png

8.3.6. Data Cleaning

Before we can work with the NYC Collisions Dataset, we need to do some data cleaning.

df = pd.read_csv("../data/nyc_mv_collisions_202201.csv")
print(df.shape)
df.head()
(7659, 29)
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME ... CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
0 01/01/2022 7:05 NaN NaN NaN NaN NaN EAST 128 STREET 3 AVENUE BRIDGE NaN ... NaN NaN NaN NaN 4491172 Sedan NaN NaN NaN NaN
1 01/01/2022 14:43 NaN NaN 40.769993 -73.915825 (40.769993, -73.915825) GRAND CENTRAL PKWY NaN NaN ... NaN NaN NaN NaN 4491406 Sedan Sedan NaN NaN NaN
2 01/01/2022 21:20 QUEENS 11414.0 40.657230 -73.841380 (40.65723, -73.84138) 91 STREET 160 AVENUE NaN ... NaN NaN NaN NaN 4491466 Sedan NaN NaN NaN NaN
3 01/01/2022 4:30 NaN NaN NaN NaN NaN Southern parkway Jfk expressway NaN ... Unspecified NaN NaN NaN 4491626 Sedan Sedan NaN NaN NaN
4 01/01/2022 7:57 NaN NaN NaN NaN NaN WESTCHESTER AVENUE SHERIDAN EXPRESSWAY NaN ... NaN NaN NaN NaN 4491734 Sedan NaN NaN NaN NaN

5 rows × 29 columns

As we can see from the plot below, there are some issues with our dataset.

sns.scatterplot(x="LONGITUDE", y="LATITUDE", data=df, hue = "BOROUGH")
<AxesSubplot:xlabel='LONGITUDE', ylabel='LATITUDE'>
../_images/cartopy_17_1.png

Some points have the coordinates (0, 0) so we need to remove them from the dataset. We can also remove the rows in which the longitude/latitude data is missing.

# drop rows with longitude == 0
df[df.LONGITUDE == 0] = np.nan

# make sure bad latitude values are all removed now
print(df.LATITUDE.isna().sum())
print(df.loc[df.LATITUDE == 0].shape[0])

df.head()
582
0
CRASH DATE CRASH TIME BOROUGH ZIP CODE LATITUDE LONGITUDE LOCATION ON STREET NAME CROSS STREET NAME OFF STREET NAME ... CONTRIBUTING FACTOR VEHICLE 2 CONTRIBUTING FACTOR VEHICLE 3 CONTRIBUTING FACTOR VEHICLE 4 CONTRIBUTING FACTOR VEHICLE 5 COLLISION_ID VEHICLE TYPE CODE 1 VEHICLE TYPE CODE 2 VEHICLE TYPE CODE 3 VEHICLE TYPE CODE 4 VEHICLE TYPE CODE 5
0 01/01/2022 7:05 NaN NaN NaN NaN NaN EAST 128 STREET 3 AVENUE BRIDGE NaN ... NaN NaN NaN NaN 4491172.0 Sedan NaN NaN NaN NaN
1 01/01/2022 14:43 NaN NaN 40.769993 -73.915825 (40.769993, -73.915825) GRAND CENTRAL PKWY NaN NaN ... NaN NaN NaN NaN 4491406.0 Sedan Sedan NaN NaN NaN
2 01/01/2022 21:20 QUEENS 11414.0 40.657230 -73.841380 (40.65723, -73.84138) 91 STREET 160 AVENUE NaN ... NaN NaN NaN NaN 4491466.0 Sedan NaN NaN NaN NaN
3 01/01/2022 4:30 NaN NaN NaN NaN NaN Southern parkway Jfk expressway NaN ... Unspecified NaN NaN NaN 4491626.0 Sedan Sedan NaN NaN NaN
4 01/01/2022 7:57 NaN NaN NaN NaN NaN WESTCHESTER AVENUE SHERIDAN EXPRESSWAY NaN ... NaN NaN NaN NaN 4491734.0 Sedan NaN NaN NaN NaN

5 rows × 29 columns

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7659 entries, 0 to 7658
Data columns (total 29 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   CRASH DATE                     7639 non-null   object 
 1   CRASH TIME                     7639 non-null   object 
 2   BOROUGH                        5010 non-null   object 
 3   ZIP CODE                       5010 non-null   float64
 4   LATITUDE                       7077 non-null   float64
 5   LONGITUDE                      7077 non-null   float64
 6   LOCATION                       7077 non-null   object 
 7   ON STREET NAME                 5607 non-null   object 
 8   CROSS STREET NAME              3606 non-null   object 
 9   OFF STREET NAME                2032 non-null   object 
 10  NUMBER OF PERSONS INJURED      7639 non-null   float64
 11  NUMBER OF PERSONS KILLED       7639 non-null   float64
 12  NUMBER OF PEDESTRIANS INJURED  7639 non-null   float64
 13  NUMBER OF PEDESTRIANS KILLED   7639 non-null   float64
 14  NUMBER OF CYCLIST INJURED      7639 non-null   float64
 15  NUMBER OF CYCLIST KILLED       7639 non-null   float64
 16  NUMBER OF MOTORIST INJURED     7639 non-null   float64
 17  NUMBER OF MOTORIST KILLED      7639 non-null   float64
 18  CONTRIBUTING FACTOR VEHICLE 1  7595 non-null   object 
 19  CONTRIBUTING FACTOR VEHICLE 2  5611 non-null   object 
 20  CONTRIBUTING FACTOR VEHICLE 3  822 non-null    object 
 21  CONTRIBUTING FACTOR VEHICLE 4  224 non-null    object 
 22  CONTRIBUTING FACTOR VEHICLE 5  79 non-null     object 
 23  COLLISION_ID                   7639 non-null   float64
 24  VEHICLE TYPE CODE 1            7519 non-null   object 
 25  VEHICLE TYPE CODE 2            4738 non-null   object 
 26  VEHICLE TYPE CODE 3            750 non-null    object 
 27  VEHICLE TYPE CODE 4            206 non-null    object 
 28  VEHICLE TYPE CODE 5            77 non-null     object 
dtypes: float64(12), object(17)
memory usage: 1.7+ MB

We can create a plot to show that our data has been successfully cleaned.

# specify the size of the plot
fig = plt.figure(figsize=(14, 14))

# use seaborn to plot the points
# x and y are just the longitude and latitude from the plot
# data is the dataframe, which tells seaborn to refernce the dataframe for the rest of the arguments
# hue categorizes the colors and style changes the marker to an x if someone was killed
# alpha = 0.5 makes the markers 50% transparent
# size changes the size of the point based on the amount of injuries
# for sizes, I used list concatenation to make each point 25 times bigger
sns.scatterplot(x="LONGITUDE", y="LATITUDE", data=df,
                hue = 'BOROUGH', style = "NUMBER OF PERSONS KILLED", alpha = 0.5,
                size = "NUMBER OF PERSONS INJURED",
                sizes = [(i+1)*25 for i in range(int(max(df["NUMBER OF PERSONS INJURED"] + 1)))])
<AxesSubplot:xlabel='LONGITUDE', ylabel='LATITUDE'>
../_images/cartopy_22_1.png

8.3.7. Plotting On Cartopy Map

Using our dataset, we can plot the car crashes on their geographical location.

# specify the size of the plot
fig = plt.figure(figsize=(14, 14))

# plot on top of geographical coordinates
m = plt.axes(projection=ccrs.PlateCarree())

# set title
plt.title("Crashes in New York City (January 2022)")

# set geographical location of graph
m.set_extent([np.nanmin(df.LONGITUDE) - 0.01, np.nanmax(df.LONGITUDE) + 0.01,
              np.nanmin(df.LATITUDE) - 0.01, np.nanmax(df.LATITUDE) + 0.01], ccrs.PlateCarree())

# color in the land (the color is just a hex code for a yellow that I though looked good as land)
m.add_feature(cfeature.LAND, color = "#F3E5AB")

# color in the ocean (colors as blue by default)
m.add_feature(cfeature.OCEAN)

# use seaborn to plot the points
sns.scatterplot(x="LONGITUDE", y="LATITUDE", data=df,
                hue = 'BOROUGH', style = "NUMBER OF PERSONS KILLED", alpha = 0.5,
                size = "NUMBER OF PERSONS INJURED",
                sizes = [(i+1)*25 for i in range(int(max(df["NUMBER OF PERSONS INJURED"] + 1)))])

# if needed, you can add plt.legend(loc = "...") to change the location of the legend

# draw coastlines
m.coastlines()
<cartopy.mpl.feature_artist.FeatureArtist at 0x7f6f1e5be640>
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/10m/physical/ne_10m_land.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/10m/physical/ne_10m_ocean.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
/home/runner/work/ids-s22/ids-s22/env/lib/python3.9/site-packages/cartopy/io/__init__.py:241: DownloadWarning: Downloading: https://naciscdn.org/naturalearth/10m/physical/ne_10m_coastline.zip
  warnings.warn('Downloading: {}'.format(url), DownloadWarning)
../_images/cartopy_24_4.png

8.3.8. Plotting On Cartopy Map Using Shapefile

By making use of a shapefile, we can map with greater precision and omit the locations that are not included in our graph.

# get shape file
reader = shpreader.Reader('../data/nyc/geo_export_a1f96cd2-7ce6-44bb-9568-f3ee8bcba44a.dbf')

# convert shape file to format that cartopy can work with
geom = list(reader.geometries())
GEOM = cfeature.ShapelyFeature(geom, ccrs.PlateCarree())

# specify the size of the plot
plt.figure(figsize=(14, 14))

# plot on top of geographical coordinates
m = plt.axes(projection=ccrs.PlateCarree())

# make the default color of the plot gray (this ends up just being the background)
m.set_facecolor("gray")

# set title
plt.title("Crashes in New York City (January 2022)")

# set geographical location of graph
m.set_extent([np.nanmin(df.LONGITUDE) - 0.01, np.nanmax(df.LONGITUDE) + 0.01,
              np.nanmin(df.LATITUDE) - 0.01, np.nanmax(df.LATITUDE) + 0.01], ccrs.PlateCarree())

# add the in the New York City geography that was originally read in as a shape file
m.add_feature(GEOM, facecolor = "#F3E5AB", edgecolor='black')

# use seaborn to plot the points
# setting zorder to a high number makes it so the geography does not cover up the points
sns.scatterplot(x="LONGITUDE", y="LATITUDE", data=df,
                hue = 'BOROUGH', style = "NUMBER OF PERSONS KILLED", alpha = 0.5,
                size = "NUMBER OF PERSONS INJURED",
                sizes = [(i+1)*25 for i in range(int(max(df["NUMBER OF PERSONS INJURED"] + 1)))],
                zorder = 100)

# if needed, you can add plt.legend(loc = "...") to change the location of the legend
<GeoAxesSubplot:title={'center':'Crashes in New York City (January 2022)'}, xlabel='LONGITUDE', ylabel='LATITUDE'>
../_images/cartopy_26_1.png