from shapely.geometry import LineString, Point, Polygon
import geopandas as gpd
# point example
= Point(0.5, 0.5)
point = gpd.GeoDataFrame(geometry=[point])
gdf1
# line example
= LineString([(0, 0), (1, 1)])
line = gpd.GeoDataFrame(geometry=[line])
gdf2
# polygon example
= Polygon([(0, 0), (0, 1), (1, 1), (1, 0), (0, 0)])
polygon = gpd.GeoDataFrame(geometry=[polygon]) gdf3
6 Geospatial Data
6.1 Handling Spatial Data with GeoPandas
(by Kaitlyn Bedard)
GeoPandas
is a python library created as an extension of pandas to offer support for geographic data. Like pandas, GeoPandas has a series type and a dataframe type: GeoSeries
and GeoDataFrame
. It allows users to do work that would otherwise need a GIS database. Note that since GeoPandas is an extension of Pandas, it inherits all its attributes and methods. Please review the pandas presentations for information on these tools, if needed.
6.1.1 Installation
You can install GeoPandas using the below commands in terminal. The documentation recommends the first method.
conda install -c conda-forge geopandas
conda install geopandas
pip install geopandas
6.1.2 Basic Concepts
The GeoPandas GeoDataFrame
is essentially a pandas dataframe that supports typical data, however, it also supports geometries. Though the dataframe can have multiple geometry columns, there is one “active” column on which all operations are applied to.
The types of geometries are:
- Points: coordinates
- Lines: set of two coordinates
- Polygons: list of coordinate tuples, first and last must be the same (closed shape)
These geometries are often represented by shapely.geometry
objects. Note, we can also have multi-points, multi-lines, and multi-polygons. Below are examples of creating these geometries using shapely. Each GeoSeries has a specified CRS (Coordinate Reference System) that stores information about the data.
The following are some examples of basic attributes of a GeoSeries:
length
: returns the length of a line
gdf2.length
0 1.414214
dtype: float64
area
: returns the area of the shape
gdf3.area
0 1.0
dtype: float64
bounds
: gives the bounds of each row in a geometry columntotal_bounds
: gives the total bounds of a geometry seriesgeom_type
: gives the geometry type
gdf3.geom_type
0 Polygon
dtype: object
is_valid
: returns True for valid geometries and False otherwise
Below are some examples of basic methods that can be applied to a GeoSeries:
distance()
: returns the (minimum) distance of each row of a geometry to a specified paramater- parameter
other
: can be a single geometry, or an entire geometry series - parameter
align
: True if you want to align GeoSeries by index, false otherwise
- parameter
1,0))) gdf2.distance(Point((
0 0.707107
dtype: float64
centroid
: returns a new GeoSeries with the centers of each row in the geometry
gdf3.centroid
0 POINT (0.50000 0.50000)
dtype: geometry
Below are examples of some relationship tests that can be applied to a GeoSeries:
contains()
: returns true if shape contains a specifiedother
- parameter
other
: can be a single geometry, or an entire geometry series - parameter
align
: True if you want to align GeoSeries by index, false otherwise
- parameter
gdf3.contains(gdf1)
0 True
dtype: bool
intersects()
: returns true if shape intersects a specifiedother
- parameter
other
: can be a single geometry, or an entire geometry series - parameter
align
: True if you want to align GeoSeries by index, false otherwise
- parameter
gdf2.intersects(gdf3)
0 True
dtype: bool
6.1.3 Reading Files
If you have a file that contains data and geometry information, you can read it directly with geopandas using the geopandas.read_file()
command. Examples of these files are GeoPackage, GeoJSON, Shapefile. However, we can convert other types of files to a GeoDataFrame. For example, we can transform the NYC crash data. The below code creates a point geometry. The points are the coordinates of the crashes.
# Reading csv file
import pandas as pd
import numpy as np
# Shapely for converting latitude/longtitude to geometry
from shapely.geometry import Point
# To create GeodataFrame
import geopandas as gpd
= pd.read_csv('data/nyc_crashes_202301_cleaned.csv')
jan23
# creating geometry using shapely (removing empty points)
= [Point(xy) for xy in zip(jan23["LONGITUDE"], \
geometry "LATITUDE"]) if not Point(xy).is_empty]
jan23[
# creating geometry column to be used by geopandas
= gpd.points_from_xy(jan23["LONGITUDE"], jan23["LATITUDE"])
geometry2
# coordinate reference system (epsg:4326 implies geographic coordinates)
= {'init': 'epsg:4326'}
crs
# create Geographic data frame (removing rows with missing coordinates)
= gpd.GeoDataFrame(jan23.loc[~pd.isna(\
jan23_gdf "LATITUDE"]) & ~pd.isna(\
jan23["LONGITUDE"])],crs=crs, geometry=geometry)
jan23[
jan23_gdf.head()
/usr/local/lib/python3.11/site-packages/pyproj/crs/crs.py:141: FutureWarning: '+init=<authority>:<code>' syntax is deprecated. '<authority>:<code>' is the preferred initialization method. When making the change, be mindful of axis order changes: https://pyproj4.github.io/pyproj/stable/gotchas.html#axis-order-changes-in-proj-6
in_crs_string = _prepare_from_proj_string(in_crs_string)
CRASH DATE | CRASH TIME | BOROUGH | ZIP CODE | LATITUDE | LONGITUDE | LOCATION | ON STREET NAME | CROSS STREET NAME | OFF STREET NAME | ... | Unnamed: 31 | Unnamed: 32 | Unnamed: 33 | Unnamed: 34 | Unnamed: 35 | Unnamed: 36 | Unnamed: 37 | Unnamed: 38 | Unnamed: 39 | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1/1/23 | 14:38 | BROOKLYN | 11211.0 | 40.719094 | -73.946108 | (40.7190938,-73.9461082) | BROOKLYN QUEENS EXPRESSWAY RAMP | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-73.94611 40.71909) |
1 | 1/1/23 | 8:04 | QUEENS | 11430.0 | 40.659508 | -73.773687 | (40.6595077,-73.7736867) | NASSAU EXPRESSWAY | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-73.77369 40.65951) |
2 | 1/1/23 | 18:05 | MANHATTAN | 10011.0 | 40.742454 | -74.008686 | (40.7424543,-74.008686) | 10 AVENUE | 11 AVENUE | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-74.00869 40.74245) |
3 | 1/1/23 | 23:45 | QUEENS | 11103.0 | 40.769737 | -73.912440 | (40.769737, -73.91244) | ASTORIA BOULEVARD | 37 STREET | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-73.91244 40.76974) |
4 | 1/1/23 | 4:50 | BRONX | 10462.0 | 40.830555 | -73.850720 | (40.830555, -73.85072) | CASTLE HILL AVENUE | EAST 177 STREET | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | POINT (-73.85072 40.83055) |
5 rows × 41 columns
6.1.4 Plotting
We can easily plot our data now that has been transformed to a geometric data frame.
# Basic Plot
jan23_gdf.plot()# Color the plot by borough
= 'BOROUGH',legend=True)
jan23_gdf.plot(column
# Color the plot by number persons injuried
= 'NUMBER OF PERSONS INJURED',legend=True, \
jan23_gdf.plot(column = "OrRd")
cmap
# Plotting missing information
='BOROUGH', missing_kwds={'color': 'lightgrey'}) jan23_gdf.plot(column
<AxesSubplot: >
6.1.5 Interactive Maps
We can also easily create an interactive plot, using the .explore()
method.
# interactive map of just the latitude and longitude points
jan23_gdf.explore()