This presentation was prepared by Joseph Landolphi.
This presentation explains what the Grammar of Graphics is, why it is important in modern data science, and how it is used in sports analytics such as baseball.
7.2.1 Introduction
Data visualization is a fundamental part of modern data science. Analysts must interpret large datasets and communicate insights clearly. Instead of thinking about visualization as simply choosing a chart type, the Grammar of Graphics provides a structured framework for building visualizations from components.
7.2.2 What is the Grammar of Graphics?
The Grammar of Graphics was introduced by Leland Wilkinson and describes visualization as a layered system. Rather than asking “what chart should I use?”, analysts ask:
What data am I using?
What variables should be mapped to visual properties?
What geometric shapes should represent observations?
What scales and coordinate systems improve interpretation?
This approach makes visualization systematic and reproducible.
7.2.3 Why the Grammar of Graphics Matters in Modern Data Science
Key benefits include:
Structured thinking about visualization
Reproducibility
Ability to layer information
Scalability for large datasets
Integration into dashboards and automated workflows
Many modern visualization libraries are influenced by this framework.
import numpy as npimport pandas as pdimport matplotlib.pyplot as pltnp.random.seed(42)data = pd.DataFrame({"exit_velocity": np.random.normal(90, 5, 300),"launch_angle": np.random.normal(15, 10, 300),"hit_distance": np.random.normal(380, 30, 300),})data.head()
exit_velocity
launch_angle
hit_distance
0
92.483571
6.710050
402.709658
1
89.308678
9.398190
352.335040
2
93.238443
22.472936
406.088178
3
97.615149
21.103703
420.669136
4
88.829233
14.790984
392.403047
7.2.4 Component 1: Data
Every visualization begins with a dataset.
Modern baseball tracking systems collect variables such as:
Exit velocity
Launch angle
Hit distance
Pitch location
Player identity
These variables form the foundation of visual analysis.
Aesthetic mappings connect data variables to visual properties such as:
Position
Color
Size
In this example:
Exit velocity is mapped to the x-axis
Launch angle is mapped to the y-axis
Each point represents a batted ball.
plt.figure()plt.scatter( data["exit_velocity"], data["launch_angle"], c=data["hit_distance"],)plt.xlabel("Exit Velocity")plt.ylabel("Launch Angle")plt.title("Contact Profile Colored by Hit Distance")plt.colorbar(label="Hit Distance (ft)")plt.show()
7.2.6 Component 3: Scales
Scales determine how data values are translated into visual values.
Here, hit distance is represented using a color gradient. This allows multiple variables to be displayed simultaneously, increasing information density.
plt.figure()plt.scatter(data["launch_angle"], data["hit_distance"])plt.xlabel("Launch Angle (degrees)")plt.ylabel("Hit Distance (feet)")plt.title("Relationship Between Launch Angle and Hit Distance")plt.show()
7.2.7 Component 4: More Plot Objects
Geometric objects define the shapes used to represent data.
Examples include:
Points for scatter plots
Bars for comparisons
Lines for trends
This example adds a trend line to the raw observations.
x = data["exit_velocity"]y = data["launch_angle"]plt.figure()plt.scatter(x, y)z = np.polyfit(x, y, 1)p = np.poly1d(z)plt.plot(x, p(x))plt.title("Layered Visualization with Trend Line")plt.show()
7.2.8 Component 5: Layering
One of the most powerful ideas in the Grammar of Graphics is layering.
Visualizations can combine:
Raw observations
Statistical summaries
This allows deeper insight into relationships between variables.
import plotly.express as pxfig = px.scatter_3d( data, x="exit_velocity", y="launch_angle", z="hit_distance", title="Interactive 3D Scatter Plot", opacity=0.8,)fig.show()