nycOpenData: a unified R interface for NYC Open Data APIs
I am pleased to announce the release of nycOpenDataan R package providing convenient, tidy access to dozens of datasets from New York City’s Open Data platform.
The package is designed as part of an open, reproducible scientific research effort, with the goal of reducing friction between public data and statistical analysis, particularly for teaching, exploratory research, and applied civic work.
Why nycOpenData?
NYC Open Data hosts hundreds of datasets covering topics such as public safety, housing, transportation, education, health, and municipal services. Although these datasets are publicly accessible through the Socrata API, working with them directly often requires:
- know the identifiers of the datasets,
- manually build API requests,
- manage paging, timeouts and throughput limits,
- and perform repetitive data cleaning steps.
These obstacles can slow down exploratory analysis and make public data less accessible to students, researchers, and practitioners who work primarily in R.
nycOpenData was designed to remove these barriers by providing a consistent, user-friendly interface that returns clean tibbles, ready for analysis, without requiring users to interact directly with the API.
What is the package for?
The package provides a growing collection of wrapper functions, each corresponding to a specific NYC Open Data dataset or dataset family. All features follow a shared design pattern and support:
- line limits,
- optional filtering via named lists,
- sorting,
- and graceful handling of API errors and timeouts.
Examples of currently supported domains include:
- 311 service requests
- Transport and rental vehicles
- Motor vehicle collisions
- Department of Buildings Permits and Complaints
- Education and Schools Reports
- Juvenile justice and public safety
- Street trees and environmental data
- Authorized events (historical)
A typical call looks like this:
library(nycOpenData) nyc_311( limit = 1000, filters = list(borough = "BROOKLYN") ) ## # A tibble: 1,000 × 40 ## unique_key created_date agency agency_name complaint_type descriptor ## <chr> <chr> <chr> <chr> <chr> <chr> ## 1 67613985 2026-01-26T02:06:05.… NYPD New York C… Noise - Resid… Banging/P… ## 2 67609553 2026-01-26T02:02:09.… NYPD New York C… Noise - Resid… Banging/P… ## 3 67610990 2026-01-26T01:58:58.… NYPD New York C… Illegal Parki… Blocked H… ## 4 67615428 2026-01-26T01:56:49.… NYPD New York C… Noise - Resid… Banging/P… ## 5 67609568 2026-01-26T01:48:16.… NYPD New York C… Noise - Resid… Loud Musi… ## 6 67612476 2026-01-26T01:47:10.… NYPD New York C… Noise - Resid… Loud Musi… ## 7 67614152 2026-01-26T01:46:26.… DSNY Department… Snow or Ice Snow Trac… ## 8 67614054 2026-01-26T01:44:50.… DSNY Department… Dirty Conditi… Trash ## 9 67606570 2026-01-26T01:41:32.… NYPD New York C… Noise - Resid… Banging/P… ## 10 67610091 2026-01-26T01:35:51.… NYPD New York C… Noise - Vehic… Car/Truck… ## # ℹ 990 more rows ## # ℹ 34 more variables: location_type <chr>, incident_zip <chr>, ## # incident_address <chr>, street_name <chr>, cross_street_1 <chr>, ## # cross_street_2 <chr>, intersection_street_1 <chr>, ## # intersection_street_2 <chr>, address_type <chr>, city <chr>, ## # landmark <chr>, status <chr>, community_board <chr>, ## # council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>, …
The result is returned as a tibble of the most recent 1,000 NYC 311 queries, making it immediately compatible with the Tidyverse ecosystem for visualization, modeling, and reporting.
Mini-analysis
One of the strongest qualities of this function is its ability to filter based on multiple columns. Let’s put everything together and get a dataset of the last 1,000 311 requests from the NYPD in Brooklyn.
# Creating the dataset brooklyn_nypd <- nyc_311(limit = 1000, filters = list(agency = "NYPD", borough = "BROOKLYN")) # Calling head of our new dataset head(brooklyn_nypd) ## # A tibble: 6 × 39 ## unique_key created_date agency agency_name complaint_type descriptor ## <chr> <chr> <chr> <chr> <chr> <chr> ## 1 67613985 2026-01-26T02:06:05.0… NYPD New York C… Noise - Resid… Banging/P… ## 2 67609553 2026-01-26T02:02:09.0… NYPD New York C… Noise - Resid… Banging/P… ## 3 67610990 2026-01-26T01:58:58.0… NYPD New York C… Illegal Parki… Blocked H… ## 4 67615428 2026-01-26T01:56:49.0… NYPD New York C… Noise - Resid… Banging/P… ## 5 67609568 2026-01-26T01:48:16.0… NYPD New York C… Noise - Resid… Loud Musi… ## 6 67612476 2026-01-26T01:47:10.0… NYPD New York C… Noise - Resid… Loud Musi… ## # ℹ 33 more variables: location_type <chr>, incident_zip <chr>, ## # incident_address <chr>, street_name <chr>, cross_street_1 <chr>, ## # cross_street_2 <chr>, intersection_street_1 <chr>, ## # intersection_street_2 <chr>, address_type <chr>, city <chr>, ## # landmark <chr>, status <chr>, community_board <chr>, ## # council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>, ## # x_coordinate_state_plane <chr>, y_coordinate_state_plane <chr>, … # Quick check to make sure our filtering worked nrow(brooklyn_nypd) ## [1] 1000 unique(brooklyn_nypd$agency) ## [1] "NYPD" unique(brooklyn_nypd$borough) ## [1] "BROOKLYN"
We have successfully created our dataset containing the most recent 1,000 requests regarding the NYPD in the borough of Brooklyn.
Now that we’ve managed to extract the data and have it in R, let’s see what Brooklynites are complaining to the NYPD about.
To do this, we will create a bar chart of the complaint types.
# Visualizing the distribution, ordered by frequency
library(ggplot2)
ggplot(brooklyn_nypd, aes(y = reorder(complaint_type, complaint_type, length))) +
geom_bar(fill = "steelblue") +
theme_minimal() +
labs(
title = "Most Recent NYPD 311 Complaints (Brooklyn)",
subtitle = "Top 1,000 service requests",
x = "Number of Complaints",
y = "Type of Complaint"
)
Figure 1: Bar graph showing the frequency of 311 types of NYPD-related complaints in Brooklyn based on the most recent 1,000 requests for service.
This graph not only shows us which complaints were filed, but How much of each complaint were filed.
Designed for repeatable workflows
A fundamental design principle of nycOpenData is reproducibility. Rather than uploading static CSV files that may change over time or be accidentally modified, analytics can explicitly document:
- what dataset was used,
- how many lines were requested,
- what filters were applied,
- and when the data was accessed.
This makes the package particularly useful for:
- reproducible research projects,
- homework in class,
- data journalism,
- and exploratory civic analysis.
The package is also designed to be API-friendly, with configurable timeouts and protections that help avoid common failure modes when querying large public data sets.
Berita Terkini
Berita Terbaru
Daftar Terbaru
News
Berita Terbaru
Flash News
RuangJP
Pemilu
Berita Terkini
Prediksi Bola
Togel Deposit Pulsa
Technology
Otomotif
Berita Terbaru
Daftar Judi Slot Online Terpercaya
Slot yang lagi gacor
Teknologi
Berita terkini
Berita Pemilu
Berita Teknologi
Hiburan
master Slote
Berita Terkini
Pendidikan
Resep
Jasa Backlink
One Piece Terbaru