nycOpenData: a unified R interface for NYC Open Data APIs
3 mins read

nycOpenData: a unified R interface for NYC Open Data APIs

I am pleased to announce the release of nycOpenDataan R package providing convenient, tidy access to dozens of datasets from New York City’s Open Data platform.

The package is designed as part of an open, reproducible scientific research effort, with the goal of reducing friction between public data and statistical analysis, particularly for teaching, exploratory research, and applied civic work.

Why nycOpenData?

NYC Open Data hosts hundreds of datasets covering topics such as public safety, housing, transportation, education, health, and municipal services. Although these datasets are publicly accessible through the Socrata API, working with them directly often requires:

  • know the identifiers of the datasets,
  • manually build API requests,
  • manage paging, timeouts and throughput limits,
  • and perform repetitive data cleaning steps.

These obstacles can slow down exploratory analysis and make public data less accessible to students, researchers, and practitioners who work primarily in R.

nycOpenData was designed to remove these barriers by providing a consistent, user-friendly interface that returns clean tibbles, ready for analysis, without requiring users to interact directly with the API.

What is the package for?

The package provides a growing collection of wrapper functions, each corresponding to a specific NYC Open Data dataset or dataset family. All features follow a shared design pattern and support:

  • line limits,
  • optional filtering via named lists,
  • sorting,
  • and graceful handling of API errors and timeouts.

Examples of currently supported domains include:

  • 311 service requests
  • Transport and rental vehicles
  • Motor vehicle collisions
  • Department of Buildings Permits and Complaints
  • Education and Schools Reports
  • Juvenile justice and public safety
  • Street trees and environmental data
  • Authorized events (historical)

A typical call looks like this:

library(nycOpenData)

nyc_311(
  limit = 1000,
  filters = list(borough = "BROOKLYN")
)
## # A tibble: 1,000 × 40
##    unique_key created_date          agency agency_name complaint_type descriptor
##    <chr>      <chr>                 <chr>  <chr>       <chr>          <chr>     
##  1 67613985   2026-01-26T02:06:05.… NYPD   New York C… Noise - Resid… Banging/P…
##  2 67609553   2026-01-26T02:02:09.… NYPD   New York C… Noise - Resid… Banging/P…
##  3 67610990   2026-01-26T01:58:58.… NYPD   New York C… Illegal Parki… Blocked H…
##  4 67615428   2026-01-26T01:56:49.… NYPD   New York C… Noise - Resid… Banging/P…
##  5 67609568   2026-01-26T01:48:16.… NYPD   New York C… Noise - Resid… Loud Musi…
##  6 67612476   2026-01-26T01:47:10.… NYPD   New York C… Noise - Resid… Loud Musi…
##  7 67614152   2026-01-26T01:46:26.… DSNY   Department… Snow or Ice    Snow Trac…
##  8 67614054   2026-01-26T01:44:50.… DSNY   Department… Dirty Conditi… Trash     
##  9 67606570   2026-01-26T01:41:32.… NYPD   New York C… Noise - Resid… Banging/P…
## 10 67610091   2026-01-26T01:35:51.… NYPD   New York C… Noise - Vehic… Car/Truck…
## # ℹ 990 more rows
## # ℹ 34 more variables: location_type <chr>, incident_zip <chr>,
## #   incident_address <chr>, street_name <chr>, cross_street_1 <chr>,
## #   cross_street_2 <chr>, intersection_street_1 <chr>,
## #   intersection_street_2 <chr>, address_type <chr>, city <chr>,
## #   landmark <chr>, status <chr>, community_board <chr>,
## #   council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>, …

The result is returned as a tibble of the most recent 1,000 NYC 311 queries, making it immediately compatible with the Tidyverse ecosystem for visualization, modeling, and reporting.

Mini-analysis

One of the strongest qualities of this function is its ability to filter based on multiple columns. Let’s put everything together and get a dataset of the last 1,000 311 requests from the NYPD in Brooklyn.

# Creating the dataset
brooklyn_nypd <- nyc_311(limit = 1000, filters = list(agency = "NYPD", borough = "BROOKLYN"))

# Calling head of our new dataset
head(brooklyn_nypd)
## # A tibble: 6 × 39
##   unique_key created_date           agency agency_name complaint_type descriptor
##   <chr>      <chr>                  <chr>  <chr>       <chr>          <chr>     
## 1 67613985   2026-01-26T02:06:05.0… NYPD   New York C… Noise - Resid… Banging/P…
## 2 67609553   2026-01-26T02:02:09.0… NYPD   New York C… Noise - Resid… Banging/P…
## 3 67610990   2026-01-26T01:58:58.0… NYPD   New York C… Illegal Parki… Blocked H…
## 4 67615428   2026-01-26T01:56:49.0… NYPD   New York C… Noise - Resid… Banging/P…
## 5 67609568   2026-01-26T01:48:16.0… NYPD   New York C… Noise - Resid… Loud Musi…
## 6 67612476   2026-01-26T01:47:10.0… NYPD   New York C… Noise - Resid… Loud Musi…
## # ℹ 33 more variables: location_type <chr>, incident_zip <chr>,
## #   incident_address <chr>, street_name <chr>, cross_street_1 <chr>,
## #   cross_street_2 <chr>, intersection_street_1 <chr>,
## #   intersection_street_2 <chr>, address_type <chr>, city <chr>,
## #   landmark <chr>, status <chr>, community_board <chr>,
## #   council_district <chr>, police_precinct <chr>, bbl <chr>, borough <chr>,
## #   x_coordinate_state_plane <chr>, y_coordinate_state_plane <chr>, …
# Quick check to make sure our filtering worked
nrow(brooklyn_nypd)
## [1] 1000
unique(brooklyn_nypd$agency)
## [1] "NYPD"
unique(brooklyn_nypd$borough)
## [1] "BROOKLYN"

We have successfully created our dataset containing the most recent 1,000 requests regarding the NYPD in the borough of Brooklyn.

Now that we’ve managed to extract the data and have it in R, let’s see what Brooklynites are complaining to the NYPD about.

To do this, we will create a bar chart of the complaint types.

# Visualizing the distribution, ordered by frequency
library(ggplot2)

ggplot(brooklyn_nypd, aes(y = reorder(complaint_type, complaint_type, length))) +
  geom_bar(fill = "steelblue") +
  theme_minimal() +
  labs(
    title = "Most Recent NYPD 311 Complaints (Brooklyn)",
    subtitle = "Top 1,000 service requests",
    x = "Number of Complaints",
    y = "Type of Complaint"
  )

Bar chart showing the frequency of 311 types of NYPD-related complaints in Brooklyn based on the most recent 1,000 requests for service.

Figure 1: Bar graph showing the frequency of 311 types of NYPD-related complaints in Brooklyn based on the most recent 1,000 requests for service.

This graph not only shows us which complaints were filed, but How much of each complaint were filed.

Designed for repeatable workflows

A fundamental design principle of nycOpenData is reproducibility. Rather than uploading static CSV files that may change over time or be accidentally modified, analytics can explicitly document:

  • what dataset was used,
  • how many lines were requested,
  • what filters were applied,
  • and when the data was accessed.

This makes the package particularly useful for:

  • reproducible research projects,
  • homework in class,
  • data journalism,
  • and exploratory civic analysis.

The package is also designed to be API-friendly, with configurable timeouts and protections that help avoid common failure modes when querying large public data sets.



Berita Terkini

Berita Terbaru

Daftar Terbaru

News

Berita Terbaru

Flash News

RuangJP

Pemilu

Berita Terkini

Prediksi Bola

Togel Deposit Pulsa

Technology

Otomotif

Berita Terbaru

Daftar Judi Slot Online Terpercaya

Slot yang lagi gacor

Teknologi

Berita terkini

Berita Pemilu

Berita Teknologi

Hiburan

master Slote

Berita Terkini

Pendidikan

Resep

Jasa Backlink

One Piece Terbaru