Introduction to APIs in R

Introduction to APIs in R workshop by JHU Data Services - October 22nd, 2025

Author

Peter Lawson, PhD

Published

October 22, 2025

Introduction

This document has the completed code for the Johns Hopkins Data Services October 22nd session Introduction to APIs in R.

Load Libraries

## Library for working with RESTful APIs 
library(httr2)

## Libraries for data processing
library(purrr)
library(dplyr)
library(tibble)

## Library for viewing dataframe as HTML table
library(DT)

Introduction to Requests

Use the PokeAPI to extract basic information about Pokémon using a series of API requests.

Set the url for the API and set our endpoint (pokemon)

base_url <- 'https://pokeapi.co/api/v2/'
endpoint <- 'pokemon'

Use req_url_path_append() to construct the endpoint path

request <- request(base_url) |> 
    req_url_path_append(endpoint)

We can see the get request formed by our request function:

request
<httr2_request>
GET https://pokeapi.co/api/v2/pokemon
Body: empty

Generate a response by performing our request using req_perform(request)

response <- req_perform(request)

Let’s take a look at the contents of our raw response body:

response$body
   [1] 7b 22 63 6f 75 6e 74 22 3a 31 33 32 38 2c 22 6e 65 78 74 22 3a 22 68 74
  [25] 74 70 73 3a 2f 2f 70 6f 6b 65 61 70 69 2e 63 6f 2f 61 70 69 2f 76 32 2f
  [49] 70 6f 6b 65 6d 6f 6e 3f 6f 66 66 73 65 74 3d 32 30 26 6c 69 6d 69 74 3d
  [73] 32 30 22 2c 22 70 72 65 76 69 6f 75 73 22 3a 6e 75 6c 6c 2c 22 72 65 73
  [97] 75 6c 74 73 22 3a 5b 7b 22 6e 61 6d 65 22 3a 22 62 75 6c 62 61 73 61 75
 [121] 72 22 2c 22 75 72 6c 22 3a 22 68 74 74 70 73 3a 2f 2f 70 6f 6b 65 61 70
 [145] 69 2e 63 6f 2f 61 70 69 2f 76 32 2f 70 6f 6b 65 6d 6f 6e 2f 31 2f 22 7d
 [169] 2c 7b 22 6e 61 6d 65 22 3a 22 69 76 79 73 61 75 72 22 2c 22 75 72 6c 22
 [193] 3a 22 68 74 74 70 73 3a 2f 2f 70 6f 6b 65 61 70 69 2e 63 6f 2f 61 70 69
 [217] 2f 76 32 2f 70 6f 6b 65 6d 6f 6e 2f 32 2f 22 7d 2c 7b 22 6e 61 6d 65 22
....

We see that our output is raw byte codes - as series of two character hexadecimal values. This is because we need to tell the httr2 library to intepret the body as a JSON. We can use glimpse() to make our JSON output more readable.

response |> 
    resp_body_json() |> 
    glimpse()
List of 4
 $ count   : int 1328
 $ next    : chr "https://pokeapi.co/api/v2/pokemon?offset=20&limit=20"
 $ previous: NULL
 $ results :List of 20
  ..$ :List of 2
  .. ..$ name: chr "bulbasaur"
  .. ..$ url : chr "https://pokeapi.co/api/v2/pokemon/1/"
  ..$ :List of 2
  .. ..$ name: chr "ivysaur"
  .. ..$ url : chr "https://pokeapi.co/api/v2/pokemon/2/"
  ..$ :List of 2
  .. ..$ name: chr "venusaur"
  .. ..$ url : chr "https://pokeapi.co/api/v2/pokemon/3/"
  ..$ :List of 2
....

Request a specific pokemon endpoint

  • Define an endpoint for bulbasaur
  • Perform the request
  • Display the response
response <- request(base_url) |>
    req_url_path_append(endpoint, 'bulbasaur') |> 
    req_perform() |> 
    resp_body_json()

response |> glimpse()
List of 20
 $ abilities               :List of 2
  ..$ :List of 3
  .. ..$ ability  :List of 2
  .. ..$ is_hidden: logi FALSE
  .. ..$ slot     : int 1
  ..$ :List of 3
  .. ..$ ability  :List of 2
  .. ..$ is_hidden: logi TRUE
  .. ..$ slot     : int 3
 $ base_experience         : int 64
 $ cries                   :List of 2
  ..$ latest: chr "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/latest/1.ogg"
  ..$ legacy: chr "https://raw.githubusercontent.com/PokeAPI/cries/main/cries/pokemon/legacy/1.ogg"
 $ forms                   :List of 1
....

Manipulating JSON data

Extract data from a JSON into a tibble (dataframe)

JSON data is variably nested and can be difficult to work with. For example, what if we want to retrieve the stats for bulbasaur and format it like:

Stat Name Stat
HP 30
Attack 25
Defense 35

We can see that stats are nested in a hierarchy of lists, and are difficult to retreive by name:

str(response$stats)
List of 6
 $ :List of 3
  ..$ base_stat: int 45
  ..$ effort   : int 0
  ..$ stat     :List of 2
  .. ..$ name: chr "hp"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/1/"
 $ :List of 3
  ..$ base_stat: int 49
  ..$ effort   : int 0
  ..$ stat     :List of 2
  .. ..$ name: chr "attack"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/2/"
 $ :List of 3
  ..$ base_stat: int 49
  ..$ effort   : int 0
  ..$ stat     :List of 2
  .. ..$ name: chr "defense"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/3/"
 $ :List of 3
  ..$ base_stat: int 65
  ..$ effort   : int 1
  ..$ stat     :List of 2
  .. ..$ name: chr "special-attack"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/4/"
 $ :List of 3
  ..$ base_stat: int 65
  ..$ effort   : int 0
  ..$ stat     :List of 2
  .. ..$ name: chr "special-defense"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/5/"
 $ :List of 3
  ..$ base_stat: int 45
  ..$ effort   : int 0
  ..$ stat     :List of 2
  .. ..$ name: chr "speed"
  .. ..$ url : chr "https://pokeapi.co/api/v2/stat/6/"

One strategy is to iterate over all six outer lists, each of which corresponds to a single statistics, and then extract the relevant information from the inner lists.

We can do this using the map_df() function from the purrr library. The map_df() iterates, or “maps”, over each list, allows us to perform a function, and returns the result as a dataframe.

The function takes the form map_df(my_list, ~ function(.x)) where the ~ represents an anonymous function that allows us to reference each list as .x.

We can extract all of the stats into a dataframe using:

stats <- map_df(
  response$stats,
  ~ tibble(stat_name = .x$stat$name, stat = .x$base_stat)
)

which gives us:

stats
# A tibble: 6 × 2
  stat_name        stat
  <chr>           <int>
1 hp                 45
2 attack             49
3 defense            49
4 special-attack     65
5 special-defense    65
6 speed              45

The rest of the data is easier to extract from response, so by using both our stats tibble we created, as well as the original response, we can create a tibble of bulbasaur stats:

bulbasaur_stats <- tibble(
  sprite = response$sprites$front_default,
  species = response$species$name,
  height = response$height,
  weight = response$weight,
  hp = stats$stat[stats$stat_name == "hp"],
  defense = stats$stat[stats$stat_name == "defense"],
  attack = stats$stat[stats$stat_name == "attack"])

bulbasaur_stats
# A tibble: 1 × 7
  sprite                              species height weight    hp defense attack
  <chr>                               <chr>    <int>  <int> <int>   <int>  <int>
1 https://raw.githubusercontent.com/… bulbas…      7     69    45      49     49

Pagination

Using pagination to request data for 100 pokemon

What if we want stats for more than one pokemon? We need to find some way of making multiple requests for each pokemon.

Let’s create a request for 20 pokemon. We can use req_url_query() to pass specific parameters to our get request; in this case we will pass limit=20 to request 20 records:

request <- request(base_url) |> 
    req_url_path_append('pokemon') |> 
    req_url_query(limit = 20)

If we perform a single request, and examine it, we see an interesting attribute - next:

response <- req_perform(request) |> resp_body_json()
response$`next`
[1] "https://pokeapi.co/api/v2/pokemon?offset=20&limit=20"

Next is delivered as part of our JSON. It tells us, if we wanted the next batch of records, exactly what API call we would need to make. We can think of these as a chain of API calls that allow us to iterate through all records available to us:

...okemon?offset=20&limit=20" -> ...okemon?offset=40&limit=20" -> ...okemon?offset=60&limit=20"

and so on until we end up reaching the end, which we know is the end because next == NULL - there are no more records past the last API call.

So how do we request multiple pokemon stats? There are different strategies, which you will find is often the case when working with APIs. We will:

  1. Use the req_perform_iterative() function to iterate over multiple batches of records until we have all 1,328 pokemon. Learn more about req_perform_iterative.
  2. Create our own function, req_paged_next() which will help us update each step of req_perform_iterative() to use the next record available.
  3. Iterate over all of the responses, and extract the name and url endpoint for each pokemon.
  4. Create a function that generalizes the process of extracting pokemon statistics from a JSON, as we did with bulbasaur.
  5. Using the list of pokemon URLs, make 1,328 API requests, one for each pokemon, and extract the statistics for each pokemon.

We can extract all the pokemon using:

responses <- req_perform_iterative(
    request,
    next_req = SOME_HELPER_FUNCTION,
    on_error = "return" # If request fails, stop and return what you have
)

next_req take’s a function, with the arguments (resp, req).

Normally we could use one of the httr2 provided iteration helper functions to help us iterate over.

These functions are intended for use with the next_req argument to req_perform_iterative().
Each implements iteration for a common pagination pattern:

  • iterate_with_offset() — increments a query parameter, e.g. ?page=1, ?page=2, or ?offset=1, offset=21.
  • iterate_with_cursor() — updates a query parameter with the value of a cursor found somewhere in the response.
  • iterate_with_link_url() — follows the URL found in the Link header. See resp_link_url() for more details.

The problem is, our next url is not included in the Link header, as is common, but is instead part of our JSON. This means we will need to write and provide a custom function to provide it to req_perform_iterative.

Let’s build a next page handler

Our next page handler takes two arguments, resp and req. These stand for response and request, respectively. We must use these abbreviations, because that is what the req_perform_iterative next_req argument expects.

Our next page handler does the following:

  1. Extracts the JSON body: resp_body_json(resp)
  2. Grabs the next url response_body$next``
  3. Checks if the next URL is NULL, and returns NULL if so. if (is.null(next_url)){return(NULL)}
  4. Otherwise updates the request with the new URL: req |> req_url(next_url)
next_page_handler <- function(resp, req) {
  response_body <- resp_body_json(resp)
  next_url <- response_body$`next`
  ## Error handling - if we get to a next that returns NULL
  ## then we stop. Otherwise keep navigating through our 
  ## chain of next URLs.
  if (is.null(next_url)) {
    return(NULL)
  } else {
    req |> req_url(next_url)
  }
}

We can avoid hitting any API limits by using req_throttle() in request() to ensure we never exceed a specified rate.

Throttling is implemented using a “token bucket”, which fills up to a maximum of capacity tokens over fill_time_s (fill time in seconds). Each time you make a request, it takes a token out of the bucket, and if the bucket is empty, the request will wait until the bucket refills. This ensures that you never make more than capacity requests in fill_time_s.

request <- request("https://pokeapi.co/api/v2/pokemon") |> 
  req_url_query(limit = 20) |> 
  req_throttle(capacity = 10, fill_time_s = 60)

Now let’s get all the first 100 pokemon (by setting max_reqs = 5). We can get all the pokemon by setting max_reqs = Inf, but be cautious of API limits.

responses <- httr2::req_perform_iterative(
  request,
  next_req = next_page_handler,
  max_reqs = 5,
  on_error = "return"
)

Now we can use map_dfr() to iterate over each response and extract the results.

pokemon_names <-
  purrr::map_dfr(responses, function(response) {
    body <- httr2::resp_body_json(response)
    purrr::map_dfr(body$results, tibble::as_tibble)
  })

pokemon_names
# A tibble: 100 × 2
   name       url                                  
   <chr>      <chr>                                
 1 bulbasaur  https://pokeapi.co/api/v2/pokemon/1/ 
 2 ivysaur    https://pokeapi.co/api/v2/pokemon/2/ 
 3 venusaur   https://pokeapi.co/api/v2/pokemon/3/ 
 4 charmander https://pokeapi.co/api/v2/pokemon/4/ 
 5 charmeleon https://pokeapi.co/api/v2/pokemon/5/ 
 6 charizard  https://pokeapi.co/api/v2/pokemon/6/ 
 7 squirtle   https://pokeapi.co/api/v2/pokemon/7/ 
 8 wartortle  https://pokeapi.co/api/v2/pokemon/8/ 
 9 blastoise  https://pokeapi.co/api/v2/pokemon/9/ 
10 caterpie   https://pokeapi.co/api/v2/pokemon/10/
# ℹ 90 more rows

Retreive statistics for each pokemon through multiple API calls

In order to retrieve statsitics for each pokemon, we can iterate over each URL in pokemon_names and make a request for the specific pokemon endpoint represented by that url. To do this, we can build on our approach for extracting statistics from one pokemon, and build a function for multiple.

get_pokemon_details <- function(url) {

  response <- request(url) |> req_perform() |> resp_body_json()

  ## We don't have ways to limit the number of requests over
  ## a given time period, so we can use Sys.sleep to pause our
  ## function before each request to ensure we don't exceed 
  ## API limits.
  Sys.sleep(.5)
  

  stats <- purrr:::map_df(
    response$stats,
    ~ tibble(stat_name = .x$stat$name, stat = .x$base_stat)
  )

  tibble(
    sprite = response$sprites$front_default,
    name = response$name,
    height = response$height,
    weight = response$weight,
    hp = stats$stat[stats$stat_name == "hp"],
    defense = stats$stat[stats$stat_name == "defense"],
    attack = stats$stat[stats$stat_name == "attack"]
  ) |>
    ## This mutate function wraps each sprite url in 
    ## <img src="SPRITE_URL"height="50"></img> so it can
    ## be displayed in an HTML table using the DT library
    mutate(sprite = paste0('<img src="', sprite, '" height="50"></img>'))
}

Now iterate over each pokemon URL and excecute the get_pokemon_details function to retrieve each pokemon’s statistics:

pokemon_stats <- map_dfr(pokemon_names$url, get_pokemon_details)

Finally, display our resultant table:

datatable(data = pokemon_stats, escape = FALSE)

API key privacy

What if you are working with an API that requires an API key?

For example, let’s use the NASA API to retreive weather data on mars:

This API is a little different. We have to specify our data type (JSON) and our API version (1.0) as parameters in our GET request:

nasa_base_url <- 'https://api.nasa.gov/'

req <- request(nasa_base_url) |>
  req_url_path_append('insight_weather/') |>
  req_url_query(feedtype = "json", ver = "1.0") |>
  req_perform()
Error in `req_perform()`:
! HTTP 403 Forbidden.
req
Error: object 'req' not found

We see that when we run it, we get a 403 HTTP error. We are not considered a valid user unless we provide an API key.

You can register for an API key with nothing but an email at https://api.nasa.gov/, or we can use the demo key that NASA provides for experimenting with their API (although it does have lower API limits than if you use your own key).

api_key <- 'DEMO_KEY'

req <- request(nasa_base_url) |>
  req_url_path_append('insight_weather/') |>
  req_url_query(feedtype = "json", ver = "1.0", api_key = api_key) |>
  req_perform()
  
req
<httr2_response>
GET https://api.nasa.gov/insight_weather/?feedtype=json&ver=1.0&api_key=DEMO_KEY
Status: 200 OK
Content-Type: application/json
Body: In memory (36505 bytes)

Now that we passed the api_key as a parameter, we received a valid response (HTTP 200).

We don’t want to store that API key in plaintext though - anyone could steal it! Let’s store it as an environmental variable. To do that, you can add it to your .Renviron file in your home directory.

You can use an R package to edit that file directly in RStudio: usethis.

usethis is a workflow package: it automates repetitive tasks that arise during project setup and development, both for R packages and non-package projects.

library(usethis)
edit_r_environ()
☐ Edit '/Users/plawson/.Renviron'.
☐ Restart R for changes to take effect.

An editor window will open, and you can add the following (replacing "DEMO_KEY" with your own API key, if you registered for one)

API_KEY = "DEMO_KEY"

Now you can load your API key from your .Renviron file by first restarting R, and then:

api_key <- Sys.getenv("API_KEY")
nasa_base_url <- 'https://api.nasa.gov/'

req <- request(nasa_base_url) |>
  req_url_path_append('insight_weather/') |>
  req_url_query(feedtype = "json", ver = "1.0", api_key = api_key) |>
  req_perform()
  
req
<httr2_response>
GET https://api.nasa.gov/insight_weather/?feedtype=json&ver=1.0&api_key=DEMO_KEY
Status: 200 OK
Content-Type: application/json
Body: In memory (36505 bytes)