This document reproduces the rehabilitation center catchment basin example using services that require API keys. It is therefore a more advanced example that requires some setup by the user, specifically creating and installing access tokens and API keys, as discussed below.
The document has been created using the author’s keys and access tokens to produce screenshots and fetch some data and all code is included. However no keys are distributed with the document. Users will therefore need to obtain their own keys in order to experiment with the interactive options provided by mapdeck
and the various Google services.
The googleway
library provides access to Google Maps API, both for creating maps, and using the mapping services such as geocoding, direction and route finding, and searching for places.
The mapdeck
library allows you to plot large amounts of data on a Mapbox map through the Deck.gl
javascript library developed by Uber.
To use any of Google Map’s services you will need an API key, and to plot a Mapbox map you will need an access token
Replace “GOOGLE_MAP_KEY”, “GOOGLE_API_KEY”, “MAPBOX_TOKEN” with the respective tokens. The tokens are long collections of random characters and numbers and the should be surrounded by quotes.
set_key( "GOOGLE_MAP_KEY", api = "map") ## for Google Maps
set_key( "GOOGLE_API_KEY") ## for other Google services
set_token( "MAPBOX_TOKEN" )
In this section we are calculating the times and distances to the rehab centres from random addresses using Google’s API (using the googleway
library), and plotting the maps using the mapdeck
library.
The random address list, demographic data, and calculations used to determine case load per rehab centre are taken from section 11 of the Rehab Catchment Basement section.
The geocoding API will take as input an address, or search term, and return various pieces of information. For example, here we are querying the API for the three rehab centre addresses.
rehab_addresses <- c(
DandenongHospital = "Dandenong Hospital, Dandenong VIC 3175, Australia",
CaseyHospital = "62-70 Kangan Dr, Berwick VIC 3806, Australia",
KingstonHospital = "The Kingston Centre, Heatherton VIC 3202, Australia"
)
RehabLocations <- lapply(rehab_addresses, googleway::google_geocode)
From the results we can extract various pieces of information, such as the formatted address, the type of place, and the coordinates.
lapply( RehabLocations, googleway::geocode_address )
## $DandenongHospital
## [1] "135 David St, Dandenong VIC 3175, Australia"
##
## $CaseyHospital
## [1] "62-70 Kangan Dr, Berwick VIC 3806, Australia"
##
## $KingstonHospital
## [1] "400 Warrigal Rd, Cheltenham VIC 3192, Australia"
lapply( RehabLocations, googleway::geocode_address_components )
## $DandenongHospital
## long_name short_name
## 1 135 135
## 2 David Street David St
## 3 Dandenong Dandenong
## 4 Greater Dandenong City Greater Dandenong
## 5 Victoria VIC
## 6 Australia AU
## 7 3175 3175
## types
## 1 street_number
## 2 route
## 3 locality, political
## 4 administrative_area_level_2, political
## 5 administrative_area_level_1, political
## 6 country, political
## 7 postal_code
##
## $CaseyHospital
## long_name short_name types
## 1 62-70 62-70 street_number
## 2 Kangan Drive Kangan Dr route
## 3 Berwick Berwick locality, political
## 4 Casey City Casey administrative_area_level_2, political
## 5 Victoria VIC administrative_area_level_1, political
## 6 Australia AU country, political
## 7 3806 3806 postal_code
##
## $KingstonHospital
## long_name short_name types
## 1 400 400 street_number
## 2 Warrigal Road Warrigal Rd route
## 3 Cheltenham Cheltenham locality, political
## 4 Kingston City Kingston administrative_area_level_2, political
## 5 Victoria VIC administrative_area_level_1, political
## 6 Australia AU country, political
## 7 3192 3192 postal_code
lapply( RehabLocations, googleway::geocode_coordinates )
## $DandenongHospital
## lat lng
## 1 -37.97636 145.2186
##
## $CaseyHospital
## lat lng
## 1 -38.04526 145.3458
##
## $KingstonHospital
## lat lng
## 1 -37.95475 145.0767
We have 56,000 random addresses, and 3 rehab centres. This makes 168,000 direction queries.
Google charges 0.005 USD per query, but gives you $200 credit each month.
Therefore, we get 40,000 queries credited per month.
To stay within limits, and have enough spare, I’m going to sample 50 of those random addresses per postcode, and query the API for the directions from those addresses to each rehab centres.
The first step is to format the results of the geocoding into an easily usable structure
## formatting the geocoded rehab centres into a data.frame
lst <- lapply( RehabLocations, function(x) {
coords <- googleway::geocode_coordinates(x)
data.frame(
formatted_address = googleway::geocode_address( x ),
lat = coords[["lat"]],
lon = coords[["lng"]]
)
})
df_rehab <- do.call(rbind, lst)
rm(lst)
knitr::kable( df_rehab )
formatted_address | lat | lon | |
---|---|---|---|
DandenongHospital | 135 David St, Dandenong VIC 3175, Australia | -37.97636 | 145.2186 |
CaseyHospital | 62-70 Kangan Dr, Berwick VIC 3806, Australia | -38.04526 | 145.3458 |
KingstonHospital | 400 Warrigal Rd, Cheltenham VIC 3192, Australia | -37.95475 | 145.0767 |
To find the directions we need to query one route at a time. We can do this in an lapply
.
As this takes a while we are saving the results of each query to file at each iteration. Each result is in JSON, so we are saving the raw JSON.
# ## TODO - get randomaddresses from RehabCatchment
# ## need coordinates for googleway
df_random <- as.data.frame( randomaddresses )
df_random[, c("lon", "lat")] <- sf::st_coordinates( randomaddresses )
set.seed(12345)
df_sample <- df_random %>%
dplyr::group_by(POSTCODE) %>%
dplyr::sample_n(size = 50)
df_sample$sample_row <- 1:nrow(df_sample)
saveRDS(df_sample, here::here("data/googleway/directions_queries/df_sample2.rds"))
res1 <- lapply(1:nrow(df_sample), function(x) {
js <- googleway::google_directions(
origin = as.numeric( df_sample[x, c("lat", "lon")] )
, destination = as.numeric( df_rehab[1, c("lat","lon")] )
, simplify = F
)
js <- paste0(js, collapse = "")
js <- gsub(" ","",js)
fn <- here::here(paste0("data/googleway/directions_queries/rehab1_sample/",x,"_results.json"))
f <- file(fn)
writeLines(js, f)
close(f)
return( js )
})
res2 <- lapply(1:nrow(df_sample), function(x) {
js <- googleway::google_directions(
origin = as.numeric( df_sample[x, c("lat", "lon")] )
, destination = as.numeric( df_rehab[2, c("lat","lon")] )
, simplify = F
)
js <- paste0(js, collapse = "")
js <- gsub(" ","",js)
fn <- here::here(paste0("data/googleway/directions_queries/rehab2_sample/",x,"_results.json")
f <- file(fn)
writeLines(js, f)
close(f)
return( js )
})
res3 <- lapply(1:nrow(df_sample), function(x) {
js <- googleway::google_directions(
origin = as.numeric( df_sample[x, c("lat", "lon")] )
, destination = as.numeric( df_rehab[3, c("lat","lon")] )
, simplify = F
)
js <- paste0(js, collapse = "")
js <- gsub(" ","",js)
fn <- here::here(paste0("data/googleway/directions_queries/rehab3_sample/",x,"_results.json"))
f <- file(fn)
writeLines(js, f)
close(f)
return( js )
})
Now we have all the results of the directions query saved we can read them back into the R session.
The route geometry returned from the API comes in the form of an encoded polyline.
substr( googleway::direction_polyline( directions1[[1]][["result"]] ), 1, 100 )
## [1] "~fbfFuxftZnBq[~Bk`@l@kKkY_DmH{@wN{AeI}@WFg@Ky@O}Dg@]Eq@IgAEw@IgAGg@@aA@b@yBx@sDr@{Bb@iAd@y@r@[~@aBt@"
We can convert this back into coordinates using the googlePolylines
library, for example
pl <- googleway::direction_polyline( directions1[[1]][["result"]] )
head( googlePolylines::decode( pl )[[1]] )
## lat lon
## 1 -37.86368 145.0383
## 2 -37.86424 145.0429
## 3 -37.86488 145.0483
## 4 -37.86511 145.0502
## 5 -37.86089 145.0510
## 6 -37.85938 145.0513
However, googleway
and mapdeck
both support plotting these polylines directly, so we can avoid this step and just use the polylines. We also get the distance and time on the routes in the API query results.
is_valid <- function( direction ) {
direction[["result"]][["status"]] == "OK"
}
get_distance <- function(x) {
ifelse(
is_valid( x )
, googleway::direction_legs(x[["result"]])[["distance"]][["value"]]
, NA_real_
)
}
get_duration <- function(x) {
ifelse(
is_valid( x )
, googleway::direction_legs(x[["result"]])[["duration"]][["value"]]
, NA_real_
)
}
get_polyline <- function(x) {
ifelse(
is_valid( x )
, googleway::direction_polyline( x[["result"]] )
, NA_character_
)
}
format_directions <- function( d, df_sample ) {
secs <- sapply( d, get_duration )
dist <- sapply( d, get_distance )
sample_row <- sapply( d, function(x) x[["sample_row"]] )
street <- df_sample[ sample_row, ]$STREET_NAME
postcode <- df_sample[ sample_row, ]$POSTCODE
polylines <- sapply( d, get_polyline )
data.frame(
id = sample_row
, street = street
, POSTCODE = postcode ## capitalised because we join on it later
, polyline = polylines
, distance_m = dist
, duration_s = secs
, duration_m = round( secs / 60, 2 )
, stringsAsFactors = FALSE
)
}
df_directions1 <- format_directions( directions1, df_sample )
## Warning: Detecting old grouped_df format, replacing `vars` attribute by
## `groups`
df_directions2 <- format_directions( directions2, df_sample )
df_directions3 <- format_directions( directions3, df_sample )
The is_valid()
function is used to test the result of the API query. Sometimes the API will return an “ACCESS_DENIED” or “OVER_QUERY_LIMIT” response if it wasn’t able to return any data.
Now we have three data.frames, each containing the directions, distance, times and route for 2,800 random addresses to the rehab centres.
mapdeck(
style = mapdeck_style("light")
, location = c(145, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab[1, ]
, lon = "lon", lat = "lat"
, radius = 500
, update_view = F
) %>%
add_path(
data = df_directions1[ !is.na( df_directions1$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration (minutes) ")
, update_view = F
, palette = "plasma"
)
mapdeck(
style = mapdeck_style("light")
, location = c(145, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab[2, ]
, lon = "lon", lat = "lat"
, radius = 500
, update_view = F
) %>%
add_path(
data = df_directions2[ !is.na( df_directions2$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration (minutes) ")
, update_view = F
, palette = "plasma"
)
mapdeck(
style = mapdeck_style("light")
, location = c(145, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab[3, ]
, lon = "lon", lat = "lat"
, radius = 500
, update_view = F
) %>%
add_path(
data = df_directions3[ !is.na( df_directions3$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration (minutes) ")
, update_view = F
, palette = "plasma"
)
mapdeck(
style = mapdeck_style("light")
, location = c(145, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab
, lon = "lon", lat = "lat"
, radius = 500
, update_view = F
) %>%
add_path(
data = df_directions1[ !is.na( df_directions1$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration to Dandenong (mins) ")
, update_view = F
, palette = "plasma"
, layer_id = "rehab1"
) %>%
add_path(
data = df_directions2[ !is.na( df_directions2$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration to Casey (mins) ")
, update_view = F
, palette = "plasma"
, layer_id = "rehab2"
) %>%
add_path(
data = df_directions3[ !is.na( df_directions3$polyline ), ]
, polyline = "polyline"
, stroke_colour = "duration_m"
, stroke_opacity = 100
, stroke_width = 35
, legend = T
, legend_options = list( title = "Duration to Kingston (mins) ")
, update_view = F
, palette = "plasma"
, layer_id = "rehab3e"
)
We can extract the step-by-step guide, including times and locations, to get the ‘time to destination’ for each step.
ttd <- googleway::direction_steps( directions1[[1]][["result"]] )
total_time <- get_duration( directions1[[1]] )
## calculate time remaining at every step
ttd_duration <- ttd$duration
ttd_duration$total_time <- cumsum( ttd_duration$value )
ttd_duration$time_remaining <- total_time - ttd_duration$total_time
## get the coordinates for every step
ttd_duration <- cbind( ttd$start_location, ttd_duration )
ttd_duration
## lat lng text value total_time time_remaining
## 1 -37.86368 145.0383 2mins 118 118 1401
## 2 -37.86511 145.0502 4mins 211 329 1190
## 3 -37.85174 145.0526 13mins 801 1130 389
## 4 -37.95693 145.2215 1min 23 1153 366
## 5 -37.95783 145.2251 1min 15 1168 351
## 6 -37.95770 145.2258 3mins 171 1339 180
## 7 -37.97437 145.2229 1min 72 1411 108
## 8 -37.97366 145.2167 1min 27 1438 81
## 9 -37.97547 145.2163 1min 81 1519 0
Doing this for each of the 8,400 directions gives us the time to the rehab centres for many points on the route, not just the origin.
time_to_destination <- function( direction ) {
ttd <- googleway::direction_steps( direction[["result"]] )
total_time <- get_duration( direction )
## calculate time remaining at every step
ttd_duration <- ttd$duration
ttd_duration$total_time <- cumsum( ttd_duration$value )
ttd_duration$time_remaining <- total_time - ttd_duration$total_time
## get the coordinates for every step
ttd_duration <- cbind( ttd$start_location, ttd_duration )
ttd_duration$sample_row <- direction[["sample_row"]]
if( inherits( ttd_duration, "data.frame" ) > 0 ) {
ttd_duration$sample_row_pt <- 1:nrow( ttd_duration )
}
ttd_duration
}
lst_times <- lapply( directions1, time_to_destination )
df_times1 <- do.call(rbind, lst_times)
df_times1$rehab <- row.names(df_rehab[1, ])
lst_times <- lapply( directions2, time_to_destination )
df_times2 <- do.call(rbind, lst_times)
df_times2$rehab <- row.names(df_rehab[2, ])
lst_times <- lapply( directions3, time_to_destination )
df_times3 <- do.call(rbind, lst_times)
df_times3$rehab <- row.names(df_rehab[3, ])
This now gives us 77,337 random addresses to each of the three rehab centres.
This plot shows the average times to 135 David St, Dandenong VIC 3175, Australia for 25779 addresses. The height and colour of hexagons represent the average time to the rehab centre.
mapdeck() %>%
add_hexagon(
data = df_times1
, lon = "lng", lat = "lat"
, elevation = "time_remaining"
, elevation_function = "average"
, colour = "time_remaining"
, colour_function = "average"
, elevation_scale = 10
, radius = 250
, colour_range = colourvalues::colour_values(1:6, palette = "plasma")
, layer_id = "times1"
)
While the Directions API gives us a lot of information, it only handles one direction at a time. If you want more (quantity and frequency), but less detailed results, you can query the distance API.
In this example we are querying the distances for the first ten random addresses.
distance_matrix <- googleway::google_distance(
origin = df_sample[1:10, c("lat", "lon")]
, destination = googleway::geocode_address( RehabLocations[[1]] )
)
distance_matrix
## $destination_addresses
## [1] "135 David St, Dandenong VIC 3175, Australia"
##
## $origin_addresses
## [1] "4A Carnarvon Rd, Caulfield North VIC 3161, Australia"
## [2] "375 State Route 22, Caulfield VIC 3162, Australia"
## [3] "12 Oak Grove, Malvern East VIC 3145, Australia"
## [4] "47 Hawthorn Rd, Caulfield North VIC 3161, Australia"
## [5] "68 Balaclava Rd, Caulfield North VIC 3161, Australia"
## [6] "4 Oulton St, Caulfield North VIC 3161, Australia"
## [7] "93 Balaclava Rd, Caulfield North VIC 3161, Australia"
## [8] "55 Wanda Rd, Caulfield North VIC 3161, Australia"
## [9] "679 Inkerman Rd, Caulfield North VIC 3161, Australia"
## [10] "11 Bond St, Caulfield North VIC 3145, Australia"
##
## $rows
## elements
## 1 24.1 km, 24114, 27 mins, 1640, OK
## 2 24.7 km, 24686, 29 mins, 1764, OK
## 3 21.2 km, 21155, 22 mins, 1336, OK
## 4 24.6 km, 24635, 28 mins, 1702, OK
## 5 25.8 km, 25787, 31 mins, 1879, OK
## 6 24.6 km, 24555, 31 mins, 1844, OK
## 7 25.5 km, 25482, 31 mins, 1847, OK
## 8 25.0 km, 24963, 30 mins, 1771, OK
## 9 24.2 km, 24167, 28 mins, 1692, OK
## 10 22.9 km, 22922, 27 mins, 1639, OK
##
## $status
## [1] "OK"
Both the distance and directions API allow you to specify the departure_time
, so it’s possible to query the time & distance for a given time of day.
From Google’s API results we have the driving time to each of the rehab centres for the random addresses. So we can get the nearest rehab centre for each address and view them with mapdeck
df_directions1$rehab <- row.names(df_rehab)[1]
df_directions2$rehab <- row.names(df_rehab)[2]
df_directions3$rehab <- row.names(df_rehab)[3]
df_directions1$rehab_address <- df_rehab[1, "formatted_address"]
df_directions2$rehab_address <- df_rehab[2, "formatted_address"]
df_directions3$rehab_address <- df_rehab[3, "formatted_address"]
df_directions <- rbind(
df_directions1
, df_directions2
, df_directions3
)
df_nearest <- df_directions %>%
dplyr::group_by(id) %>%
dplyr::arrange(duration_m) %>%
dplyr::slice(1) %>%
dplyr::ungroup()
Here we see which rehab centre each address goes to
mapdeck(
style = mapdeck_style("light")
, location = c(145.1, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab
, lon = "lon", lat = "lat"
, radius = 600
) %>%
add_path(
data = df_nearest
, polyline = "polyline"
, stroke_colour = "rehab"
, legend = T
, legend_options = list(css = "max-width: 300px")
, update_view = F
)
This map shows the travel times to the nearest rehab centre
mapdeck(
style = mapdeck_style("light")
, location = c(145.1, -37.9)
, zoom = 10
) %>%
add_scatterplot(
data = df_rehab
, lon = "lon", lat = "lat"
, radius = 600
) %>%
add_path(
data = df_nearest
, polyline = "polyline"
, stroke_colour = "duration_m"
, legend = T
, legend_options = list(css = "max-width: 300px")
, update_view = F
)
We can represent the times to the nearest rehab centre as hexagons, where the height and colour represents the travel time.
df_times <- rbind(
df_times1
, df_times2
, df_times3
)
df_nearest_times <- df_times %>%
dplyr::group_by(sample_row, sample_row_pt) %>%
dplyr::arrange(time_remaining) %>%
dplyr::slice(1) %>%
dplyr::ungroup()
head( df_nearest_times )
mapdeck() %>%
add_hexagon(
data = df_nearest_times
, lon = "lng", lat = "lat"
, elevation = "time_remaining"
, elevation_function = "average"
, colour = "time_remaining"
, colour_function = "average"
, elevation_scale = 10
, radius = 250
, colour_range = colourvalues::colour_values(1:6, palette = "plasma")
, layer_id = "times1"
)
postcodes <- df_nearest %>%
dplyr::group_by(POSTCODE, rehab) %>%
dplyr::summarise( n = n() )
knitr::kable( head( postcodes ) )
POSTCODE | rehab | n |
---|---|---|
3144 | KingstonHospital | 50 |
3145 | KingstonHospital | 50 |
3146 | KingstonHospital | 50 |
3147 | DandenongHospital | 8 |
3147 | KingstonHospital | 42 |
3148 | DandenongHospital | 9 |
postcodes %>%
dplyr::group_by(rehab) %>%
dplyr::summarise( total = sum(n)) %>%
dplyr::mutate( percent = 100 * total / sum( total )) %>%
knitr::kable( digits = 2)
rehab | total | percent |
---|---|---|
CaseyHospital | 528 | 18.86 |
DandenongHospital | 867 | 30.96 |
KingstonHospital | 1405 | 50.18 |
knitr::kable( df_incidence )
age | incidence |
---|---|
0-4 | 0 |
15-24 | 5 |
25-34 | 30 |
35-44 | 44 |
45-54 | 111 |
55-64 | 299 |
65-74 | 747 |
75-84 | 1928 |
85+ | 3976 |
s <- 1 / 100000 # rate per 100,000
basicDemographicsRehab <- basicDemographicsRehab %>%
mutate (stroke_cases = s * ((Age_15_19_yr_P + Age_20_24_yr_P) * 5 +
Age_25_34_yr_P * 30 +
Age_35_44_yr_P * 44 +
Age_45_54_yr_P * 111 +
Age_55_64_yr_P * 299 +
Age_65_74_yr_P * 747 +
Age_75_84_yr_P * 1928 +
Age_85ov_P * 3976)) %>%
select (-c (contains ("_yr_"), contains ("85ov")))
basicDemographicsRehab <- rename (basicDemographicsRehab, POSTCODE = Postcode)
postcodes <- left_join (postcodes, basicDemographicsRehab, by = "POSTCODE")
postcodesamples <- postcodes %>%
dplyr::group_by( POSTCODE ) %>%
summarise( totalsamples = sum( n ) )
postcodes <- left_join( postcodes, postcodesamples, by = "POSTCODE")
postcodes %>%
group_by (rehab) %>%
summarise (total = sum (stroke_cases * n/totalsamples, na.rm = TRUE )) %>%
mutate (percent = 100 * total / sum (total)) %>%
knitr::kable (digits = c (2, 0))
rehab | total | percent |
---|---|---|
CaseyHospital | 279 | 10.28 |
DandenongHospital | 975 | 35.99 |
KingstonHospital | 1456 | 53.73 |