Basic Example
basic-usage.Rmd
Intro
A good place to start is by browsing datasets on opendata.scot. You can filter and
search to find datasets of interests. The openscotland
R
package can then download and save the datasets you are interested in.
This helps to quickly start analysis without needing to work out how to
save, organise and import data in R.
Search
The opendatascotland
package has a basic search function
for finding available datasets and viewing their metadata. Use
ods_search()
function to view metadata for all datasets or
filter by matching search terms in the dataset’s title.
library(opendatascotland)
# View all available datasets and associated metadata
all_datasets <- ods_search()
# Search dataset titles containing matching terms (case insensitive)
single_query <- ods_search("Number of bikes")
# Search multiple terms
multi_query <- ods_search(c("Bins", "Number of bikes"))
#> The cached list of datasets from opendata.scot was last downloaded on 2023-10-09
head(multi_query)
#> # A tibble: 6 × 11
#> unique_id title organization notes category url resources licence
#> <chr> <chr> <chr> <chr> <list> <chr> <list> <chr>
#> 1 Litter_Bins_Aberdee… Litt… Aberdeen Ci… "<di… <chr> /dat… <df> UK Ope…
#> 2 Communal_Bins_City_… Comm… City of Edi… "<p>… <chr> /dat… <df> UK Ope…
#> 3 Grit_Bins_City_of_E… Grit… City of Edi… "<p>… <chr> /dat… <df> UK Ope…
#> 4 Salt_Bins_Dumfries_… Salt… Dumfries an… "<p>… <chr> /dat… <df> UK Ope…
#> 5 Roads_-_winter_main… Road… Stirling Co… "<p>… <chr> /dat… <df> UK Ope…
#> 6 Number_of_bikes_ava… Numb… Cycling Sco… "<p>… <chr> /dat… <df> UK Ope…
#> # ℹ 3 more variables: date_created <chr>, date_updated <chr>, org_type <chr>
Note, search term is case-insensitive but word order must be correct (there is no ‘fuzzy’ matching).
Download
Currently, only datasets available in .csv
,
json
or .geojson
can be downloaded. These
formats cover the majority of data available. You will be warned if a
dataset can’t be downloaded.
To download data, you can either download the metadata using
ods_search()
.
query <- ods_search("bins")
#> Note, the cached list of datasets from opendata.scot was last
#> downloaded on 2023-11-19
Then pass that data frame to ods_get()
.
data <- ods_get(query)
Or use the search
argument in
ods_get(search="my search term")
to search and download any
matching datasets in one step.
data <- ods_get(search = "bins")
By default, you will be asked if you want to save the data locally on
the first download. Optionally, you can refresh the data or avoid being
asked to save data by setting the refresh
or
ask
arguments.
data <- ods_get(search = "Number of bikes", refresh = TRUE, ask = FALSE)
Plot
Once you have downloaded the dataset(s), you may wish to plot or map the data. Here’s a short example of how plotting can be done.
data <- ods_get(search = "Recycling Point Locations")
The ods_get()
function returned a named list of data
frames - lets select the one we want by name:
recycling_points <- data$Recycling_Points_Aberdeen_City_Council
Or alternatively select the data frame using index/position of the
name in the list using [[]]
notation.
names(data)
#> [1] "Glass_and_textiles_recycling_points_Aberdeenshire_Council"
#> [2] "Recycling_Points_Aberdeen_City_Council"
#> [3] "Recycling_Points_Moray_Council"
recycling_points <- data[[2]]
Geojson datasets are automating converted to simple feature ‘sf’ data. As we can see in the example the data frame is classed as “sf” which means spatial / geometry attributes are baked in.
class(recycling_points)
#> [1] "sf" "data.frame"
You can see a geometry
variable which contains the
spatial co-ordinates.
head(recycling_points, 3)
#> Simple feature collection with 3 features and 7 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: -2.173564 ymin: 57.13422 xmax: -2.070197 ymax: 57.20478
#> Geodetic CRS: WGS 84
#> OBJECTID SITE
#> 1 47 Asda, Dyce
#> 2 48 Balnagask Circle
#> 3 49 Bridge of Don, Park & Ride AECC
#> ADDRESS MIXED_YN TEXTILE_YN BOOKS_YN
#> 1 Riverview Drive, Dyce. AB21 7NG Y Y N
#> 2 Grampian Court, AB11 8TY N Y N
#> 3 Exhibition Avenue, Bridge of Don. AB23 8BL Y Y N
#> UPRN geometry
#> 1 009051155534 POINT (-2.173564 57.20478)
#> 2 009051155535 POINT (-2.070197 57.13422)
#> 3 009051155536 POINT (-2.089188 57.18388)
This allows the plot()
function to automatically plot
the coordinates.
JSON
Some datasets are only available in JSON or CSV formats, unlike
formats such as GeoJSON, these do not automatically have spatial
geometry. For example, the Glasgow cycling counts data does have
latitude
and longitude
variables but is
downloaded as JSON and therefore does not have spatial geometry added by
default. This type of data is downloaded as a data frame without a
geometry
column.
Let’s download the Glasgow cycle counts data which is only available in JSON format.
cycle_count <- ods_get(search = "lasgow City Council - Daily cycling counts from automatic cycling counters")
cycle_count <- cycle_count[[1]]
head(cycle_count, 4)
#> # A tibble: 4 × 10
#> provider area siteID location latitude longitude startTime endTime count
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <int>
#> 1 Glasgow City… Glas… GLG01… CP St V… 55.9 -4.28 2016-08-… 2016-0… 0
#> 2 Glasgow City… Glas… GLG00… Archerh… 55.9 -4.35 2016-08-… 2016-0… 0
#> 3 Glasgow City… Glas… GLG00… Glasgow… 55.9 -4.26 2016-08-… 2016-0… 0
#> 4 Glasgow City… Glas… GLG00… Parnie … 55.9 -4.24 2016-08-… 2016-0… 0
#> # ℹ 1 more variable: usmart_id <chr>
We can see the cycle count JSON has been converted into a ‘flat’
tabulated data frame. This data frame can now be plotted as a chart, in
the following example, let’s use the ggplot2
plotting
library to create a graph. We’ll display counts over time for each
location.
library(ggplot2)
library(scales)
# Convert character to date (to display date time correctly)
cycle_count$Date <- as.Date(substr(cycle_count$startTime, 1, 10))
cycle_count <- cycle_count[cycle_count$Date > as.Date("2022-12-31"), ]
# Plot
ggplot(cycle_count, aes(x = Date, y = count, colour = location)) +
geom_line() +
theme_minimal() +
theme(legend.position = "none") +
scale_x_date(date_labels = "%b-%Y") +
scale_y_continuous(labels = comma)
This is a bit noisy so lets sum the counts for all locations to see total cycle counts.
library(ggplot2)
library(dplyr)
library(magrittr)
# Group by date and sum the count
sum_cycle_count <- cycle_count %>%
group_by(Date) %>%
summarise(`Total Count` = sum(count))
# Graph the new `sum_count` variable
ggplot(sum_cycle_count, aes(x = Date, y = `Total Count`)) +
geom_line(colour = "blue") +
theme_minimal() +
scale_x_date(date_labels = "%b-%Y") +
scale_y_continuous(labels = comma)
We can also use the latitude and longitude columns to plot the points on a map.
library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
# Convert data frame into 'sf' geospatial dataframe
cycle_counts_geo <- st_as_sf(cycle_count, coords = c("longitude", "latitude"))
# Sum counts by location
cycle_counts_geo_sum <- cycle_counts_geo %>%
group_by(location) %>%
summarise(`Total Cycle Count` = sum(count))
# plot
c <- ggplot(cycle_counts_geo_sum) +
geom_sf(aes(size = `Total Cycle Count`), colour = "blue", fill = NA) +
theme_minimal()
c
Can you guess which points in Glasgow are most popular for cyclists? We can add a base layer map to reveal the streets.
library(ggmap)
library(purrr)
# Create bounding box to download background street map layer
bbox_glasgow <- map_dbl(st_bbox(cycle_counts_geo_sum), 1)
names(bbox_glasgow) <- c("left", "bottom", "right", "top")
# Download background layer
glasgow <- suppressMessages(ggmap(get_stamenmap(bbox_glasgow,
zoom = 10,
https = TRUE,
messaging = TRUE
)))
# Plot background with cycle counts on top
cycle_map <- glasgow + geom_point(
data = cycle_counts_geo_sum,
aes(
x = unlist(map(geometry, 1)),
y = unlist(map(geometry, 2)),
size = `Total Cycle Count`
),
col = "green",
pch = 16,
alpha = 0.4
) +
theme_minimal() +
theme(
axis.title = element_blank(),
axis.text = element_blank()
)
cycle_map
#>
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#>
#> set_names
#> The following object is masked from 'package:scales':
#>
#> discard
Looks like Victoria Bridge area is a hot spot for cyclists.