Skip to contents

Intro

A good place to start is by browsing datasets on opendata.scot. You can filter and search to find datasets of interests. The openscotland R package can then download and save the datasets you are interested in. This helps to quickly start analysis without needing to work out how to save, organise and import data in R.

The opendatascotland package has a basic search function for finding available datasets and viewing their metadata. Use ods_search() function to view metadata for all datasets or filter by matching search terms in the dataset’s title.

library(opendatascotland)

# View all available datasets and associated metadata
all_datasets <- ods_search()

# Search dataset titles containing matching terms (case insensitive)
single_query <- ods_search("Number of bikes")

# Search multiple terms
multi_query <- ods_search(c("Bins", "Number of bikes"))
#> The cached list of datasets from opendata.scot was last downloaded on 2023-10-09
head(multi_query)
#> # A tibble: 6 × 11
#>   unique_id            title organization notes category url   resources licence
#>   <chr>                <chr> <chr>        <chr> <list>   <chr> <list>    <chr>  
#> 1 Litter_Bins_Aberdee… Litt… Aberdeen Ci… "<di… <chr>    /dat… <df>      UK Ope…
#> 2 Communal_Bins_City_… Comm… City of Edi… "<p>… <chr>    /dat… <df>      UK Ope…
#> 3 Grit_Bins_City_of_E… Grit… City of Edi… "<p>… <chr>    /dat… <df>      UK Ope…
#> 4 Salt_Bins_Dumfries_… Salt… Dumfries an… "<p>… <chr>    /dat… <df>      UK Ope…
#> 5 Roads_-_winter_main… Road… Stirling Co… "<p>… <chr>    /dat… <df>      UK Ope…
#> 6 Number_of_bikes_ava… Numb… Cycling Sco… "<p>… <chr>    /dat… <df>      UK Ope…
#> # ℹ 3 more variables: date_created <chr>, date_updated <chr>, org_type <chr>

Note, search term is case-insensitive but word order must be correct (there is no ‘fuzzy’ matching).

Download

Currently, only datasets available in .csv, json or .geojson can be downloaded. These formats cover the majority of data available. You will be warned if a dataset can’t be downloaded.

To download data, you can either download the metadata using ods_search().

query <- ods_search("bins")
#> Note, the cached list of datasets from opendata.scot was last
#> downloaded on 2023-11-19

Then pass that data frame to ods_get().

data <- ods_get(query)

Or use the search argument in ods_get(search="my search term") to search and download any matching datasets in one step.

data <- ods_get(search = "bins")

By default, you will be asked if you want to save the data locally on the first download. Optionally, you can refresh the data or avoid being asked to save data by setting the refresh or ask arguments.

data <- ods_get(search = "Number of bikes", refresh = TRUE, ask = FALSE)

Plot

Once you have downloaded the dataset(s), you may wish to plot or map the data. Here’s a short example of how plotting can be done.

data <- ods_get(search = "Recycling Point Locations")

The ods_get() function returned a named list of data frames - lets select the one we want by name:

recycling_points <- data$Recycling_Points_Aberdeen_City_Council

Or alternatively select the data frame using index/position of the name in the list using [[]] notation.

names(data)
#> [1] "Glass_and_textiles_recycling_points_Aberdeenshire_Council"
#> [2] "Recycling_Points_Aberdeen_City_Council"                   
#> [3] "Recycling_Points_Moray_Council"
recycling_points <- data[[2]]

Geojson datasets are automating converted to simple feature ‘sf’ data. As we can see in the example the data frame is classed as “sf” which means spatial / geometry attributes are baked in.

class(recycling_points)
#> [1] "sf"         "data.frame"

You can see a geometry variable which contains the spatial co-ordinates.

head(recycling_points, 3)
#> Simple feature collection with 3 features and 7 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: -2.173564 ymin: 57.13422 xmax: -2.070197 ymax: 57.20478
#> Geodetic CRS:  WGS 84
#>   OBJECTID                            SITE
#> 1       47                      Asda, Dyce
#> 2       48                Balnagask Circle
#> 3       49 Bridge of Don, Park & Ride AECC
#>                                      ADDRESS MIXED_YN TEXTILE_YN BOOKS_YN
#> 1            Riverview Drive, Dyce. AB21 7NG        Y          Y        N
#> 2                   Grampian Court, AB11 8TY        N          Y        N
#> 3 Exhibition Avenue, Bridge of Don. AB23 8BL        Y          Y        N
#>           UPRN                   geometry
#> 1 009051155534 POINT (-2.173564 57.20478)
#> 2 009051155535 POINT (-2.070197 57.13422)
#> 3 009051155536 POINT (-2.089188 57.18388)

This allows the plot() function to automatically plot the coordinates.

plot(recycling_points$geometry, col = as.factor(recycling_points$TEXTILE_YN))

JSON

Some datasets are only available in JSON or CSV formats, unlike formats such as GeoJSON, these do not automatically have spatial geometry. For example, the Glasgow cycling counts data does have latitude and longitude variables but is downloaded as JSON and therefore does not have spatial geometry added by default. This type of data is downloaded as a data frame without a geometry column.

Let’s download the Glasgow cycle counts data which is only available in JSON format.

cycle_count <- ods_get(search = "lasgow City Council - Daily cycling counts from automatic cycling counters")
cycle_count <- cycle_count[[1]]
head(cycle_count, 4)
#> # A tibble: 4 × 10
#>   provider      area  siteID location latitude longitude startTime endTime count
#>   <chr>         <chr> <chr>  <chr>       <dbl>     <dbl> <chr>     <chr>   <int>
#> 1 Glasgow City… Glas… GLG01… CP St V…     55.9     -4.28 2016-08-… 2016-0…     0
#> 2 Glasgow City… Glas… GLG00… Archerh…     55.9     -4.35 2016-08-… 2016-0…     0
#> 3 Glasgow City… Glas… GLG00… Glasgow…     55.9     -4.26 2016-08-… 2016-0…     0
#> 4 Glasgow City… Glas… GLG00… Parnie …     55.9     -4.24 2016-08-… 2016-0…     0
#> # ℹ 1 more variable: usmart_id <chr>

We can see the cycle count JSON has been converted into a ‘flat’ tabulated data frame. This data frame can now be plotted as a chart, in the following example, let’s use the ggplot2 plotting library to create a graph. We’ll display counts over time for each location.

library(ggplot2)
library(scales)
# Convert character to date (to display date time correctly)
cycle_count$Date <- as.Date(substr(cycle_count$startTime, 1, 10))
cycle_count <- cycle_count[cycle_count$Date > as.Date("2022-12-31"), ]

# Plot
ggplot(cycle_count, aes(x = Date, y = count, colour = location)) +
  geom_line() +
  theme_minimal() +
  theme(legend.position = "none") +
  scale_x_date(date_labels = "%b-%Y") +
  scale_y_continuous(labels = comma)

This is a bit noisy so lets sum the counts for all locations to see total cycle counts.

library(ggplot2)
library(dplyr)
library(magrittr)

# Group by date and sum the count
sum_cycle_count <- cycle_count %>%
  group_by(Date) %>%
  summarise(`Total Count` = sum(count))
#  Graph the new `sum_count` variable
ggplot(sum_cycle_count, aes(x = Date, y = `Total Count`)) +
  geom_line(colour = "blue") +
  theme_minimal() +
  scale_x_date(date_labels = "%b-%Y") +
  scale_y_continuous(labels = comma)

We can also use the latitude and longitude columns to plot the points on a map.

library(sf)
#> Linking to GEOS 3.10.2, GDAL 3.4.1, PROJ 8.2.1; sf_use_s2() is TRUE
# Convert data frame into 'sf' geospatial dataframe
cycle_counts_geo <- st_as_sf(cycle_count, coords = c("longitude", "latitude"))
# Sum counts by location
cycle_counts_geo_sum <- cycle_counts_geo %>%
  group_by(location) %>%
  summarise(`Total Cycle Count` = sum(count))
# plot
c <- ggplot(cycle_counts_geo_sum) +
  geom_sf(aes(size = `Total Cycle Count`), colour = "blue", fill = NA) +
  theme_minimal()
c

Can you guess which points in Glasgow are most popular for cyclists? We can add a base layer map to reveal the streets.

library(ggmap)
library(purrr)

# Create bounding box to download background street map layer
bbox_glasgow <- map_dbl(st_bbox(cycle_counts_geo_sum), 1)
names(bbox_glasgow) <- c("left", "bottom", "right", "top")

# Download background layer
glasgow <- suppressMessages(ggmap(get_stamenmap(bbox_glasgow,
  zoom = 10,
  https = TRUE,
  messaging = TRUE
)))

# Plot background with cycle counts on top
cycle_map <- glasgow + geom_point(
  data = cycle_counts_geo_sum,
  aes(
    x = unlist(map(geometry, 1)),
    y = unlist(map(geometry, 2)),
    size = `Total Cycle Count`
  ),
  col = "green",
  pch = 16,
  alpha = 0.4
) +
  theme_minimal() +
  theme(
    axis.title = element_blank(),
    axis.text = element_blank()
  )


cycle_map

Cycle Trips Map

#> 
#> Attaching package: 'purrr'
#> The following object is masked from 'package:magrittr':
#> 
#>     set_names
#> The following object is masked from 'package:scales':
#> 
#>     discard

Looks like Victoria Bridge area is a hot spot for cyclists.