class: center, middle, inverse, title-slide # Spatial data and the tidyverse ## 🌐
combining tidy tools for geocomputation with R ### Robin Lovelace, Jannes Menchow and Jakub Nowosad ### GeoStat 2018. Source code:
github.com/geocompr/geostats_18
--- <!-- msg: Looks like I'll have a second change to demonstrate this code: 55% of people in the poll wanted sea level rise (SLR) to be the example dataset for the dplyr/sf integration workshop tomorrow. Watch this space --> <!-- 14:15 - 15:00 - 45 minute talk --> <!-- Aims: show-off the book, provide overview, some useful things in it --> layout: true background-image: url(xaringan_stuff/img/r_geocomp_background.png) background-size: cover --- # 'Team geocompr' <figure> <img align="right" src="xaringan_stuff/img/geocompr_logo.png" width = "60%", height = "60%"/> </figure> - [Jakub Nowosad](https://nowosad.github.io/): developer of GeoPAT + more. Lecturing 09:00 on Wednesday. - [Jannes Muenchow](http://www.geographie.uni-jena.de/en/Muenchow.html), creator of **RQGIS**. Lecturing Weds 13:30 on GIS Bridges and Weds 15:30 on Spatial CV. -- - [Robin Lovelace](https://www.robinlovelace.net/), creator of **stplanr**, co-author of Efficent R Programming. Lecturing 11:00 Weds on spatial data and the **tidyverse**. -- - Front cover image competition! Submissions (code / ideas welcome) by Thursday evening. Prize: ~~$100~~ $150 CRC Press book voucher. --- # Aim This workshop will introduce you to working with spatial data 'in the tidyverse'. By this we mean handling spatial datasets using functions (such as ` %>% ` and `filter()`) and concepts (such as type stability) from R packages that are part of the metapackage **tidyverse**, which can now be installed from CRAN with the following command: ```r install.packages("tidyverse") ``` This functionality is possible thanks to **sf**, a recent package (first release in 2016) that implements the open standard data model *simple features*. Get **sf** with: ```r install.packages("sf") ``` The workshop also uses a dataset from the **spData** package, which can be installed with: ```r install.packages("spData") ``` For more on this see: [github.com/Robinlovelace/geocompr](https://github.com/Robinlovelace/geocompr). --- ## Context - Software for 'data science' is evolving - In R, packages **ggplot2** and **dplyr** have become immensely popular and now they are a part of the **tidyverse** - These packages use the 'tidy data' principles for consistency and speed of processing (from `vignette("tidy-data")`): > - Each variable forms a column. > - Each observation forms a row. > - Each type of observational unit forms a table - Historically spatial R packages have not been compatible with the **tidyverse** --- background-image: url("https://pbs.twimg.com/media/CvzEQcfWIAAIs-N.jpg") background-size: cover --- ## Enter sf - **sf** is a recently developed package for spatial (vector) data - Combines the functionality of three previous packages: **sp**, **rgeos** and **rgdal** - Has many advantages, including: - Faster data I/O - More geometry types supported - Compatibility with the *tidyverse* That's the topic of this workshop --- background-image: url("https://media1.giphy.com/media/Hw5LkPYy9yfVS/giphy.gif") ## Geocomputation with R - A book we are working on for CRC Press (to be published in 2018) - Chapters 3 ~~and 4~~ of this book form the basis of the worked examples presented here --- # Reproducibility + collaboration > Collaboration is most important aspect of science (and reprex the most important R package!) (Jakub Nowosad, 2018, GEOSTAT) <br> Slides: https://geocompr.github.io/presentations/ <br> Source code: https://github.com/geocompr/geostats_18 To install all packages used in the book: ```r devtools::install_github("geocompr/geocompkg") ``` ```r library(sf) ``` ``` ## Linking to GEOS 3.5.1, GDAL 2.2.2, proj.4 4.9.2 ``` ```r library(raster) ``` ``` ## Loading required package: sp ``` --- # Conflicting function names ```r library(tidyverse) ``` ``` ## ── Attaching packages ──────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ── ``` ``` ## ✔ ggplot2 3.0.0 ✔ purrr 0.2.5 ## ✔ tibble 1.4.2 ✔ dplyr 0.7.6 ## ✔ tidyr 0.8.1 ✔ stringr 1.3.1 ## ✔ readr 1.1.1 ✔ forcats 0.3.0 ``` ``` ## ── Conflicts ─────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ tidyr::extract() masks raster::extract() ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ✖ dplyr::select() masks raster::select() ``` --- # Detour: System commands / console -- - Option 1: use `system()` ```r system("ls code/", intern = TRUE) ``` ``` ## [1] "geocompr" "r_gis_bridges" "spatial_cv" ``` -- - Option 2: use *sh commands directly, e.g.: ```zsh ls code/ ``` ``` ## geocompr ## r_gis_bridges ## spatial_cv ``` --- # System commands are handy - Important step on path to programming - Will help your R programming career - Trick: shorten github URLs from the command-line: ```zsh curl -i https://git.io -F "url=https://github.com/geocompr/geostats_18/releases/download/0.1/data.zip" \ -F "code=spatial-tidyvers" # Date: Wed, 22 Aug 2018 04:09:48 GMT # Status: 201 Created # Content-Type: text/html;charset=utf-8 # Location: https://git.io/spatial-tidyvers ``` --- # Get the data Data for the sea level rise code in this presentation: see the releases in [geostats_18](https://github.com/geocompr/geostats_18/releases): ```r download.file("https://git.io/spatial-tidyvers", "data.zip") unzip("data.zip") file.rename("pres/geocompr/data/", "data/") prague = raster::raster("data/prague_elev.tif") ``` --- # Check it works ```r prague = raster::raster("data/prague_elev.tif") plot(prague) p = stplanr::geo_code("Pruhonice") %>% st_point() %>% st_sfc() plot(p, add = TRUE, cex = 5, lwd = 3) ``` ![](spatial-tidyverse_files/figure-html/unnamed-chunk-11-1.png)<!-- --> --- # Ready to go? > Confusion is good (Roger Bivand 2018, GEOSTAT/OpenGeoHub) -- ![](https://media.giphy.com/media/OMeGDxdAsMPzW/giphy.gif) --- ## Reading and writing spatial data ```r library(sf) library(spData) prague_sf = read_sf("data/prague.geojson") # same as: st_read("data/prague.geojson") ``` `write_sf()/st_write()` writes data `st_write(prague_sf, 'data/prague_sf.gpkg')`. See supported formats with: `sf::st_drivers()`. Details: Chapter 6 of our book: [geocompr.robinlovelace.net/read-write.html](https://geocompr.robinlovelace.net/read-write.html) --- ## Structure of the sf objects ```r prague_sf ``` ```r class(prague_sf) ``` ``` ## [1] "sf" "data.frame" ``` --- ## sf vs sp in the tidyverse - **sp** precedes **sf** - Together with the **rgdal** and **rgeos** package, it creates a powerful tool to works with spatial data - Many spatial R packages still depends on the **sp** package, therefore it is important to know how to convert **sp** to and from **sf** objects ```r library(spData) world_sp = as(world, "Spatial") world_sf = st_as_sf(world_sp) ``` - The structures in the **sp** packages are more complicated - `str(world_sf)` vs `str(world_sp)` -- - **sp** doesn't play well with the **tidyverse**: ```r world_sp %>% filter(name_long == "England") ``` `Error in UseMethod("filter_") : no applicable method for 'filter_' applied to an object of class "c('SpatialPolygonsDataFrame', 'SpatialPolygons', 'Spatial')"` --- ## Attribute operations: filtering ```r world %>% filter(name_long == "United Kingdom") ``` ``` ## Simple feature collection with 1 feature and 10 fields ## geometry type: MULTIPOLYGON ## dimension: XY ## bbox: xmin: -7.572168 ymin: 49.96 xmax: 1.681531 ymax: 58.635 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## iso_a2 name_long continent region_un subregion type ## 1 GB United Kingdom Europe Europe Northern Europe Country ## area_km2 pop lifeExp gdpPercap geom ## 1 249986.4 64613160 81.30488 38251.79 MULTIPOLYGON (((-6.197885 5... ``` -- Base R equivalent: ```r world[world$name_long == "United Kingdom", ] ``` -- Question: ```r identical( world %>% filter(name_long == "United Kingdom"), world[world$name_long == "United Kingdom", ] ) # ? ``` ``` ## [1] FALSE ``` # Detour: Row names - Usually don't matter but they can bite ```r u1 = world %>% filter(name_long == "United Kingdom") u2 = world[world$name_long == "United Kingdom", ] row.names(u2) = 1 identical(u1, u2) ``` ``` ## [1] FALSE ``` -- - Advanced challenge: how to make u1 and u2 identical? --- # Regex - What does each of these produce? ```r world %>% filter(grepl(pattern = "United", x = name_long)) world[grepl(pattern = "United", x = world$name_long)] grepl(pattern = "United", x = world$name_long) world %>% filter(grepl(pattern = "^U", x = name_long)) world %>% filter(grepl(pattern = "m$", x = name_long)) world %>% filter(grepl(pattern = "Un|om", x = name_long)) ``` --- ## Aggregation ```r world_cont = world %>% group_by(continent) %>% summarize(pop_sum = sum(pop, na.rm = TRUE)) ``` ``` ## Simple feature collection with 8 features and 2 fields ## geometry type: GEOMETRY ## dimension: XY ## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513 ## epsg (SRID): 4326 ## proj4string: +proj=longlat +datum=WGS84 +no_defs ## # A tibble: 8 x 3 ## continent pop_sum geom ## <fct> <dbl> <MULTIPOLYGON [°]> ## 1 Africa 1.15e9 (((49.54352 -12.46983, 49.80898 -12.89528, 50.05651… ## # ... with 7 more rows ``` - The `st_set_geometry` function can be used to remove the geometry column: ```r world_df = st_set_geometry(world_cont, NULL) class(world_df) ``` ``` ## [1] "tbl_df" "tbl" "data.frame" ``` --- ## Spatial operations It's a big topic which includes: - Spatial subsetting - Spatial joining/aggregation - Topological relations - Distances - Spatial geometry modification - Raster operations (map algebra) See [Chapter 4](http://geocompr.robinlovelace.net/spatial-operations.html) of *Geocomputation with R* --- ## Spatial subsetting ```r lnd_buff = lnd[1, ] %>% st_transform(crs = 27700) %>% # uk CRS st_buffer(500000) %>% st_transform(crs = 4326) near_lnd = world[lnd_buff, ] plot(near_lnd$geom) ``` ![](spatial-tidyverse_files/figure-html/unnamed-chunk-27-1.png)<!-- --> - What is going with the country miles away? --- ## Multi-objects Some objects have multiple geometries: ```r st_geometry_type(near_lnd) ``` ``` ## [1] MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON MULTIPOLYGON ## [6] MULTIPOLYGON MULTIPOLYGON ## 18 Levels: GEOMETRY POINT LINESTRING POLYGON ... TRIANGLE ``` Which have more than 1? ```r data.frame(near_lnd$name_long, sapply(near_lnd$geom, length)) ``` ``` ## near_lnd.name_long sapply.near_lnd.geom..length. ## 1 France 3 ## 2 Germany 1 ## 3 Luxembourg 1 ## 4 Belgium 1 ## 5 Netherlands 1 ## 6 Ireland 1 ## 7 United Kingdom 2 ``` --- ## Subsetting contiguous polygons ```r near_lnd_new = world %>% st_cast(to = "POLYGON") %>% filter(st_intersects(., lnd_buff, sparse = FALSE)) plot(st_geometry(near_lnd_new)) ``` ![](spatial-tidyverse_files/figure-html/unnamed-chunk-30-1.png)<!-- --> --- # Tidyverse pitfall 1: row names You can also do tidy spatial subsetting: ```r near_lnd_tidy = world %>% filter(st_intersects(., lnd_buff, sparse = FALSE)) ``` But a) it's verbose and b) it boshes the row names! ```r row.names(near_lnd_tidy) ``` ``` ## [1] "1" "2" "3" "4" "5" "6" "7" ``` ```r row.names(near_lnd) ``` ``` ## [1] "44" "122" "129" "130" "131" "134" "144" ``` - Consequences for joining - rownames can be useful! Also affects non-spatial data - [tidyverse/dplyr#366](https://github.com/tidyverse/dplyr/issues/366) --- # Tidyverse pitfall 2: Duplicate column names See [edzer/sf#544](https://github.com/r-spatial/sf/issues/544) ```r names(world) ``` ``` ## [1] "iso_a2" "name_long" "continent" "region_un" "subregion" ## [6] "type" "area_km2" "pop" "lifeExp" "gdpPercap" ## [11] "geom" ``` ```r names(lnd_buff) ``` ``` ## [1] "NAME" "GSS_CODE" "HECTARES" "NONLD_AREA" "ONS_INNER" ## [6] "SUB_2009" "SUB_2006" "geometry" ``` ```r lnd_buff$iso_a2 = NA ``` ```r st_intersection(world, lnd_buff) # works world_tidy = st_as_sf(as_tibble(world)) st_intersection(world_tidy, lnd_buff) # fails ``` ``` Error: Column `iso_a2` must have a unique name ``` --- # Tidyverse pitfall 3: binding rows ```r rbind(near_lnd, near_lnd) # works bind_rows(near_lnd, near_lnd) ``` ``` Error in .subset2(x, i, exact = exact) : attempt to select less than one element in get1index ``` But this does: ```r near_lnd_data = st_set_geometry(near_lnd, NULL) d = bind_rows(near_lnd_data, near_lnd_data) d_sf = st_sf(d, geometry = c(near_lnd$geom, near_lnd$geom)) plot(d_sf) ``` ![](spatial-tidyverse_files/figure-html/unnamed-chunk-37-1.png)<!-- --> --- ## CRS ```r na_2163 = world %>% filter(continent == "North America") %>% st_transform(2163) #US National Atlas Equal Area st_crs(na_2163) ``` ``` ## Coordinate Reference System: ## EPSG: 2163 ## proj4string: "+proj=laea +lat_0=45 +lon_0=-100 +x_0=0 +y_0=0 +a=6370997 +b=6370997 +units=m +no_defs" ``` ![](figs/coord_compare.png) --- ## Basic maps - Basic maps of `sf` objects can be quickly created using the `plot()` function: ```r plot(world[0]) ``` ```r plot(world["pop"]) ``` ![](figs/plot_compare.png) --- ## tmap https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html ```r library(tmap) tmap_mode("plot") #check the "view" tm_shape(world, projection = "+proj=moll") + tm_polygons("lifeExp", title = "Life expactancy", style = "pretty", palette = "RdYlGn") + tm_style("grey") ``` <img src="spatial-tidyverse_files/figure-html/unnamed-chunk-43-1.png" style="display: block; margin: auto;" /> --- ## leaflet ```r library(leaflet) leaflet(world) %>% addTiles() %>% addPolygons(color = "#444444", weight = 1, fillOpacity = 0.5, fillColor = ~colorQuantile("YlOrRd", lifeExp)(lifeExp), popup = paste(round(world$lifeExp, 2))) ```
--- ## Raster data in the tidyverse Raster data is not yet closely connected to the **tidyverse**, however: - Some functions from the **raster** package works well in `pipes` - You can convert vector data to the `Spatial*` form using `as(my_vector, "Spatial")`before working on raster-vector interactions - There are some initial efforts to bring raster data closer to the **tidyverse**, such as [tabularaster](https://github.com/hypertidy/tabularaster), [sfraster](https://github.com/mdsumner/sfraster), or [fasterize](https://github.com/ecohealthalliance/fasterize) - The development of the [stars](https://github.com/r-spatial/stars), package focused on multidimensional, large datasets should start soon. It will allow pipe-based workflows --- # Practical exercises In groups of 2:4, -- A) Beginner/tidyverse consolidation: Tackle the exercises in [Chapter 3](http://geocompr.robinlovelace.net/attr.html) of Geocomputation with R -- B) Intermediate/advanced: Build on Edzer's [`breweries` analysis](https://edzer.github.io/UseR2017/) and answer the questions using tidyverse functions: 1. which was the earliest founded brewery? 2. which has the longest name? 3. group the breweries by the century they were founded: which has, on average, most beer types? 4. Join the breweries to a 5km buffer around the trails: which trail is the best for number/diversity of breweries? -- C) Advanced/raster: Build on the [SLR article in geocompr.github.io/geocompkg](https://geocompr.github.io/geocompkg/articles/sea-level-rise.html) and the Geocomputation with R [slides](https://geocompr.github.io/presentations/) to: 1. To find the % of Sczecin flooded under a 20m scenario of SLR? 1. What % of Prague area will be flooded by 200m of SLR?! 2. The % of another town that would be flooded by another SLR value. <!-- 3. Devise a research programme to do a 5 year study on geocomputation for sea level rise research. --> -- D) Solo working the geocompr chapter that's most interesting to you --- # What next - Who wants to do A, B and C? - Get into groups (move around!) - Ask at least 1 question or help at least 1 person -- - Bonus: create a reprex showing (part of) your analysis -- Check the SLR science in refs here: [geocompr.github.io/presentations](https://geocompr.github.io/presentations/geostat18-geocomputation.html#64) -- Have fun! ![](https://media.giphy.com/media/OMeGDxdAsMPzW/giphy.gif) --- ## Geocomputation with R The online version of the book is at http://geocompr.robinlovelace.net/ and its source code at https://github.com/robinlovelace/geocompr. We encourage contributions on any part of the book, including: - Improvements to the text, e.g. clarifying unclear sentences, fixing typos (see guidance from [Yihui Xie](https://yihui.name/en/2013/06/fix-typo-in-documentation/)) - Changes to the code, e.g. to do things in a more efficient way - Suggestions on content (see the project's [issue tracker](https://github.com/Robinlovelace/geocompr/issues) and the [work-in-progress](https://github.com/Robinlovelace/geocompr/tree/master/work-in-progress) folder for chapters in the pipeline) Please see [our-style.md](https://github.com/Robinlovelace/geocompr/blob/master/our-style.md) for the book's style.