Skip to content

Commit a343097

Browse files
committed
source commit: 5ccff3b
0 parents  commit a343097

File tree

127 files changed

+6735
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

127 files changed

+6735
-0
lines changed

01-intro-raster-data.md

Lines changed: 190 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
---
2+
title: "Introduction to Raster Data"
3+
teaching: 15
4+
exercises: 5
5+
---
6+
7+
:::questions
8+
- What format should I use to represent my data?
9+
- What are the main data types used for representing geospatial data?
10+
- What are the main attributes of raster data?
11+
:::
12+
13+
:::objectives
14+
- Describe the difference between raster and vector data.
15+
- Describe the strengths and weaknesses of storing data in raster format.
16+
- Distinguish between continuous and categorical raster data and identify types of datasets that would be stored in each format.
17+
:::
18+
19+
## Introduction
20+
21+
This episode introduces the two primary types of data models that are used to digitally represent the Earth surface: raster and vector. After briefly introducing these data models, this episode focuses on the raster representation, describing some major features and types of raster data. This workshop will focus on how to work with both raster and vector data sets, therefore it is essential that we understand the basic structures of these types of data and the types of phenomena that they can represent.
22+
23+
## Data Structures: Raster and Vector
24+
25+
The two primary data models that are used to represent the Earth surface digitally are the raster and vector. **Raster data** is stored as a grid of values which are rendered on a map as pixels (also known as cells) where each pixel (or cell) represents a value of the Earth surface. Examples of raster data are satellite images or aerial photographs. Data stored according to the **vector data** model are represented by points, lines, or polygons. Examples of vector representation are points of interest, buildings (often represented as building footprints) or roads.
26+
27+
Representing phenomena as vector data allows you to add attribute information to them. For instance, a polygon of a house can contain multiple attributes containing information about the address like the street name, zip code, city, and number. More explanations about vector data will be discussed in the [next episode](02-intro-vector-data.md).
28+
29+
When working with spatial information, you will experience that many phenomena can be represented as vector data and raster data. A house, for instance, can be represented by a set of cells in a raster having all the same value or by a polygon as vector containing attribute information (figure 1). It depends on the purpose for which the data is collected and intended to be used which data model it is stored in. But as a rule of thumb, you can apply that discrete phenomena like buildings, roads, trees, signs are represented as vector data, whereas continuous phenomena like temperature, wind speed, elevation are represented as raster data. Yet, one of the things a spatial data analyst often has to do is to transform data from vector to raster or the other way around. Keep in mind that this can cause problems in the data quality.
30+
31+
### Raster Data
32+
33+
Raster data is any pixelated (or gridded) data where each pixel has a value and is associated with a specific geographic location. The value of a pixel can be continuous (e.g., elevation, temperature) or categorical (e.g., land-use type). If this sounds familiar, it is because this data structure is very common: it's how we represent any digital image. A geospatial raster is only different from a digital photo in that it is accompanied by spatial information that connects the data to a particular location. This includes the raster's extent and cell size, the number of rows and columns, and its Coordinate Reference System (CRS), which will be explained in [episode 3](03-crs.md) of this workshop.
34+
35+
![Raster Concept (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_concept.png){alt="raster concept"}
36+
37+
Some examples of continuous rasters include:
38+
39+
1. Precipitation maps.
40+
2. Elevation maps.
41+
42+
A map of elevation for *Harvard Forest* derived from the [NEON AOP LiDAR sensor](https://www.neonscience.org/data-collection/airborne-remote-sensing)
43+
is below. Elevation is represented as a continuous numeric variable in this map. The legend
44+
shows the continuous range of values in the data from around 300 to 420 meters.
45+
46+
![Continuous Elevation Map: HARV Field Site](fig/E01/continuous-elevation-HARV-plot-01.png){alt="elevation Harvard forest"}
47+
48+
Some rasters contain categorical data where each pixel represents a discrete
49+
class such as a landcover type (e.g., "forest" or "grassland") rather than a
50+
continuous value such as elevation or temperature. Some examples of classified
51+
maps include:
52+
53+
1. Landcover / land-use maps.
54+
2. Elevation maps classified as low, medium, and high elevation.
55+
56+
![USA landcover classification](fig/E01/USA_landcover_classification.png){alt="USA landcover classification"}
57+
58+
The map above shows the contiguous United States with landcover as categorical
59+
data. Each color is a different landcover category. (Source: Homer, C.G., et
60+
al., 2015, Completion of the 2011 National Land Cover Database for the
61+
conterminous United States-Representing a decade of land cover change
62+
information. Photogrammetric Engineering and Remote Sensing, v. 81, no. 5, p.
63+
345-354)
64+
65+
:::challenge
66+
## Advantages and Disadvantages
67+
68+
With your neighbor, brainstorm potential advantages and
69+
disadvantages of storing data in raster format. Add your
70+
ideas to the Etherpad. The Instructor will discuss and
71+
add any points that weren't brought up in the small group
72+
discussions.
73+
74+
::::solution
75+
## Solution
76+
77+
Raster data has some important advantages:
78+
79+
* representation of continuous surfaces
80+
* potentially very high levels of detail
81+
* data is 'unweighted' across its extent - the geometry doesn't
82+
implicitly highlight features
83+
* cell-by-cell calculations can be very fast and efficient
84+
85+
The downsides of raster data are:
86+
87+
* very large file sizes as cell size gets smaller
88+
* currently popular formats don't embed metadata well (more on this later!)
89+
* can be difficult to represent complex information
90+
::::
91+
:::
92+
93+
### Important Attributes of Raster Data
94+
95+
#### Extent
96+
97+
The spatial extent is the geographic area that the raster data covers.
98+
The spatial extent of an object represents the geographic edge or
99+
location that is the furthest north, south, east and west. In other words, extent
100+
represents the overall geographic coverage of the spatial object.
101+
102+
![Spatial extent image (Image Source: National Ecological Observatory Network (NEON))](fig/E01/spatial_extent.png){alt="spatial extent objects"}
103+
104+
:::challenge
105+
## Extent Challenge
106+
107+
In the image above, the dashed boxes around each set of objects
108+
seems to imply that the three objects have the same extent. Is this
109+
accurate? If not, which object(s) have a different extent?
110+
111+
::::solution
112+
## Solution
113+
114+
The lines and polygon objects have the same extent. The extent for
115+
the points object is smaller in the vertical direction than the
116+
other two because there are no points on the line at y = 8.
117+
::::
118+
:::
119+
120+
#### Resolution
121+
122+
A resolution of a raster represents the area on the ground that each
123+
pixel of the raster covers. The image below illustrates the effect
124+
of changes in resolution.
125+
126+
![Resolution image (Source: National Ecological Observatory Network (NEON))](fig/E01/raster_resolution.png){alt="resolution image"}
127+
128+
### Raster Data Format for this Workshop
129+
130+
Raster data can come in many different formats. For this workshop, we will use
131+
one of the most common formats for raster data, i.e. the GeoTIFF format, which has the extension `.tif`.
132+
A `.tif` file stores metadata or attributes about the file as embedded `tif tags`. For instance, your camera
133+
might store a tag that describes the make and model of the camera or the date
134+
the photo was taken when it saves a `.tif`. A GeoTIFF is a standard `.tif` image
135+
format with additional spatial (georeferencing) information embedded in the file
136+
as tags. These tags include the following raster metadata:
137+
138+
1. Extent
139+
2. Resolution
140+
3. Coordinate Reference System (CRS) - we will introduce this concept in [a later episode](03-crs.md)
141+
4. Values that represent missing data (`NoDataValue`) - we will introduce this
142+
concept in [a later episode](06-raster-intro.md).
143+
144+
We will discuss these attributes in more detail in [a later episode](06-raster-intro.md).
145+
In that episode, we will also learn how to use Python to extract raster attributes
146+
from a GeoTIFF file.
147+
148+
:::callout
149+
## More Resources on the `.tif` format
150+
151+
* [GeoTIFF on Wikipedia](https://en.wikipedia.org/wiki/GeoTIFF)
152+
* [OSGEO TIFF documentation](https://trac.osgeo.org/geotiff/)
153+
:::
154+
155+
### Multi-band Raster Data
156+
157+
A raster can contain one or more bands. One type of multi-band raster
158+
dataset that is familiar to many of us is a color image. A basic color
159+
image often consists of three bands: red, green, and blue (RGB). Each
160+
band represents light reflected from the red, green or blue portions of
161+
the electromagnetic spectrum. The pixel brightness for each band, when
162+
composited creates the colors that we see in an image.
163+
164+
![RGB multi-band raster image (Source: National Ecological Observatory Network (NEON).)](fig/E01/RGBSTack_1.jpg){alt="multi-band raster"}
165+
166+
We can plot each band of a multi-band image individually.
167+
168+
Or we can composite all three bands together to make a color image.
169+
170+
In a multi-band dataset, the rasters will always have the same extent,
171+
resolution, and CRS.
172+
173+
:::callout
174+
## Other Types of Multi-band Raster Data
175+
176+
Multi-band raster data might also contain:
177+
1. **Time series:** the same variable, over the same area, over time.
178+
2. **Multi or hyperspectral imagery:** image rasters that have 4 or
179+
more (multi-spectral) or more than 10-15 (hyperspectral) bands. We
180+
won't be working with this type of data in this workshop, but you can
181+
check out the NEON Data Skills [Imaging Spectroscopy HDF5 in R](https://www.neonscience.org/hsi-hdf5-r)
182+
tutorial if you're interested in working with hyperspectral data cubes.
183+
:::
184+
185+
:::keypoints
186+
- Raster data is pixelated data where each pixel is associated with a specific location.
187+
- Raster data always has an extent and a resolution.
188+
- The extent is the geographical area covered by a raster.
189+
- The resolution is the area covered by each pixel of a raster.
190+
:::

02-intro-vector-data.md

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
---
2+
title: "Introduction to Vector Data"
3+
teaching: 10
4+
exercises: 5
5+
---
6+
7+
:::questions
8+
- What are the main attributes of vector data?
9+
:::
10+
11+
:::objectives
12+
- Describe the strengths and weaknesses of storing data in vector format.
13+
- Describe the three types of vectors and identify types of data that would be stored in each.
14+
:::
15+
16+
## About Vector Data
17+
18+
Vector data structures represent specific features on the Earth's surface, and
19+
assign attributes to those features. Vectors are composed of discrete geometric
20+
locations (x, y values) known as vertices that define the shape of the spatial
21+
object. The organization of the vertices determines the type of vector that we
22+
are working with: point, line or polygon.
23+
24+
![Types of vector objects (Image Source: National Ecological Observatory Network (NEON))](fig/E02/pnt_line_poly.png){alt="vector data types"}
25+
26+
* **Points:** Each point is defined by a single x, y coordinate. There can be
27+
many points in a vector point file. Examples of point data include: sampling
28+
locations, the location of individual trees, or the location of survey plots.
29+
30+
* **Lines:** Lines are composed of many (at least 2) points that are connected.
31+
For instance, a road or a stream may be represented by a line. This line is
32+
composed of a series of segments, each "bend" in the road or stream represents a
33+
vertex that has a defined x, y location.
34+
35+
* **Polygons:** A polygon consists of 3 or more vertices that are connected and
36+
closed. The outlines of survey plot boundaries, lakes, oceans, and states or
37+
countries are often represented by polygons. Note, that polygons can also contain one
38+
or multiple holes, for instance a plot boundary with a lake in it. These polygons are
39+
considered *complex* or *donut* polygons.
40+
41+
:::callout
42+
## Data Tip
43+
44+
Sometimes, boundary layers such as states and countries, are stored as lines
45+
rather than polygons. However, these boundaries, when represented as a line,
46+
will not create a closed object with a defined area that can be filled.
47+
:::
48+
49+
:::challenge
50+
## Identify Vector Types
51+
52+
The plot below includes examples of two of the three types of vector
53+
objects. Use the definitions above to identify which features
54+
are represented by which vector type.
55+
56+
![Vector Type Examples](fig/E02/vector_types_examples.png){alt="vector type examples"}
57+
58+
::::solution
59+
## Solution
60+
61+
State boundaries are shown as polygons. The Fisher Tower location is
62+
represented by a purple point. There are no line features shown.
63+
Note, that at a different scale the Fischer Tower coudl also have been represented as a polygon.
64+
Keep in mind that the purpose for which the dataset is created and aimed to be used for determines
65+
which vector type it uses.
66+
::::
67+
:::
68+
69+
Vector data has some important advantages:
70+
71+
* The geometry itself contains information about what the dataset creator thought was important
72+
* The geometry structures hold information in themselves - why choose point over polygon, for instance?
73+
* Each geometry feature can carry multiple attributes instead of just one, e.g. a database of cities can have attributes for name, country, population, etc
74+
* Data storage can, depending on the scale, be very efficient compared to rasters
75+
* When working with network analysis, for instance to calculate the shortest route between A and B, topologically correct lines are essential. This is not possible through raster data.
76+
77+
The downsides of vector data include:
78+
79+
* Potential bias in datasets - what didn't get recorded? Often vector data are interpreted datasets like topographical maps and have been collected by someone else, for another purpose.
80+
* Calculations involving multiple vector layers need to do math on the
81+
geometry as well as the attributes, which potentially can be slow compared to raster calculations.
82+
83+
Vector datasets are in use in many industries besides geospatial fields. For
84+
instance, computer graphics are largely vector-based, although the data
85+
structures in use tend to join points using arcs and complex curves rather than
86+
straight lines. Computer-aided design (CAD) is also vector- based. The
87+
difference is that geospatial datasets are accompanied by information tying
88+
their features to real-world locations.
89+
90+
## Vector Data Format for this Workshop
91+
92+
Like raster data, vector data can also come in many different formats. For this
93+
workshop, we will use the GeoPackage format. GeoPackage is developed by the [Open Geospatial Consortium](https://www.ogc.org/) and is *is an open, standards-based, platform-independent, portable, self-describing, compact format for transferring geospatial information* (source: [https://www.geopackage.org/](https://www.geopackage.org/)). A GeoPackage file, with extension **.gpkg**, is a single file that contains the geometries of features, their attributes and information about the coordinate reference system (CRS) used.
94+
95+
Another vector format that you will probably come accross quite often is a Shapefile. Although we will not be using this format in this lesson we do believe it is useful to understand how the Shapefile format works. Shapefile is a multi-file format, with each shapefile consisting of multiple files in the same directory, of which `.shp`, `.shx`, and `.dbf` files are mandatory. Other non-mandatory but very important files are `.prj` and `shp.xml` files.
96+
97+
- The `.shp` file stores the feature geometry itself
98+
- `.shx` is a positional index of the feature geometry to allow quickly searching forwards and backwards the geographic coordinates of each vertex in the vector
99+
- `.dbf` contains the tabular attributes for each shape.
100+
- `.prj` file indicates the Coordinate reference system (CRS)
101+
- `.shp.xml` contains the Shapefile metadata.
102+
103+
Together, the Shapefile includes the following information:
104+
105+
* **Extent** - the spatial extent of the shapefile (i.e. geographic area that
106+
the shapefile covers). The spatial extent for a shapefile represents the
107+
combined extent for all spatial objects in the shapefile.
108+
* **Object type** - whether the shapefile includes points, lines, or polygons.
109+
* **Coordinate reference system (CRS)**
110+
* **Other attributes** - for example, a line shapefile that contains the
111+
locations of streams, might contain the name of each stream.
112+
113+
Because the structure of points, lines, and polygons are different, each
114+
individual shapefile can only contain one vector type (all points, all lines
115+
or all polygons). You will not find a mixture of point, line and polygon
116+
objects in a single shapefile.
117+
118+
:::callout
119+
## More Resources on Shapefiles
120+
121+
More about shapefiles can be found on
122+
[Wikipedia.](https://en.wikipedia.org/wiki/Shapefile) Shapefiles are often publicly
123+
available from government services, such as [this page containing all administrative boundaries for countries in the world](https://gadm.org/download_country.html) or
124+
[topographical vector data from Open Street Maps](https://download.geofabrik.de/).
125+
:::
126+
127+
:::callout
128+
## Why not both?
129+
130+
Very few formats can contain both raster and vector data - in fact, most are
131+
even more restrictive than that. Vector datasets are usually locked to one
132+
geometry type, e.g. points only. Raster datasets can usually only encode one
133+
data type, for example you can't have a multiband GeoTIFF where one layer is
134+
integer data and another is floating-point. There are sound reasons for this -
135+
format standards are easier to define and maintain, and so is metadata. The
136+
effects of particular data manipulations are more predictable if you are
137+
confident that all of your input data has the same characteristics.
138+
:::
139+
140+
141+
:::keypoints
142+
- Vector data structures represent specific features on the Earth's surface along with attributes of those features.
143+
- Vector data is often interpreted data and collected for a different purpose than you would want to use it for.
144+
- Vector objects are either points, lines, or polygons.
145+
:::

0 commit comments

Comments
 (0)