Perceptions of Probability

from https://github.com/zonination/perceptions

from https://github.com/zonination/perceptions

I'm currently working on a project where we are presenting information on the possible presence of particular species at sites across the state. We're considering the fairly ordinary words of 'high', 'medium', and 'low' to describe the probability, These words, of course, can connote a particular meaning to the users of this analysis. In order to make a decision on which words to use, we're beginning to look at the mean behind commonly used phrases.

This great work, by redditor zonation, provides of some great examples of what a surveys mean. This is based on Sherman Kent's work in the 1960's. Its definately provides a great framework to understand how people may understand our data presentation.

 

LiDAR and Forests: Part 1

I've been working with my friend Steve Mather for a while now on setting up an analysis of forest structure across Pennsylvania and Ohio using LiDAR data. Our work has been slow-going.

As a concurrent project, I've been exploring a similar analysis in R in order to think more about the question and come up with an analysis framework for undertaking the work. I recently came across the excellent 'lidr' package for R that allows one for analysis LiDAR data for forestry applications. Having a small amount of extra time available the other night, I decided to dive in and explore some of the data.

Photo by Tim Ebbs / Flickr, CC

Photo by Tim Ebbs / Flickr, CC

We're really interesting in comparing the forest structure as measured by LiDAR to field measurements by ecologists. Effectively, can we take a slice through the point cloud and extract a vertical column.  Our current plots record information across the forest strata: 0-0.5m, 0.5-1m. 1-2m, 2-5m, 5-10m, 10-15m, 15-20m, 20-35m, and >35m.

Each of the Pennsylvania LiDAR tiles covers is four kilometers to a side and thus the file size is usually about 350mb per tile. We have about 13,500 of them in the state, so we're only working with one for this test and we'll scale up later.  It was relatively straightforward to load a single LAS file into R using lidR. 

lidar = readLAS("50001990PAN.las")

Next, I generated a digital terrain model (DTM):

dtm = grid_terrain(lidar, res = 10, method = "knnidw", k = 6L)

This was a pretty quick proof-of-concept so the parameters for interpolating the grid are just placeholders. However, in later versions of this, we'll be able to use the DEM created by Pennsylvania TopoGeo as part of the original LiDAR collection. 

The next step was to create a normalized point cloud that removes the ground contours and leaves us with just the canopy heights.

lidar_norm = lidar - dtm

From here, the next step was to calculate point density in each of the strata layers we measured in the field. First, we pulled out a slice of the point cloud (here's the zero to half meter extract:

ht0_0.5m = lidar_norm %>% lasfilter(Z>0.33,Z<1.64)

Then we calculate the point density:

den0m_0.5m <- grid_density(ht0_0.5m, res=5)

Here's a plot of the density for eight strata l

The next step is now to figure out how to summarize the point could density in each slice of the layer cake.