Tuesday, September 10, 2024

GIS 5935 Module 1.3 - Data Quality Assessment

 Module 1.3 of Special Topics in GIS was a continuation of data quality; this module focused on the completeness of datasets, roadway networks particularly. Two datasets were provided for the completeness assessment; one was obtained from Jackson County, Oregon and the other was downloaded from the United States Census Bureau TIGER shapefile repository. While both datasets contained roadway centerlines, their overall distances were significantly different. The spatial analysis performed on these datasets was to ascertain which one was more complete, based on length alone. Initially, before any processing was performed, the TIGER shapefile consisted of 11,382.7 kilometers of roadway centerlines while the Jackson County dataset accounted for 10,873.3 kilometers, making the TIGER dataset more complete. 

The next process of this lab was to analyze completeness according to [Haklay, 2010]. Essentially, this method consists of overlaying a grid index on top of the datasets and creating a thematic map according to their percentage differences. For this lab, the grid consisted of 5-kilometer squares that were set within the confines of the county border. Next, all roadways that lied outside of the grid index were clipped; this deleted any extra roadways outside the confines of the grid. After this, the roadways had to be split at the intersection of each grid cell, and then the individual roadway sections within each cell had to be dissolved into one multi-part feature. Once these processes were completed for each dataset, a comparison between the two could be made on a cell-by-cell basis [see map below].

For this particular assignment, the Jackson County dataset was determined to be used as the baseline to make all comparisons. [Haklay, 2010] eludes that completeness of datasets obtained from local jurisdictions often surpasses datasets that are downloaded from national bureaucracies or volunteered geographic information systems, such as OpenStreetMap. Once the baseline dataset was determined, executing the following formula on each grid cell provided a percentage difference that was used to create the map above:

[[Jackson County Length - TIGER Length] / Jackson County Length] * 100

As displayed in the map's legend, the positive percentages represent cells where the Jackson County dataset is more complete than the TIGER. Conversely, negative percentages represent cells where the TIGER dataset is more complete than the Jackson County shapefiles. Lastly, it is worth mentioning that the distribution of the percentage differences was skewed to the right with some drastic outliers in the negative percentage range; because of this skewed distribution, I decided to apply a manual interval data classification system over an equal interval or quantile system. This decision eliminated any misconceptions portrayed by the few extreme outliers.

Overall, this assignment was another exciting opportunity to explore the ongoing issue of spatial data quality in the realm of Geographic Information Sciences.


Source:

Haklay, M. (2010). How Good is Volunteered Geographic Information? A Comparative Study of OpenStreetMap and Ordinance Survey Datasets. Environmental and Planning B: Planning and Design, 37(4). 682-703.

No comments:

Post a Comment

GIS 5935 Module 2.2 - Surface Interpolation

  Post in progress - please check back soon...