Wednesday, January 15, 2025

GIS 6005 Module 1 - Map Design and Typography

Module One of GIS 6005 - Communicating GIS revolved around cartographic design principles and typographical principles that should be followed to create effective maps. The five design principles studied were as follows:

  • Visual Contrast
  • Legibility
  • Figure-ground organization
  • Hierarchical organization 
  • Balance
These principles were practiced in all five parts of the lab while typographical design, or labelling, was introduced into the last three sections of the assignment.

For part one, datasets of the Austin, Texas metropolitan area were provided and were to be compiled into a cartographic reference map for tourists coming to the city. A simple layout was provided with all required map elements, but the final map [shown below] does not incorporate any of the original layout's elements. Here is a brief synopsis of the five design principles and how they were applied to the final map:

Visual contrast 'relates to how map features differ from each other and their background' [Kimberling, 2012, p.133]. This was achieved by using bright colors for the symbology of the feature classes that are emphasized against the earthy tones of the background. Legibility is 'the degree to which something can be read and deciphered' [Kimberling, 2012, p. 132]. Careful consideration was given to the font types, size, and font colors throughout the map, ensuring they can be read at a reasonable distance. Figure-ground organization is 'a perceptual phenomenon in which our mind and eye work together to spontaneously organize what we are viewing into two contrasting impressions - the figure, on which our eye settles, and the amorphous ground below or behind it' [Kimberling, 2012, p. 136]. To achieve this aesthetical organization, a darker color was used for the focal point of the map to visually bring it to the forefront of the page. Visual hierarchy is 'the graphic structuring of the features that make up a map' [Kimberling, 2012, p. 137]. This hierarchy is simply achieved by ensuring the focal point of the map [Travis County] has the most visual weight, the title / subtitle are second to the map, and the remaining map elements hold the least amount of visual weight, not stealing unnecessary attention from the user's eyes. Finally, balance 'involves the harmonious organization of the mapped area and any marginalia on the [map]...' [Kimberling, 2012, pg. 140]. This design principle offers much more flexibility than the others, is more subjective in nature, and is achievable through many differing methods. For this map, it has a more symmetrical, or formal, balance with the map dead center and the marginalia organized on both sides at the bottom of the map. This balance is highly contrasting with the balance achieved in part two of the lab [see next map below].

Part two of this lab was creating a map for a lumber company that illustrates how much land can be harvested in two land leases, both located in the state of Alaska; all five design principles introduced in part one were to be addressed and applied in this map as well [see below], but the final product is quite different between the two maps.


Part three of this assignment still focused on the five cartographic design principles, but introduced labelling practices into the design process. There are typographical standards that provide the cartographer with general guidelines, but sacrifices sometimes need to be made to achieve the desired result. A map of San Francisco, California and some of its major landmarks was the required deliverable. Here are a couple design principles that were followed in the creation of this map: first, since the map is of San Francisco, this text was given the most weight by using a bold, larger font than the rest of the labels. A sans serif font was used in the labelling of all cities and neighborhoods because that is a cartographic standard; manmade landmarks are labelled with a sans serif font. The parks were labelled with a serif font in a green hue that is contrasting to the shade of green used in the parks themselves. Also, the spacing was increased between the letters of these labels because it spreads the labels out over a larger area, visually conveying that areal features are being labelled. Lastly, a serif font was also used for natural features in a brown hue that is contrasting to the shade of brown that was used for the land mass. Since there are roadways all over the map, they were displayed in a light shade of grey, and a halo was used on most labels that overlaid the streets to increase legibility. These decisions led to a map that is visually balanced, aesthetically pleasing, and hierarchically organized to emphasize which landmarks are most important on the map.


Finally, for part four and five, the deliverable was a map of Mexico and its significant landmarks. The same principles that were applied in step three were relevant to step four and five also, but more considerations had to be taken because the amount of labelling on this map was somewhat daunting. In step four, the only labelling that was required was the rivers. This provided an opportunity to explore the various settings to achieve a desirable result. As shown in the map below, a blue, italicized, and serif font was used for the text elements, but they were manipulated in a way that they curve along the linework of the feature class. Once these labels were added to the map and properly formatted, cities, states, and the capitol city were added to the map. Once all the labels were on the map, it was apparent that design choices, and sacrifices, were going to be required to create a legible end product. The first decision was to reduce the number of cities that were included on the map; to accomplish this, all cities that had a population less than 250,000 people were eliminated. This greatly enhanced the legibility, but it was still highly congested around the capitol city. Due to the small size of the District of Mexico, and it being a federal district [not a state], it was eliminated from the map as well. These design choices led to a final product [see below] that was substantially more legible than one displaying the label for every feature class.

Overall, this lab assignment was very fulfilling and provided numerous opportunities to get familiar with the cartographic and typographic design principles that are required to create an effective, high-quality map.

Sources:

Kimerling, A. J. (2012). Map Use: reading, analysis, interpretation (7th ed). Esri Press Academic.

Wednesday, October 9, 2024

GIS 5935 Module 3.1 - Scale Effect and Spatial Data Aggregation

The final module of Special Topics in GIS focused on scale / resolution, and data aggregation. The first portion of the lab explored two vector-based datasets [consisting of points, lines, and polygons] and how they were affected as the scale of the datasets was changed. As the scale / resolution of a dataset is enlarged, sample points will be eliminated as they space between is reduced. Therefore, lines and polygons will have less vertices, resulting in geometries that are over-generalized, or completely eliminated from the map. This is displayed on the map below, with the left-side map being delineated watersheds and the right-side map highlighting water bodies of the same swath of land located in North Carolina. The darkest shade of blue is the original dataset, the medium represents the same dataset at a 1:24,000 resolution, and the light blue represents the same dataset at a 1:100,000 resolution. It is evident that as the scale is increased, the number of line features decreases, eliminating line / polygon features from the dataset; hence, more line features / polygonal features are captured by the original and high-resolution [1:24,000] datasets. Due to this generalization effect, a careful consideration of appropriate scale must take place to ensure an accurate representation of the data.


The second portion of the lab assignment focused on spatial data aggregation, or specifically, the Modified Area Unit Problem. Essentially, the MAUP is when the same dataset can produce different results based on how the data is aggregated or shown at different scales. This is called the Zone Effect or Scale Effect, respectively. In the four maps below, the same dataset was used, but applied to different zones, or districts [this is spatial data aggregation]: census blocks, congressional house blocks, zip codes, and counties. A brief visual comparison of these four maps displays how the results can be dramatically different from one another, based on the type of aggregation used. Like the scale / resolution issue discussed in the first portion of this assignment, it is very important to acknowledge the Modified Area Unit Problem and carefully consider its potential effects on each spatial data analysis.


The third, and final portion of the lab demonstrated how spatial data can be used to manipulate results, especially in the realm of politics. Gerrymandering is a manipulation of political districting that creates an 'intentional bias' in order to favor one political party over another. To discover and combat this practice, the Polsby & Popper score can be used to determine the compactness of a political district; the Polsby & Popper score is a ratio between the area of the district and its perimeter, and is computed with the following equation:
PP = 4πA / P²

where A is the Area of the district and P is the perimeter; this score will fall within a range of 0 to 1, with 1 being perfect compactness [Morgan & Evans, 2018]. After running the Polsby & Popper test on all congressional districts in the continental United States, it was determined that the district in the map below scored the lowest, with a Polsby & Popper score of .02948!


Informational Source:

Morgan, J.D. and Evans, J. (2018). Aggregation of Spatial Entities and Legislative RedistrictingLinks to an external site.. The Geographic Information Science & Technology Body of Knowledge (3rd Quarter 2018 Edition), John P. Wilson (Ed.). DOI:10.22224/gistbok/2018.3.6

Tuesday, September 24, 2024

GIS 5935 Module 2.2 - Surface Interpolation

Module 2.2 of Special Topics in GIS was an exploration of surface interpolation and some of the different methods that can be employed to produce a dataset consisting of estimated values in between known [sample or testing] points. [Bolstad & Manson, 2022] define surface interpolation as a 'prediction of variables at unmeasured locations, and based on a sampling of the same variables at known locations [p. 510]. This was accomplished by the use of four different techniques and comparing the results to discern which model most accurately portrayed the data. 

The first exercise was a comparison of Digital Elevation Models produced by Inverse Distance Weighted [IDW] and Spline interpolation. The IDW Interpolation model is an estimation of unknown values inversely proportionate to the distance from known values at sample, or testing, points. Essentially, this equates to greater distanced from sample points equals less influence in determining that cell's estimated value. The Spline Interpolation model employs mathematical functions, or polynomials, to form a smooth curved surface between the known sample points [Bolstad & Manson, 2022]. 

After these models were constructed, the Raster Calculator geoprocessing tool was ran to determine the mathematical difference between the two; the results are shown in the map below.


As illustrated in the legend, areas that are shaded brown represent raster cells where the Spline model had a higher elevation, and areas that are shaded purple represent cells where the IDW had higher elevation values. It is noteworthy, that mostly all the white areas are places where sample points were collected. Also, mathematically, the ratio of purple to brown areas are 52% and 48%, respectively. 

The remainder of the lab was an analysis of the levels of Biochemical Oxygen Demand [BOD] levels in Tampa Bay, Florida using the following models: Nearest Neighbor [Thiessen Polygons], Inverse Distance Weighted, Regularized Spline, and Tensions Spline. While IDW and Spline were described above, we took spline interpolation one step further to explore the difference between Regularized and Tension. The difference between the two models is based on the weight parameter, where the Regularized Spline produces a smooth, continuous surface, while the Tension Spline is coarser, but manages to force the surface to exactly match the values at the sample points [ESRI, 2024]. The other model used was Nearest Neighbor, or Thiessen Polygon Interpolation. Mathematically speaking, this is the least intensive as it assigns a value for any unsampled location that is equal to the value found at the nearest sample location. [Bolstad & Manson, 2022, p. 516]. This method creates polygons that extend out [in all directions] from known sample points until the maximum distance is reached to the next sample point. The results for all four methods are shown in the maps below.







The interpolation method I would choose to best represent BOD concentrations of Tampa Bay would be the Spline Technique, specifically the Regularized Spline type. The reasoning behind this decision is due to the nature of any substance being diluted in water. Regardless of the substance, once introduced into a body of water, it will disperse evenly and continuously throughout the adjacent areas. The ISW method tends to create 'hot spots' with peaks occurring at the testing points while the Tension Spline type will create a continuous surface, but not a smooth surface. Finally, Nearest Neighbor will provide an estimated value of BOD concentrations, but these are generalized over discreet regions of the bay and will not provide a smooth, continuous interpolated model. However, the Regularized Spline type 'creates a smooth, gradually changing surface with values that may lie outside the sample data range' [ESRI, 2024]. This, in my opinion, would provide a much more accurate estimate of BOD concentrations in Tampa Bay than the other three methods discussed in this assignment.


Sources:

Bolstad, Paul & Manson, Steven. (2022). GIS Fundamentals: A First Text on Geographic Information Systems (7th Edition). Eider Press.

Environmental Systems Research Institute. (2024). How Spline Workshttps://pro.arcgis.com/en/pro-app/latest/tool-reference/3d-analyst/how-spline-works.htm

Monday, September 16, 2024

GIS 5935 Module 2.1 - Surfaces [Triangulated Irregular Networks and Digital Elevation Models]

Module 2.1 of Special Topics in GIS was based on surfaces, particularly Triangulated Irregular Networks [TINs] and Digital Elevation Models [DEMs]. The first portion of the lab was an opportunity to import elevation data, set the ground source [giving it 3D visualization], and learning how to exaggerate the vertical distances to enhance the visual aesthetics of the landscape. Once these fundamental concepts were practiced, an analytical problem was presented. 

The second portion of the lab was to create a Suitability Map for a study area that illustrates the best locations for a ski resort and its associated ski run. The suitability was determined based on slope, elevation, and aspect [directional face] of the landscape. The dark green areas of the map below display the most suitable locations of the resort, and the red areas signify areas that are unsuitable for this tourist destination.


The final portion of this lab dealt with TIN's and DEM's. Environmental Systems Research Institute [ESRI] defines a Triangulated Irregular Network as a form of vector-based digital geographic data and are constructed by triangulating a set of vertices [points]. The vertices are connected with a series of edges to form a network of triangles.' The documentation for this model type is quite extensive, and can be found by clicking here. Furthermore, ESRI defines Digital Elevation Models as 'a raster representation of a continuous surface, usually referencing the surface of the earth' [ESRI, 2024]; the complete documentation for DEM's can be found by clicking here. Once these two models were created, contour lines were derived at 100 meter increments [see map below]. Contours represent lines of equal elevation that are spaced at equal intervals [Bolstad & Manson, 2022].


Due to the data types of the models used to create the contour lines, the output of each varies in appearance. Since TIN's are based on points, lines, and polygons, the contour lines take on a very segmented appearance. Conversely, a raster-based DEM will create contour lines with a very transitional and organic aesthetic.


Sources:

Bolstad, Paul & Manson, Steven. (2022). GIS Fundamentals: A First Text on Geographic Information Systems (7th Edition). Eider Press.

Environmental Systems Research Institute. (2024). Exploring Digital Elevation Models

Environmental Systems Research Institute. (2024). What is a TIN Surface?  


Tuesday, September 10, 2024

GIS 5935 Module 1.3 - Data Quality Assessment

 Module 1.3 of Special Topics in GIS was a continuation of data quality; this module focused on the completeness of datasets, roadway networks particularly. Two datasets were provided for the completeness assessment; one was obtained from Jackson County, Oregon and the other was downloaded from the United States Census Bureau TIGER shapefile repository. While both datasets contained roadway centerlines, their overall distances were significantly different. The spatial analysis performed on these datasets was to ascertain which one was more complete, based on length alone. Initially, before any processing was performed, the TIGER shapefile consisted of 11,382.7 kilometers of roadway centerlines while the Jackson County dataset accounted for 10,873.3 kilometers, making the TIGER dataset more complete. 

The next process of this lab was to analyze completeness according to [Haklay, 2010]. Essentially, this method consists of overlaying a grid index on top of the datasets and creating a thematic map according to their percentage differences. For this lab, the grid consisted of 5-kilometer squares that were set within the confines of the county border. Next, all roadways that lied outside of the grid index were clipped; this deleted any extra roadways outside the confines of the grid. After this, the roadways had to be split at the intersection of each grid cell, and then the individual roadway sections within each cell had to be dissolved into one multi-part feature. Once these processes were completed for each dataset, a comparison between the two could be made on a cell-by-cell basis [see map below].

For this particular assignment, the Jackson County dataset was determined to be used as the baseline to make all comparisons. [Haklay, 2010] eludes that completeness of datasets obtained from local jurisdictions often surpasses datasets that are downloaded from national bureaucracies or volunteered geographic information systems, such as OpenStreetMap. Once the baseline dataset was determined, executing the following formula on each grid cell provided a percentage difference that was used to create the map above:

[[Jackson County Length - TIGER Length] / Jackson County Length] * 100

As displayed in the map's legend, the positive percentages represent cells where the Jackson County dataset is more complete than the TIGER. Conversely, negative percentages represent cells where the TIGER dataset is more complete than the Jackson County shapefiles. Lastly, it is worth mentioning that the distribution of the percentage differences was skewed to the right with some drastic outliers in the negative percentage range; because of this skewed distribution, I decided to apply a manual interval data classification system over an equal interval or quantile system. This decision eliminated any misconceptions portrayed by the few extreme outliers.

Overall, this assignment was another exciting opportunity to explore the ongoing issue of spatial data quality in the realm of Geographic Information Sciences.


Source:

Haklay, M. (2010). How Good is Volunteered Geographic Information? A Comparative Study of OpenStreetMap and Ordinance Survey Datasets. Environmental and Planning B: Planning and Design, 37(4). 682-703.

Monday, September 2, 2024

GIS 5935 Module 1.2 - Spatial Data Quality

Lab assignment 1.2 of Special Topics in GIS was performing an accuracy assessment according to the National Standard for Spatial Data Accuracy. Positional Accuracy Handbook states 'the National Standard for Spatial Data Accuracy describes a way to measure and report positional accuracy of features found within a geographic dataset. Approved in 1998, the NSSDA recognizes the growing need for digital spatial data and provides a common language for reporting accuracy' [Planning, 1999]. For this assignment, two datasets were provided for a study area located in the City of Albuquerque, New Mexico. The first dataset was obtained from the City of Albuquerque and the second was a StreetMap USA dataset, which is a product of TeleAtlas and is distributed by ESRI with the ArcGIS software package. Both datasets consist of roadway networks and can be seen in the map below. The green lines represent the City of Albuquerque [ABQ] dataset, and the red lines represent the StreetMap USA dataset.

The process for this accuracy assessment is clearly outlined in the Positional Accuracy Handbook [Planning, 1999] but will be briefly outlined through the remainder of this post. The first step of the assessment was to determine if the test involves horizontal accuracy, vertical accuracy, or both; for this assignment, we focused on horizontal accuracy only. Second, testing points were determined throughout the study area; these points are displayed as black X's in the map above. Specific guidelines are provided in [Planning, 1999] that aid in the appropriate determination of testing points. Next, an independent data set of higher accuracy needs to be chosen in order to complete the assessment. To accomplish this, 2006 Digital Orthophoto Quadrangles [United States Geological Survey] were used to identify intersections as reference points, and horizontal accuracy assessments were performed on the two provided datasets. It is noteworthy, however, that a quick visual analysis concludes that the City of Albuquerque dataset is aligned more consistently with the USGS DOQ's than its StreetMap USA counterpart; this visual analysis should be harmonious with the results calculated at the end of the NSSDA assessment. Measurements were taken from each dataset to the digitized reference points located at the chosen street intersections on the USGS DOQ's. Once these measurements were obtained, the NSSDA Horizontal Accuracy Statistic Worksheet could be completed for each of the datasets; the results for the City of Albuquerque are shown below:
Here are the results obtained from the StreetMap USA dataset:
As predicted, the value of the City of Albuquerque dataset is much lower than the value calculated for the StreetMap USA dataset. The next, or sixth, step of the assessment is to construct an accuracy statement that clearly defines the dataset's accuracy to the 95th percentile. The accuracy statement for each of the two datasets are written below:

ABQ Dataset:

Using the National Standard for Spatial Data Accuracy, the data set tested 14.27ft horizontal accuracy at 95% confidence level.

StreetMap Dataset:

Using the National Standard for Spatial Data Accuracy, the data set tested 379.66ft horizontal accuracy at 95% confidence level.

This provides the user with a clearly defined value of what radius 95% of all values will fall within. As determined above, a user can be certain that 95% of all features in the City of Albuquerque dataset will fall within 14.27 feet of their true geographical location, and 95% of all features in the StreetMap USA dataset will fall within 379.66 feet of their true geographical location. In conclusion, the horizontal accuracy of the City of Albuquerque dataset is substantially higher than the StreetMap USA dataset. The final step of the NSSDA Horizontal Accuracy Assessment is to include the report in a comprehensive description of the dataset called metadata, or 'data about the data' [Planning, 1999]. An example of this comprehensive description is provided below:

Example of Detailed positional accuracy statements as reported in metadata:

Digitized features of the roadway infrastructure located within the study area of Albuquerque, New Mexico were obtained from the City of Albuquerque and from StreetMap USA, a product of TeleAtlas and distributed by ESRI with ArcGIS. Those obtained from the City of Albuquerque tested at 14.27ft horizontal accuracy at the 95% confidence level, and those obtained from StreetMap USA tested at 379.66ft horizontal accuracy at the 95% confidence level using modified NSSDA testing procedures. See Section 5 for entity information of digitized feature groups. See also Lineage portion of Section 2 for additional background. For a complete report of the testing procedures used, contact the University of West Florida GIS Department as noted in Section 6, Distribution Information.

Levels of vertical relief were not considered throughout the entire accuracy assessment of these two datasets.

All other features are generated by coordinate geometry and are based on a visually based framework of roadway intersections. Computed positions of roadway intersections, or testing points, are not based on individual field surveys. Although tests of randomly selected points for comparison may show varying degrees of accuracy between the provided datasets, overall visual analysis confirms higher levels of accuracy throughout the entire City of Albuquerque dataset. However, caution is necessary in use of roadway intersections as shown, due to the location process employed throughout this assessment. For more information, contact the GIS department at the University of West Florida.


Source:

Planning, M. (1999). Positional Accuracy Handbook. Using the National Standard for Spatial data Accuracy to measure and report geographic data quality. Minnesota Planning, St. Paul, MN.

Monday, August 26, 2024

GIS 5935 Module 1 - Data Precision and Accuracy

Module 1 of Special Topics in GIS dealt with the precision and accuracy of gathered waypoints from a GPS data collection unit. International Organization for Standardization defines precision as 'the closeness of agreement between independent test results obtained under stipulated conditions' [ISO, 2006]. With regard towards the lab assignment, precision would be determining the proximity of fifty gathered waypoints from a single location using a Garmin GPSMAP 76 data collection unit. As shown in the map below, many of the waypoints are in close proximity while others deviate from the majority. For this part of the lab, the mathematical mean was calculated for the x-, y-, and z- location for all fifty waypoints; this 'average' location is denoted on the map as a red 'X'. Once this average location was calculated, an analysis could be performed on the distance between each waypoint and the calculated average location. This precision analysis concludes that 50% of all gathered waypoints fall within 3.1 meters of the average location, 68% of waypoints fall within 4.5 meters of the average location, and 95% of all gathered waypoints fall within 14.8 meters from the calculated average location. Whether these precision analysis results would suffice varies widely between applications. These percentile distances may be acceptable and appropriate for one scenario and widely unacceptable in a different scenario; precision, therefore, is relative and must be determined at the beginning of each synopsis.


The second part of the lab assignment dealt with accuracy, and different tools that can be employed to determine the extent of accuracy within a dataset. According to GIS Fundamentals: A First Text on Geographic Information Systems, an accurate observation 'reflects the true shape, locations, or characteristics of the phenomena represented in a GIS', meaning that accuracy is a 'measure of how often or by how much our data values are in error' [Bolstad & Manson, 2022, p. 609]. For this portion of the lab, a dataset was provided and completely analyzed using Microsoft Excel. This was very beneficial, as it provided an excellent opportunity to deviate from the comfort of ArcGIS Pro, and into a program that is not so familiar. The first tool used to calculate the dataset's accuracy was a series of manual formulas, including minimum, maximum, mean, median and Root Square Mean Error. The second method used to display the accuracy of the dataset was a Cumulative Distributive Function graph, which is displayed below. 


For additional information regarding Root Mean Square Error, click here. For additional information regarding Cumulative Distribution Function, click here. Both sources are published by Science Direct and provide an extensive overview of their respective topics.


Sources:

Bolstad, Paul & Manson, Steven. (2022). GIS Fundamentals: A First Text on Geographic        Information Systems (7th Edition). Eider Press.

International Organization for Standardization. (2006). Statistics - Vocabulary and Symbols.    Part 1: General Statistical Terms and Terms Used in ProbabilityInternational Organization for Standardization.

Science Direct. (2024). Cumulative Distribution Function.                            https://www.sciencedirect.com/topics/mathematics/cumulative-distribution-function.

Science Direct. (2024). Root Mean Square Errorhttps://www.sciencedirect.com/topics/engineering/root-mean-square-error.

GIS 6005 Module 1 - Map Design and Typography

Module One of GIS 6005 - Communicating GIS revolved around cartographic design principles and typographical principles that should be follow...