This week's assignment was based on Data Classification Systems, and how each system can be used to convey information differently. We were tasked with creating two maps of the 2010 Census Tracts [United States Census Bureau] of Miami-Dade County, Florida and graphically portraying 1.) the percentage of Senior Citizens residing in each tract, and 2.) the number of Senior Citizens per square mile in each tract. For each of these maps, we were to display the data utilizing four different data classification methods and displaying the results using graduating color schemes. After final data analysis, we had to decide which map portrayed the information in the most accurate manner. As shown above, I decided that the number of Senior Citizens per square mile was the most accurate depiction, and I will discuss the reasons why toward the end of this post.
The four classification systems that we focused on were Equal Interval, Natural Breaks, Quantile, and Standard Deviation. Below is a brief synopsis of each:
Equal Interval: this data classification system takes the range of values for each feature [or observation] and creates classes that have equal value ranges. For example, if the values ranged from 0 to 100, there would be four classes with a range of 25 or five classes with a range of 20.
Natural Breaks: this data classification system takes the range of values for the entire data set, and creates class ranges that are based on any gaps that occur within the data set. For example, if there is a cluster of observations that range in value from 0 to 10 and the next observation has a value of 14, the computer would create a class that maxes out at 10. The next class would end at the next break that occurs [naturally] in data values, and this would continue until the desired number of classes are created.
Quantile: this data classification system takes the total number of features [or observations], and creates classes that have approximately equal number of observations. For example, Miami-Dade County includes 521 census tracts [as of 2010]; this equates to four classes of 104 tracts and one class with 105 census tracts. The class ranges are dictated by the values of the 104th, 208th, 312th, and 416th data values.
Standard Deviation: this data classification system takes the entire data set, calculates the statistical mean, and creates classes that are higher and classes that are lower than the mean. There are multiple classes on each side of the mean, and graduated colors are used to visually express how far the values deviate from the calculated average.
After analyzing the data, it was apparent that Standard Deviation and Natural Breaks depicted the values more accurately than the Equal Interval and Quantile methods. The biggest issue with Equal Interval was that the data set was highly skewed to the lower end, so many values were clustered into a single class that should have been divided further. The result was a major loss in detail across the map. Similarly, the Quantile map was also misleading because of the data skew. The major issue with this classification method was that many features with very similar values were placed into different classes, and the fifth class had a range that was far greater than the preceding classes. This creates ambiguity within the map and is potentially misleading to the map viewer; this is the reason that Quantile Data Classification is appropriate for data sets that follow a linear fashion, as opposed to a data set that is highly skewed such as this one.
Finally, after careful consideration between the two maps, it was evident that the number of Senior Citizens per square mile provided a more accurate depiction of census tracts with higher densities of people over the age of 65. To illustrate, a census tract with 95 people over the age of 65 with a total population of 120 yields 79% of Senior Citizens. Conversely, a census tract with 2372 people over the age of 65 with a total population of 9593 yields 24.7% of Senior Citizens. Therefore, it is imperative that the data be normalized against a standard unit of measure to avoid any obscurities. This is why the number of Senior Citizens per square mile depicts higher density levels of Senior Citizens more accurately than mapping percentages of Senior Citizens alone.
Overall, this lab assignment was an excellent opportunity to dive into different data classification methods and closely analyze how each differs from the others; this assignment allowed us to start making connections on which data classification method is appropriate to use for which kind of scenario. For comparison, I have attached the map containing percentages of Senior Citizens residing in each census tract below.
No comments:
Post a Comment