Urban Heat (Temperature), NDVI, and Cluster Analysis

Introduction

Temperature and NDVI are two major factors when considering Urban Heat Index Selection of Landsat 7 Data for Consistency with Census Intervals The Urban Heat Index (UHI) analysis in Hong Kong heavily relies on two primary factors: temperature and the Normalized Difference Vegetation Index (NDVI). To align the satellite data with census intervals, we’ve opted for data from Landsat 7 ETM+. The choice of Landsat 7, despite the availability of Landsat 8 and 9, is dictated by the need for consistency with the census data from the Hong Kong government and ESRI Hong Kong, which are available every five years. The most recent census data was from 2021, necessitating the use of data from 2011 and 2016 as well. Given that Landsat 8 started operating in 2013, it wouldn’t cover the entire period corresponding to our census data, leading to the decision to utilize Landsat 7’s data for a comprehensive analysis.

Methodology for Accurate Data Retrieval and Processing

For ensuring accuracy, our approach involved creating a mosaic composite raster image from Google Earth Engine, filtered for the entire year. This process was repeated for each year from 2011 to 2021 to extract the annual mean of temperature and NDVI for each pixel. The most challenging aspect of this section was the calibration of the data retrieved from Google Earth Engine. We encountered numerous difficulties in calibrating the band data. For instance, there were times when the NDVI worked correct (in the range) on Google Earth Engine but failed in Python. Similarly, issues were faced with the land surface temperature data. After countless attempts and explorations, we discovered a calculation method from an academic paper that provided a more refined approach. This method allowed for the conversion of digital numbers, surface reflectance, top of the atmosphere reflectance, top brightness temperature, and land surface temperature in a more accurate way.

(GEE link:https://code.earthengine.google.com/a1c8085a46f0c89f1d82f0cfcfabacd8)

Reference: Ermida, S.L., Soares, P., Mantas, V., Göttsche, F.-M., Trigo, I.F., 2020. Google Earth Engine open-source code for Land Surface Temperature estimation from the Landsat series. Remote Sensing, 12 (9), 1471; https://doi.org/10.3390/rs12091471

Load the Lansat 7 2011 Mosaic Image

<xarray.Dataset>
Dimensions:      (band: 16, x: 2503, y: 1783)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.5 114.5 114.5 114.5
  * y            (y) float64 22.57 22.57 22.57 22.57 ... 22.09 22.09 22.09 22.09
    spatial_ref  int64 ...
Data variables:
    band_data    (band, y, x) float32 ...

Read the Hong Kong District Data from ArcGIS geojson

geopandas.geodataframe.GeoDataFrame

We decided to use EPSG: 4326 since the CRS of Landsat 7 Data’s is EPSG= 4326

(113.835066606914, 22.153344109236, 114.441993257571, 22.5619493489016)

Trim our precious Raster Data to the Hong Kong Boundary

<xarray.Dataset>
Dimensions:      (band: 16, x: 2252, y: 1516)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.4 114.4 114.4 114.4
  * y            (y) float64 22.56 22.56 22.56 22.56 ... 22.15 22.15 22.15 22.15
    spatial_ref  int64 0
Data variables:
    band_data    (band, y, x) float32 1.004e+04 9.852e+03 ... 0.9903 0.9901

Raster image combined with City limits

Calculate the NDVI using Near Infrared Band and Red Band.

<xarray.Dataset>
Dimensions:      (y: 1516, x: 2252)
Coordinates:
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.4 114.4 114.4 114.4
  * y            (y) float64 22.56 22.56 22.56 22.56 ... 22.15 22.15 22.15 22.15
    spatial_ref  int64 0
Data variables:
    band_data    (y, x) float32 -0.0636 -0.06632 -0.06632 ... -0.03526 -0.02509

Plot the 2011 mean NDVI with city limits

The Band 15 is the calibrated Land Surface Temperature Band in Kelvin, so we have to subtract 273.15 from the original band data to convert it to Celsius

<xarray.Dataset>
Dimensions:      (x: 5, y: 5)
Coordinates:
    band         int64 15
  * x            (x) float64 113.8 113.8 113.8 113.8 113.8
  * y            (y) float64 22.56 22.56 22.56 22.56 22.56
    spatial_ref  int64 0
Data variables:
    band_data    (y, x) float32 19.86 19.86 19.86 20.4 ... 17.04 17.04 17.04
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    band         int64 15
    spatial_ref  int64 0
Data variables:
    band_data    float32 20.17

Using Zonal Statistics to calculate the mean temperature and NDVI for each Hong Kong District

OBJECTID ID CNAME CNAME_S ENAME Shape__Area Shape__Length geometry mean_ndvi_2011 max_temp_2016 mean_ndvi_2016 mean_temp_2011
0 1 1 黃大仙區 黄大仙区 WONG TAI SIN 1.092784e+07 17995.640782 POLYGON ((114.17942 22.34905, 114.17946 22.349... 0.211102 24.699992 0.234139 20.704102
1 2 6 九龍城區 九龙城区 KOWLOON CITY 1.184286e+07 31834.409404 MULTIPOLYGON (((114.17700 22.34904, 114.17702 ... 0.090806 26.748404 0.097568 23.277648
2 3 7 觀塘區 观塘区 KWUN TONG 1.322124e+07 25496.700164 POLYGON ((114.24371 22.28620, 114.24370 22.286... 0.121019 26.426119 0.136822 22.402509
3 4 8 西貢區 西贡区 SAI KUNG 1.602944e+08 365545.476363 MULTIPOLYGON (((114.22112 22.35318, 114.22114 ... 0.262977 24.956479 0.282872 21.255087
4 5 11 北區 北区 NORTH 1.619184e+08 192815.532996 MULTIPOLYGON (((114.33576 22.51003, 114.33576 ... 0.251814 22.677352 0.271825 19.671759
5 6 13 中西區 中西区 CENTRAL & WESTERN 1.460458e+07 26733.700864 MULTIPOLYGON (((114.14562 22.29045, 114.14990 ... 0.194695 23.794897 0.199246 18.948752
6 7 14 灣仔區 湾仔区 WAN CHAI 1.161490e+07 19548.075298 MULTIPOLYGON (((114.20012 22.27387, 114.20047 ... 0.191913 24.292477 0.201838 19.608638
7 8 15 東區 东区 EASTERN 2.184211e+07 36188.456666 POLYGON ((114.24738 22.25339, 114.24733 22.253... 0.213661 23.827175 0.236692 19.821950
8 9 17 屯門區 屯门区 TUEN MUN 9.921091e+07 99074.932441 MULTIPOLYGON (((113.93745 22.42638, 113.93765 ... 0.188711 25.061902 0.221932 20.773037
9 10 18 元朗區 元朗区 YUEN LONG 1.648768e+08 91328.856191 MULTIPOLYGON (((113.93832 22.42696, 113.93843 ... 0.196683 23.764657 0.213754 20.540728

Mean 2011 annual temperature for each district

Mean annual NDVI for each district

2016 version

<xarray.Dataset>
Dimensions:      (band: 16, x: 2503, y: 1783)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.5 114.5 114.5 114.5
  * y            (y) float64 22.57 22.57 22.57 22.57 ... 22.09 22.09 22.09 22.09
    spatial_ref  int64 ...
Data variables:
    band_data    (band, y, x) float32 ...
<xarray.Dataset>
Dimensions:      (band: 16, x: 2252, y: 1516)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.4 114.4 114.4 114.4
  * y            (y) float64 22.56 22.56 22.56 22.56 ... 22.15 22.15 22.15 22.15
    spatial_ref  int64 0
Data variables:
    band_data    (band, y, x) float32 1.061e+04 1.032e+04 ... 0.9894 0.9897
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    spatial_ref  int64 0
Data variables:
    band_data    float32 0.09237
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    band         int64 15
    spatial_ref  int64 0
Data variables:
    band_data    float32 23.15
OBJECTID ID CNAME CNAME_S ENAME Shape__Area Shape__Length geometry mean_ndvi_2011 mean_ndvi_2016 mean_temp_2011 mean_temp_2016
0 1 1 黃大仙區 黄大仙区 WONG TAI SIN 1.092784e+07 17995.640782 POLYGON ((114.17942 22.34905, 114.17946 22.349... 0.211102 0.234139 20.704102 24.699992
1 2 6 九龍城區 九龙城区 KOWLOON CITY 1.184286e+07 31834.409404 MULTIPOLYGON (((114.17700 22.34904, 114.17702 ... 0.090806 0.097568 23.277648 26.748404
2 3 7 觀塘區 观塘区 KWUN TONG 1.322124e+07 25496.700164 POLYGON ((114.24371 22.28620, 114.24370 22.286... 0.121019 0.136822 22.402509 26.426119
3 4 8 西貢區 西贡区 SAI KUNG 1.602944e+08 365545.476363 MULTIPOLYGON (((114.22112 22.35318, 114.22114 ... 0.262977 0.282872 21.255087 24.956479
4 5 11 北區 北区 NORTH 1.619184e+08 192815.532996 MULTIPOLYGON (((114.33576 22.51003, 114.33576 ... 0.251814 0.271825 19.671759 22.677352
5 6 13 中西區 中西区 CENTRAL & WESTERN 1.460458e+07 26733.700864 MULTIPOLYGON (((114.14562 22.29045, 114.14990 ... 0.194695 0.199246 18.948752 23.794897
6 7 14 灣仔區 湾仔区 WAN CHAI 1.161490e+07 19548.075298 MULTIPOLYGON (((114.20012 22.27387, 114.20047 ... 0.191913 0.201838 19.608638 24.292477
7 8 15 東區 东区 EASTERN 2.184211e+07 36188.456666 POLYGON ((114.24738 22.25339, 114.24733 22.253... 0.213661 0.236692 19.821950 23.827175
8 9 17 屯門區 屯门区 TUEN MUN 9.921091e+07 99074.932441 MULTIPOLYGON (((113.93745 22.42638, 113.93765 ... 0.188711 0.221932 20.773037 25.061902
9 10 18 元朗區 元朗区 YUEN LONG 1.648768e+08 91328.856191 MULTIPOLYGON (((113.93832 22.42696, 113.93843 ... 0.196683 0.213754 20.540728 23.764657

2021 Version

<xarray.Dataset>
Dimensions:      (band: 16, x: 2503, y: 1783)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.5 114.5 114.5 114.5
  * y            (y) float64 22.57 22.57 22.57 22.57 ... 22.09 22.09 22.09 22.09
    spatial_ref  int64 ...
Data variables:
    band_data    (band, y, x) float32 ...
<xarray.Dataset>
Dimensions:      (band: 16, x: 2252, y: 1516)
Coordinates:
  * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
  * x            (x) float64 113.8 113.8 113.8 113.8 ... 114.4 114.4 114.4 114.4
  * y            (y) float64 22.56 22.56 22.56 22.56 ... 22.15 22.15 22.15 22.15
    spatial_ref  int64 0
Data variables:
    band_data    (band, y, x) float32 9.098e+03 9.027e+03 ... 0.9903 0.9903
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    spatial_ref  int64 0
Data variables:
    band_data    float32 0.5408
<xarray.Dataset>
Dimensions:      ()
Coordinates:
    band         int64 15
    spatial_ref  int64 0
Data variables:
    band_data    float32 20.91
OBJECTID ID CNAME CNAME_S ENAME Shape__Area Shape__Length geometry mean_ndvi_2011 mean_ndvi_2016 mean_temp_2011 mean_temp_2016 mean_temp_2021 mean_ndvi_2021
0 1 1 黃大仙區 黄大仙区 WONG TAI SIN 1.092784e+07 17995.640782 POLYGON ((114.17942 22.34905, 114.17946 22.349... 0.211102 0.234139 20.704102 24.699992 21.227092 0.215096
1 2 6 九龍城區 九龙城区 KOWLOON CITY 1.184286e+07 31834.409404 MULTIPOLYGON (((114.17700 22.34904, 114.17702 ... 0.090806 0.097568 23.277648 26.748404 22.868930 0.081260
2 3 7 觀塘區 观塘区 KWUN TONG 1.322124e+07 25496.700164 POLYGON ((114.24371 22.28620, 114.24370 22.286... 0.121019 0.136822 22.402509 26.426119 22.577353 0.117286
3 4 8 西貢區 西贡区 SAI KUNG 1.602944e+08 365545.476363 MULTIPOLYGON (((114.22112 22.35318, 114.22114 ... 0.262977 0.282872 21.255087 24.956479 21.121994 0.278582
4 5 11 北區 北区 NORTH 1.619184e+08 192815.532996 MULTIPOLYGON (((114.33576 22.51003, 114.33576 ... 0.251814 0.271825 19.671759 22.677352 20.621217 0.256020
5 6 13 中西區 中西区 CENTRAL & WESTERN 1.460458e+07 26733.700864 MULTIPOLYGON (((114.14562 22.29045, 114.14990 ... 0.194695 0.199246 18.948752 23.794897 20.243429 0.200184
6 7 14 灣仔區 湾仔区 WAN CHAI 1.161490e+07 19548.075298 MULTIPOLYGON (((114.20012 22.27387, 114.20047 ... 0.191913 0.201838 19.608638 24.292477 21.345188 0.192107
7 8 15 東區 东区 EASTERN 2.184211e+07 36188.456666 POLYGON ((114.24738 22.25339, 114.24733 22.253... 0.213661 0.236692 19.821950 23.827175 21.877428 0.212786
8 9 17 屯門區 屯门区 TUEN MUN 9.921091e+07 99074.932441 MULTIPOLYGON (((113.93745 22.42638, 113.93765 ... 0.188711 0.221932 20.773037 25.061902 20.751241 0.195801
9 10 18 元朗區 元朗区 YUEN LONG 1.648768e+08 91328.856191 MULTIPOLYGON (((113.93832 22.42696, 113.93843 ... 0.196683 0.213754 20.540728 23.764657 20.284002 0.203589

Cluster Analysis

We will perform K-means Cluster Analysis for our final analytical output.

Index(['NAME_EN', 'NAME_TC', 'total_pop_2016', 'total_pop_2021',
       'total_pop_2011', 'median_age_2011', 'median_age_2016',
       'median_age_2021', 'OBJECTID', 'ID', 'CNAME', 'CNAME_S', 'ENAME',
       'Shape__Area', 'Shape__Length', 'road_length', 'area', 'road_density',
       'open_space_area', 'mean_temp_2011', 'mean_temp_2016', 'mean_temp_2021',
       'mean_ndvi_2011', 'mean_ndvi_2016', 'mean_ndvi_2021',
       'building_density', 'geometry'],
      dtype='object')

We first import all libraries we need, and then choose the key variables to determine the clustering pattern in our merged dataset, which are: Total population in 2011, 2016, 2021, median age of the population (in three years) for each district in Hong Kong, road density, open space area, mean temperature in three years, mean ndvi (normalized difference vegetation index) in three years, and building density by district.

Initially, we tested a cluster range for our preliminary testing of the clustering model. Random_state set to 42 so that we can make the randomness deterministic.

This ‘Kneed’ package can help us determine the ‘knee’ point quantitatively. It gives out the number of cluster of 4, which aligns with our initial testing and decision for Scree plot.

Scree plot is one way to determine how many clusters we should input for our kmeans model. From the shape of the curve, we probably will select 3 or 4 for our final number of cluster.

4 [2, 3, 4, 5, 6, 7, 8, 9]

After performing the kmeans analysis, we gives each district a ‘label’ so that we can visualize the cluster on the map.

label size
0 0 1
1 1 5
2 2 7
3 3 5

We finally plot our result as a map. It can be seen that four clusters are shown quite evenly across Hong Kong, except Cluster 0 (in Dark Red) which demonstrates the Hong Kong Island District with airports, few buildings, low population density, mainly mountains and high volume of vegetations).

It demonstrates that the pale blue regions (Cluster 2) are remote suburb areas with relatively low population and building density, and are having large areas of parks and natural landscapes. The orange regions (Cluster 1) represents the main populated living area where population densities are high and buildings are dense. Dark Blue (Cluster 3) regions are the main business & entertainment and travel areas where there are many buildings and population, while the area is small so density is high.

If we are to assign names to those clusters, the dark red (cluster 0) will be Emergent Mixed-Use Area, the orange (cluster 1) will be Urban Residential Zone, the pale blue (cluster 2) will be Suburb Oasis, the dark blue (cluster 3) will be Commercial and Entertainment Hub, which is also identifed as with high vulnerbility to future urban heat index.

Appendix

(https://code.earthengine.google.com/531a9a4d8b3a6c1a3fafab218f7d7159).

Navigating Data Source Challenges in Urban Heat Index Analysis Acquiring and utilizing data for our Urban Heat Index analysis involved a series of steps and challenges, particularly in data calibration and exportation. Our primary data source was a feature collection from the Google Earth Engine (GEE), which provided us with Landsat 7 satellite imagery. We initially started with the Raw Scene version of the Landsat 7 data. This version had no calibration applied to the digital numbers, meaning the data was in its most unprocessed form.

Subsequently, we discovered the “Top of Atmosphere” (TOA) version of the Landsat 7 data, which had been converted by the US Geological Survey (USGS). This TOA version seemed promising and functioned well within the Google Earth Engine environment. However, we encountered a significant issue when exporting the data from GEE to Python. Specifically, Band 3 and Band 4 values were altered during this export process. This alteration led to the NDVI values falling outside the expected range of -1 to 1, a puzzling issue that we struggled to resolve.

To address this, we implemented cloud masks to remove the impact of cloud cover from our analysis, hoping this would correct the discrepancies in the NDVI values. Unfortunately, this step did not yield the desired results. The problem might lie in exporting the data, particularly the stage where data is transferred to Google Drive. The data, when downloaded, undergoes a zipping and unzipping process on our laptops. We hypothesized that this process could potentially alter the band values for Band 3, 4, and 6 (the Thermal Band similar to Landsat 8, Band 10, and Band 11), resulting in the discrepancies we observed.