Predicting Seagrass Habitats in Thailand with Machine Learning
Palika Wannawilai
Department of Geography (GEOG 350)
American River College, California, USA
Abstract
Since seagrasses play an important role in providing shelter and food for marine organisms,
combating global warming, or even storing up to 100 times more carbon-dioxide compared to
tropical forests. Monitoring and preserving this important resource are critical challenges for ocean
conservation. Due to a tropical climate, Thailand is one of the countries that locates the most
seagrasses in the world, and more than 70% of seagrasses locate in the west coast along the
Andaman Sea of Thailand. To quantify seagrass abundance, this study will use the Ecological
Marine Unit point data (EMU), provided by USGS and Esri, with machine learning based on
ArcGIS Pro 2.6.3. The result found that suitable habitats of seagrasses were found along the
coastlines in Africa, Asia, Australia, North and South America, especially in high density around
Florida and the Gulf of Mexico. Although the accuracy of the model was about 95 % which was
high enough, the result may not be in high accurate because of using different dataset.
1. Introduction
In the Andaman Sea of Thailand, seagrasses are one of the important keys to indicate the
overall health of the ecosystem (Rattanachot et al. 2). Seagrass beds play a significant role along
with the coastal communities in securing the dietary needs for marine organisms and offering
shelter and nursery areas for fishery species. However, seagrasses have been highly affected by
local stressors (overfishing, pollution, and habitat destruction, and global stressors (ocean
acidification and climate change) in recent years (Praisankul and Nabangchang-Srisawalak 140).
To solve the reduction of seagrass habitats, Ecological Marine Units or EMUs would be
able to improve the seagrass conservation. By using known seagrass habitats around the United
States with machine learning, the result will be able to predict where seagrass grows worldwide
(Orhun). Moreover, the statistical data of EMUs are baseline 3D mapped ecosystems that will
support ocean sustainability for a framework to detect the change of the ocean (Esri, “Ecological
Marine Units”).
This study aims to create a predicting seagrass habitat map around Thailand in order to
raise concerns of the seagrass protection. However, the distribution of seagrasses around the U.S.
will be used to predict the correlation with seven variables of the ocean measurements from EMUs
with machine learning to achieve this: (i) to predict suitable seagrass habitats around the world and
especially around Thailand; (ii) to learn how to use a machine learning in Arc GIS Pro; (iii) to
compare the differences of two datasets.
2. Background
In New Map Sets Framework for Describing Ocean Ecology in Unprecedented Detail by
Esri Insider, presents a better understanding of new global Ecological Marine Units or EMU which
is undertaken by Esri in collaboration with USGS. EMU map seeks to portray a systematic division
and classification of physiographic and ecological information about features in the ocean. The
map was created from 3D data visualization and developed a statistical clustering to identify the
physiographic structure such as water column, temperature, salinity, and other factors that will
likely drive ecosystem responses. Users can navigate this marine ecology 3D map through the
ocean in a wide range of ocean parameters, and it is possible to observe other various
environments, such as mangroves and coral reefs. Moreover, to learn how to predict seagrass
habitats around the world, Esri provides a lesson of “Predict seagrass habitats with machine
learning (Esri, Predict seagrass habitats with machine learning”). Since the purpose of this study
is to create a suitable seagrass habitat map by using EMU dataset, applying the machine learning
method would help the result of this study to be more precise. According to the result from Esri
learning, two Python libraries which are scikit-learn, a popular machine learning library, and
seaborn, a statistical data visualization library had about 95% of accuracy test data.
3. Methods
3.1 Create a dataset and metadata
The dataset of Global Distribution of Seagrasses derived from https://data.unep-
wcmc.org/datasets/7 by UNEP World Conservation Monitoring Centre (Figure 1) which is the
most recent data of seagrass distribution I have found. This study will use a different dataset from
Esri learning website to compare differences of two datasets. However, the dataset of EMUs and
others are downloaded from Esri learning.
Figure 1: Global Distribution of Seagrass dataset downloaded website.
To prepare input data, EMUs will be fixed its attribute table by using Fill Missing Values
tool, because it lost some information which need to be used for an analysis. Thus, there are seven
important variables; dissolve oxygen, nitrate, phosphate, salinity, silicate, stream, and temperature,
that may be influent in seagrass phenology. The result after filling the missing values in the
attribute table is shown in Figure 2.
Figure 2: The data of seven variables were fixed by Fill Missing Values tool.
In this study, I will use the sampling data of seagrasses of the United State to form a
relationship between seagrass occurrence and ocean condition by using Create Random Points
(Data Management) tool (Figure 3) and Empirical Bayesian Kriging tool (Figure 4) respectively.
To know where seagrass grows, all the overlapping points between the US Coastlines Shallow
layer and the World Seagrass layer will be calculated and given the categorical variable 1 and 0 to
show that there is known seagrass growth at that location for variable 1, and the variable 0 is for
unsuitable seagrass habitats.
Figure 3: Forming a relationship between seagrass occurrence and ocean condition by using
Create Random Points (Data Management) tool.
Figure 4: The result of all seven ocean measurements by using Empirical Bayesian Kriging tool.
3.2 Perform random forest classification
After preparing all datasets, I will use the machine learning libraries to create a prediction
model. First, the study will check the correlation of seven variables to make sure a random forest
classification is the best option. Then, the scripts will test dataset to predict seagrass occurrence
and the accuracy of the predictor. The Python scripts of every step are following in Figure 5, 6,
and 7.
Figure 5: The script for moving the data into Python.
Figure 6: The script to create a correlation chart for the seven variables.
Figure 7: Training random forest classifier using the training data that have already created.
3.3 Evaluate the prediction result
The Kernel Density is the density of point features in a neighborhood around each output
raster cell (Esri, How Kernel Density Works”). Therefore, creating a kernel density surface by
the Kernel Density tool will help to find locations where predicted seagrasses around the world
will have large concentrations.
4. Results
The first result is shown in Figure 8 which is the correlation coefficient between seven
variables by using the Python script libraries. Moreover, the test accuracy of the model is about
95% in Figure 9. When the prediction result is evaluated, the kernel density surface will be able
to show the area of high density of seagrasses around the world.
Figure 8: The correlation coefficient of seven variables.
Figure 9: The accuracy of the model is about 95%.
According to the result from the kernel density surface areas, the range of suitable seagrass
habitats around the world were calculated from 0 to 0.627. The areas of seagrasses are found
around the world, especially in the North Atlantic Ocean. Suitable habitats of seagrasses in
Thailand were found in wide areas along coastlines and the Gulf of Thailand, but in low density
(0.003). In contrast to the North America, suitable seagrass habitats were found in higher density
(0.208-0.549) along coastlines of Florida, and the Gulf of Mexico. The predicting of the
distribution of seagrass habitats around the world and Thailand are shown in Figure 10 and 11.
Figure 10: Suitable seagrass habitats around the world.
Figure 11: Predicting suitable seagrass habitats around Thailand.
5. Analysis
Figure 12: Comparison of different results from using the dataset from Esri (left) and the
UNEP World Conservation Monitoring Centre (right).
VS
According to Figure 12, two maps clearly illustrate the differences between the result of
Esri learning dataset and the UNEP World Conservation Monitoring Centre dataset, although both
cases used only the seagrass data of the U.S. to calculate the correlation with EMUs data. This
study map found that there is higher density of seagrass habitats around the North Atlantic Ocean,
especially around the coastlines of Florida and the Gulf of Mexico. Also, there are wide areas of
suitable seagrass habitats, but in lower density around Thailand and other southeast Asian
countries. It is possible to say that the wider areas of suitable seagrass habitats around the Gulf of
Thailand and the Andaman Sea of Thailand support the growth of seagrasses, because seagrasses
grow in the coastal shallow waters of moist continents (UNEP World Conservation Monitoring 2).
However, seagrasses in Thailand and other countries in southeast Asian might get higher effects
from another factors, such as overfishing and pollution due to an inappropriate management of
ocean conservation.
6. Conclusion
Suitable habitats of seagrasses are found along the coastlines in Africa, Asia, Australia,
North and South America, especially high density around Florida and Gulf of Mexico. Although
the accuracy of the model is about 95 % which is high enough, the result seems not to be in high
accurate, because seagrasses are possible to locate in the Arctic Circle areas, such as Northern
Russia, Norway, and Alaska. Thus, the size of the sample in dataset might affect a final map. For
future research, I would like to use a larger size of sample which is more than 10% and use various
areas beside the U.S. to find the correlation coefficient and to improve bias of the study.
7. References
Esri. Ecological Marine Units. www.esri.com/en-us/about/science/ecological-marine-
units/overview. Accessed 15 Dec. 2020.
Esri. How Kernel Density Works. ArcGIS Pro, pro.arcgis.com/en/pro-app/latest/tool-
reference/spatial-analyst/how-kernel-density-works.htm. Accessed 20 Dec. 2020.
Esri. Predict seagrass habitats with machine learning. Learn ArcGIS,
learn.arcgis.com/en/projects/predict-seagrass-habitats-with-machine-learning. Accessed
15 Dec. 2020.
Esri Insider. New Map Sets Framework for Describing Ocean Ecology in Unprecedented Detail.
Esri, 14 March 2016, www.esri.com/about/newsroom/insider/new-map-sets-framework-
for-describing-ocean-ecology-in-unprecedented-detail. Accessed 10 Nov. 2020.
Orhun, Aydin. The Science of Where Seagrasses Grow: ArcGIS and Machine Learning. Esri, 18
Sep. 2017, esri.com/arcgis-blog/products/analytics/analytics/the-science-of-where-
seagrasses-grow-arcgis-and-machine-learning. Accessed 20 Dec. 2020.
Praisankul, Suhatai, Nabangchang-Srisawalak, Orapan. The Economic Value of Seagrass
Ecosystem in Trang. Kasetsart University Fisheries Research Bulletin, 2016, pp. 13855.
Rattanachot, Ekkalak, et al. “Monitoring of Seagrass along Southern Andaman Coast of Thailand.”
Ecological Research, vol. 35, no. 5, 2020, pp. 77379, doi:10.1111/1440-1703.12123.
UNEP World Conservation Monitoring. Seagrass. Marine biodiversity features, terminology and
areas, 4 Dec. 2014, biodiversitya-z.org/content/seagrass.pdf. Accessed 20 Dec. 2020.