Harmful algal blooms (HAB) open issues: A review of ecological data challenges, factor analysis and prediction approaches using data-driven method

  • Nur Aqilah Paskhal Rostam School of Computer Sciences Universiti Sains Malaysia
  • Nurul Hashimah Ahamed Hassain Malim School of Computer Sciences Universiti Sains Malaysia
  • Nur Afzalina Azmee Faculty of Science and Mathematics, Universiti Pendidikan Sultan Idris
  • Renato J. Figueiredo Department of Electrical and Computer Engineering of the University of Florida
  • Mohd Azam Osman School of Computer Sciences Universiti Sains Malaysia
  • Rosni Abdullah School of Computer Sciences Universiti Sains Malaysia
Ariticle ID: 100
164 Views, 123 PDF Downloads
Keywords: data-driven prediction method, harmful algal bloom, time series forecasting, machine learning, deep learning

Abstract

Ongoing research on the temporal and spatial distribution of algae ecological data has caused intricacies entailing incomprehensible data, model overfit, and inaccurate algal bloom prediction. Relevant scholars have integrated past historical data with machine learning (ML) and deep learning (DL) approaches to forecast the advent of harmful algal blooms (HAB) following successful data-driven techniques. As potential HAB outbreaks could be predicted through time-series forecasting (TSF) to gauge future events of interest, this research aimed to holistically review field-based complexities, influencing factors, and algal growth prediction trends and analyses with or without the time-series approach. It is deemed pivotal to examine algal growth factors for useful insights into the growth of algal blooms. Multiple open issues concerning indicator types and numbers, feature selection (FS) methods, ML and DL forms, and the time series-DL integration were duly highlighted. This algal growth prediction review corresponded to various (chronologically-sequenced) past studies with the algal ecology domain established as a reference directory. As a valuable resource for beginners to internalize the algae ecological informatics research patterns and scholars to optimize current prediction techniques, this study outlined the (i) aforementioned open issues with an end-to-end (E2E) evaluation process ranging from FS to predictive model performance and (ii) potential alternatives to bridge the literature gaps.

References

Anderson DM. Approaches to monitoring, control and management of harmful algal blooms (HABs). Ocean & Coastal Management 2009; 52(7): 342–347. doi: 10.1016/j.ocecoaman.2009.04.006

McCormick PV, Cairns J. Algae as indicators of environmental change. Journal of Applied Phycology 1994; 6(5–6): 509–526. doi: 10.1007/BF02182405

Recknagel F, Michener WK. Ecological Informatics: Data Management and Knowledge Discovery. Springer; 2017.

Wong KTM, Lee JHW, Hodgkiss IJ. A simple model for forecast of coastal algal blooms. Estuarine, Coastal and Shelf Science 2007; 74(1–2): 175–196. doi: 10.1016/j.ecss.2007.04.012

Sun Y, Li J, Liu J, et al. Using causal discovery for feature selection in multivariate numerical time series. Machine Learning 2015; 101(1–3): 377–395. doi: 10.1007/s10994-014-5460-1

Zhang H, Hu B, Wang X, et al. An action dependent heuristic dynamic programming approach for algal bloom prediction with time-varying parameters. IEEE Access 2020; 8: 26235–26246. doi: 10.1109/ACCESS.2020.2971244

Lee S, Lee D. Improved prediction of harmful algal blooms in four major South Korea’s rivers using deep learning models. International Journal of Environmental Research and Public Health 2018; 15(7): 1–15. doi: 10.3390/ijerph15071322

Huo S, He Z, Su J, et al. Using artificial neural network models for eutrophication prediction. Procedia Environmental Sciences 2013; 18: 310–316. doi: 10.1016/j.proenv.2013.04.040

Yang X, Wu X, Hao H, He Z. Mechanisms and assessment of water eutrophication. Journal of Zhejiang University SCIENCE B 2008; 9(3): 197–209. doi: 10.1631/jzus.B0710626

Adhikari R, Agrawal RK, Kant L. PSO based neural networks vs. traditional statistical models for seasonal time series forecasting. In: Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC); 22–23 February 2013; Ghaziabad, India. pp. 719–725.

Radmer RJ. Algal diversity and commercial algal products. BioScience 1996; 46(4): 263–270. doi: 10.2307/1312833

Bui MH, Pham TL, Dao TS. Prediction of cyanobacterial blooms in the Dau Tieng Reservoir using an artificial neural network. Marine and Freshwater Research 2017; 68(11): 2070–2080. doi: 10.1071/MF16327

Whigham PA, Recknagel F. An inductive approach to ecological time series modelling by evolutionary computation. Ecological Modelling 2001; 146(1–3): 275–287. doi: 10.1016/S0304-3800(01)00313-1

Wells ML, Trainer VL, Smayda TJ, et al. Harmful algal blooms and climate change: Learning from the past and present to forecast the future. Harmful Algae 2015; 49: 68–93. doi: 10.1016/j.hal.2015.07.009

Huang JD, Zheng H. Current trend of metagenomic data analytics for cyanobacteria blooms. Journal of Geoscience and Environment Protection 2017; 5(6): 198–213. doi: 10.4236/gep.2017.56018

Lu J, Huang T, Hu R. Data mining on algae concentrations (chlorophyll) time series in source water based on wavelet. In: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery; 18–20 October 2008; Ji’nan, China. pp. 611–616.

Tian W, Liao Z, Zhang J. An optimization of artificial neural network model for predicting chlorophyll dynamics. Ecological Modelling 2017; 364: 42–52. doi: 10.1016/j.ecolmodel.2017.09.013

Zellweger F, De Frenne P, Lenoir J, et al. Advances in microclimate ecology arising from remote sensing. Trends in Ecology & Evolution 2019; 34(4): 327–341. doi: 10.1016/j.tree.2018.12.012

Kearney MR, Porter WP. NicheMapR-an R package for biophysical modelling: The microclimate model. Ecography 2017; 40(5): 664–674. doi: 10.1111/ecog.02360

Amsler CD, Reed DC, Neushuli M. The microclimate inhabited by macroalgal propaguies. British Phycological Journal 1992; 27(3): 253–270. doi: 10.1080/00071619200650251

Shi K, Zhang Y, Zhou Y, et al. Long-term MODIS observations of cyanobacterial dynamics in Lake Taihu: Responses to nutrient enrichment and meteorological factors. Scientific Reports 2017; 7(1): 1–16. doi: 10.1038/srep40326

Cho H, Choi UJ, Park H. Deep learning application to time-series prediction of daily chlorophyll-a concentration. WIT Transactions on Ecology and the Environment 2018; 215: 157–163. doi: 10.2495/EID180141

Mathulamuthu SS, Asirvadam VS, Dass SC, et al. Predicting dengue incidences using cluster based regression on climate data. In: Proceedings of the 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE); 25–27 November 2016; Penang, Malaysia. pp. 245–250.

Mustaffa Z, Sulaiman MH, Emawan F, et al. Dengue outbreak prediction: Hybrid meta-heuristic model. In: Proceedings of 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD); 27–29 June 2018; Busan, Korea (South). pp. 271–274.

Zhu G, Hunter J, Jiang Y. Improved prediction of dengue outbreak using the delay permutation entropy. In: Proceedings of the 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData); 15–18 December 2016; Chengdu, China. pp. 828–832.

Džeroski S. Applications of symbolic machine learning to ecological modelling. Ecological Modelling 2001; 146(1–3): 263–273. doi: 10.1016/S0304-3800(01)00312-X

Chen Q, Rui H, Li W, Zhang Y. Analysis of algal bloom risk with uncertainties in lakes by integrating self-organizing map and fuzzy information theory. Science of the Total Environment 2014; 482–483: 318–324. doi: 10.1016/j.scitotenv.2014.02.096

Kim S. A multiple process univariate model for the prediction of chlorophyll-a concentration in river systems. International Journal of Limnology 2016; 52: 137–150. doi: 10.1051/limn/2016003

Egerton TA, Morse RE, Marshall HG, Mulholland MR. Emergence of algal blooms: The effects of short-term variability in water quality on phytoplankton abundance, diversity, and community composition in a tidal estuary. Microorganisms 2014; 2(1): 33–57. doi: 10.3390/microorganisms2010033

Rostam NAP, Ahamed Hassain Malim NH, Abdullah R. Development of a low-cost solar powered & real-time water quality monitoring system for Malaysia seawater aquaculture: Application & challenges. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing; 26–28 August 2020; United Kingdom. pp. 86–91.

Caron DA, Garneau MÈ, Seubert E, et al. Harmful algae and their potential impacts on desalination operations off southern California. Water Research 2010; 44(2): 385–416. doi: 10.1016/j.watres.2009.06.051

Lewitus AJ, Horner RA, Caron DA, et al. Harmful algal blooms along the North American west coast region: History, trends, causes, and impacts. Harmful Algae 2012; 19: 133–159. doi: 10.1016/j.hal.2012.06.009

McGowan JA, Deyle ER, Ye H, Carter ML, et al. Predicting coastal algal blooms in southern California. Ecology 2017; 98(5): 1419–1433. doi: 10.1002/ecy.1804

Pennekamp F, Iles AC, Garland J, et al. The intrinsic predictability of ecological time series and its potential to guide forecasting. Ecological Monographs 2019; 89(2): e01359. doi: 10.1002/ecm.1359

Gamboa JCB. Deep learning for time-series analysis. arXiv 2017; arXiv:1701.01887. doi: 10.48550/arXiv.1701.01887

Jung NC, Popescu I, Kelderman P, et al. Application of model trees and other machine learning techniques for algal growth prediction in Yongdam reservoir, Republic of Korea. Journal of Hydroinformatics 2010; 12(3): 262–274. doi: 10.2166/hydro.2009.004

Bair E. Semi-supervised clustering methods. Wiley Interdisciplinary Reviews Computational Statistics 2013; 5(5): 349–361. doi: 10.1002/wics.1270

Kohonen T. Self-organized formation of topologically correct feature maps. Biological Cybernetics 1982; 43(1): 59–69. doi: 10.1007/BF00337288

Wu ML, Zhang YY, Dong JD, et al. Identification of coastal water quality by self-organizing map in Sanya Bay, South China Sea. Aquatic Ecosystem Health & Management 2011; 14(3): 291–297. doi: 10.1080/14634988.2011.604273

Li X, Sha J, Wang ZL. Chlorophyll-a prediction of lakes with different water quality patterns in China based on hybrid neural networks. Water 2017; 9(7): 1–13. doi: 10.3390/w9070524

Malek S, Salleh A, Ahmad SMS. Analysis of algal growth using Kohonen self-organizing feature map (SOM) and its prediction using rule based expert system. In: Proceedings of the 2009 International Conference on Information Management and Engineering; 3–5 April 2009; Kuala Lumpur, Malaysia. pp. 501–504.

Malek S, Syed Ahmad SM, Singh SKK, et al. Assessment of predictive models for chlorophyll-a concentration of a tropical lake. BMC Bioinformatics 2011; 12(Suppl 13): S12. doi: 10.1186/1471-2105-12-S13-S12

Malek S, Salleh A, Milow P, et al. Applying artificial neural network theory to exploring diatom abundance at tropical Putrajaya Lake, Malaysia. Journal of Freshwater Ecology 2012; 27(2): 211–227. doi: 10.1080/02705060.2011.635883

Voutilainen A, Arvola L. SOM clustering of 21-year data of a small pristine boreal lake. Knowledge and Management of Aquatic Ecosystem 2017; 418: 36. doi: 10.1051/kmae/2017027

Nitin M, Kwok-wing C. Machine-learning paradigms for selecting ecologically significant input variables. Engineering Applications of Artificial Intelligence 2007; 20(6): 735–744. doi: 10.1016/j.engappai.2006.11.016

Obenour DR, Gronewold AD, Stow CA, Scavia D. Using a Bayesian hierarchical model to improve Lake Erie cyanobacteria bloom forecasts. Water Resources Research 2014; 50(10): 7847–7860. doi: 10.1002/2014WR015616

Knoll LB, Hagenbuch EJ, Stevens MH, et al. Predicting eutrophication status in reservoirs at large spatial scales using landscape and morphometric variables. Inland Waters 2015; 5(3): 203–214. doi: 10.5268/IW-5.3.812

Li X, Yu J, Jia Z, Song J. Harmful algal blooms prediction with machine learning models in Tolo Harbour. In: Proceedings of the 2014 International Conference on Smart Computing; 3–5 November 2014; Hong Kong, China. pp. 245–250.

Aria SH, Asadollahfardi G, Heidarzadeh N. Eutrophication modelling of Amirkabir Reservoir (Iran) using an artificial neural network approach. Lakes & Reservoirs: Research and Management 2019; 24(1): 48–58. doi: 10.1111/lre.12254

Guallar C, Delgado M, Diogene J, Fernandez-Tejedor M. Artificial neural network approach to population dynamics of harmful algal blooms in Alfacs Bay (NW Mediterranean): Case studies of Karlodinium and Pseudo-nitzschia. Ecological Modelling 2016; 338: 37–50. doi: 10.1016/j.ecolmodel.2016.07.009

Tran TH, Hoang ND. Estimation of algal colonization growth on mortar surface using a hybridization of machine learning and metaheuristic optimization. Sādhanā 2017; 42(6): 929–939. doi: 10.1007/s12046-017-0652-6

Zhang Z, Peng G, Guo F, et al. The key technologies for eutrophication simulation and algal bloom prediction in Lake Taihu, China. Environmental Earth Sciences 2016; 75(18): 1295. doi: 10.1007/s12665-016-6106-3

Lou I, Xie Z, Ung WK, Mok KM. Freshwater algal bloom prediction by extreme learning machine in Macau Storage Reservoirs. In: Sun F, Toh KA, Romay M, et al. (editors). Extreme Learning Machines 2013: Algorithms and Applications. Adaptation, Learning, and Optimization. Springer, Cham; 2014. Volume 16. pp. 95–111.

Fan J, Wu J, Kong W, et al. Predicting bio-indicators of aquatic ecosystems using the support vector machine model in the Taizi River, China. Sustainability 2017; 9(6): 892. doi: 10.3390/su9060892

Serry H, Hassanien AE, Zaghlou S, Hefny HA. Predicting algae growth in the Nile River using meta-learning techniques. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017; 9–11 September 2017; Cairo, Egypt. pp. 745–754.

Qin M, Li Z, Du Z. Red tide time series forecasting by combining ARIMA and deep belief network. Knowledge-Based Systems 2017; 125: 39–52. doi: 10.1016/j.knosys.2017.03.027

Wang L, Wang X, Jin X, et al. Analysis of algae growth mechanism and water bloom prediction under the effect of multi-affecting factor. Saudi Journal of Biological Sciences 2017; 24(3): 556–562. doi: 10.1016/j.sjbs.2017.01.026

Wang Y, Xie Z, Lou IC, et al. Algal bloom prediction by support vector machine and relevance vector machine with genetic algorithm optimization in freshwater reservoirs. Engineering Computations 2017; 34(2): 664–679. doi: 10.1108/EC-11-2015-0356

Karki S, Sultan M, Elkadiri R, Elbayoumi T. Mapping and forecasting onsets of harmful algal blooms using MODIS data over coastal waters surrounding Charlotte County, Florida. Remote Sensing 2018; 10(10): 1–19. doi: 10.3390/rs10101656

Wang H, Zhu R, Zhang J, et al. A novel and convenient method for early warning of algal cell density by chlorophyll fluorescence parameters and its application in a highland lake. Frontiers in Plant Science 2018; 9: 1–3. doi: 10.3389/fpls.2018.00869

Li X, Sha J, Wang ZL. Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake. Environmental Science and Pollution Research 2018; 25(20): 19488–19498. doi: 10.1007/s11356-018-2147-3

Yi HS, Park S, An KG, Kwak KC. Algal bloom prediction using extreme learning machine models at artificial weirs in the Nakdong River, Korea. International Journal of Environmental Research and Public Health 2018; 15(10): 2078. doi: 10.3390/ijerph15102078

Du Z, Qin M, Zhang F, Liu R. Multistep-ahead forecasting of chlorophyll a using a wavelet nonlinear autoregressive network. Knowledge-Based Systems 2018; 160: 61–70. doi: 10.1016/j.knosys.2018.06.015

Nieto PG, García-Gonzalo E, Fernández JA, Muñiz CD. Water eutrophication assessment relied on various machine learning techniques: A case study in the Englishmen Lake (Northern Spain). Ecological Modelling 2019; 404: 91–102. doi: 10.1016/j.ecolmodel.2019.03.009

Tian Y, Zheng B, Shen H, et al. A novel index based on the cusp catastrophe theory for predicting harmful algae blooms. Ecological Indicators 2019; 102: 746–751. doi: 10.1016/j.ecolind.2019.03.044

Cho H, Park H. Merged-LSTM and multistep prediction of daily chlorophyll-a concentration for algal bloom forecast. In: IOP Conference Series: Earth and Environmental Science, Proceedings of the 2019 International Conference on Advances in Civil and Ecological Engineering Research; 1–4 July 2019; Kaohsiung, Taiwan. IOP Publishing; 2019. Volume 351.

Hussein AM, Elaziz MA, Wahed MSA, Sillanpää M. A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network. Journal of Hydrology 2019; 575: 852–863. doi: 10.1016/j.jhydrol.2019.05.073

Hill PR, Kumar A, Temimi M, Bull DR. HABNet: Machine learning, remote sensing-based detection of harmful algal blooms. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020; 13: 3229–3239. doi: 10.1109/JSTARS.2020.3001445

Mamun M, Kim JJ, Alam MA, An KG. Prediction of algal chlorophyll-a and water clarity in monsoon-region reservoir using machine learning approaches. Water 2020; 12(1): 30. doi: 10.3390/w12010030

Wang X, Xu L. Unsteady multi-element time series analysis and prediction based on spatial-temporal attention and error forecast fusion. Future Internet 2020; 12(2): 34. doi: 10.3390/fi12020034

Song C, Zhang H. Study on turbidity prediction method of reservoirs based on long short term memory neural network. Ecological Modelling 2020; 432: 109210. doi: 10.1016/j.ecolmodel.2020.109210

Zadeh LA. Fuzzy sets. Information and Control 1965; 8(3): 338–353. doi: 10.1016/S0019-9958(65)90241-X

Chen Q, Mynett AE. Integration of data mining techniques and heuristic knowledge in fuzzy logic modelling of eutrophication in Taihu Lake. Ecological Modelling 2003; 162(1–2): 55–67. doi: 10.1016/S0304-3800(02)00389-7

Recknagel F, French M, Harkonen P, Yabunaka KI. Artificial neural network approach for modelling and prediction of algal blooms. Ecological Modelling 1997; 96(1–3): 11–28. doi: 10.1016/S0304-3800(96)00049-X

Xie Z, Lou I, Ung WK, Mok KM. Freshwater algal bloom prediction by support vector machine in Macau storage reservoirs. Mathematical Problems in Engineering 2012; 2012: 397473. doi: 10.1155/2012/397473

Liu J, Zhang Y, Qian X. Modeling chlorophyll-a in Taihu Lake with machine learning models. In: Proceedings of the 2009 3rd International Conference on Bioinformatics and Biomedical Engineering; 11–13 June 2009; Beijing, China. pp. 8–13.

Abdelrahim M, Merlosy C, Wang T. Hybrid machine learning approaches: A method to improve expected output of semi-structured sequential data. In: Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC); 4–6 February 2016; Laguna Hills, CA, USA. pp. 342–345.

Wang Z, Huang K, Zhou P, Guo H. A hybrid neural network model for cyanobacteria bloom in Dianchi Lake. Procedia Environmental Sciences 2010; 2: 67–75. doi: 10.1016/j.proenv.2010.10.010

Daghighi A. Harmful Algae Bloom Prediction Model for Western Lake Erie Using Stepwise Multiple Regression and Genetic Programming [Master’s thesis]. Cleveland State University; 2017.

Hota HS, Handa R, Shrivas AK. Time series data prediction using sliding window based RBF neural network. Available online: https://www.semanticscholar.org/paper/Time-Series-Data-Prediction-Using-Sliding-Window-Hota-Handa/91037f01fd4b845eadca0b53f5dc00d9f61ac493 (accessed on 22 June 2023).

Yin J, Rao W, Yuan M, et al. Experimental study of multivariate time series forecasting models. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 3–7 November 2019; Beijing, China. pp. 2833–2839.

Taieb SB, Bontempi G, Atiya AF, Sorjamaa A. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Systems with Applications 2012; 39(8): 7067–7083. doi: 10.1016/j.eswa.2012.01.039

Nguyen HP, Liu J, Zio E. A long-term prediction approach based on long short-term memory neural networks with automatic parameter optimization by Tree-structured Parzen Estimator and applied to time-series data of NPP steam generators. Applied Soft Computing 2020; 89: 106116. doi: 10.1016/j.asoc.2020.106116

An NH, Anh DT. Comparison of strategies for multi-step-ahead prediction of time series using neural network. In: Proceedings of the 2015 International Conference on Advanced Computing and Applications (ACOMP); 23–25 November 2015; Ho Chi Minh City, Vietnam. pp. 142–149.

Taieb SB, Sorjamaa A, Bontempi G. Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 2010; 73(10–12): 1950–1957. doi: 10.1016/j.neucom.2009.11.030

Taieb SB, Hyndman RJ. Recursive and Direct Multi-Step Forecasting: The Best of Both Worlds. Monash University; 2012.

Divina F, Torres MG, Vela FAG, Noguera JLV. A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings. Energies 2019; 12(10): 1–23. doi: 10.3390/en12101934

Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014; arXiv:1412.3555. doi: 10.48550/arXiv.1412.3555

Rahman A, Shahriar MS. Algae growth prediction through identification of influential environmental variables: A machine learning approach. International Journal of Computational Intelligence and Applications 2013; 12(2): 1–19. doi: 10.1142/S1469026813500089

Yin J, Rao W, Yuan M, et al. Experimental study of multivariate time series forecasting models. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 3–7 November 2019; Beijing, China. pp. 2833–2839.

Ande R, Adebisi B, Hammoudeh M, Saleem J. Internet of Things: Evolution and technologies from a security perspective. Sustainable Cities and Society 2020; 54: 101728. doi: 10.1016/j.scs.2019.101728

Venkatraman A, Hebert M, Bagnell JA. Improving multi-step prediction of learned time series models. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence; 25–30 January 2015; Austin, Texas, USA.

Published
2023-11-17
How to Cite
Rostam, N. A. P., Ahamed Hassain Malim, N. H., Azmee, N. A., Figueiredo, R. J., Osman, M. A., & Abdullah, R. (2023). Harmful algal blooms (HAB) open issues: A review of ecological data challenges, factor analysis and prediction approaches using data-driven method. Computing and Artificial Intelligence, 1(1), 100. https://doi.org/10.59400/cai.v1i1.100
Section
Review