Harmful algal blooms (HAB) open issues: A review of ecological data challenges, factor analysis and prediction approaches using data-driven method
Abstract
Ongoing research on the temporal and spatial distribution of algae ecological data has caused intricacies entailing incomprehensible data, model overfit, and inaccurate algal bloom prediction. Relevant scholars have integrated past historical data with machine learning (ML) and deep learning (DL) approaches to forecast the advent of harmful algal blooms (HAB) following successful data-driven techniques. As potential HAB outbreaks could be predicted through time-series forecasting (TSF) to gauge future events of interest, this research aimed to holistically review field-based complexities, influencing factors, and algal growth prediction trends and analyses with or without the time-series approach. It is deemed pivotal to examine algal growth factors for useful insights into the growth of algal blooms. Multiple open issues concerning indicator types and numbers, feature selection (FS) methods, ML and DL forms, and the time series-DL integration were duly highlighted. This algal growth prediction review corresponded to various (chronologically-sequenced) past studies with the algal ecology domain established as a reference directory. As a valuable resource for beginners to internalize the algae ecological informatics research patterns and scholars to optimize current prediction techniques, this study outlined the (i) aforementioned open issues with an end-to-end (E2E) evaluation process ranging from FS to predictive model performance and (ii) potential alternatives to bridge the literature gaps.
References
[1] Anderson DM. Approaches to monitoring, control and management of harmful algal blooms (HABs). Ocean & Coastal Management 2009; 52(7): 342–347. doi: 10.1016/j.ocecoaman.2009.04.006
[2] McCormick PV, Cairns J. Algae as indicators of environmental change. Journal of Applied Phycology 1994; 6(5–6): 509–526. doi: 10.1007/BF02182405
[3] Recknagel F, Michener WK. Ecological Informatics: Data Management and Knowledge Discovery. Springer; 2017.
[4] Wong KTM, Lee JHW, Hodgkiss IJ. A simple model for forecast of coastal algal blooms. Estuarine, Coastal and Shelf Science 2007; 74(1–2): 175–196. doi: 10.1016/j.ecss.2007.04.012
[5] Sun Y, Li J, Liu J, et al. Using causal discovery for feature selection in multivariate numerical time series. Machine Learning 2015; 101(1–3): 377–395. doi: 10.1007/s10994-014-5460-1
[6] Zhang H, Hu B, Wang X, et al. An action dependent heuristic dynamic programming approach for algal bloom prediction with time-varying parameters. IEEE Access 2020; 8: 26235–26246. doi: 10.1109/ACCESS.2020.2971244
[7] Lee S, Lee D. Improved prediction of harmful algal blooms in four major South Korea’s rivers using deep learning models. International Journal of Environmental Research and Public Health 2018; 15(7): 1–15. doi: 10.3390/ijerph15071322
[8] Huo S, He Z, Su J, et al. Using artificial neural network models for eutrophication prediction. Procedia Environmental Sciences 2013; 18: 310–316. doi: 10.1016/j.proenv.2013.04.040
[9] Yang X, Wu X, Hao H, He Z. Mechanisms and assessment of water eutrophication. Journal of Zhejiang University SCIENCE B 2008; 9(3): 197–209. doi: 10.1631/jzus.B0710626
[10] Adhikari R, Agrawal RK, Kant L. PSO based neural networks vs. traditional statistical models for seasonal time series forecasting. In: Proceedings of the 2013 3rd IEEE International Advance Computing Conference (IACC); 22–23 February 2013; Ghaziabad, India. pp. 719–725.
[11] Radmer RJ. Algal diversity and commercial algal products. BioScience 1996; 46(4): 263–270. doi: 10.2307/1312833
[12] Bui MH, Pham TL, Dao TS. Prediction of cyanobacterial blooms in the Dau Tieng Reservoir using an artificial neural network. Marine and Freshwater Research 2017; 68(11): 2070–2080. doi: 10.1071/MF16327
[13] Whigham PA, Recknagel F. An inductive approach to ecological time series modelling by evolutionary computation. Ecological Modelling 2001; 146(1–3): 275–287. doi: 10.1016/S0304-3800(01)00313-1
[14] Wells ML, Trainer VL, Smayda TJ, et al. Harmful algal blooms and climate change: Learning from the past and present to forecast the future. Harmful Algae 2015; 49: 68–93. doi: 10.1016/j.hal.2015.07.009
[15] Huang JD, Zheng H. Current trend of metagenomic data analytics for cyanobacteria blooms. Journal of Geoscience and Environment Protection 2017; 5(6): 198–213. doi: 10.4236/gep.2017.56018
[16] Lu J, Huang T, Hu R. Data mining on algae concentrations (chlorophyll) time series in source water based on wavelet. In: Proceedings of the 2008 Fifth International Conference on Fuzzy Systems and Knowledge Discovery; 18–20 October 2008; Ji’nan, China. pp. 611–616.
[17] Tian W, Liao Z, Zhang J. An optimization of artificial neural network model for predicting chlorophyll dynamics. Ecological Modelling 2017; 364: 42–52. doi: 10.1016/j.ecolmodel.2017.09.013
[18] Zellweger F, De Frenne P, Lenoir J, et al. Advances in microclimate ecology arising from remote sensing. Trends in Ecology & Evolution 2019; 34(4): 327–341. doi: 10.1016/j.tree.2018.12.012
[19] Kearney MR, Porter WP. NicheMapR-an R package for biophysical modelling: The microclimate model. Ecography 2017; 40(5): 664–674. doi: 10.1111/ecog.02360
[20] Amsler CD, Reed DC, Neushuli M. The microclimate inhabited by macroalgal propaguies. British Phycological Journal 1992; 27(3): 253–270. doi: 10.1080/00071619200650251
[21] Shi K, Zhang Y, Zhou Y, et al. Long-term MODIS observations of cyanobacterial dynamics in Lake Taihu: Responses to nutrient enrichment and meteorological factors. Scientific Reports 2017; 7(1): 1–16. doi: 10.1038/srep40326
[22] Cho H, Choi UJ, Park H. Deep learning application to time-series prediction of daily chlorophyll-a concentration. WIT Transactions on Ecology and the Environment 2018; 215: 157–163. doi: 10.2495/EID180141
[23] Mathulamuthu SS, Asirvadam VS, Dass SC, et al. Predicting dengue incidences using cluster based regression on climate data. In: Proceedings of the 2016 6th IEEE International Conference on Control System, Computing and Engineering (ICCSCE); 25–27 November 2016; Penang, Malaysia. pp. 245–250.
[24] Mustaffa Z, Sulaiman MH, Emawan F, et al. Dengue outbreak prediction: Hybrid meta-heuristic model. In: Proceedings of 2018 19th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD); 27–29 June 2018; Busan, Korea (South). pp. 271–274.
[25] Zhu G, Hunter J, Jiang Y. Improved prediction of dengue outbreak using the delay permutation entropy. In: Proceedings of the 2016 IEEE International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData); 15–18 December 2016; Chengdu, China. pp. 828–832.
[26] Džeroski S. Applications of symbolic machine learning to ecological modelling. Ecological Modelling 2001; 146(1–3): 263–273. doi: 10.1016/S0304-3800(01)00312-X
[27] Chen Q, Rui H, Li W, Zhang Y. Analysis of algal bloom risk with uncertainties in lakes by integrating self-organizing map and fuzzy information theory. Science of the Total Environment 2014; 482–483: 318–324. doi: 10.1016/j.scitotenv.2014.02.096
[28] Kim S. A multiple process univariate model for the prediction of chlorophyll-a concentration in river systems. International Journal of Limnology 2016; 52: 137–150. doi: 10.1051/limn/2016003
[29] Egerton TA, Morse RE, Marshall HG, Mulholland MR. Emergence of algal blooms: The effects of short-term variability in water quality on phytoplankton abundance, diversity, and community composition in a tidal estuary. Microorganisms 2014; 2(1): 33–57. doi: 10.3390/microorganisms2010033
[30] Rostam NAP, Ahamed Hassain Malim NH, Abdullah R. Development of a low-cost solar powered & real-time water quality monitoring system for Malaysia seawater aquaculture: Application & challenges. In: Proceedings of the 2020 4th International Conference on Cloud and Big Data Computing; 26–28 August 2020; United Kingdom. pp. 86–91.
[31] Caron DA, Garneau MÈ, Seubert E, et al. Harmful algae and their potential impacts on desalination operations off southern California. Water Research 2010; 44(2): 385–416. doi: 10.1016/j.watres.2009.06.051
[32] Lewitus AJ, Horner RA, Caron DA, et al. Harmful algal blooms along the North American west coast region: History, trends, causes, and impacts. Harmful Algae 2012; 19: 133–159. doi: 10.1016/j.hal.2012.06.009
[33] McGowan JA, Deyle ER, Ye H, Carter ML, et al. Predicting coastal algal blooms in southern California. Ecology 2017; 98(5): 1419–1433. doi: 10.1002/ecy.1804
[34] Pennekamp F, Iles AC, Garland J, et al. The intrinsic predictability of ecological time series and its potential to guide forecasting. Ecological Monographs 2019; 89(2): e01359. doi: 10.1002/ecm.1359
[35] Gamboa JCB. Deep learning for time-series analysis. arXiv 2017; arXiv:1701.01887. doi: 10.48550/arXiv.1701.01887
[36] Jung NC, Popescu I, Kelderman P, et al. Application of model trees and other machine learning techniques for algal growth prediction in Yongdam reservoir, Republic of Korea. Journal of Hydroinformatics 2010; 12(3): 262–274. doi: 10.2166/hydro.2009.004
[37] Bair E. Semi-supervised clustering methods. Wiley Interdisciplinary Reviews Computational Statistics 2013; 5(5): 349–361. doi: 10.1002/wics.1270
[38] Kohonen T. Self-organized formation of topologically correct feature maps. Biological Cybernetics 1982; 43(1): 59–69. doi: 10.1007/BF00337288
[39] Wu ML, Zhang YY, Dong JD, et al. Identification of coastal water quality by self-organizing map in Sanya Bay, South China Sea. Aquatic Ecosystem Health & Management 2011; 14(3): 291–297. doi: 10.1080/14634988.2011.604273
[40] Li X, Sha J, Wang ZL. Chlorophyll-a prediction of lakes with different water quality patterns in China based on hybrid neural networks. Water 2017; 9(7): 1–13. doi: 10.3390/w9070524
[41] Malek S, Salleh A, Ahmad SMS. Analysis of algal growth using Kohonen self-organizing feature map (SOM) and its prediction using rule based expert system. In: Proceedings of the 2009 International Conference on Information Management and Engineering; 3–5 April 2009; Kuala Lumpur, Malaysia. pp. 501–504.
[42] Malek S, Syed Ahmad SM, Singh SKK, et al. Assessment of predictive models for chlorophyll-a concentration of a tropical lake. BMC Bioinformatics 2011; 12(Suppl 13): S12. doi: 10.1186/1471-2105-12-S13-S12
[43] Malek S, Salleh A, Milow P, et al. Applying artificial neural network theory to exploring diatom abundance at tropical Putrajaya Lake, Malaysia. Journal of Freshwater Ecology 2012; 27(2): 211–227. doi: 10.1080/02705060.2011.635883
[44] Voutilainen A, Arvola L. SOM clustering of 21-year data of a small pristine boreal lake. Knowledge and Management of Aquatic Ecosystem 2017; 418: 36. doi: 10.1051/kmae/2017027
[45] Nitin M, Kwok-wing C. Machine-learning paradigms for selecting ecologically significant input variables. Engineering Applications of Artificial Intelligence 2007; 20(6): 735–744. doi: 10.1016/j.engappai.2006.11.016
[46] Obenour DR, Gronewold AD, Stow CA, Scavia D. Using a Bayesian hierarchical model to improve Lake Erie cyanobacteria bloom forecasts. Water Resources Research 2014; 50(10): 7847–7860. doi: 10.1002/2014WR015616
[47] Knoll LB, Hagenbuch EJ, Stevens MH, et al. Predicting eutrophication status in reservoirs at large spatial scales using landscape and morphometric variables. Inland Waters 2015; 5(3): 203–214. doi: 10.5268/IW-5.3.812
[48] Li X, Yu J, Jia Z, Song J. Harmful algal blooms prediction with machine learning models in Tolo Harbour. In: Proceedings of the 2014 International Conference on Smart Computing; 3–5 November 2014; Hong Kong, China. pp. 245–250.
[49] Aria SH, Asadollahfardi G, Heidarzadeh N. Eutrophication modelling of Amirkabir Reservoir (Iran) using an artificial neural network approach. Lakes & Reservoirs: Research and Management 2019; 24(1): 48–58. doi: 10.1111/lre.12254
[50] Guallar C, Delgado M, Diogene J, Fernandez-Tejedor M. Artificial neural network approach to population dynamics of harmful algal blooms in Alfacs Bay (NW Mediterranean): Case studies of Karlodinium and Pseudo-nitzschia. Ecological Modelling 2016; 338: 37–50. doi: 10.1016/j.ecolmodel.2016.07.009
[51] Tran TH, Hoang ND. Estimation of algal colonization growth on mortar surface using a hybridization of machine learning and metaheuristic optimization. Sādhanā 2017; 42(6): 929–939. doi: 10.1007/s12046-017-0652-6
[52] Zhang Z, Peng G, Guo F, et al. The key technologies for eutrophication simulation and algal bloom prediction in Lake Taihu, China. Environmental Earth Sciences 2016; 75(18): 1295. doi: 10.1007/s12665-016-6106-3
[53] Lou I, Xie Z, Ung WK, Mok KM. Freshwater algal bloom prediction by extreme learning machine in Macau Storage Reservoirs. In: Sun F, Toh KA, Romay M, et al. (editors). Extreme Learning Machines 2013: Algorithms and Applications. Adaptation, Learning, and Optimization. Springer, Cham; 2014. Volume 16. pp. 95–111.
[54] Fan J, Wu J, Kong W, et al. Predicting bio-indicators of aquatic ecosystems using the support vector machine model in the Taizi River, China. Sustainability 2017; 9(6): 892. doi: 10.3390/su9060892
[55] Serry H, Hassanien AE, Zaghlou S, Hefny HA. Predicting algae growth in the Nile River using meta-learning techniques. In: Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2017; 9–11 September 2017; Cairo, Egypt. pp. 745–754.
[56] Qin M, Li Z, Du Z. Red tide time series forecasting by combining ARIMA and deep belief network. Knowledge-Based Systems 2017; 125: 39–52. doi: 10.1016/j.knosys.2017.03.027
[57] Wang L, Wang X, Jin X, et al. Analysis of algae growth mechanism and water bloom prediction under the effect of multi-affecting factor. Saudi Journal of Biological Sciences 2017; 24(3): 556–562. doi: 10.1016/j.sjbs.2017.01.026
[58] Wang Y, Xie Z, Lou IC, et al. Algal bloom prediction by support vector machine and relevance vector machine with genetic algorithm optimization in freshwater reservoirs. Engineering Computations 2017; 34(2): 664–679. doi: 10.1108/EC-11-2015-0356
[59] Karki S, Sultan M, Elkadiri R, Elbayoumi T. Mapping and forecasting onsets of harmful algal blooms using MODIS data over coastal waters surrounding Charlotte County, Florida. Remote Sensing 2018; 10(10): 1–19. doi: 10.3390/rs10101656
[60] Wang H, Zhu R, Zhang J, et al. A novel and convenient method for early warning of algal cell density by chlorophyll fluorescence parameters and its application in a highland lake. Frontiers in Plant Science 2018; 9: 1–3. doi: 10.3389/fpls.2018.00869
[61] Li X, Sha J, Wang ZL. Application of feature selection and regression models for chlorophyll-a prediction in a shallow lake. Environmental Science and Pollution Research 2018; 25(20): 19488–19498. doi: 10.1007/s11356-018-2147-3
[62] Yi HS, Park S, An KG, Kwak KC. Algal bloom prediction using extreme learning machine models at artificial weirs in the Nakdong River, Korea. International Journal of Environmental Research and Public Health 2018; 15(10): 2078. doi: 10.3390/ijerph15102078
[63] Du Z, Qin M, Zhang F, Liu R. Multistep-ahead forecasting of chlorophyll a using a wavelet nonlinear autoregressive network. Knowledge-Based Systems 2018; 160: 61–70. doi: 10.1016/j.knosys.2018.06.015
[64] Nieto PG, García-Gonzalo E, Fernández JA, Muñiz CD. Water eutrophication assessment relied on various machine learning techniques: A case study in the Englishmen Lake (Northern Spain). Ecological Modelling 2019; 404: 91–102. doi: 10.1016/j.ecolmodel.2019.03.009
[65] Tian Y, Zheng B, Shen H, et al. A novel index based on the cusp catastrophe theory for predicting harmful algae blooms. Ecological Indicators 2019; 102: 746–751. doi: 10.1016/j.ecolind.2019.03.044
[66] Cho H, Park H. Merged-LSTM and multistep prediction of daily chlorophyll-a concentration for algal bloom forecast. In: IOP Conference Series: Earth and Environmental Science, Proceedings of the 2019 International Conference on Advances in Civil and Ecological Engineering Research; 1–4 July 2019; Kaohsiung, Taiwan. IOP Publishing; 2019. Volume 351.
[67] Hussein AM, Elaziz MA, Wahed MSA, Sillanpää M. A new approach to predict the missing values of algae during water quality monitoring programs based on a hybrid moth search algorithm and the random vector functional link network. Journal of Hydrology 2019; 575: 852–863. doi: 10.1016/j.jhydrol.2019.05.073
[68] Hill PR, Kumar A, Temimi M, Bull DR. HABNet: Machine learning, remote sensing-based detection of harmful algal blooms. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 2020; 13: 3229–3239. doi: 10.1109/JSTARS.2020.3001445
[69] Mamun M, Kim JJ, Alam MA, An KG. Prediction of algal chlorophyll-a and water clarity in monsoon-region reservoir using machine learning approaches. Water 2020; 12(1): 30. doi: 10.3390/w12010030
[70] Wang X, Xu L. Unsteady multi-element time series analysis and prediction based on spatial-temporal attention and error forecast fusion. Future Internet 2020; 12(2): 34. doi: 10.3390/fi12020034
[71] Song C, Zhang H. Study on turbidity prediction method of reservoirs based on long short term memory neural network. Ecological Modelling 2020; 432: 109210. doi: 10.1016/j.ecolmodel.2020.109210
[72] Zadeh LA. Fuzzy sets. Information and Control 1965; 8(3): 338–353. doi: 10.1016/S0019-9958(65)90241-X
[73] Chen Q, Mynett AE. Integration of data mining techniques and heuristic knowledge in fuzzy logic modelling of eutrophication in Taihu Lake. Ecological Modelling 2003; 162(1–2): 55–67. doi: 10.1016/S0304-3800(02)00389-7
[74] Recknagel F, French M, Harkonen P, Yabunaka KI. Artificial neural network approach for modelling and prediction of algal blooms. Ecological Modelling 1997; 96(1–3): 11–28. doi: 10.1016/S0304-3800(96)00049-X
[75] Xie Z, Lou I, Ung WK, Mok KM. Freshwater algal bloom prediction by support vector machine in Macau storage reservoirs. Mathematical Problems in Engineering 2012; 2012: 397473. doi: 10.1155/2012/397473
[76] Liu J, Zhang Y, Qian X. Modeling chlorophyll-a in Taihu Lake with machine learning models. In: Proceedings of the 2009 3rd International Conference on Bioinformatics and Biomedical Engineering; 11–13 June 2009; Beijing, China. pp. 8–13.
[77] Abdelrahim M, Merlosy C, Wang T. Hybrid machine learning approaches: A method to improve expected output of semi-structured sequential data. In: Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing (ICSC); 4–6 February 2016; Laguna Hills, CA, USA. pp. 342–345.
[78] Wang Z, Huang K, Zhou P, Guo H. A hybrid neural network model for cyanobacteria bloom in Dianchi Lake. Procedia Environmental Sciences 2010; 2: 67–75. doi: 10.1016/j.proenv.2010.10.010
[79] Daghighi A. Harmful Algae Bloom Prediction Model for Western Lake Erie Using Stepwise Multiple Regression and Genetic Programming [Master’s thesis]. Cleveland State University; 2017.
[80] Hota HS, Handa R, Shrivas AK. Time series data prediction using sliding window based RBF neural network. Available online: https://www.semanticscholar.org/paper/Time-Series-Data-Prediction-Using-Sliding-Window-Hota-Handa/91037f01fd4b845eadca0b53f5dc00d9f61ac493 (accessed on 22 June 2023).
[81] Yin J, Rao W, Yuan M, et al. Experimental study of multivariate time series forecasting models. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 3–7 November 2019; Beijing, China. pp. 2833–2839.
[82] Taieb SB, Bontempi G, Atiya AF, Sorjamaa A. A review and comparison of strategies for multi-step ahead time series forecasting based on the NN5 forecasting competition. Expert Systems with Applications 2012; 39(8): 7067–7083. doi: 10.1016/j.eswa.2012.01.039
[83] Nguyen HP, Liu J, Zio E. A long-term prediction approach based on long short-term memory neural networks with automatic parameter optimization by Tree-structured Parzen Estimator and applied to time-series data of NPP steam generators. Applied Soft Computing 2020; 89: 106116. doi: 10.1016/j.asoc.2020.106116
[84] An NH, Anh DT. Comparison of strategies for multi-step-ahead prediction of time series using neural network. In: Proceedings of the 2015 International Conference on Advanced Computing and Applications (ACOMP); 23–25 November 2015; Ho Chi Minh City, Vietnam. pp. 142–149.
[85] Taieb SB, Sorjamaa A, Bontempi G. Multiple-output modeling for multi-step-ahead time series forecasting. Neurocomputing 2010; 73(10–12): 1950–1957. doi: 10.1016/j.neucom.2009.11.030
[86] Taieb SB, Hyndman RJ. Recursive and Direct Multi-Step Forecasting: The Best of Both Worlds. Monash University; 2012.
[87] Divina F, Torres MG, Vela FAG, Noguera JLV. A comparative study of time series forecasting methods for short term electric energy consumption prediction in smart buildings. Energies 2019; 12(10): 1–23. doi: 10.3390/en12101934
[88] Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014; arXiv:1412.3555. doi: 10.48550/arXiv.1412.3555
[89] Rahman A, Shahriar MS. Algae growth prediction through identification of influential environmental variables: A machine learning approach. International Journal of Computational Intelligence and Applications 2013; 12(2): 1–19. doi: 10.1142/S1469026813500089
[90] Yin J, Rao W, Yuan M, et al. Experimental study of multivariate time series forecasting models. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management; 3–7 November 2019; Beijing, China. pp. 2833–2839.
[91] Ande R, Adebisi B, Hammoudeh M, Saleem J. Internet of Things: Evolution and technologies from a security perspective. Sustainable Cities and Society 2020; 54: 101728. doi: 10.1016/j.scs.2019.101728
[92] Venkatraman A, Hebert M, Bagnell JA. Improving multi-step prediction of learned time series models. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence; 25–30 January 2015; Austin, Texas, USA.
Copyright (c) 2023 Nur Aqilah Paskhal Rostam, Nurul Hashimah Ahamed Hassain Malim, Nur Afzalina Azmee, Renato J. Figueiredo, Mohd Azam Osman, Rosni Abdullah
This work is licensed under a Creative Commons Attribution 4.0 International License.