Abstract
Nitrate is one of the most dangerous contaminants that can pollute water sources; as a result, it is always tried to use accurate methods to monitor its quantity. The goal of this study is to develop a hybrid machine learning model (HELM) for modeling nitrate concentration in water resources. For this purpose, 1453 samples were collected over a 20-year period (from 2000 to 2020). To develop the HMLM, the concentrations of nitrate data were first clustered using the Jenks natural breaks method (JNBA). After that, a support vector machines (SVM) model was developed for each cluster. In this case, the trial-and-error (TaE) method was used to determine the parameters of kernel functions. However, the sequential forward floating selection (SFFS) technique was employed to select the optimal input parameters to simulate the nitrate content in each cluster. In the last step, to improve the efficiency of the SVM model, the parameters of the kernel functions were determined using the Harris-Hawkes optimization (HHO) algorithm. The HHO algorithm is used because of its ability to intelligently shift between the exploration and exploitation phases during optimization. To develop all of the models, 80% of the data was utilized for training and the remaining for testing. Finally, indices such as root mean square error (RMSE), mean absolute error (MAE), Willmott’s index agreement (WI), explained variance score (EVS), Kling-Gupta efficiency (KGE), and coefficient of determination (\({R}^{2}\)) were used to evaluate different models. Based on the results of this study, HMLM can be used to accurately model nitrate concentration. Furthermore, using the HHO optimization algorithm instead of the TaE method considerably improves the SVM model’s performance.
Similar content being viewed by others
Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Abbreviations
- ANFIS:
-
Adaptive neuro-fuzzy inference system
- AUC:
-
Area under the receiver operating characteristic curve method
- ANN:
-
Artificial neural network
- AT:
-
Air temperature
- ARIMA:
-
Autoregressive integrated moving average
- BRT:
-
Boosted regression trees
- \({\mathrm{R}}^{2}\) :
-
Coefficient of determination
- DE:
-
Differential evolution algorithm
- EVS:
-
Explained variance score
- FLS:
-
Fuzzy logic supervised
- GA:
-
Genetic algorithm
- GVF:
-
Goodness of variance fit
- GAPM:
-
Groundwater aquifer potential mapping
- HM-SVMS:
-
Hard-margin support vector machines
- HHO:
-
Harris-Hawkes optimization algorithm
- HMLM:
-
Hybrid machine learning model
- JNB:
-
Jenks natural breaks algorithm
- KGE:
-
Kling-Gupta efficiency
- LK:
-
Linear kernel
- LR:
-
Logistic regression
- MAE:
-
Mean absolute error
- MCM:
-
Million cubic meters
- MQ:
-
Multi quadric
- MABLR:
-
Multi-adaptive boosting logistic regression
- MPL:
-
Multiple-layer perception
- MDA:
-
Multivariate discriminant analysis
- OOB:
-
Out-of-bag error
- PSO:
-
Particle swarm optimization
- PK:
-
Polynomial kernel
- RBF:
-
Radial basis function
- ROC:
-
Receiver operating characteristic curve
- SFF:
-
Sequential forward floating selection technique
- SK:
-
Sigmoid kernel
- SM-SVMS:
-
Soft-margin support vector machines
- SVM:
-
Support vector machines
- TH:
-
Total hardness
- TAE:
-
Trial-and-error method
- WT:
-
Water temperature
- WI:
-
Willmott’s index agreement
- MA:
-
Mean absolute error
References
Abascal, E., Gómez-Coma, L., Ortiz, I., & Ortiz, A. (2022). Global diagnosis of nitrate pollution in groundwater and review of removal technologies. Science of the Total Environment, 810, 152233.
Adeloju, S. B., Khan, S., & Patti, A. F. (2021). Arsenic contamination of groundwater and its implications for drinking water quality and human health in under-developed countries and remote communities—A review. Applied Sciences, 11(4), 1926.
Alabool, H. M., et al. (2021). Harris hawks optimization: a comprehensive review of recent variants and applications. Neural computing and applications, 33(15), 8939–8980.
Amiri, S., Rajabi, A., Shabanlou, S., Yosefvand, F., & Izadbakhsh, M. A. (2023) Prediction of groundwater level variations using deep learning methods and GMS numerical model. Earth Science Informatics. https://doi.org/10.1007/s12145-023-01052-1
Azizpour, A., Izadbakhsh, M. A., Shabanlou, S., Yosefvand, F., & Rajabi, A. (2021). Estimation of water level fluctuations in groundwater through a hybrid learning machine. Groundwater for Sustainable Development, 15, 100687. https://doi.org/10.1016/j.gsd.2021.100687
Azizpour, A., Izadbakhsh, M. A., Shabanlou, S., et al. (2022). Simulation of time-series groundwater parameters using a hybrid metaheuristic neuro-fuzzy model. Environmental Science and Pollution Research, 29, 28414–28430. https://doi.org/10.1007/s11356-021-17879-4
Azizi, E., Yosefvand, F., Yaghoubi, B., Izadbakhsh, M. A., & Shabanlou, S. (2023). Modelling and prediction of groundwater level using wavelet transform and machine learning methods: A case study for the Sahneh Plain, Iran. Irrigation and Drainage, 72(3), 747–762. https://doi.org/10.1002/ird.2794
Bouchair, A., et al. (2022). A cluster-oriented policy for virtual network embedding in SDN-enabled distributed cloud. International Journal of Computing and Digital Systems, 11(1), 365–353.
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific modeldevelopment, 7(3), 1247–1250.
Chen, J., et al. (2009). Optimal contraction theorem for exploration–exploitation tradeoff in search and optimization. IEEE Transactions on Systems, Man, and Cybernetics-Part a: Systems and Humans, 39(3), 680–691.
Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine learning, 20(3), 273–297.
Craswell, E. (2021). Fertilizers and nitrate pollution of surface and ground water: An increasingly pervasive global problem. SN Applied Sciences, 3(4), 518.
Di Bucchianico, A. (2008). Coefficient of determination (R2). In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470061572.eqr173
El Amri, A., et al. (2022). Nitrate concentration analysis and prediction in a shallow aquifer in central-eastern Tunisia using artificial neural network and time series modelling. Environmental Science and Pollution Research, 29(28), 43300–43318.
Elzain, H. E., et al. (2021). ANFIS-MOA models for the assessment of groundwater contamination vulnerability in a nitrate contaminated area. Journal of Environmental Management, 286, 112162.
Fallahi, M. M., Shabanlou, S., Rajabi, A., et al. (2023). Effects of climate change on groundwater level variations affected by uncertainty (case study: Razan aquifer). Applied Water Science, 13, 143. https://doi.org/10.1007/s13201-023-01949-8
Fatemi, A. (2020). Strategies and policies for water quality management of Gharasou River, Kermanshah, Iran: A review. Environmental Earth Sciences, 79(11), 254.
Fu, G., et al. (2022). The role of deep learning in urban water management: A critical review. Water Research, 223, 118973.
Golaki, M., et al. (2022). Health risk assessment and spatial distribution of nitrate, nitrite, fluoride, and coliform contaminants in drinking water resources of Kazerun, Iran. Environmental Research, 203, 11185.
Hearst, M. A., et al. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28.
Heidari, A. A., et al. (2019). Harris hawks optimization: Algorithm and applications. Future Generation Computer Systems, 97, 849–872.
Karimidastenaei, Z., et al. (2022). Unconventional water resources: Global opportunities and challenges. Science of the Total Environment, 827, 154429.
Knoben, W. J., Freer, J. E., & Woods, R. A. (2019). Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331.
Lahjouj, A., et al. (2020). Mapping specific groundwater vulnerability to nitrate using random forest: Case of Sais basin, Morocco. Modeling Earth Systems and Environment, 6(3), 1451–1466.
Malekzadeh, M., et al. (2019a). A novel approach for prediction of monthly ground water level using a hybrid wavelet and non-tuned self-adaptive machine learning model. Water Resources Management, 33, 1609–1628.
Malekzadeh, M., Kardar, S., & Shabanlou, S. (2019). Simulation of groundwater level using MODFLOW, extreme learning machine and wavelet-extreme learning machine models. Groundwater for Sustainable Development, 9, 100279. https://doi.org/10.1016/j.gsd.2019.100279
Mirzaee, M., et al. (2021). Multi-objective optimization for optimal extraction of groundwater from a nitrate-contaminated aquifer considering economic-environmental issues: A case study. Journal of Contaminant Hydrology, 241, 103806.
Mohammed, K. S., Shabanlou, S., Rajabi, A., et al. (2023). Prediction of groundwater level fluctuations using artificial intelligence-based models and GMS. Applied Water Science, 13, 54. https://doi.org/10.1007/s13201-022-01861-7
Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.
Poursaeid, M., Mastouri, R., Shabanlou, S., & Najarchi, M. (2020). Estimation of total dissolved solids, electrical conductivity, Salinity and groundwater levels using novel learning machines. Environment and Earth Science, 79, 1–25.
Poursaeid, M., Mastouri, R., Shabanlou, S., & Najarchi, M. (2021). Modelling qualitative and quantitative parameters of groundwater using a new wavelet conjunction heuristic method: Wavelet extreme learning machine versus wavelet neural networks. Water Environment Journal, 35, 67–83.
Rizeei, H. M., et al. (2019). Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique. Journal of Hydrology, 579, 124172.
Roberts, J. K., Monaco, J. P., Stovall, H., & Foster, V. (2011). Explained variance in multilevel models. In J. J. Hox, & J. K. Roberts (Eds.), Handbook for advanced multilevel analysis (pp. 219–230). Routledge/Taylor & Francis Group.
Saha, G. K., et al. (2023). A deep learning-based novel approach to generate continuous daily stream nitrate concentration for nitrate data-sparse watersheds. Science of the Total Environment, 878, 162930.
Sajedi-Hosseini, F., et al. (2018). A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Science of the Total Environment, 644, 954–962.
Vapnik, V. N., & Chervonenkis, A. Y. (2015). On the uniform convergence of relative frequencies of events to their probabilities. In: V. Vovk, H. Papadopoulos, & A. Gammerman (Eds.), Measures of complexity. Cham: Springer. https://doi.org/10.1007/978-3-319-21852-6_3
Willmott, C. J., Robeson, S. M., & Matsuura, K. (2012). A refined index of model performance. International Journal of Climatology, 32(13), 2088–2094.
Yosefvand, F., & Shabanlou, S. (2020). vForecasting of groundwater level using ensemble hybrid wavelet–self-adaptive extreme learning machine-based models. Natural Resources Research, 29, 3215–3232.
Zhang, Q., et al. (2021). Effect of hydrogeological conditions on groundwater nitrate pollution and human health risk assessment of nitrate in Jiaokou Irrigation District. Journal of Cleaner Production, 298, 126783.
Author information
Authors and Affiliations
Contributions
All authors (Adnan Mazraeh, Meysam Bagherifar, Saeid Shabanlou, Reza Ekhlasmand) have an equal share in writing all parts of the article.
Corresponding author
Ethics declarations
Ethics Approval and Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mazraeh, A., Bagherifar, M., Shabanlou, S. et al. A Hybrid Machine Learning Model for Modeling Nitrate Concentration in Water Sources. Water Air Soil Pollut 234, 721 (2023). https://doi.org/10.1007/s11270-023-06745-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11270-023-06745-3