Skip to main content
Log in

A Hybrid Machine Learning Model for Modeling Nitrate Concentration in Water Sources

  • Published:
Water, Air, & Soil Pollution Aims and scope Submit manuscript

Abstract

Nitrate is one of the most dangerous contaminants that can pollute water sources; as a result, it is always tried to use accurate methods to monitor its quantity. The goal of this study is to develop a hybrid machine learning model (HELM) for modeling nitrate concentration in water resources. For this purpose, 1453 samples were collected over a 20-year period (from 2000 to 2020). To develop the HMLM, the concentrations of nitrate data were first clustered using the Jenks natural breaks method (JNBA). After that, a support vector machines (SVM) model was developed for each cluster. In this case, the trial-and-error (TaE) method was used to determine the parameters of kernel functions. However, the sequential forward floating selection (SFFS) technique was employed to select the optimal input parameters to simulate the nitrate content in each cluster. In the last step, to improve the efficiency of the SVM model, the parameters of the kernel functions were determined using the Harris-Hawkes optimization (HHO) algorithm. The HHO algorithm is used because of its ability to intelligently shift between the exploration and exploitation phases during optimization. To develop all of the models, 80% of the data was utilized for training and the remaining for testing. Finally, indices such as root mean square error (RMSE), mean absolute error (MAE), Willmott’s index agreement (WI), explained variance score (EVS), Kling-Gupta efficiency (KGE), and coefficient of determination (\({R}^{2}\)) were used to evaluate different models. Based on the results of this study, HMLM can be used to accurately model nitrate concentration. Furthermore, using the HHO optimization algorithm instead of the TaE method considerably improves the SVM model’s performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data Availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Abbreviations

ANFIS:

Adaptive neuro-fuzzy inference system

AUC:

Area under the receiver operating characteristic curve method

ANN:

Artificial neural network

AT:

Air temperature

ARIMA:

Autoregressive integrated moving average

BRT:

Boosted regression trees

\({\mathrm{R}}^{2}\)  :

Coefficient of determination

DE:

Differential evolution algorithm

EVS:

Explained variance score

FLS:

Fuzzy logic supervised

GA:

Genetic algorithm

GVF:

Goodness of variance fit

GAPM:

Groundwater aquifer potential mapping

HM-SVMS:

Hard-margin support vector machines

HHO:

Harris-Hawkes optimization algorithm

HMLM:

Hybrid machine learning model

JNB:

Jenks natural breaks algorithm

KGE:

Kling-Gupta efficiency

LK:

Linear kernel

LR:

Logistic regression

MAE:

Mean absolute error

MCM:

Million cubic meters

MQ:

Multi quadric

MABLR:

Multi-adaptive boosting logistic regression

MPL:

Multiple-layer perception

MDA:

Multivariate discriminant analysis

OOB:

Out-of-bag error

PSO:

Particle swarm optimization

PK:

Polynomial kernel

RBF:

Radial basis function

ROC:

Receiver operating characteristic curve

SFF:

Sequential forward floating selection technique

SK:

Sigmoid kernel

SM-SVMS:

Soft-margin support vector machines

SVM:

Support vector machines

TH:

Total hardness

TAE:

Trial-and-error method

WT:

Water temperature

WI:

Willmott’s index agreement

MA:

Mean absolute error

References

  • Abascal, E., Gómez-Coma, L., Ortiz, I., & Ortiz, A. (2022). Global diagnosis of nitrate pollution in groundwater and review of removal technologies. Science of the Total Environment, 810, 152233.

    Article  CAS  Google Scholar 

  • Adeloju, S. B., Khan, S., & Patti, A. F. (2021). Arsenic contamination of groundwater and its implications for drinking water quality and human health in under-developed countries and remote communities—A review. Applied Sciences, 11(4), 1926.

    Article  CAS  Google Scholar 

  • Alabool, H. M., et al. (2021). Harris hawks optimization: a comprehensive review of recent variants and applications. Neural computing and applications, 33(15), 8939–8980.

    Article  Google Scholar 

  • Amiri, S., Rajabi, A., Shabanlou, S., Yosefvand, F., & Izadbakhsh, M. A. (2023) Prediction of groundwater level variations using deep learning methods and GMS numerical model. Earth Science Informatics. https://doi.org/10.1007/s12145-023-01052-1

  • Azizpour, A., Izadbakhsh, M. A., Shabanlou, S., Yosefvand, F., & Rajabi, A. (2021). Estimation of water level fluctuations in groundwater through a hybrid learning machine. Groundwater for Sustainable Development, 15, 100687. https://doi.org/10.1016/j.gsd.2021.100687

    Article  Google Scholar 

  • Azizpour, A., Izadbakhsh, M. A., Shabanlou, S., et al. (2022). Simulation of time-series groundwater parameters using a hybrid metaheuristic neuro-fuzzy model. Environmental Science and Pollution Research, 29, 28414–28430. https://doi.org/10.1007/s11356-021-17879-4

    Article  Google Scholar 

  • Azizi, E., Yosefvand, F., Yaghoubi, B., Izadbakhsh, M. A., & Shabanlou, S. (2023). Modelling and prediction of groundwater level using wavelet transform and machine learning methods: A case study for the Sahneh Plain, Iran. Irrigation and Drainage, 72(3), 747–762. https://doi.org/10.1002/ird.2794

    Article  Google Scholar 

  • Bouchair, A., et al. (2022). A cluster-oriented policy for virtual network embedding in SDN-enabled distributed cloud. International Journal of Computing and Digital Systems, 11(1), 365–353.

    Google Scholar 

  • Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?–Arguments against avoiding RMSE in the literature. Geoscientific modeldevelopment, 7(3), 1247–1250.

    Article  Google Scholar 

  • Chen, J., et al. (2009). Optimal contraction theorem for exploration–exploitation tradeoff in search and optimization. IEEE Transactions on Systems, Man, and Cybernetics-Part a: Systems and Humans, 39(3), 680–691.

    Article  Google Scholar 

  • Cortes, C., & Vapnik, V. (1995). Support vector machine. Machine learning, 20(3), 273–297.

    Article  Google Scholar 

  • Craswell, E. (2021). Fertilizers and nitrate pollution of surface and ground water: An increasingly pervasive global problem. SN Applied Sciences, 3(4), 518.

    Article  Google Scholar 

  • Di Bucchianico, A. (2008). Coefficient of determination (R2). In F. Ruggeri, R. S. Kenett, & F. W. Faltin (Eds.), Encyclopedia of statistics in quality and reliability. John Wiley & Sons, Ltd. https://doi.org/10.1002/9780470061572.eqr173

  • El Amri, A., et al. (2022). Nitrate concentration analysis and prediction in a shallow aquifer in central-eastern Tunisia using artificial neural network and time series modelling. Environmental Science and Pollution Research, 29(28), 43300–43318.

    Article  Google Scholar 

  • Elzain, H. E., et al. (2021). ANFIS-MOA models for the assessment of groundwater contamination vulnerability in a nitrate contaminated area. Journal of Environmental Management, 286, 112162.

    Article  CAS  Google Scholar 

  • Fallahi, M. M., Shabanlou, S., Rajabi, A., et al. (2023). Effects of climate change on groundwater level variations affected by uncertainty (case study: Razan aquifer). Applied Water Science, 13, 143. https://doi.org/10.1007/s13201-023-01949-8

    Article  Google Scholar 

  • Fatemi, A. (2020). Strategies and policies for water quality management of Gharasou River, Kermanshah, Iran: A review. Environmental Earth Sciences, 79(11), 254.

    Article  Google Scholar 

  • Fu, G., et al. (2022). The role of deep learning in urban water management: A critical review. Water Research, 223, 118973.

  • Golaki, M., et al. (2022). Health risk assessment and spatial distribution of nitrate, nitrite, fluoride, and coliform contaminants in drinking water resources of Kazerun, Iran. Environmental Research, 203, 11185.

    Article  Google Scholar 

  • Hearst, M. A., et al. (1998). Support vector machines. IEEE Intelligent Systems and their Applications, 13(4), 18–28.

    Article  Google Scholar 

  • Heidari, A. A., et al. (2019). Harris hawks optimization: Algorithm and applications. Future Generation Computer Systems, 97, 849–872.

    Article  Google Scholar 

  • Karimidastenaei, Z., et al. (2022). Unconventional water resources: Global opportunities and challenges. Science of the Total Environment, 827, 154429.

    Article  CAS  Google Scholar 

  • Knoben, W. J., Freer, J. E., & Woods, R. A. (2019). Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrology and Earth System Sciences, 23(10), 4323–4331.

    Article  Google Scholar 

  • Lahjouj, A., et al. (2020). Mapping specific groundwater vulnerability to nitrate using random forest: Case of Sais basin, Morocco. Modeling Earth Systems and Environment, 6(3), 1451–1466.

    Article  Google Scholar 

  • Malekzadeh, M., et al. (2019a). A novel approach for prediction of monthly ground water level using a hybrid wavelet and non-tuned self-adaptive machine learning model. Water Resources Management, 33, 1609–1628.

    Article  Google Scholar 

  • Malekzadeh, M., Kardar, S., & Shabanlou, S. (2019). Simulation of groundwater level using MODFLOW, extreme learning machine and wavelet-extreme learning machine models. Groundwater for Sustainable Development, 9, 100279. https://doi.org/10.1016/j.gsd.2019.100279

    Article  Google Scholar 

  • Mirzaee, M., et al. (2021). Multi-objective optimization for optimal extraction of groundwater from a nitrate-contaminated aquifer considering economic-environmental issues: A case study. Journal of Contaminant Hydrology, 241, 103806.

    Article  CAS  Google Scholar 

  • Mohammed, K. S., Shabanlou, S., Rajabi, A., et al. (2023). Prediction of groundwater level fluctuations using artificial intelligence-based models and GMS. Applied Water Science, 13, 54. https://doi.org/10.1007/s13201-022-01861-7

    Article  Google Scholar 

  • Noble, W. S. (2006). What is a support vector machine? Nature Biotechnology, 24(12), 1565–1567.

    Article  CAS  Google Scholar 

  • Poursaeid, M., Mastouri, R., Shabanlou, S., & Najarchi, M. (2020). Estimation of total dissolved solids, electrical conductivity, Salinity and groundwater levels using novel learning machines. Environment and Earth Science, 79, 1–25.

    Article  Google Scholar 

  • Poursaeid, M., Mastouri, R., Shabanlou, S., & Najarchi, M. (2021). Modelling qualitative and quantitative parameters of groundwater using a new wavelet conjunction heuristic method: Wavelet extreme learning machine versus wavelet neural networks. Water Environment Journal, 35, 67–83.

    Article  Google Scholar 

  • Rizeei, H. M., et al. (2019). Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique. Journal of Hydrology, 579, 124172.

    Article  Google Scholar 

  • Roberts, J. K., Monaco, J. P., Stovall, H., & Foster, V. (2011). Explained variance in multilevel models. In J. J. Hox, & J. K. Roberts (Eds.), Handbook for advanced multilevel analysis (pp. 219–230). Routledge/Taylor & Francis Group.

  • Saha, G. K., et al. (2023). A deep learning-based novel approach to generate continuous daily stream nitrate concentration for nitrate data-sparse watersheds. Science of the Total Environment, 878, 162930.

    Article  CAS  Google Scholar 

  • Sajedi-Hosseini, F., et al. (2018). A novel machine learning-based approach for the risk assessment of nitrate groundwater contamination. Science of the Total Environment, 644, 954–962.

    Article  CAS  Google Scholar 

  • Vapnik, V. N., & Chervonenkis, A. Y. (2015). On the uniform convergence of relative frequencies of events to their probabilities. In: V. Vovk, H. Papadopoulos, & A. Gammerman (Eds.), Measures of complexity. Cham: Springer. https://doi.org/10.1007/978-3-319-21852-6_3

  • Willmott, C. J., Robeson, S. M., & Matsuura, K. (2012). A refined index of model performance. International Journal of Climatology, 32(13), 2088–2094.

    Article  Google Scholar 

  • Yosefvand, F., & Shabanlou, S. (2020). vForecasting of groundwater level using ensemble hybrid wavelet–self-adaptive extreme learning machine-based models. Natural Resources Research, 29, 3215–3232.

    Article  Google Scholar 

  • Zhang, Q., et al. (2021). Effect of hydrogeological conditions on groundwater nitrate pollution and human health risk assessment of nitrate in Jiaokou Irrigation District. Journal of Cleaner Production, 298, 126783.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors (Adnan Mazraeh, Meysam Bagherifar, Saeid Shabanlou, Reza Ekhlasmand) have an equal share in writing all parts of the article.

Corresponding author

Correspondence to Saeid Shabanlou.

Ethics declarations

Ethics Approval and Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mazraeh, A., Bagherifar, M., Shabanlou, S. et al. A Hybrid Machine Learning Model for Modeling Nitrate Concentration in Water Sources. Water Air Soil Pollut 234, 721 (2023). https://doi.org/10.1007/s11270-023-06745-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11270-023-06745-3

Keywords

Navigation