Novel approach for burst detection in water distribution systems based on graph neural networks

https://doi.org/10.1016/j.scs.2022.104090Get rights and content

Highlights

  • Novel burst detection approach based on graph convolutional neural networks.

  • Proposal of two innovative models based on graph neural networks.

  • Synthetic generation of multiple realistic case studies.

  • Results highlight that the graph-based approach outperforms state-of-the-art methods.

Abstract

Sustainable management of water resources is a key challenge for the well-being and security of current and future society worldwide. In this regard, water utilities have to ensure fresh water for all users in a demand scenario stressed by climate change along with the increase in the size of cities. Dealing with anomalies, such as leakages and pipe bursts, represents one of the major issues for efficient water distribution system (WDS) operation and management. To this end, it is crucial to count on suitable methods and technologies to provide a quick, reliable, and accurate detection of such anomalies and supply disruption events. Therefore, this work proposes a novel WDS management framework based on the development of graph convolutional neural networks (GCN) models for bursts detection in WDSs. These methods rely on a WDS graph representation for a set of pressure and flow rates measures. Such a graph is used to design two GCN-based models to identify bursts. In addition, two conventional multi-layer perceptron models are used as the benchmarks to compare the graph-based methodologies. Finally, the proposed methodology is tested on a water utility network, showing the high potential of graph convolutional networks for anomaly detection on WDSs.

Introduction

Sustainable water management is a fundamental keyword for the future since the constant increase of water demand due to socio-economic factors and climate change puts WDSs under increasing strain. Hence, wasted water is not acceptable for today’s utilities anymore. To limit losses and wastes, optimization of water pressure in WDSs and the use of smart meters can provide essential savings (Spedaletti et al., 2022). In this condition, water management needs proper care to undertake a transition towards a smart and sustainable paradigm (Oberascher et al., 2022, Puchol-Salort et al., 2021). The new era of big data (Shafiee, Barker, & Rasekh, 2018) and artificial intelligence allows innovative ways to make water management more efficient (Savic, 2019) in order to advance towards a sustainable management paradigm (Ávila, Sánchez-Romero, López-Jiménez, & Pérez-Sánchez, 2022).

Research aimed at developing new data-driven strategies for improving water management has blossomed in the last few years. Among a myriad of different applications, there highlight the development of modeling water demand (e.g., House-Peters & Chang, 2011), data analysis techniques for smart water metering systems (e.g., Nguyen et al., 2018, Rahim et al., 2020), intrusion detection methods (e.g., Mboweni, Abu-Mahfouz, & Ramotsoela, 2021) and water demand forecasting (e.g., Brentan et al., 2017, Herrera et al., 2010). All these research approaches have in common the aim of helping water utilities in efficient WDS management. This is the case of (Bakker et al., 2013), who showed how a system to forecast urban water demand in the Netherlands allowed to reduce both energy consumption and the energy cost of a WDS. These achievements encouraged the scientific community to focus on developing innovative and powerful methodologies and even the adaptation of the novel techniques that are rising across different research fields (Bronstein et al., 2017, Schmidhuber, 2015).

One crucial research task for water distribution systems lies in leak detection and water wastes reduction (Cavazzini, Pavesi, & Ardizzon, 2020). Leaks can be distinguished into background losses and pipe bursts (Puust et al., 2010, Zaman et al., 2020). While background losses are distributed along the network and mainly caused by time deterioration of a water network assets (i.e., pipes, valves, fire hydrants), pipe bursts are instead characterized by a significant crack of a pipe with a consequent high water outflow. The identification of this latter type of leak is fundamental to avoid water waste, and service interruptions (Misiunas, Lambert, Simpson, & Olsson, 2005). Consequently, the use of efficient leak detection algorithms is mandatory for the correct management of the WDSs. The development of data-driven methods for leak detection has been possible in the latest years to address this challenge, thanks to the increasing use of cyber–physical systems by water utilities and the consequent available data.

The literature review of Wu and Liu (2017) identifies three categories of approaches to address the anomaly detection problem: classification, prediction–classification, and statistical methods.

  • The classification methods consist of identifying data affected by bursts from normal data. Usually, classification methods require labeled data, which means data with full knowledge on whether an anomaly has occurred or not. Still, a range of data-driven and machine learning algorithms have been developed for these methods. For instance, Aksela, Aksela, and Vahala (2009) proposed an approach based on a self-organizing map artificial neural network (ANN) for leak detection. The output of the latter was transformed into an alarm system based on a threshold value. Mounce and Machell (2006) proposed two conceptually different ANN to detect bursts: a static ANN and a time-delay ANN. The authors showed the crucial role played by the temporal dimension.

  • The prediction–classification methods rely on the fundamental idea for which at the occurrence of an anomaly (i.e. a burst), the prediction output consistently differs from the measured data. These methods have the advantage that require only normal hydraulic data for developing a prediction model. However, in order to have this kind of normal hydraulic data, there is the need of removing outliers and abnormal data for obtaining such clean data. Therefore, some authors adopt data pre-processing and statistical methods (e.g., Romano, Kapelan, & Savić, 2014). A range of data-driven methodologies has been developed for this category. This is the case of support vector regression (e.g., Mounce et al., 2011, Zhang et al., 2016) and ANN (e.g., Arsene et al., 2012, Fang et al., 2019). However, the prediction–classification methods rely on a first prediction phase that is followed by a second classification phase.

  • The statistical methods for burst identification do not require any prediction or classification models. However, in most cases the detection relies on statistical theory, meaning this class of methods usually falls in the category of statistical process control (SPC). This consists of monitoring of process variation caused by anomaly events through control charts and analytic tools (e.g., Jung et al., 2015, Loureiro et al., 2016). Despite this class of methods does not require any of the complex data-driven models that are usually developed in the previously described classification and prediction–classification methods, the results from SPC methods can be affected by high uncertainty (Wu & Liu, 2017).

It is worth mentioning that all the different methodologies rely on the quality and the quantity of available data, which are fundamental for the proper development of anomaly detection approaches (Chan et al., 2018, Menapace et al., 2020).

Today, many data-driven problems in science and engineering have seen the rise of graph-structured approaches to provide closer representations of problems in non-Euclidean spaces (Bronstein et al., 2017). Graphs are structures that can model a set of objects (vertices) along with their relationships/connections (edges). The use of such structures has been previously adapted for dealing with engineering problems, such as anomaly detection in internet traffic networks (Herrera, Proselkov, Pérez-Hernández, & Parlikad, 2021). Other researchers use machine learning on graphs due to the ability of graphs to represent and analyze data with graph neural network (GNN) models (Wu et al., 2021). In machine learning, the non-Euclidean structure of graphs allows many different tasks such as node level classification, link prediction, and clustering, among others (Zhou et al., 2020). Graph convolutional neural networks (GCN) are one of the multiple variants of GNNs. Due to the expressive power of GCNs to learn graph representation, GCNs have demonstrated superior performance in many deep learning problems (Zhang, Tong, Xu, & Maciejewski, 2019).

GCN has recently been adopted for different applications with anomaly detection purposes. For instance, Zhou et al. (2021) proposed a cross-network contamination source identification method based on GCN, where the latter is used to capture spatial information of the network topology. The authors showed the ability of the model to identify contamination source nodes reliably, which can transfer knowledge from one WDS to another. Furthermore, Wang, Luo, and Zhou (2020) adopted GCN to build an anomaly detection model in the context of block-chain-based healthcare systems. The authors showed that their proposed approach could deal with the associated security requirements. In a different field, John, Thomas, and Emmanuel (2020) proposed an anomaly detection method based on GCN, in the context of android malware. Their proposed approach showed a promising performance compared to other machine learning techniques. Another important application is the one introduced by Arifoglu, Charif, and Bouchachia (2020), where the authors deal with the problem of detecting the abnormal activities of people affected by dementia. In this context, the authors explored the use of GCN to detect anomalies from activation data and compared the results with some state-of-art methods in their field. They highlighted the ability of the GCN model to recognize abnormal activity related to dementia.

This paper proposes a novel approach for burst detection in WDSs. This approach consists of developing a classification model based on a graph convolutional neural network (GCN) to identify abnormal data from a dataset of pressures and flow rates measurements. This novel model relies on a graph structure that represents the input and is the basis for a graph neural network classification model. The WDSs data generator developed by Menapace et al. (2020) is adopted to get demand data affected by bursts due to the complexity of labeling data for classification models. For the sake of clarity, having labeled data means having the full knowledge on burst occurrences. Therefore, the use of the generator allows to create suitable water demand time series affected by realistic burst events, following the formulations of van Zyl and Cassa (2014) and simulating them into a distributed pressure-driven hydraulic solver (Menapace & Avesani, 2019). This latter allows to simulate both time series of flow rates along the pipes and pressure at the nodes. This study uses the well-known WDS of Modena (Italy) with four datasets of synthetic water demand to test the classification methodology. Then, it is proposed to adopt such data to build a graph, where the nodes are the different meters involved, and the links represent their correlation. The graph structure is used to build the GCN-based models that aim to classify whether there is an anomaly. In particular, two models based on GCN are proposed: (1) a model based on GCN that does not use past observation in the inputs, but only data of the same time frame of the event that has to be classified; and (2) a graph convolutional recurrent neural network (GCRNN) that is also fed by past observations. GCRNN uses recurrent layers to deal with the temporal feature extraction from the data. Furthermore, to compare the performances of the two GCN-based models, it is proposed to develop two classification models based on a multi-layer perceptron (MLP) architecture. These latters are employed to emphasize the differences between the novel graph-based approaches (i.e. GCN and GCRNN models) and the conventional data-driven ones (i.e. the model based on MLP). As for the two graph-based approaches, also the two MLP models have been designed using, in one case, past observations and, in the other, only data from the same moment of the event to be classified. The results highlight a high potential of the proposed graph-based approaches to detect bursts, showing reliable detection with high accuracy in all the tested case studies. An additional novelty of this paper lies on highlighting the significant advantages and superior performances of the use of anomaly detection models that can learn from a graph structure.

Section snippets

Methodology

This section presents a description of the proposed graph-based classification methods for burst detection. Fig. 1 summarizes the overall research proposal.

The proposed methodology relies on a classification model for estimating whether a burst occurs. Given a WDS, the data generation method from (Menapace et al., 2020) is adopted to generate realistic time series of water demand affected by bursts with also the pressures in the network nodes. A description of the generator is proposed in

Benchmark models

In order to compare the results of the two graph-based proposed models, it is proposed to use the well established multi-layer perceptron architecture, which is nowadays still used for burst detection methodologies (Fallahi, Jalili Ghazizadeh, Aminnejad, & Yazdi, 2021). The proposed benchmark model mechanism is depicted in Fig. 4.

The proposed benchmark models is composed by a sequence of dense layers, starting from an input layers that is fed with the pressure and flow rate data of the

Case studies

To test the proposed burst detection methodologies, it is proposed to use the well known Modena network (Bragalli, D’Ambrosio, Lee, Lodi, & Toth, 2008) located in Italy. The sensors configuration of the network is selected by means of the procedure described in Section 2.2, as it is also shown in Zanfei, Menapace, Santopietro, and Righetti (2020). The network is depicted in Fig. 5.

Modena network model consists of 4 reservoirs, 267 nodes and 317 pipes with a total length of 71.8 kilometers.

Results and discussion

A reliable burst detection algorithm should be able to provide correct detection with TPR values close to 1 and with low detection times. Furthermore, a reduced number of false alarms with a low time persistence highlight a robust detection algorithm. The results of the proposed anomaly detection models are shown in Table 2. In particular, it reports the detection time from instantaneous to 8 h delay, and some metrics including FN, FP and TPR for the 4 considered models.

Table 2 highlights the

Conclusions

Accurate detection of leakages improves the sustainable management of water resources decreasing the water waste and increasing the resilience of such systems. This study proposes a novel approach for detecting bursts in WDSs based on graph convolutional neural networks. The proposed GCN-based models are developed to detect abnormal events, using as input the generated data of pressures and flow rates in some metering locations to build a graph structure. The position of the sensors that

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Funding

This study has been partially funded by the project “TESES-Urb - Techno-economic methodologies to investigate sustainable energy scenarios at urban level” of the Free University of Bozen-Bolzano, Italy, and by the project DIADEM “Data driven anomaly detection for sustainable water and energy smart grids management” of the Free University of Bozen-Bolzano, Italy .

References (55)

  • ShafieeM.E. et al.

    Enhancing water system models by integrating big data

    Sustainable Cities and Society

    (2018)
  • SpedalettiS. et al.

    Improvement of the energy efficiency in water systems through water losses reduction using the district metered area (DMA) approach

    Sustainable Cities and Society

    (2022)
  • WangZ. et al.

    Guardhealth: Blockchain empowered secure data management and graph convolutional network enabled anomaly detection in smart healthcare

    Journal of Parallel and Distributed Computing

    (2020)
  • ZhouJ. et al.

    Graph neural networks: A review of methods and applications

    AI Open

    (2020)
  • ZhouY. et al.

    Graph convolutional networks based contamination source identification across water distribution networks

    Process Safety and Environmental Protection

    (2021)
  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., et al. (2016). Tensorflow: A system for large-scale...
  • AkselaK. et al.

    Leakage detection in a real distribution network using a SOM

    Urban Water Journal

    (2009)
  • BakkerM. et al.

    Better water quality and higher energy efficiency by using model predictive flow control at water supply systems

    Journal of Water Supply: Research and Technology—AQUA

    (2013)
  • BengioY. et al.

    Learning long-term dependencies with gradient descent is difficult

    IEEE Transactions on Neural Networks

    (1994)
  • BragalliC. et al.

    Water network design by MINLPRep. No. RC24495

    (2008)
  • BronsteinM.M. et al.

    Geometric deep learning: going beyond euclidean data

    IEEE Signal Processing Magazine

    (2017)
  • ChanT.K. et al.

    Review of current technologies and proposed intelligent methodologies for water distributed network leakage detection

    IEEE Access

    (2018)
  • CholletF.

    Keras

    (2015)
  • FallahiH. et al.

    Leakage detection in water distribution networks using hybrid feedforward artificial neural networks

    Journal of Water Supply: Research and Technology-Aqua

    (2021)
  • FangQ. et al.

    Detection of multiple leakage points in water distribution networks based on convolutional neural networks

    Water Science and Technology: Water Supply

    (2019)
  • GrattarolaD. et al.

    Graph neural networks in tensorflow and keras with spektral

    (2020)
  • HerreraM. et al.

    Mining graph-Fourier transform time series for anomaly detection of internet traffic at core and metro networks

    IEEE Access

    (2021)
  • Cited by (20)

    View all citing articles on Scopus
    View full text