ABSTRACT
A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied to climate data to study the dynamics of a climate network. The core task to enable network dynamics analysis on climate data is the efficient computation and update of the correlation matrix for user-defined time-windows on historical and real-time data. We present TSUBASA, an algorithm for efficiently computing the exact pair-wise time-series correlation based on Pearson's correlation. By pre-computing simple and low-overhead sketches, TSUBASA can efficiently compute exact pairwise correlations on arbitrary time windows at query time. For real-time data, TSUBASA proposes a fast and incremental way of updating the correlation matrix. We provide a detailed time and space complexity analysis of TSUBASA. Our experiments show that with the same space overhead as a DFT-based approximate solution, TSUBASA has a lower sketching time and is on par with the approximate solution with respect to query time. TSUBASA is at least one order of magnitude faster than a baseline for both historical and real-time data.
- [n.d.]. GeoMesa. https://github.com/locationtech/geomesa.Google Scholar
- Sumiyoshi Abe and Norikazu Suzuki. 2004. Scale-free network of earthquakes. EPL (Europhysics Letters) 65, 4 (2004), 581.Google ScholarCross Ref
- Mohamed H. Ali, Badrish Chandramouli, Balan Sethu Raman, and Ed Katibah. 2010. Spatio-Temporal Stream Processing in Microsoft StreamInsight. IEEE Data Eng. Bull. 33, 2 (2010), 69--74.Google Scholar
- Albert Batushansky, David Toubiana, and Aaron Fait. 2016. Correlation-based network generation, visualization, and analysis as a powerful tool in biological studies: a case study in cancer cell metabolism. BioMed research international 2016 (2016).Google Scholar
- Y. Berezin, A. Gozolchiani, O. Guez, and S. Havlin. 2012. Stability of Climate Networks with Time. Scientific Reports 2 (2012).Google Scholar
- Walter Cai, Philip A. Bernstein, Wentao Wu, and Badrish Chandramouli. 2021. Optimization of Threshold Functions over Streams. PVLDB 14, 6 (2021), 878--889.Google ScholarDigital Library
- Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink?: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.Google Scholar
- Pimwadee Chaovalit, Aryya Gangopadhyay, George Karabatis, and Zhiyuan Chen. 2011. Discrete wavelet transform-based time series analysis and mining. ACM Comput. Surv. 43, 2 (2011), 6:1--6:37.Google Scholar
- Mo Chen, Junwei Han, Lei Guo, Jiahui Wang, and Ioannis Patras. 2015. Identifying valence and arousal levels via connectivity between EEG channels. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 63--69. https://doi.org/10.1109/ACII.2015.7344552Google ScholarDigital Library
- Richard Cole, Dennis E. Shasha, and Xiaojian Zhao. 2005. Fast Window Correlations over Uncooperative Time Series. In SIGKDD. 743--749.Google Scholar
- Jonathan F Donges, Yong Zou, Norbert Marwan, and Jürgen Kurths. 2009. Complex networks in climate dynamics. The European Physical Journal Special Topics 174, 1 (2009), 157--179.Google ScholarCross Ref
- Jack W Dunlap. 1937. Combinative properties of correlation coefficients. The Journal of Experimental Education 5, 3 (1937), 286--288.Google ScholarCross Ref
- James H. Faghmous and Vipin Kumar. 2014. A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science. Big Data 2, 3 (2014), 155--163.Google ScholarCross Ref
- Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, Anastasia Bezerianos, and Themis Palpanas. 2020. Data Series Progressive Similarity Search with Probabilistic Quality Guarantees. In SIGMOD. 1857--1873.Google Scholar
- Avi Gozolchiani, Kazuko Yamasaki, Oz Gazit, and Shlomo Havlin. 2008. Pattern of climate network blinking links follows El Niño events. EPL (Europhysics Letters) 83, 2 (2008), 28005.Google ScholarCross Ref
- Jun-Gi Jang, Dongjin Choi, Jinhong Jung, and U Kang. 2018. Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range. In CIKM. 1083--1092.Google Scholar
- Ravindra Khattree and Dayanand N Naik. 2000. Multivariate data reduction and discrimination. SAS Institute, Cary, North Carolina (2000).Google Scholar
- Kyunghun Kim, Hongjun Joo, Daegun Han, Soojun Kim, Taewoo Lee, and Hung Soo Kim. 2019. On complex network construction of rain gauge stations considering nonlinearity of observed daily rainfall data. Water 11, 8 (2019), 1578.Google ScholarCross Ref
- Levente J Klein, Fernando J Marianno, Conrad M Albrecht, Marcus Freitag, Siyuan Lu, Nigel Hinds, Xiaoyan Shao, Sergio Bermudez Rodriguez, and Hendrik F Hamann. 2015. PAIRS: A scalable geo-spatial data analytics platform. In Big Data. 1290--1298.Google Scholar
- Wang Lam, Lu Liu, STS Prasad, Anand Rajaraman, Zoheb Vacheri, and AnHai Doan. 2012. Muppet: MapReduce-Style Processing of Fast Data. Proc. VLDB Endow. 5, 12 (2012), 1814--1825.Google ScholarDigital Library
- Ge Luo, Ke Yi, Siu-Wing Cheng, Zhenguo Li, Wei Fan, Cheng He, and Yadong Mu. 2015. Piecewise linear approximation of streaming time series data with max-error guarantees. In ICDE. 173--184.Google Scholar
- Bilal Ahmed Memon and Hongxing Yao. 2019. Structural change and dynamics of Pakistan stock market during crisis: A complex network perspective. Entropy 21, 3 (2019), 248.Google ScholarCross Ref
- Katsiaryna Mirylenka, Michele Dallachiesa, and Themis Palpanas. 2017. Data Series Similarity Using Correlation-Aware Measures. In SSDBM. 11:1--11:12.Google Scholar
- Abdullah Mueen, Suman Nath, and Jie Liu. 2010. Fast approximate correlation for massive time-series data. In SIGMOD. 171--182.Google Scholar
- Domenico Napoletani and Timothy D Sauer. 2008. Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77, 2 (2008), 026103.Google ScholarCross Ref
- Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed Stream Computing Platform. In ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 December 2010. IEEE Computer Society, 170--177.Google ScholarDigital Library
- Thomas Nocke, Stefan Buschmann, Jonathan Friedemann Donges, Norbert Marwan, H-J Schulz, and Christian Tominski. 2015. visual analytics of climate networks. Nonlinear Processes in Geophysics 22, 5 (2015), 545--570.Google ScholarCross Ref
- Apostolos Papageorgiou, Bin Cheng, and Ernö Kovacs. 2015. Real-time data reduction at the network edge of Internet-of-Things systems. In 2015 11th international conference on network and service management (CNSM). IEEE, 284--291.Google ScholarDigital Library
- John Paparrizos, Chunwei Liu, Aaron J. Elmore, and Michael J. Franklin. 2020. Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures. In SIGMOD. 1887--1905.Google Scholar
- Karl Pearson. 1895. VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London 58, 347--352 (1895), 240--242.Google Scholar
- Han Qiu, Hoang Thanh Lam, Francesco Fusco, and Mathieu Sinn. 2018. Learning Correlation Space for Time Series. arXiv:1802.03628 [cs.LG]Google Scholar
- Davood Rafiei. 1999. On Similarity-Based Queries for Time Series Data. In ICDE. 410--417.Google Scholar
- Stephen M. Smith, Diego Vidaurre, Christian F. Beckmann, Matthew F. Glasser, Mark Jenkinson, Karla L. Miller, Thomas E. Nichols, Emma C. Robinson, Gholamreza Salimi-Khorshidi, Mark W. Woolrich, Deanna M. Barch, Kamil U?urbil, and David C. Van Essen. 2013. Functional connectomics from resting-state fMRI. Trends in Cognitive Sciences 17, 12 (2013), 666--682. https://doi.org/10.1016/j.tics.2013.09.016 Special Issue: The Connectome.Google ScholarCross Ref
- Alexis Tantet and Henk A Dijkstra. 2014. An interaction network perspective on the relation between patterns of sea surface temperature variability and global mean surface temperature. Earth System Dynamics 5, 1 (2014), 1--14.Google ScholarCross Ref
- A. A. Tsonis and P. J. Roebber. 2004. The architecture of the climate network. Physica A 333 (Feb. 2004), 497--504.Google Scholar
- Yun Long Xu, Jinshu Liu, and Fatemeh Nargesian. 2022. TSUBASA: Climate Network Construction on Historical and Real-Time Data. arXiv: submit/4225198 (2022).Google Scholar
- Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.Google ScholarDigital Library
- Xiaojian Zhao. 2006. High performance algorithms for multiple streaming time series. New York University.Google Scholar
- Yunyue Zhu and Dennis E. Shasha. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In VLDB. 358--369.Google Scholar
Index Terms
- TSUBASA: Climate Network Construction on Historical and Real-Time Data
Recommendations
TSUPY: Dynamic Climate Network Analysis Library
CIKM '22: Proceedings of the 31st ACM International Conference on Information & Knowledge ManagementA climate network represents the global climate system as a network where nodes are geographical locations each represented by time-series and edges indicate the interactions of time-series. Network science has been applied to climate data to study the ...
A hybrid deep neural network approach to estimate reference evapotranspiration using limited climate data
AbstractReference evapotranspiration (ET0) plays an undeniably important role in irrigation management. Thus, accurate estimation of ET0 is necessary to avoid over or under irrigation to increase agricultural productivity and manage water resources ...
Analysis of large scale climate data: how well climate change models and data from real sensor networks agree?
WWW '13 Companion: Proceedings of the 22nd International Conference on World Wide WebResearch on global warming and climate changes has attracted a huge attention of the scientific community and of the media in general, mainly due to the social and economic impacts they pose over the entire planet. Climate change simulation models have ...
Comments