skip to main content
10.1145/3514221.3526177acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

TSUBASA: Climate Network Construction on Historical and Real-Time Data

Published:11 June 2022Publication History

ABSTRACT

A climate network represents the global climate system by the interactions of a set of anomaly time-series. Network science has been applied to climate data to study the dynamics of a climate network. The core task to enable network dynamics analysis on climate data is the efficient computation and update of the correlation matrix for user-defined time-windows on historical and real-time data. We present TSUBASA, an algorithm for efficiently computing the exact pair-wise time-series correlation based on Pearson's correlation. By pre-computing simple and low-overhead sketches, TSUBASA can efficiently compute exact pairwise correlations on arbitrary time windows at query time. For real-time data, TSUBASA proposes a fast and incremental way of updating the correlation matrix. We provide a detailed time and space complexity analysis of TSUBASA. Our experiments show that with the same space overhead as a DFT-based approximate solution, TSUBASA has a lower sketching time and is on par with the approximate solution with respect to query time. TSUBASA is at least one order of magnitude faster than a baseline for both historical and real-time data.

References

  1. [n.d.]. GeoMesa. https://github.com/locationtech/geomesa.Google ScholarGoogle Scholar
  2. Sumiyoshi Abe and Norikazu Suzuki. 2004. Scale-free network of earthquakes. EPL (Europhysics Letters) 65, 4 (2004), 581.Google ScholarGoogle ScholarCross RefCross Ref
  3. Mohamed H. Ali, Badrish Chandramouli, Balan Sethu Raman, and Ed Katibah. 2010. Spatio-Temporal Stream Processing in Microsoft StreamInsight. IEEE Data Eng. Bull. 33, 2 (2010), 69--74.Google ScholarGoogle Scholar
  4. Albert Batushansky, David Toubiana, and Aaron Fait. 2016. Correlation-based network generation, visualization, and analysis as a powerful tool in biological studies: a case study in cancer cell metabolism. BioMed research international 2016 (2016).Google ScholarGoogle Scholar
  5. Y. Berezin, A. Gozolchiani, O. Guez, and S. Havlin. 2012. Stability of Climate Networks with Time. Scientific Reports 2 (2012).Google ScholarGoogle Scholar
  6. Walter Cai, Philip A. Bernstein, Wentao Wu, and Badrish Chandramouli. 2021. Optimization of Threshold Functions over Streams. PVLDB 14, 6 (2021), 878--889.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Paris Carbone, Asterios Katsifodimos, Stephan Ewen, Volker Markl, Seif Haridi, and Kostas Tzoumas. 2015. Apache Flink?: Stream and Batch Processing in a Single Engine. IEEE Data Eng. Bull. 38, 4 (2015), 28--38.Google ScholarGoogle Scholar
  8. Pimwadee Chaovalit, Aryya Gangopadhyay, George Karabatis, and Zhiyuan Chen. 2011. Discrete wavelet transform-based time series analysis and mining. ACM Comput. Surv. 43, 2 (2011), 6:1--6:37.Google ScholarGoogle Scholar
  9. Mo Chen, Junwei Han, Lei Guo, Jiahui Wang, and Ioannis Patras. 2015. Identifying valence and arousal levels via connectivity between EEG channels. In 2015 International Conference on Affective Computing and Intelligent Interaction (ACII). 63--69. https://doi.org/10.1109/ACII.2015.7344552Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richard Cole, Dennis E. Shasha, and Xiaojian Zhao. 2005. Fast Window Correlations over Uncooperative Time Series. In SIGKDD. 743--749.Google ScholarGoogle Scholar
  11. Jonathan F Donges, Yong Zou, Norbert Marwan, and Jürgen Kurths. 2009. Complex networks in climate dynamics. The European Physical Journal Special Topics 174, 1 (2009), 157--179.Google ScholarGoogle ScholarCross RefCross Ref
  12. Jack W Dunlap. 1937. Combinative properties of correlation coefficients. The Journal of Experimental Education 5, 3 (1937), 286--288.Google ScholarGoogle ScholarCross RefCross Ref
  13. James H. Faghmous and Vipin Kumar. 2014. A Big Data Guide to Understanding Climate Change: The Case for Theory-Guided Data Science. Big Data 2, 3 (2014), 155--163.Google ScholarGoogle ScholarCross RefCross Ref
  14. Anna Gogolou, Theophanis Tsandilas, Karima Echihabi, Anastasia Bezerianos, and Themis Palpanas. 2020. Data Series Progressive Similarity Search with Probabilistic Quality Guarantees. In SIGMOD. 1857--1873.Google ScholarGoogle Scholar
  15. Avi Gozolchiani, Kazuko Yamasaki, Oz Gazit, and Shlomo Havlin. 2008. Pattern of climate network blinking links follows El Niño events. EPL (Europhysics Letters) 83, 2 (2008), 28005.Google ScholarGoogle ScholarCross RefCross Ref
  16. Jun-Gi Jang, Dongjin Choi, Jinhong Jung, and U Kang. 2018. Zoom-SVD: Fast and Memory Efficient Method for Extracting Key Patterns in an Arbitrary Time Range. In CIKM. 1083--1092.Google ScholarGoogle Scholar
  17. Ravindra Khattree and Dayanand N Naik. 2000. Multivariate data reduction and discrimination. SAS Institute, Cary, North Carolina (2000).Google ScholarGoogle Scholar
  18. Kyunghun Kim, Hongjun Joo, Daegun Han, Soojun Kim, Taewoo Lee, and Hung Soo Kim. 2019. On complex network construction of rain gauge stations considering nonlinearity of observed daily rainfall data. Water 11, 8 (2019), 1578.Google ScholarGoogle ScholarCross RefCross Ref
  19. Levente J Klein, Fernando J Marianno, Conrad M Albrecht, Marcus Freitag, Siyuan Lu, Nigel Hinds, Xiaoyan Shao, Sergio Bermudez Rodriguez, and Hendrik F Hamann. 2015. PAIRS: A scalable geo-spatial data analytics platform. In Big Data. 1290--1298.Google ScholarGoogle Scholar
  20. Wang Lam, Lu Liu, STS Prasad, Anand Rajaraman, Zoheb Vacheri, and AnHai Doan. 2012. Muppet: MapReduce-Style Processing of Fast Data. Proc. VLDB Endow. 5, 12 (2012), 1814--1825.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ge Luo, Ke Yi, Siu-Wing Cheng, Zhenguo Li, Wei Fan, Cheng He, and Yadong Mu. 2015. Piecewise linear approximation of streaming time series data with max-error guarantees. In ICDE. 173--184.Google ScholarGoogle Scholar
  22. Bilal Ahmed Memon and Hongxing Yao. 2019. Structural change and dynamics of Pakistan stock market during crisis: A complex network perspective. Entropy 21, 3 (2019), 248.Google ScholarGoogle ScholarCross RefCross Ref
  23. Katsiaryna Mirylenka, Michele Dallachiesa, and Themis Palpanas. 2017. Data Series Similarity Using Correlation-Aware Measures. In SSDBM. 11:1--11:12.Google ScholarGoogle Scholar
  24. Abdullah Mueen, Suman Nath, and Jie Liu. 2010. Fast approximate correlation for massive time-series data. In SIGMOD. 171--182.Google ScholarGoogle Scholar
  25. Domenico Napoletani and Timothy D Sauer. 2008. Reconstructing the topology of sparsely connected dynamical networks. Physical Review E 77, 2 (2008), 026103.Google ScholarGoogle ScholarCross RefCross Ref
  26. Leonardo Neumeyer, Bruce Robbins, Anish Nair, and Anand Kesari. 2010. S4: Distributed Stream Computing Platform. In ICDMW 2010, The 10th IEEE International Conference on Data Mining Workshops, Sydney, Australia, 13 December 2010. IEEE Computer Society, 170--177.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Thomas Nocke, Stefan Buschmann, Jonathan Friedemann Donges, Norbert Marwan, H-J Schulz, and Christian Tominski. 2015. visual analytics of climate networks. Nonlinear Processes in Geophysics 22, 5 (2015), 545--570.Google ScholarGoogle ScholarCross RefCross Ref
  28. Apostolos Papageorgiou, Bin Cheng, and Ernö Kovacs. 2015. Real-time data reduction at the network edge of Internet-of-Things systems. In 2015 11th international conference on network and service management (CNSM). IEEE, 284--291.Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Paparrizos, Chunwei Liu, Aaron J. Elmore, and Michael J. Franklin. 2020. Debunking Four Long-Standing Misconceptions of Time-Series Distance Measures. In SIGMOD. 1887--1905.Google ScholarGoogle Scholar
  30. Karl Pearson. 1895. VII. Note on regression and inheritance in the case of two parents. proceedings of the royal society of London 58, 347--352 (1895), 240--242.Google ScholarGoogle Scholar
  31. Han Qiu, Hoang Thanh Lam, Francesco Fusco, and Mathieu Sinn. 2018. Learning Correlation Space for Time Series. arXiv:1802.03628 [cs.LG]Google ScholarGoogle Scholar
  32. Davood Rafiei. 1999. On Similarity-Based Queries for Time Series Data. In ICDE. 410--417.Google ScholarGoogle Scholar
  33. Stephen M. Smith, Diego Vidaurre, Christian F. Beckmann, Matthew F. Glasser, Mark Jenkinson, Karla L. Miller, Thomas E. Nichols, Emma C. Robinson, Gholamreza Salimi-Khorshidi, Mark W. Woolrich, Deanna M. Barch, Kamil U?urbil, and David C. Van Essen. 2013. Functional connectomics from resting-state fMRI. Trends in Cognitive Sciences 17, 12 (2013), 666--682. https://doi.org/10.1016/j.tics.2013.09.016 Special Issue: The Connectome.Google ScholarGoogle ScholarCross RefCross Ref
  34. Alexis Tantet and Henk A Dijkstra. 2014. An interaction network perspective on the relation between patterns of sea surface temperature variability and global mean surface temperature. Earth System Dynamics 5, 1 (2014), 1--14.Google ScholarGoogle ScholarCross RefCross Ref
  35. A. A. Tsonis and P. J. Roebber. 2004. The architecture of the climate network. Physica A 333 (Feb. 2004), 497--504.Google ScholarGoogle Scholar
  36. Yun Long Xu, Jinshu Liu, and Fatemeh Nargesian. 2022. TSUBASA: Climate Network Construction on Historical and Real-Time Data. arXiv: submit/4225198 (2022).Google ScholarGoogle Scholar
  37. Matei Zaharia, Reynold S. Xin, Patrick Wendell, Tathagata Das, Michael Armbrust, Ankur Dave, Xiangrui Meng, Josh Rosen, Shivaram Venkataraman, Michael J. Franklin, Ali Ghodsi, Joseph Gonzalez, Scott Shenker, and Ion Stoica. 2016. Apache Spark: a unified engine for big data processing. Commun. ACM 59, 11 (2016), 56--65.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Xiaojian Zhao. 2006. High performance algorithms for multiple streaming time series. New York University.Google ScholarGoogle Scholar
  39. Yunyue Zhu and Dennis E. Shasha. 2002. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In VLDB. 358--369.Google ScholarGoogle Scholar

Index Terms

  1. TSUBASA: Climate Network Construction on Historical and Real-Time Data

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
      June 2022
      2597 pages
      ISBN:9781450392495
      DOI:10.1145/3514221

      Copyright © 2022 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 11 June 2022

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate785of4,003submissions,20%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader