ICDE 2024 Tutorials

Accepted Tutorials

Tuesday May 14th, 2024 @ 16:21

Tutorial on Large Language Models: Principles and Practice [ In Theater 1 ]

Abstract:
The last few years have been marked by several breakthroughs in the domain of generative AI. Large language models such as GPT-4 are able to solve a plethora of tasks, ranging from text and code generation to multimodal data analysis, without task-specific training data.
This tutorial, targeted at database researchers without prior background in language models, introduces language models as well as relevant use cases in the context of data management. The tutorial covers the fundamental principles enabling language models, including the Transformer architecture, pre-training, and alignment. Furthermore, the tutorial will show how to use language models in practice, leveraging OpenAI's GPT model to build a natural language query interface as a demonstration. Finally, the tutorial will discuss recent research exploiting language models in the context of data management.

Presenters:

Immanuel Trummer is an assistant professor at Cornell University and heads the Cornell Database Group. His papers were selected for “Best of VLDB”, “Best of SIGMOD", for the ACM SIGMOD Research Highlight Award, and for publication in CACM as CACM Research Highlight. His online lecture introducing students to database topics collected over a million views. He received the NSF CAREER Award and multiple Google Faculty Research Awards.

Tuesday May 14th, 2024 @ 10:15

Tutorial on Bipartite Graph Analytics: Current Techniques and Future Trends [ In Theater 1 ]

Abstract:
As the field of data science continues to evolve, bipartite graphs have emerged as a fundamental structure in numerous applications, drawing significant interest from both academic and industrial communities. Bipartite graphs are a specific type of graph consisting of two distinct sets of vertices, where connections only occur between vertices of different sets. Examples include e-commerce networks and biological networks. Analytics of bipartite graphs has become an important research topic in the era of big data. This tutorial aims to shed light on analysis methods for bipartite graphs, categorizing them into three areas: classical models, learning-based models, and application-driven models. We start by outlining the importance of bipartite graph analytics, and the unique challenges that need to be addressed. Then, we conduct a thorough review of existing works on bipartite graph analytics. We also compare and analyze the models and solutions in these works. Finally, we point out new research directions.

Presenters:

Ying Zhang is a Professor and ARC Future Fellow (2017- 2021) at Australia Artificial Intelligence Institute (AAII), the University of Technology Sydney (UTS). He received his BSc and MSc degrees in Computer Science from Peking University, and PhD in Computer Science from the University of New South Wales. His research interests include query processing and analytics on large-scale data with focus on graphs and high dimensional data.
Kai Wang Kai Wang is an assistant professor at Antai College of Economics and Management, Shanghai Jiao Tong University. He received the BEng degree in Computer Science from Zhejiang University in 2016, and the PhD degree in Computer Science from the University of New South Wales in 2020. His research interests lie in big data analytics, especially for the graph/network and spatial data.
Wenjie Zhang is a full Professor and ARC Future Fellow in School of Computer Science and Engineering, the University of New South Wales, Australia. She has published over 200 research papers in top venues in database area such as SIGMOD, VLDB, ICDE, PODS, TODS, VLDBJ, and TKDE. Her papers were nominated as Best of SIGMOD and ICDE, and receive Best Paper Awards from DASFAA, WISE, APWeb and ADC. She received Chris Wallace Award from Australasian Computing Research and Education (CORE) in 2019. She serves as an Associate Editor for TKDE and VLDB Journal, and area chair for ICDE/VLDB/ICDM.
Hanchen Wang is a postdoctoral research associate at AAII, FEIT, University of Technology Sydney, Australia. He received BSc degree from Zhejiang University, China in 2016, and PhD degree from University of Technology Sydney, Australia in 2021. His research interests include graph analytics and machine learning for databases.

Wednesday May 15th, 2024 @ 10:00 and 17:00

Tutorial on Privacy-Aware Analysis based on Data Series [ In Theater 1 ]

Abstract:
Data that is recorded about the operations of an organization constitutes a valuable source of information for monitoring and improvement. Specific use cases include the assessment of compliance to legal regulations, the analysis of performance bottlenecks, or the optimization of resource utilization. In recent years, a plethora of algorithms for operational analysis using data series, summarized as process mining, have been developed to support these use cases. Data series often contain sensitive information, though, about the individuals that act as service consumers or service providers. Personal information is only partially hidden by obfuscation and pseudonymization and potential privacy breaches need to be prevented for ethical, legal, and economic reasons.
This tutorial is devoted to methods for privacy-aware analysis using data series. It covers essential notions, reviews privacy-disclosure attacks, and outlines techniques to give formal privacy guarantees while largely maintaining the data’s utility for operational analysis. The discussion is structured by the adopted perspective on the privacy of individuals, and the degree to which a data series contains contextual information.

Presenters:

Stephan Fahrenkrog-Petersen is a research group lead at the Weizenbaum Institute, Germany. He holds a PhD from Humboldt-Universität zu Berlin. His research was published in the proceedings of the premier conferences in the field and in international journals, such as ACM TMIS, DKE, and KAIS. His work received the Distinguished Paper Award at CAiSE 2021 and the Best Student Paper Award at ICPM 2021.
Han van der Aa is a junior professor in the Data and Web Science Group at the University of Mannheim, Germany. He obtained a PhD from the Vrije Universiteit Amsterdam in 2018. His research interests include process modelling, process mining, natural language processing, and complex event processing. His work has been published in journals including IEEE TKDE, Information Systems, and Decision Support Systems and at the BPM, CAISE, ICPM, ICDE, and SIGMOD conferences.
Matthias Weidlich is a professor and Chair of Databases and Information Systems at Humboldt-Universität zu Berlin, Germany. Matthias’ research focuses on process-oriented and event-based information systems. His results appear regularly in premier conferences (SIGMOD, VLDB, ICDE, IJCAI, AAAI, BPM, CAiSE) and journals (TKDE, Information Systems, VLDB Journal) in the field. He serves as Co-Editor-in-Chief for the Information Systems journal and is a member of the steering committee of the ACM DEBS conference series.

Thursday May 16th, 2024 @ 13:15

Tutorial on Robust Query Optimization in the Era of Machine Learning: State-of-the-Art and Future Directions [ In Theater 1 ]

Abstract:
Query optimizers are an essential component of relational database management systems (RDBMSs) as they search for an execution plan that is expected to be optimal for a given query. However, they commonly use parameter estimates that are often inaccurate and make assumptions that may not hold in practice. Consequently, the optimizer may select suboptimal execution plans at runtime, when these estimates and assumptions are not valid, which may result in poor query performance. Therefore, query optimizers do not sufficiently support the robustness of the database system. In this tutorial we aim to explore the notion of robustness of a query execution plan, as well as how robustness is evaluated or even further supported. Firstly, we provide a comprehensive definition of robustness in this context. Next, we review the approaches proposed in the literature to address the issue of robustness, including techniques that rely on query re-optimization, discovering parameters, quantifying robustness, as well as recent techniques that employ machine learning. We emphasize the comparison of traditional cost-model-based and recent ML-based techniques concerning their capacity to address the issue of robustness in query optimization. Finally, we discuss the limitations and gaps in the current literature and provide some recommendations for future research directions.

Presenters:

Amin Kamali is a Ph.D. Candidate in Digital Transformation and Innovation at the University of Ottawa. Over the past decade, he has occupied diverse roles at IBM, all centered around Data and AI. These roles have spanned a spectrum from Business Intelligence to Digital Transformation, Data Science and Machine Learning. Amin holds a B.Sc. in Industrial Engineering from Sharif University of Technology and an M.Sc. in Systems Science from the University of Ottawa. His primary research interest revolves around delving into the latest AI breakthroughs and their potential to transform data systems. Amin has been privileged to organize multiple workshops for various audiences and to present at several academic conferences.
Verena Kantere has held academic positions as: Professor at the School of EECS of the University of Ottawa, Assistant Professor at the School of ECE of the National Technical University of Athens (NTUA), Maître d’Enseignement et de Recherche at the Centre Universitaire d’ Informatique of the University of Geneva and Junior Assistant Professor at the Department of Electrical Engineering and Information Technology at the Cyprus University of Technology. She has conducted research for many years in the domain of data management, showing results in Peer-to-Peer systems, scientific data management, cloud data management, Big Data management, and analysis. She has received an Engineering Diploma and a Ph.D. from the NTUA and a M.Sc. from the University of Toronto.
Calisto Zuzarte is a Senior Technical Staff Member (STSM) in the Db2 development organization in IBM. His expertise is in database query optimization and has 60+ patents and 60+ research publications related to this area. His current interest is in the application of machine learning in query optimization and optimization in the Lakehouse environment.

Thursday May 16th, 2024 @ 16:30

Tutorial on Quantum data management: from theory to opportunities [ In Theater 1 ]

Abstract:
Quantum computing has emerged as a transformative tool for future data management. Classical problems in database domains, including query optimization, data integration, and transaction management, have recently been addressed using techniques from quantum computing. This tutorial aims to establish the theoretical foundation essential for enhancing methodologies and practical implementations in this line of research.
Moreover, this tutorial takes a forward-looking approach by delving into recent strides in quantum internet technologies and the nonlocality theory. We aim to shed light on the uncharted territory of future data systems tailored for the quantum internet.

Presenters:

Rihan Hai is an assistant professor at TU Delft, Netherlands. Her research focuses on data management for machine learning, federated learning, and quantum data management. She has served as a PC member of VLDB, and ICDE, and a journal reviewer for TKDE, VLDBJ, SIGMOD Record, JMLR and TPDS.
Shih-Han Hung is a postdoc at Academia Sinica. His research aims to better understand the power and the limit of quantum computers. Previously, he was a postdoc at the University of Texas at Austin. He received his Ph.D. from the University of Maryland.
Sebastian Feld is an assistant professor at Quantum & Computer Engineering department of TU Delft, Netherlands. He and his group are working on Quantum Machine Learning. Before, he was head of Quantum Applications and Research Laboratory (QAR-Lab) at LMU Munich.

Friday May 17th, 2024 @ 8:30

Tutorial on An Interactive Dive into Time-Series Anomaly Detection [ In Theater 1 ]

Abstract:
Anomaly detection is an important problem in data analytics with applications in many domains. In recent years, there has been an increasing interest in anomaly detection tasks applied to time series. In this tutorial, we take a holistic view of anomaly detection in time series, starting from the core definitions and taxonomies related to time series and anomaly types, to an extensive description of the anomaly detection methods proposed by different communities in the literature. We explore the literature and the proposed methods by demonstrating systems that help users understand the core computational steps of some methods and navigate benchmark results. Finally, we describe the problem of model selection for anomaly detection and discuss recent experimental results.

Presenters:

Paul Boniol is a researcher at Inria, member of the VALDA project-team. Previously, he worked at ENS Paris-Saclay (Centre Borelli), Universit ́e Paris Cit ́e, EDF Research lab, and Ecole Polytechnique (LIX). His research interests lie between data analytics, machine learning, and time-series analysis. His Ph.D. dissertation focused on subsequence anomaly detection and time-series classification. His work has been published in the top data management and analytics venues.
John Paparrizos is an assistant professor at The Ohio State University, leading The DATUM Lab. His research focuses on adaptive solutions for data-intensive and machine-learning applications. His doctoral work was recognized at the 2019 ACM SIGKDD Doctoral Dissertation Award competition. He has also received the inaugural ACM SIGMOD Research Highlight Award, a NetApp Faculty Award, and the 2023 IEEE TCDE Rising Star Award. His ideas have been widely adopted across scientific areas, Fortune 100-500 companies (e.g., Exelon and Nokia), and organizations such as ESA.
Themis Palpanas is an elected Senior Member of the French University Insitute (IUF), and Distinguished Professor of computer science at the University of Paris (France). He is the author of 14 patents, has received 3 best paper awards and the IBM SUR award, has been Program Chair for VLDB 2025 and IEEE BigData 2023, General Chair for VLDB 2013, and has served Editor in Chief for BDR. He has been working in the fields of Data Series Management and Analytics for more than 15 years, and has developed several of the state of the art techniques. He has delivered 19 tutorials in top conferences.

Friday May 17th, 2024 @ 14:00 and 16:20

A Comprehensive Tutorial on the over 100 years of Diagrammatic Representations of Logical Statements and Relational Queries [ In Theater 1 ]

Abstract:
Query formulation is increasingly performed by systems that need to guess a user's intent (e.g. via spoken word interfaces). But how can a user know that the computational agent is returning answers to the "right" query? More generally, given that relational queries can become pretty complicated, how can we help users understand existing relational queries, whether human-generated or automatically generated? Now seems the right moment to revisit a topic that predates the birth of the relational model: developing visual metaphors that help users understand relational queries.
This lecture-style tutorial surveys the key visual metaphors developed for diagrammatic representations of logical statements and relational expressions across both the relational database community and the much older diagrammatic reasoning community. We will survey the history and state-of-the art of relationally-complete diagrammatic representations of relational queries, discuss the key visual metaphors developed in over a century of investigating diagrammatic languages, and organize the landscape by mapping their used visual alphabets to the syntax and semantics of Relational Algebra (RA) and Relational Calculus (RC). Tutorial website: https://northeastern-datalab.github.io/diagrammatic-representation-tutorial/

Presenters:

Wolfgang Gatterbauer is an Associate Professor at the Khoury College of Computer Sciences at Northeastern University. His research interests lie in the intersection of theory and practice of data management. He received an NSF Career award and -- with his students and collaborators -- a best paper award at EDBT 2021, best-of-conference mentions for PODS 2021, SIGMOD 2017, WALCOM 2017, and VLDB 2015, and two out of three reproducibility awards for papers published at SIGMOD 2020.