Tutorials

Tutorial 1: Complex Event Recognition Languages.

Abstract

In a broad range of domains, contemporary applications require processing of continuously owing data from geographically distributed sources at extremely high and unpredictable rates to obtain timely responses to complex queries. Complex event recognition (CER) — event pattern matching — refers to the detection of events in Big Data streams, thereby providing the opportunity to implement reactive and proactive measures. Changes in time, location, status, or business conditions are sensed, reported, and filtered, which yields a stream of events in which complex events are to be identified. Example applications consist of the recognition of attacks in computer network nodes, human activities on video content, emerging stories and trends on the Social Web , traffic and transport incidents in smart cities, violations of maritime regulations, cardiac arrhythmias, epidemic spread, and trading opportunities in the stock market. In each scenario, CER allows to make sense of streaming data, react accordingly and potentially prepare for taking counter-measures.

The challenges imposed by CER relate to the often-quoted four ‘V’s as well as their distribution. Velocity — number of events per time unit —, volume — overall amount of events —, variety — differently structured events —, lack of veracity — uncertainty of event occurrence —, and the distribution of event sources, complicate CER. Consider, for example, credit card fraud management where fraud shall be detected, or even forecast, within 25 milliseconds, in order to prevent nancial loss. Event patterns expressing fraudulent activity are highly complex and diverse, involving hundreds of rules and performance indicators that heavily depend on the country, merchant, amount and customer. Fraud detection is a needle in the haystack problem as fraudulent transactions constitute at most 0.2% of the total number of transactions. Yet, recent work showed how credit card fraud management can be tackled using event processing technology. For instance, the SPEEDD project2 recognises fraud using up to 10,000 transactions/sec streaming from points-of-sale distributed all over the world, while exploiting around 700 million historical events.

Presenters

Alexander Artikis is an Assistant Professor in the University of Piraeus, and a Research Associate in the Institute of Informatics & Telecommunications at NCSR “Demokritos”, in Athens, Greece, where he leads the Complex Event Recognition lab (http://cer.iit. demokritos.gr/). He holds a PhD from Imperial College London on the topic of multi-agent systems, while his research interests lie in the areas of artificial intelligence and distributed systems. He has published over 70 papers in related journals, such as Artificial Intelligence, Machine Learning, the ACM Transactions on Autonomous and Adaptive Systems, the ACM Transactions on Computational Logic, the ACM Transactions on Intelligent Systems and Technology and the IEEE Transactions on Knowledge and Data Engineering, as well as highly competitive conferences, including DEBS, ECML, AAMAS and ECAI. He has been working on several EU-funded projects on event processing, such as datACRON, PRONTO and INSIGHT. He is the scientific coordinator of the SPEEDD project2 . Alexander has been serving as a member of the programme committees of several international conferences, including IJCAI, DEBS, CIKM, ECAI, AAMAS and AAAI.

Matthias Weidlich is a Junior Professor at Humboldt-Universität zu Berlin (HU) heading the Process-Driven Architectures group that is funded by a prestigious Emmy Noether grant from the German Research Foundation (DFG). He is a Visiting Researcher in the Department of Computing at Imperial College London, UK, where he was a research associate before joining HU in April 2015. Earlier, he held positions as a research fellow and adjunct lecturer at the Technion – The Israel Institute of Technology, Israel. He received his PhD in Computer Science from the Hasso Plattner Institute (HPI), University of Potsdam, Germany, in 2011. He published more than 80 articles in journals, such as IEEE Transactions on Software Engineering, IEEE Transactions on Knowledge and Data Engineering, Information Systems, The Computer Journal, and highly competitive conferences including BPM, CAiSE, SIGMOD, ICDE, and ICSOC. He received awards in the ACM DEBS Grand Challenge in 2013 and 2014, and got the Best Paper Award at ICSOC 2010. He has served as a PC co-Chair of the BPM 2015 conference and is the General co-Chair of the ACM DEBS 2016 conference. He is also an Area Editor of Elsevier’s Information Systems and co-organiser of a Dagstuhl Seminar on Integrating Process-oriented and Event-based Systems in 2016.

Alessandro Margara is an assistant professor at Politecnico di Milano. His research interests focus on middleware solutions to ease the design and development of complex distributed systems, with a special emphasis on event stream processing and reactive systems. His work includes the definition of languages for event and stream processing and the implementation of parallel and distributed algorithms to efficiently support such languages. Alessandro completed his PhD at Dipartimento di Elettronica e Informazione at Politecnico di Milano. He has been a postdoctoral researcher at Vrije Universiteit Amsterdam and at Università della Svizzera italiana, Lugano. Alessandro published tens of papers in international journals and conferences in the areas of software engineering, distributed systems, and event-based middleware. Some of Alessandro’s recent publications appear in ACM Computing Surveys, IEEE Transactions on Parallel and Distributed Systems, IEEE Transactions on Software Engineering, DEBS’14, ICDCS’14, DEBS’15, ICSE’15. Alessandro has been serving as a member of the organizing committee of DEBS’17 and Middleware’17. He has been serving as a member of the program committee of several international conferences, including DEBS, RuleML, CCGrid, and IoTDI.

Martin Ugarte is a postdoctoral researcher at the Université Libre de Bruxelles in Brussels, Belgium, where he has been working for the past year under the Laboratory for Web & Information Systems. He holds a PhD from the Ponthical Catholic University of Chile on the topic of logic and query languages for the Semantic Web. His current research interests include query languages and query answering over dynamic datasets, information extraction, and other aspects of data management. Martin has recently published papers in top international conferences, including the 25th World Wide Web Conference (WWW2016), the 35th Principles of Database Systems symposium series (PODS 2016) and the 18th International Conference on Database Theory (ICDT 2015). During his PhD he worked for the Center for Semantic Web Research (www.ciws.cl), and currently forms part of SPICES, a Brussels-funded project on the topic of complex event processing. He has participated in the organization of conferences such as ICDT 2015 and EDBT 2015, and recently co-chaired the Second KEYSTONE Conference (IKC 2016)

Stijn Vansummeren is Assistant Professor at the Université Libre de Bruxelles in Brussels, Belgium where he is co-director of the Laboratory for Web & Information Systems. He holds a joint PhD from Hasselt University and the Transnational University of Limburg, Belgium. His research interests lie in the wide area of data and information management. He has published over 50 papers in top-ranked journals such as the Journal of the ACM, ACM Transactions on Database Systems, ACM Transactions on Computational Logic, ACM Transactions on the Web as well as highly competitive conferences such as SIGMOD, VLDB, WWW, PODS, and ICDT. He is currently scientific coordinator of SPICES, a Brussels-funded project concerning event processing for computer security. Stijn has been serving as a member of the programme committees of several international conferences including WWW, EDBT, PODS, and ICDT.

Tutorial 2: Sliding-Window Aggregation Algorithms.

Abstract

Aggregation is a common important operation in streaming applications. Such applications often need an aggregated summary of the most recent data in a stream, because the most recent data is also the most relevant. While a stream is thought to be infinite, the most recent data is captured by a sliding window, which has an intuitive meaning to the user and a clear specification for the platform developer. Unfortunately, it is nontrivial to perform sliding-window aggregation efficiently. Naïve approaches waste time and/or resources. A poorly chosen algorithm can cause high latencies and bloated memory consumption, leading to losses, missed opportunities, and quality-of-service violations. Furthermore, in practice, users may find that their streaming platform of choice does not support a particular aggregation operation or window kind, or even if it does, may not use the most efficient algorithm.

This tutorial aims to provide an in-depth exploration of sliding-window aggregation algorithms, from theoretical and practical perspectives. The goals of the tutorial are:

  • To enable practitioners to hand-implement cases not handled by their streaming platform of choice.
  • To enable streaming platform engineers to extend the set of supported cases and use the best algorithms.
  • To enable researchers to notice literature gaps and to stand on the shoulder of giants when filling them.

Presenters

Martin Hirzel is a research staff member and the manager of the Advanced Cognitive Engineering research group at IBM Research. Martin worked on SPL, the programming language for IBM Streams. He co-taught a tutorial on stream processing optimizations at DEBS 2013, which formed the basis for a widely-cited survey paper on the same topic. Martin is an ACM Distinguished Scientist.

Scott Schneider is a research staff member at IBM Research. Scott does research and development to improve the programmability and performance of distributed streaming systems, with a particular focus on enabling parallelism. Scott regularly makes core contributions to the IBM Streams product. Scott also co-taught and 2 co-authored the aforementioned DEBS tutorial and survey paper.

Kanat Tangwongsan is a computer science faculty member at Mahidol University International College. His research is mainly about efficient computation with large data. More broadly, Kanat is interested in theoretical and practical aspects of algorithms, especially parallel algorithms and algorithms engineering. Kanat was lead author of a paper about sliding-window aggregation at VLDB 2015.

Tutorial 3: Data Streaming and its Application to Stream Processing.

Abstract

Nowadays stream analysis is used in many context where the amount of data and/or the rate at which it is generated rules out other approaches (e.g., batch processing). Œe data streaming model provides randomized and/or approximated solutions to compute speci€c functions over (distributed) stream(s) of data-items in worst case scenarios, while striving for small resources usage. Stream processing, which is somehow complementary and more practical, provides ecient and highly scalable frameworks to perform so‰ real-time generic computation on streams, relying on cloud computing platforms. Œis duality opens the opportunity for making stream processing systems more ecient and performant through data streaming solutions. Œis tutorial targets a broad audience that goes from researchers, potentially interested in understanding how current limitations of stream processing systems may be surpassed or at least mitigated through data streaming solutions, to practitioners and developers, as they will see how data streaming algorithms provide an ecient approach to handle data contained in large data streams.

Presenters

Leonardo Querzoni currently works as an assistant professor at Sapienza University of Rome. He obtained a PhD in computer engineering from the same institution with a work on effecient data dissemination through the publish/subscribe communication paradigm. His research activities are focused on several computer science €fields including distributed stream processing, dependability and security in distributed systems, and large scale and dynamic distributed systems.

Nicolo Rivetti is currently a postdoctoral research fellow at the Technion – Israel Institute of Technology. He recently received his PhD in computer engineering at Sapienza University of Rome and in computer science at the Université de Nantes. His PhD topic was to apply data streaming techniques to big data related fi€elds, among which stream processing. His research also spans on other areas related to big data and event-based systems, such as network security, business processes and uncertain data streams.

Tutorial 4: Human-body Related Event Processing.

Abstract

The healthcare revolution is enabled by current technology, the ability to detect events in the body, and even cell level, and provide personalized treatment. The cost of healthcare in aging population, and the ability to eliminate diseases with early detection are the economic triggers of research and development into body-related sensors and actuators, but there are also other applications in the area of entertainment, emotion detection, nonmedical emergencies and more. The physical infrastructure of body area network serves as an infrastructure for such a revolution. The tutorial will provide a glance into the fascinating world, some of it rooted in the present, some of it is still futuristic and somewhat speculative, and will provide insights into how it is changing the healthcare industry, and our life.

Presenter

Prof. Opher Etzion serves as Professor of IS, head of the Information Systems department and head of the Technological Empowerment Institute in Yezreel Valley Academic College. During the years 1997-2014 he served in various roles in IBM, most recently IBM Senior Technical Staff Member and chief scientist of event processing in IBM Haifa Research Lab. He has also been the chair of EPTS (Event Processing Technical Society). In parallel he serves as an adjunct professor at the Technion – Israel Institute of Technology, where over the years he supervised 6 PhD and 23 MSc theses. He has authored or co-authored more than 80 papers in refereed journals and conferences on topics related to: active databases, temporal databases, rule-base systems, event processing and autonomic computing, and gave several keynote addresses and tutorials. He is the co-author of Event Processing in Action (with Peter Niblett), a comprehensive technical book about event processing. Prior to joining IBM in 1997, he has been a faculty member and Founding Head of the Information Systems Engineering department at the Technion, and held professional and managerial positions in industry and in the Israel Air-Force. He is a senior member of ACM, and has been general chair and program chair of various conferences such as COOPIS 2000 and ACM DEBS 2011. He won several prestigious awards over the years, such as the Israel Air-Force commander award, the highest air-force award (1983), IBM Outstanding Innovation Award (twice – in 2002 and 2013), and IBM Corporate Award (the highest IBM award, in 2010) for the pioneering work on event processing. He was recognized as Distinguished Speaker by ACM.

Tutorial 5: Reflections on Almost Two Decades of Research into Stream Processing.

Abstract

Ever since the need for new approaches and systems to handle data streams was identified in early 2000s, stream processing has been an active area of research, resulting in a large body of work with significant impact. This tutorial reflects on this research history by highlighting a number of trends and best practices that can be identified in hindsight. It also enumerates a list of directions for future research in stream processing.

Presenter

Kyumars Sheykh Esmaili joined Nokia Bell Labs at the end of 2014 as a Researcher. Since then, he and his colleagues have been actively involved in building a stream processing platform for IoT applications. As part of this project, he conducted a thorough study of the stream processing literature. This tutorial is, in essence, a summary report of that study. Before joining Bell Labs, Kyumars was a Research Engineer at Technicolor’s R&I Lab in Paris, France, where he was part of a team that built a large-scale, data-intensive monitoring and troubleshooting system for home networks. Prior to that, he was a Research Fellow at Nanyang Technological University (NTU), Singapore. At NTU, he worked on improving the reliability and storage efficiency of HDFS, Apache Hadoop’s distributed file system. Kyumars obtained his PhD degree in computer science from ETH Zurich, Switzerland, in 2011. His PhD thesis dealt with a range of stream processing challenges in the context of complex applications. Among his contributions were (i) establishing a schema for data streams and exploiting it for efficient processing, (ii) defining lifecycle semantics for continuous queries, in particular for safe modification of continuous queries on the fly, and (iii) addressing the problem of provenance management on data streams. His contributions on stream processing and Big Data management have appeared in the top publication venues in the field e.g., ACM ToIT’14, DEBS’13, IEEE Big Data Conference’13, SIGMOD’11 and EDBT’10. Kyumars holds a Master’s and a Bachelor’s degree from Sharif University of Technology, Tehran, Iran, and has had short-term affiliations with University of Kurdistan as Lecturer and Adjunct Professor