WSDM26

Rigorizing Retrieval-augmented Generation with Structured Knowledge

Tutorial at ACM International Conference on Web Search and Data Mining

Abstract

Retrieving external knowledge to Augment Generations of downstream task solutions (RAGs) has become a standard practice in powering knowledge-intensive applications. However, real-world knowledge often manifests in heterogeneous yet distinctive structures (e.g., tabular schemas, social networks, and document trees), the effective modeling of which demands specialized modeling, practical engineering, and domain expertise. Meanwhile, adopting RAGs in high-stakes scenarios underscores rigorous safety and privacy considerations. Despite the importance of this structural perspective, the current landscape remains fragmented across different knowledge structures. Moreover, few approaches adequately consider how structured knowledge shapes RAG’s safety. Against this backdrop, our tutorial offers a structural perspective on RAGs. We begin by overviewing structured RAGs across their full lifecycle, highlighting their canonical designs. We then examine how design principles can be specialized for different knowledge structures, showcasing their unique applications and security attack/defense strategies. Specifically, our objectives are to:

Motivation and Application RAG, structured knowledge, and safety/privacy concerns.
Unified perspective on RAG and specialization in document, social network, table, and citation literature networks
Discuss key security risks in structured RAGs, including knowledge poisoning and extraction attacks

Time and Location

Time: Feburary 22
Location: Boise Centre Room 110AB, Boise, Idaho, USA

Tutorial Outline

Background and Overview (9:00-9:30 am) - Yu Wang
- Retrieval-augmented Generation
- Structured Knowledge and Applications: Document, Table, Personalization, Social Network, Scientific Literature, Knowledge Graph
- Security Issues of RAGs and their Structured Variants
Document/Table Structured RAGs (9:30-10:00 am) - Haoyu Han
- Document/Table Tasks and Structured Knowledge
- Document Structured RAGs
- Future Directions and Q&A
Personalization & Social Structured RAGs (10:00-10:30 am) - Utkarsh Sahu
- Personalization and Social Network Tasks and Structured Knowledge
- Personalization/Social Structured RAGs
- Future Directions and Q&A
Coffee Break (10:30-11:00am)
Scientific Literature RAGs (11:00-11:30 am) - Yu Zhang
- Scientific Literature Tasks and Structured Knowledge
- Scientific Literature RAGs
- Future Directions and Q&A
Knowledge Graph RAGs (11:30-12:00 pm) - Harry Shomer
- Knowledge Graph Tasks and Structured Knowledge
- Knowledge Graph Structured RAGs
- Future Directions and Q&A
Security and Privacy of Structured RAGs (12:00-12:30 pm) - Zhisheng Qi
- Knowledge Poisoning Attacks
- Knowledge Extraction Attacks
Conclusion and Future Work (12:30 - 12:40 pm) - Yu Wang

Slides

You can download PDF version of our slides or see them embedded below. Download PDF Link

Speakers Bio

Yu Wang is an Assistant Professor in the School of Computer and Data Sciences at the University of Oregon. His research interests include Graph Machine Learning, LLMs, Information Retrieval, and Data-centric AI for social good. He received the Best Paper Award in the 2020 Smokey Mountain Data Challenge Competition by ORNL and GLFrontiers Workshop at Neurips’23, and Best Doctoral Forum Poster Runner-ups at SDM’24. He actively contributed to the community, both in publishing and serving as a PC member/reviewer/organizer, such as ICLR, NeurIPS, AAAI, KDD, WWW, CIKM, WSDM, TKDD, and TIST. He has contributed to organizing workshops in WSDM’22/24 and KDD’25 and served as the student travel award chair in CIKM’24.

Zhisheng Qi is a Ph.D. student in the School of Computer and Data Sciences at the University of Oregon. His research interests include Retrieval-augmented Generation, Agentic Reasoning and Planning, and Security of AI Agent.

Haoyu Han is a Ph.D. candidate at Michigan State University. His research interests include Machine Learning on Graphs and LLMs with Graphs. Before joining MSU, he completed his M.S. (2021) and B.S. (2018) at USTC. He has published several works in top conferences (e.g., KDD, ICDM, NeurIPS, ICML, and ICLR). He was the recipient of the KDD’22 and NeurIPS’24 Student Travel Awards. He has contributed to organizing workshops for KDD’23, AAAI’24, and SDM’24.

Utkarsh Sahu is a Ph.D. student in the School of Computer and Data Sciences at the University of Oregon. His research interests include Multi-Modal Learning, Social Network, Personalization.

Harry Shomer is an Assistant Professor at the University of Texas at Arlington. He received his Ph.D. from Michigan State University. His research interests include Machine Learning on Graphs, Trustworthy AI, and AI in education. Before joining MSU, he received his B.S. in Computer Science from CUNY Brooklyn College (2019). His work has been published at top conferences including NeurIPS, ICLR, KDD, EMNLP, TheWebConf, ACL, and CIKM. He is the recipient of the MSU Engineering Distinguished fellowship, the NRT-IMPACTS fellowship, and the KDD’24 student travel award.

Kaize Ding is an Assistant Professor in the Department of Statistics and Data Science at Northwestern University. Before joining Northwestern, he obtained his Ph.D. degree in Computer Science at Arizona State University in 2023 under the supervision of Prof. Huan Liu. His research interests are generally in data mining, machine learning, and natural language processing, with a particular focus on Graph Machine Learning, data-efficient learning, and reliable AI. His work has been published in top-tier conferences and journals (e.g., AAAI, EMNLP, IJCAI, KDD, NeurIPS, TheWebConf, and TNNLS), and has been recognized with several prestigious awards, including the AAAI New Faculty Highlights, SDM Best Posters Award, and Best Paper Award at the Trustworthy Learning on Graphs workshop.

Yu Zhang is an Assistant Professor at Texas A&M University. His research focuses on structure-enhanced text mining for science, including citation network analysis, scientific document understanding, and knowledge discovery. He received his Ph.D. from the University of Illinois at Urbana-Champaign. His work has been published in top-tier conferences and journals including KDD, WWW, and EMNLP.

Ryan Rossi is a Senior Research Scientist at Adobe Research. His research includes machine learning and spans theory, algorithms, and applications of large complex graph data. He has authored over 100 papers published in top-tier conferences and journals such as NeurIPS, ICML, AAAI, KDD, IJCAI, ICLR, COLT, WWW, WSDM, and JMLR. Before joining Adobe Research, he worked at many industrial, government, and academic research labs including Palo Alto Research Center (Xerox PARC), Lawrence Livermore National Laboratory, and University of Massachusetts Amherst. He earned his Ph.D./M.S. in Computer Science at Purdue University. He brings substantial experience in leveraging social network-structured signals for personalization and applying knowledge graphs to automate document understanding.

Hui Liu is an Assistant Professor at Michigan State University. Her research interests include trustworthy AI, knowledge graph learning, and information retrieval. She has contributed extensive expertise in knowledge intelligence and has a strong publication record in top-tier conferences and journals.