SynthAI@SIGMOD 2026
Workshop on Synthetic Data Generation and Management for Building AI Systems

Location: Scarlet 1
Date: June 5, 2026

Schedule:
Time Session
8:30am–9:00am Welcome Address (Prasad Deshpande & Shiv Kumar Saini)
9:00am–9:45am Keynote Talk 1: Aditya Parameswaran (UC Berkeley, USA)
9:45am–10:30am Keynote Talk 2: Shivakumar Vaithyanathan (Vice President, Platform Engineering & Architecture, Adobe)
Title: Beyond Prompt Benchmarks: Evaluating Enterprise Workflows with Synthetic Data
10:30am–11:00am Coffee Break
11:00am–11:45am Panel Discussion
Topic: Synthetic Data for Agentic AI: Can We Generate, Trust, and Scale Synthetic Reality?

Themes:
• Theme 1: Building Trustworthy Synthetic Data Systems
• Theme 2: Evaluating Synthetic Data
• Theme 3: Synthetic Reality for AI
• Theme 4: Managing Synthetic Data at Scale

Panelists:
• Prof. Sharad Mehrotra (Univ. of California, Irvine, USA)
• Prof. Srinivasan Parthasarathy (The Ohio State University, USA)
• Dr. Prasad Deshpande (Databricks)
• Dr. Shiv Kumar Saini (Principal Scientist, Adobe Research)
11:45am–12:30pm Paper Presentation Session 1
12:30pm–1:30pm Lunch Break
1:30pm–2:15pm Keynote Talk 3: Srinivasan Parthasarathy (The Ohio State University, USA)
2:15pm–3:00pm Paper Presentation Session 2
3:00pm–3:30pm Coffee Break
3:30pm–4:30pm Paper Presentation Session 3
4:30pm–5:00pm Poster Session
Paper Presentation Session 1: Database-Oriented Synthetic Data
# Title Authors
1 Synthetic Data Generation for Schema-Aware Query Interfaces: Benchmarking NL2GraphQL Systems Keya Battu, Parth Mehta, Manish Kesarwani
2 Reliability-Aware Structured Synthetic Data Generation via Schema Enforcement and Layered Repair Sachin Mishra
3 Generating Databases from Natural Language Specification Aviroop Mitra, Anupam Sanghi
Paper Presentation Session 2: Structured Synthetic Data
# Title Authors
1 WorkLoadWeaver: Realistic Synthetic Query Workloads for Adaptive Database Systems Keming Li, Nada Lahjouji, Ashwin Gerard Colaco, Shiyuan Zhou, Pratyoy Das, Guangxue Zhang, Sahil Makhijani, Vishal Chakraborty, Sarvesh Pandey, Sharad Mehrotra
2 Efficient Context-Aware Corpus-Grounded Query and Prompt Suggestion constrained by Creative Knowledge Graph Harshit Jain, Twisha Naik, Vipul Jain, Chirag Arora, Sudhakar Pandey, Vedanth Subramaniam, Eshan Trivedi
3 Synthetic Augmentation of Structured Templates for Geo-Localization Amulya Sri Pulijala, Vikram Pudi
Paper Presentation Session 3: Text, LLM & Agent-Driven Synthetic Data
# Title Authors
1 A Comparative Study of LLM-Generated Doctor–Patient Dialogues via Biomedical Information Extraction Frameworks Harshal Patil, Eshal Shaikh, Akshay Dhere, Kunal Korgaonkar, Ashwini Shinde
2 KnowGen: Modeling Synthetic Dataset Construction as Inference over Latent Acceptability Moumita Chanda, Alexander Maas, Hasan M Jamil
3 An Agentic System for Context-Aware Synthetic Data Generation Sajratul Rubaiat, Syed Sakib, Hasan Jamil
4 Auditing Differentially Private Text Generation from LLMs Dhruv Shah, Vishnu Vinod, Krishna Pillutla
Poster Session
# Title Authors
1 Deep Learning Based Image Inpainting and Restoration System Using Partial Convolutional Networks and Autoencoders Balaji G, Akash AK, Mohana M
2 DNA-DPRL: Dynamic Noise Adaptation for Differentially Private Reinforcement Learning Soudeep Tikadar, Debanjan Panda
3 Cosine-Based Exploration of Word Relationships Pragya Apoorva
4 Evaluating Enterprise Semantic Search with LLM-Assisted Synthetic Data Shreya Mahapatra, Diksha Bhardwaj, Tracy Holloway King, Anandita Chopra
5 Skin Color Agnostic Classification of Skin Lesions Using Synthetic Image Generation Akshay Bhavani Kumar Kulkarni, Vikram Pudi
6 LLM-Driven Digital Twins for Behavioral Marketing: Enhancing Conversion Rates in Non-Converting Customers via K-Means Persona Mapping and Channel Consistency Keertana S, Vijaylaxmi Ummarji, Swapna Bynobonia, Rupesh Prasad, Atul Singh, Arvind Maurya, Vivek Mishra
7 CHART: Curriculum-guided Hierarchy-Aware Relation Classification via Prompt Tuning Soumya Bharadwaj, Anushka Gupta, Mokshith Posa, Ashish Anand
8 NVC-HH: Structured Intent Decomposition for Preference-Tuned Language Model Alignment Venkata Satya Satvik Viriyala, Vinu E. Venugopal
9 Synthetic Data Generation for AI-enabled Smart Cyber-Physical Infrastructures Ryan Hildebrant, Andrew Chio, Nalini Venkatasubramanian
10 A Framework for Simulated Environment Construction for Goal-Oriented AI Agents: An Order-to-Cash Case Study Sanjeet Patil, Swati Tata, Ashwin Ramachandran, Namita Namita, Krishna Kummamuru
11 An Empirical Comparison of Measures of Memorization in Language Models Ishita Khatri, Krishna Pillutla
12 FISCAL: Fidelity-preserving Synthetic Document Generation via Constrained Arithmetic, Semantic and Layout Preservation Rohan Dalmia, Ritwik Mishra, Krishna Kummamuru, Sangita Agarwal, Rishikesh Sapar, Prakash Ghatage