Call for Papers - Special Track on LLMs for Data Science

PAKDD 2026 will be held exclusively in person. All accepted presentations must be delivered on-site.

The rapid advancements in Large Language Models (LLMs) have opened new avenues for innovation and research across various domains, particularly in the field of Data Science. As LLMs continue to evolve, their applications in data analysis, machine learning, natural language processing, and decision-making processes are becoming increasingly profound. The PAKDD 2026 Special Track on Large Language Models for Data Science aims to explore the transformative potential of LLMs for Data Science, bringing together researchers, practitioners, and industry experts to discuss the latest developments, challenges, and opportunities in this rapidly growing area. Novel, high-quality, and original research papers that provide innovative insights into all facets of large language models and their applications in data science, including but not limited to science and algorithms of LLMs, enlarged language models, retrieval-augmented text generation, vision-language pretraining, vision transformers, trustworthiness and societal implications of LLMs, and LLMs on diverse applications are solicited. Papers accepted to the LLM Special Track will be published in the PAKDD proceedings by Springer. At least one author of each accepted paper must register for the conference and present the work.

For up-to-date information on PAKDD 2026, please visit its homepage: https://www.pakdd2026.org.

Important Dates

Paper Submission Deadline: November 15, 2025
Paper Acceptance Notification: February 8, 2026
Camera Ready Papers Due: March 1, 2026

*All deadlines are 23:59 Pacific Standard Time (PST)

Topics

LLM-enhanced Data Analytics: Methods for leveraging LLMs to automate or augment data analysis workflows.
LLM-based Data Preprocessing and Cleaning: Using LLMs for data wrangling, missing value imputation, or schema alignment.
LLMs for Data Integration and Knowledge Graph Construction: Techniques for using LLMs to merge heterogeneous data sources or build structured knowledge graphs.
LLMs for Data Labeling and Annotation: Semi-automatic or interactive labeling pipelines powered by LLMs.
Prompt Engineering for Data Science Tasks: Systematic methods for prompt design to solve data mining tasks.
LLMs for Feature Engineering and Selection: Automating feature extraction, selection, or transformation using LLMs.
LLM-powered Conversational Interfaces for Data Analysis: Chatbots or agents that assist with exploratory data analysis via natural language.
LLMs for Explainable Data Science: Using LLMs to generate natural language explanations of models or complex datasets.
Combining LLMs with Classical Machine Learning Models: Hybrid frameworks where LLMs support or supervise classical models.
Evaluation and Benchmarking of LLMs for Data Science: Novel benchmarks or empirical studies on how well LLMs perform real-world data science tasks.
Domain-specific LLMs for Data-intensive Applications: Fine-tuning or adapting LLMs for domains like finance, healthcare, or scientific research.
Human-in-the-loop Systems with LLMs for Data Science: Designing interactive systems where LLMs assist analysts or domain experts.
Trust, Reliability, and Bias Mitigation in LLM-assisted Data Science: Addressing risks and ethical concerns when using LLMs in data-driven workflows.
Applications and Case Studies: Practical reports showing how LLMs are deployed to solve data science problems in industry or research.

Submission

Submission Details: https://www.pakdd2026.org/authors-kit

If you have any questions, please feel free to contact us at pakdd2026.llm@gmail.com.

PAKDD 2026 LLM Track Chairs

Carl Yang	Emory University, USA
Xiang Li	East China Normal University, China