Dataset Competition
Datasets have catalyzed every leap in AI for Science—from the Protein Data Bank to ImageNet and Sloan Digital Sky Survey. To empower trustworthy AI scientists, we now need corpora that capture hypotheses, instrument telemetry, lab notebooks, and closed-loop build–test cycles. The Dataset Competition celebrates proposals for such resources and forms half of the AI Scientist Competition.
What to Include
- Scientific bottleneck: Clearly describe the question, domain, or workflow your dataset unlocks and why it matters now.
- Acquisition roadmap: Outline experimental or computational pipelines, instrumentation, sensing, or simulation steps, plus safety/privacy plans.
- Metadata & governance: Specify schema, labeling standards, licensing, and mechanisms for responsible sharing (including accommodations for sensitive data).
- Acceleration potential: Link the dataset to concrete AI milestones—agents that can reason, design, or execute more reliably because of your resource.
Submission Requirements
- Length: 2 pages of main text (PDF) with unlimited references/appendices.
- Format: ICML 2026 style file (double-blind option). Update the template footnote to “Submitted to the AI for Science workshop (ICML 2026).”
- Anonymity: Fully double blind. Remove institutional identifiers from data sources when possible, or provide anonymized descriptions.
- Submission portal: OpenReview → Dataset Competition.
Evaluation Criteria
- Scientific value & novelty across disciplines (physics, chemistry, biology, climate, materials, etc.).
- Feasibility & openness, including ethical, privacy, or biosafety safeguards.
- Breadth of impact on underserved communities or multi-domain AI models.
- Execution plan with realistic milestones, costs, and sustainability.
Timeline (AoE)
- Abstract deadline: April 21, 2026
- Submission deadline: April 24, 2026
- Notification: May 15, 2026
- Camera-ready / spotlight material: May 29, 2026
Awards & Support
- Combined with the AI Scientist Track, the best proposals share $10K sponsored by Xaira Therapeutics.
- Winners present during the workshop and can request introductions to lab or industry partners for dataset collection support.
Looking for the AI Scientist Track details? Visit the AI Scientist Competition page for both track descriptions, FAQs, and the latest announcements. Questions? Email ai4sciencecommunity@gmail.com.