Event

Data Provenance @ Mozilla Data Futures Lab

Courtesy of the researchers

People

Projects

Data Provenance for AI

Groups

Share this event

Monday

January 22, 2024

11:00am — 12:00pm ET

Recent breakthroughs in language modeling are powered by large collections of natural language datasets. This has triggered an arms race to train models on disparate collections of incorrectly, ambiguously, or under-documented data that has left practitioners unsure of the ethical and legal risks.

To address this, the Data Provenance Initiative has created a mapping of 2000+ popular, text-to-text finetuning datasets from origin to creation, cataloging their data sources, licenses, creators, and other metadata, for researchers to explore using this tool. The purpose of this work is to improve transparency, documentation, and informed use of datasets in AI.

Read more at Mozilla Access the recording and slides

More Events

Event Events

Data Provenance @ Mozilla Data Futures Lab

People

Projects

Groups

SpaceCHI 2026 at the NASA Ames Research Center / Hybrid

City Science Summit 2026 - Gipuzkoa

Intelligent Soft Wearables @ UbiComp'26

Hábitat Latam 2026

Data Provenance @ Mozilla Data Futures Lab

People

Projects

Groups

Share this event

SpaceCHI 2026 at the NASA Ames Research Center / Hybrid

City Science Summit 2026 - Gipuzkoa

Intelligent Soft Wearables @ UbiComp'26

Hábitat Latam 2026