Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up

Data Provenance Initiative

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Enrico Shippole's profile pictureShayne Longpre's profile pictureKartik Perisetla's profile pictureDamien Sileo's profile pictureRobert Mahari's profile pictureNiklas Muennighoff's profile pictureDavid Mataciunas's profile pictureAhmad Mustafa Anis's profile pictureMinnie Liang's profile picture

models 0

None public yet

datasets 25

DataProvenanceInitiative/common_pile_set

Viewer • Updated Mar 27, 2025 • 4.79M • 110 • 1

DataProvenanceInitiative/Megawika_corrected

Viewer • Updated Dec 15, 2024 • 556k • 502

DataProvenanceInitiative/stack-exchange-instruction-2split

Viewer • Updated Dec 8, 2024 • 10.8M • 167

DataProvenanceInitiative/Megawika_subset

Updated Nov 19, 2024 • 322

DataProvenanceInitiative/common_pile_ultra_permissive

Viewer • Updated Sep 9, 2024 • 7.05M • 75

DataProvenanceInitiative/Commercial_or_unspecified_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 61M • 338

DataProvenanceInitiative/commercial_or_unspecified_licenses

Viewer • Updated Sep 9, 2024 • 74.6M • 352

DataProvenanceInitiative/commercial_licenses_and_terms

Viewer • Updated Sep 9, 2024 • 25.2M • 571 • 1

DataProvenanceInitiative/commercial_licenses

Viewer • Updated Sep 9, 2024 • 35M • 765 • 3

DataProvenanceInitiative/Everything

Viewer • Updated Sep 9, 2024 • 44.5M • 335 • 1
View 25 datasets
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs