Dataset Overview
The UberJugaad Enhanced SALT Dataset transforms SAP's transactional SALT dataset into a complete enterprise simulation by layering realistic business communications, documents, and embedded patterns. This comprehensive dataset provides researchers with a realistic business ecosystem for developing and testing AI/ML applications.
Dataset Statistics
3.5K
Supporting Documents
About UberJugaad GmbH
UberJugaad GmbH is a fictional EUR 14.8B German industrial supplier created for this dataset. The company represents a realistic enterprise environment with complex business operations, customer relationships, and communication patterns.
Key Features
🔗 Linked Data
All files connected via SALESDOCUMENT and customer_id for comprehensive analysis
📧 Realistic Communications
151K emails with authentic subjects, bodies, and business context
⏰ Time-Synchronized
Chronologically consistent data spanning 2019-2020 business operations
🔍 Discovery-Oriented
Patterns embedded in content rather than explicit labels for natural learning
💼 Complete Business Ecosystem
Email threads, documents, and transactions forming realistic business scenarios
🗃️ Multiple Formats
Parquet files for analysis, SQLite database for queries, comprehensive documentation
Dataset Structure
The dataset is organized into logical groups for easy access and analysis:
📊 Core Data Files (83.5 MB)
- all_communications.parquet - 151,673 business emails (6.9 MB)
- erp_transactions.parquet - 1.9M ERP transactions (40.8 MB)
- sales_documents.parquet - 243K sales headers (7.1 MB)
- sales_items.parquet - 1.9M line items (28.4 MB)
- supporting_documents.parquet - 3,467 supporting documents (0.1 MB)
- business_documents.parquet - 32 business reports (22 KB)
📧 Email Database (76 MB)
- uberjugaad_email.db - SQLite database with 151K emails + 19K contacts
📖 Documentation & Code
- Comprehensive guides, column descriptions, and getting started materials
- Sample Jupyter notebook for data exploration
- Email examples and business document conversion guides
Use Cases
Email Classification & Routing
Sentiment Analysis
Customer Behavior Analysis
Document Information Extraction
Business Process Mining
Anomaly Detection
Natural Language Understanding
Multi-modal Analysis
Quick Start
Get started with the dataset using this simple Python code:
import pandas as pd
# Load emails
emails = pd.read_parquet('all_communications.parquet')
print(f"Total emails: {len(emails):,}")
# Load transactions
transactions = pd.read_parquet('erp_transactions.parquet')
print(f"Total orders: {len(transactions):,}")
# Find emails mentioning specific orders
order = "0002315309"
order_emails = emails[emails['body'].str.contains(order, na=False)]
print(f"Emails about order {order}: {len(order_emails)}")
🚀 Enhanced for AI Research
This dataset builds on SAP's SALT dataset with synthetic but realistic business communications modeled on real-world enterprise patterns. Perfect for developing and testing AI applications in realistic business contexts.
Citation
If you use this dataset in your research, please cite:
@dataset{rutledge_patrick_2025_17125322,
author = {Rutledge, Patrick},
title = {{UberJugaad GmbH Enhanced SALT Dataset}},
month = sep,
year = 2025,
publisher = {Zenodo},
version = {v1.0.0},
doi = {10.5281/zenodo.17125322},
url = {https://doi.org/10.5281/zenodo.17125322}
}
APA Format:
Rutledge, P. (2025). UberJugaad GmbH Enhanced SALT Dataset (Version v1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17125322
License & Acknowledgments
License: MIT License for enhanced content / SAP Sample Code License for original SALT data
Acknowledgments: Built on SAP's SALT dataset, enhanced with synthetic business communications for AI/ML applications. The original SALT authors are not affiliated with these enhancements.