UberJugaad Enhanced SALT Dataset

UberJugaad Enhanced SALT Dataset

1.9M ERP transactions + 151K business emails + 3.5K documents — Enterprise AI/ML Research
← Back to projects
UberJugaad Enhanced SALT Dataset visualization

Dataset Overview

The UberJugaad Enhanced SALT Dataset transforms SAP's transactional SALT dataset into a complete enterprise simulation by layering realistic business communications, documents, and embedded patterns. This comprehensive dataset provides researchers with a realistic business ecosystem for developing and testing AI/ML applications.

Dataset Statistics

1.9M
ERP Transactions
151K
Business Emails
3.5K
Supporting Documents
159MB
Total Dataset Size

About UberJugaad GmbH

UberJugaad GmbH is a fictional EUR 14.8B German industrial supplier created for this dataset. The company represents a realistic enterprise environment with complex business operations, customer relationships, and communication patterns.

€14.8B
Annual Revenue
1,467
Employees
15,606
Customers
2019-2020
Data Period

Key Features

🔗 Linked Data

All files connected via SALESDOCUMENT and customer_id for comprehensive analysis

📧 Realistic Communications

151K emails with authentic subjects, bodies, and business context

⏰ Time-Synchronized

Chronologically consistent data spanning 2019-2020 business operations

🔍 Discovery-Oriented

Patterns embedded in content rather than explicit labels for natural learning

💼 Complete Business Ecosystem

Email threads, documents, and transactions forming realistic business scenarios

🗃️ Multiple Formats

Parquet files for analysis, SQLite database for queries, comprehensive documentation

Dataset Structure

The dataset is organized into logical groups for easy access and analysis:

📊 Core Data Files (83.5 MB)

📧 Email Database (76 MB)

📖 Documentation & Code

Use Cases

Email Classification & Routing
Sentiment Analysis
Customer Behavior Analysis
Document Information Extraction
Business Process Mining
Anomaly Detection
Natural Language Understanding
Multi-modal Analysis

Quick Start

Get started with the dataset using this simple Python code:

import pandas as pd # Load emails emails = pd.read_parquet('all_communications.parquet') print(f"Total emails: {len(emails):,}") # Load transactions transactions = pd.read_parquet('erp_transactions.parquet') print(f"Total orders: {len(transactions):,}") # Find emails mentioning specific orders order = "0002315309" order_emails = emails[emails['body'].str.contains(order, na=False)] print(f"Emails about order {order}: {len(order_emails)}")

🚀 Enhanced for AI Research

This dataset builds on SAP's SALT dataset with synthetic but realistic business communications modeled on real-world enterprise patterns. Perfect for developing and testing AI applications in realistic business contexts.

Citation

If you use this dataset in your research, please cite:

@dataset{rutledge_patrick_2025_17125322, author = {Rutledge, Patrick}, title = {{UberJugaad GmbH Enhanced SALT Dataset}}, month = sep, year = 2025, publisher = {Zenodo}, version = {v1.0.0}, doi = {10.5281/zenodo.17125322}, url = {https://doi.org/10.5281/zenodo.17125322} }

APA Format:
Rutledge, P. (2025). UberJugaad GmbH Enhanced SALT Dataset (Version v1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.17125322

License & Acknowledgments

License: MIT License for enhanced content / SAP Sample Code License for original SALT data

Acknowledgments: Built on SAP's SALT dataset, enhanced with synthetic business communications for AI/ML applications. The original SALT authors are not affiliated with these enhancements.