🗂️ Navigation
🔧 Synthetic Data Vault (SDV)

Synthetic Data Vault (SDV)

Synthetic data generation for tabular, relational and time series data.

Visit Website →

Overview

The Synthetic Data Vault (SDV) is a Python library designed to be a one-stop shop for creating tabular synthetic data. It uses a variety of machine learning algorithms to learn patterns from real data and emulate them in synthetic data. The SDV supports single tables, multiple connected tables, and sequential tables, and includes tools for evaluating and visualizing the quality of the synthetic data.

✨ Key Features

  • Multiple machine learning models for synthetic data generation (e.g., GaussianCopula, CTGAN)
  • Support for single-table, multi-table, and time-series data
  • Data evaluation and visualization tools
  • Preprocessing, anonymization, and constraint definition
  • Hierarchical generative modeling and recursive sampling

🎯 Key Differentiators

  • Open-source and highly customizable
  • Strong academic and research community
  • Support for complex relational and time-series data structures

Unique Value: Provides a flexible and powerful open-source ecosystem for generating synthetic data for a variety of data structures.

🎯 Use Cases (4)

Supplementing or augmenting real data for machine learning Testing machine learning or other data-dependent software systems Academic research Data sharing without privacy risks

🏆 Alternatives

Gretel.ai Tonic.ai Mostly AI

As an open-source library, it offers greater flexibility and control compared to commercial platforms, but requires more technical expertise.

💻 Platforms

API

✅ Offline Mode Available

💰 Pricing

Contact for pricing
Free Tier Available

Free tier: Open-source and free to use

Visit Synthetic Data Vault (SDV) Website →