The Open-Source Engine
High-performance synthetic data generation. LLM-powered schema design. Vectorized NumPy execution. MIT licensed.
Three lines to realistic data
From zero to production-grade synthetic data in under 60 seconds.
Install
One dependency. No Docker, no config files.
Generate
Describe your schema in plain English.
Use
Relational CSVs with referential integrity.
Not just random values.
Intelligent generation.
LLM Schema Generation
Describe your data in plain English. Groq-powered AI designs the entire relational schema — tables, columns, types, foreign keys.
Vectorized NumPy Engine
385,000 rows/second. No row-by-row Python loops. Entire columns generated at once using NumPy vectorization.
SQLAlchemy Seeding
Point at your existing database. Misata introspects models, respects foreign keys, and seeds production-grade data.
Business Constraints
Define rules like max_daily_hours=8 or cost < price. Misata enforces them mathematically, not by retry.
Smart Row Proportions
15 categories, 100 users, 250 orders, 500 line items. Misata analyzes your FK graph to size tables realistically.
Realism Engine v0.5
Every column is aware of every other column. Emails match names. Totals = subtotal + tax + shipping. Always.
Faker generates random noise.
Misata generates truth.
First-class Python integration
Drop into any test suite, CI/CD pipeline, or data workflow.
LLM-Powered Generation
SQLAlchemy Seeding
Built in public, shipped fast
7 releases in 6 months. From genesis to realism engine.
Business Constraints & Performance
- Custom constraint engine (sum_limit, redistribute)
- Performance optimizations across all generators
- Improved CLI output formatting
SQLAlchemy Seeding & Smart Mode
- Direct SQLAlchemy model introspection
- Smart row proportions from FK graph analysis
- Database truncate + seed in one command
The Realism Engine
- Column-aware generation (every column knows about every other)
- Mathematical consistency: total = subtotal + tax + shipping
- Email-from-name, cost < price, temporal ordering
- J-curve rating distributions
Stability & Bug Fixes
- Fixed edge cases in FK resolution
- Improved error messages for invalid schemas
- Test coverage improvements
Foreign Key Graph
- Topological sort for table generation order
- FK-aware value referencing
- Multi-level relationship support
LLM Schema Generation
- Groq integration for natural language → schema
- Multiple LLM provider support (Groq, OpenAI)
- Story-based generation from CLI
Genesis — Core Engine
- NumPy-vectorized column generation
- CLI with --story flag
- CSV export
- Basic column type inference
Where we are vs. where we're going
We believe in building in public. Here's an honest look at the gap between our current state and our vision. This is why we need collaborators.
Start generating in 60 seconds
MIT licensed. No signups. No credit card. Just pip install and go.