Why Proportional Row Counts Matter for Realistic Seed Data
The Flat-Count Problem
Every database seeder does the same thing: generate 100 rows for every table. But reality doesn’t work that way.
An e-commerce store has 15 categories, not 100. It has 500 line items, not 100. A review count should be less than orders, not equal.
How Misata Solves This
Misata v0.5.2 analyzes your database’s FK graph and classifies tables into five tiers:
| Tier | Type | Multiplier | Example | |------|------|------------|--------| | T0 | Reference | 0.15× | categories (15) | | T1 | Entity | 1.0× | users (100), products (100) | | T2 | Transaction | 2.5× | orders (250), addresses (250) | | T3 | Line-item | 5.0× | order_items (500) | | T4 | Activity | 1.5× | reviews (150) |
The Algorithm
- Build the FK graph — Which tables reference which?
- Classify by name + position — Tables with names like
categories,types,statusesare Reference. Tables referenced by many others are Entity. Tables that reference 2+ parents are Line-item. - Apply multipliers — Starting from a base count (default: 100), each tier gets its proportional count.
Result
Instead of 7 tables × 100 rows = 700 rows of flat data, you get 1,515 rows of proportionally distributed data that mirrors real production databases.
from misata import schema_from_db, seed_database
schema = schema_from_db('postgresql://localhost/mydb')
# categories: 15, users: 100, orders: 250, order_items: 500...
seed_database(schema, 'postgresql://localhost/mydb')