Back to Research

Why Proportional Row Counts Matter for Realistic Seed Data

2026-03-07By Misata Team

The Flat-Count Problem

Every database seeder does the same thing: generate 100 rows for every table. But reality doesn’t work that way.

An e-commerce store has 15 categories, not 100. It has 500 line items, not 100. A review count should be less than orders, not equal.

How Misata Solves This

Misata v0.5.2 analyzes your database’s FK graph and classifies tables into five tiers:

| Tier | Type | Multiplier | Example | |------|------|------------|--------| | T0 | Reference | 0.15× | categories (15) | | T1 | Entity | 1.0× | users (100), products (100) | | T2 | Transaction | 2.5× | orders (250), addresses (250) | | T3 | Line-item | 5.0× | order_items (500) | | T4 | Activity | 1.5× | reviews (150) |

The Algorithm

  1. Build the FK graph — Which tables reference which?
  2. Classify by name + position — Tables with names like categories, types, statuses are Reference. Tables referenced by many others are Entity. Tables that reference 2+ parents are Line-item.
  3. Apply multipliers — Starting from a base count (default: 100), each tier gets its proportional count.

Result

Instead of 7 tables × 100 rows = 700 rows of flat data, you get 1,515 rows of proportionally distributed data that mirrors real production databases.

from misata import schema_from_db, seed_database

schema = schema_from_db('postgresql://localhost/mydb')
# categories: 15, users: 100, orders: 250, order_items: 500...
seed_database(schema, 'postgresql://localhost/mydb')