{"id":39631748,"url":"https://github.com/mostly-ai/mostlyai-mock","last_synced_at":"2026-01-18T08:45:58.449Z","repository":{"id":291802442,"uuid":"976635400","full_name":"mostly-ai/mostlyai-mock","owner":"mostly-ai","description":"Synthetic Mock Data 🔮","archived":false,"fork":false,"pushed_at":"2026-01-09T15:46:41.000Z","size":1527,"stargazers_count":13,"open_issues_count":4,"forks_count":2,"subscribers_count":5,"default_branch":"main","last_synced_at":"2026-01-15T19:43:11.683Z","etag":null,"topics":["mock-data","synthetic-data","test-data"],"latest_commit_sha":null,"homepage":"https://mostly-ai.github.io/mostlyai-mock/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mostly-ai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":".github/CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-05-02T13:17:38.000Z","updated_at":"2025-12-09T11:37:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"6e79a834-8dd8-4264-a688-09f66f4324e7","html_url":"https://github.com/mostly-ai/mostlyai-mock","commit_stats":null,"previous_names":["mostly-ai/mostlyai-mock"],"tags_count":31,"template":false,"template_full_name":null,"purl":"pkg:github/mostly-ai/mostlyai-mock","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostly-ai%2Fmostlyai-mock","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostly-ai%2Fmostlyai-mock/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostly-ai%2Fmostlyai-mock/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostly-ai%2Fmostlyai-mock/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mostly-ai","download_url":"https://codeload.github.com/mostly-ai/mostlyai-mock/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mostly-ai%2Fmostlyai-mock/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28534148,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-18T00:39:45.795Z","status":"online","status_checked_at":"2026-01-18T02:00:07.578Z","response_time":98,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["mock-data","synthetic-data","test-data"],"created_at":"2026-01-18T08:45:57.899Z","updated_at":"2026-01-18T08:45:58.433Z","avatar_url":"https://github.com/mostly-ai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Synthetic Mock Data 🔮\n\n[![Documentation](https://img.shields.io/badge/docs-latest-green)](https://mostly-ai.github.io/mostlyai-mock/) [![stats](https://pepy.tech/badge/mostlyai-mock)](https://pypi.org/project/mostlyai-mock/) ![license](https://img.shields.io/github/license/mostly-ai/mostlyai-mock) ![GitHub Release](https://img.shields.io/github/v/release/mostly-ai/mostlyai-mock)\n\nUse LLMs to generate any Tabular Data towards your needs. Create from scratch, expand existing datasets, or enrich tables with new columns. Your prompts, your rules, your data.\n\n## Key Features\n\n* A light-weight python client for prompting LLMs for mixed-type tabular data.\n* Select from a wide range of LLM endpoints and LLM models.\n* Supports single-table as well as multi-table scenarios.\n* Supports variety of data types: `string`, `integer`, `float`, `category`, `boolean`, `date`, and `datetime`.\n* Specify context, distributions and rules via dataset-, table- or column-level prompts.\n* Create from scratch or enrich existing datasets with new columns and/or rows.\n* Tailor the diversity and realism of your generated data via temperature and top_p.\n\n## Getting Started\n\n1. Install the latest version of the `mostlyai-mock` python package.\n\n```bash\npip install -U mostlyai-mock\n```\n\n2. Set the API key of your LLM endpoint (if not done yet)\n\n```python\nimport os\nos.environ[\"OPENAI_API_KEY\"] = \"your-api-key\"\n# os.environ[\"GEMINI_API_KEY\"] = \"your-api-key\"\n# os.environ[\"GROQ_API_KEY\"] = \"your-api-key\"\n```\n\nNote: You will need to obtain your API key directly from the LLM service provider (e.g. for Open AI from [here](https://platform.openai.com/api-keys)). The LLM endpoint will be determined by the chosen `model` when making calls to `mock.sample`.\n\n3. Create your first basic mock table from scratch\n\n```python\nfrom mostlyai import mock\n\ntables = {\n    \"guests\": {\n        \"prompt\": \"Guests of an Alpine ski hotel in Austria\",\n        \"columns\": {\n            \"nationality\": {\"prompt\": \"2-letter code for the nationality\", \"dtype\": \"string\"},\n            \"name\": {\"prompt\": \"first name and last name of the guest\", \"dtype\": \"string\"},\n            \"gender\": {\"dtype\": \"category\", \"values\": [\"male\", \"female\"]},\n            \"age\": {\"prompt\": \"age in years; min: 18, max: 80; avg: 25\", \"dtype\": \"integer\"},\n            \"date_of_birth\": {\"prompt\": \"date of birth\", \"dtype\": \"date\"},\n            \"checkin_time\": {\"prompt\": \"the check in timestamp of the guest; may 2025\", \"dtype\": \"datetime\"},\n            \"is_vip\": {\"prompt\": \"is the guest a VIP\", \"dtype\": \"boolean\"},\n            \"price_per_night\": {\"prompt\": \"price paid per night, in EUR\", \"dtype\": \"float\"},\n            \"room_number\": {\"prompt\": \"room number\", \"dtype\": \"integer\", \"values\": [101, 102, 103, 201, 202, 203, 204]}\n        },\n    }\n}\ndf = mock.sample(\n    tables=tables,   # provide table and column definitions\n    sample_size=10,  # generate 10 records\n    model=\"openai/gpt-5-nano\",  # select the LLM model (optional)\n)\nprint(df)\n#   nationality                 name  gender  age date_of_birth        checkin_time is_vip  price_per_night  room_number\n# 0          FR          Jean Dupont    male   29    1994-03-15 2025-01-10 14:30:00  False            150.0          101\n# 1          DE         Anna Schmidt  female   34    1989-07-22 2025-01-11 16:45:00   True            200.0          201\n# 2          IT          Marco Rossi    male   45    1979-11-05 2025-01-09 10:15:00  False            180.0          102\n# 3          AT         Laura Gruber  female   28    1996-02-19 2025-01-12 09:00:00  False            165.0          202\n# 4          CH         David Müller    male   37    1987-08-30 2025-01-08 17:20:00   True            210.0          203\n# 5          NL  Sophie van den Berg  female   22    2002-04-12 2025-01-10 12:00:00  False            140.0          103\n# 6          GB         James Carter    male   31    1992-09-10 2025-01-11 11:30:00  False            155.0          204\n# 7          BE        Lotte Peeters  female   26    1998-05-25 2025-01-09 15:45:00  False            160.0          201\n# 8          DK        Anders Jensen    male   33    1990-12-03 2025-01-12 08:15:00   True            220.0          202\n# 9          ES         Carlos Lopez    male   38    1985-06-14 2025-01-10 18:00:00  False            170.0          203\n```\n\n4. Create your first multi-table mock dataset\n\n```python\nfrom mostlyai import mock\n\ntables = {\n    \"customers\": {\n        \"prompt\": \"Customers of a hardware store\",\n        \"columns\": {\n            \"customer_id\": {\"prompt\": \"the unique id of the customer\", \"dtype\": \"string\"},\n            \"name\": {\"prompt\": \"first name and last name of the customer\", \"dtype\": \"string\"},\n        },\n        \"primary_key\": \"customer_id\",\n    },\n    \"warehouses\": {\n        \"prompt\": \"Warehouses of a hardware store\",\n        \"columns\": {\n            \"warehouse_id\": {\"prompt\": \"the unique id of the warehouse\", \"dtype\": \"string\"},\n            \"name\": {\"prompt\": \"the name of the warehouse\", \"dtype\": \"string\"},\n        },\n        \"primary_key\": \"warehouse_id\",\n    },\n    \"orders\": {\n        \"prompt\": \"Orders of a Customer\",\n        \"columns\": {\n            \"customer_id\": {\"prompt\": \"the customer id for that order\", \"dtype\": \"string\"},\n            \"warehouse_id\": {\"prompt\": \"the warehouse id for that order\", \"dtype\": \"string\"},\n            \"order_id\": {\"prompt\": \"the unique id of the order\", \"dtype\": \"string\"},\n            \"text\": {\"prompt\": \"order text description\", \"dtype\": \"string\"},\n            \"amount\": {\"prompt\": \"order amount in USD\", \"dtype\": \"float\"},\n        },\n        \"primary_key\": \"order_id\",\n        \"foreign_keys\": [\n            {\n                \"column\": \"customer_id\",\n                \"referenced_table\": \"customers\",\n                \"prompt\": \"each customer has anywhere between 2 and 3 orders\",\n            },\n            {\n                \"column\": \"warehouse_id\",\n                \"referenced_table\": \"warehouses\",\n            },\n        ],\n    },\n    \"items\": {\n        \"prompt\": \"Items in an Order\",\n        \"columns\": {\n            \"item_id\": {\"prompt\": \"the unique id of the item\", \"dtype\": \"string\"},\n            \"order_id\": {\"prompt\": \"the order id for that item\", \"dtype\": \"string\"},\n            \"name\": {\"prompt\": \"the name of the item\", \"dtype\": \"string\"},\n            \"price\": {\"prompt\": \"the price of the item in USD\", \"dtype\": \"float\"},\n        },\n        \"foreign_keys\": [\n            {\n                \"column\": \"order_id\",\n                \"referenced_table\": \"orders\",\n                \"prompt\": \"each order has between 1 and 2 items\",\n            }\n        ],\n        \"primary_key\": \"item_id\",\n    },\n}\ndata = mock.sample(\n    tables=tables,\n    sample_size=2,\n    model=\"openai/gpt-5\",\n    n_workers=1,\n)\nprint(data[\"customers\"])\n#   customer_id             name\n# 0   B0-100235  Danielle Rogers\n# 1   B0-100236       Edward Kim\nprint(data[\"warehouses\"])\n#   warehouse_id                          name\n# 0       B0-001  Downtown Distribution Center\n# 1       B0-002     Westside Storage Facility\nprint(data[\"orders\"])\n#   customer_id warehouse_id    order_id                                               text   amount\n# 0   B0-100235       B0-002  B0-3010021  Office furniture replenishment - desks, chairs...  1268.35\n# 1   B0-100235       B0-001  B0-3010022  Bulk stationery order: printer paper, notebook...    449.9\n# 2   B0-100235       B0-001  B0-3010023  Electronics restock: monitors and wireless key...    877.6\n# 3   B0-100236       B0-001  B1-3010021  Monthly cleaning supplies: disinfectant, trash...   314.75\n# 4   B0-100236       B0-002  B1-3010022  Breakroom essentials restock: coffee, tea, and...   182.45\nprint(data[\"items\"])\n#      item_id    order_id                                   name   price\n# 0  B0-200501  B0-3010021                  Ergonomic Office Desk  545.99\n# 1  B0-200502  B0-3010021              Mesh Back Executive Chair   399.5\n# 2  B1-200503  B0-3010022   Multipack Printer Paper (500 sheets)  129.95\n# 3  B1-200504  B0-3010022             Spiral Notebooks - 12 Pack   59.99\n# 4  B2-200505  B0-3010023               27\" LED Computer Monitor  489.95\n# 5  B2-200506  B0-3010023            Wireless Ergonomic Keyboard  387.65\n# 6  B3-200507  B1-3010021  Industrial Disinfectant Solution (5L)  148.95\n# 7  B3-200508  B1-3010021  Commercial Trash Liners - Case of 100    84.5\n# 8  B4-200509  B1-3010022        Premium Ground Coffee (2lb Bag)   74.99\n# 9  B4-200510  B1-3010022         Bottled Spring Water (24 Pack)   34.95\n```\n\n5. Create your first self-referencing mock table with auto-increment integer primary keys\n\n```python\nfrom mostlyai import mock\n\ntables = {\n    \"employees\": {\n        \"prompt\": \"Employees of a company\",\n        \"columns\": {\n            \"employee_id\": {\"dtype\": \"integer\"},\n            \"name\": {\"prompt\": \"first name and last name of the employee\", \"dtype\": \"string\"},\n            \"boss_id\": {\"dtype\": \"integer\"},\n            \"role\": {\"prompt\": \"the role of the employee\", \"dtype\": \"string\"},\n        },\n        \"primary_key\": \"employee_id\",\n        \"foreign_keys\": [\n            {\n                \"column\": \"boss_id\",\n                \"referenced_table\": \"employees\",\n                \"prompt\": \"each boss has at most 3 employees\",\n            },\n        ],\n    }\n}\ndf = mock.sample(tables=tables, sample_size=10, model=\"openai/gpt-5\", n_workers=1)\nprint(df)\n#   employee_id              name  boss_id                   role\n# 0            1      Patricia Lee     \u003cNA\u003e              President\n# 1            2  Edward Rodriguez        1       VP of Operations\n# 2            3      Maria Cortez        1          VP of Finance\n# 3            4     Thomas Nguyen        1       VP of Technology\n# 4            5        Rachel Kim        2     Operations Manager\n# 5            6     Jeffrey Patel        2      Supply Chain Lead\n# 6            7      Olivia Smith        2  Facilities Supervisor\n# 7            8      Brian Carter        3     Accounting Manager\n# 8            9   Lauren Anderson        3      Financial Analyst\n# 9           10   Santiago Romero        3     Payroll Specialist\n```\n\n6. Enrich existing data with additional columns\n\n```python\nfrom mostlyai import mock\nimport pandas as pd\n\ntables = {\n    \"guests\": {\n        \"prompt\": \"Guests of an Alpine ski hotel in Austria\",\n        \"columns\": {\n            \"gender\": {\"dtype\": \"category\", \"values\": [\"male\", \"female\"]},\n            \"age\": {\"prompt\": \"age in years; min: 18, max: 80; avg: 25\", \"dtype\": \"integer\"},\n            \"room_number\": {\"prompt\": \"room number\", \"dtype\": \"integer\"},\n            \"is_vip\": {\"prompt\": \"is the guest a VIP\", \"dtype\": \"boolean\"},\n        },\n        \"primary_key\": \"guest_id\",\n    }\n}\nexisting_guests = pd.DataFrame({\n    \"guest_id\": [1, 2, 3],\n    \"name\": [\"Anna Schmidt\", \"Marco Rossi\", \"Sophie Dupont\"],\n    \"nationality\": [\"DE\", \"IT\", \"FR\"],\n})\ndf = mock.sample(\n    tables=tables,\n    existing_data={\"guests\": existing_guests},\n    model=\"openai/gpt-5-nano\"\n)\nprint(df)\n#   guest_id           name nationality  gender  age  room_number is_vip\n# 0        1   Anna Schmidt          DE  female   30          102  False\n# 1        2    Marco Rossi          IT    male   27          215   True\n# 2        3  Sophie Dupont          FR  female   22          108  False\n```\n\n## MCP Server\n\nThis repo comes with MCP Server. It can be easily consumed by any MCP Client by providing the following configuration:\n\n```json\n{\n    \"mcpServers\": {\n        \"mostlyai-mock-mcp\": {\n            \"command\": \"uvx\",\n            \"args\": [\"--from\", \"mostlyai-mock[mcp]\", \"mcp-server\"],\n            \"env\": {\n                \"OPENAI_API_KEY\": \"PROVIDE YOUR KEY\",\n                \"GEMINI_API_KEY\": \"PROVIDE YOUR KEY\",\n                \"GROQ_API_KEY\": \"PROVIDE YOUR KEY\",\n                \"ANTHROPIC_API_KEY\": \"PROVIDE YOUR KEY\"\n            }\n        }\n    }\n}\n```\n\nFor example:\n- in Claude Desktop, go to \"Settings\" \u003e \"Developer\" \u003e \"Edit Config\" and paste the above into `claude_desktop_config.json`\n- in Cursor, go to \"Settings\" \u003e \"Cursor Settings\" \u003e \"MCP\" \u003e \"Add new global MCP server\" and paste the above into `mcp.json`\n\nTroubleshooting:\n1. If the MCP Client fails to detect the MCP Server, provide the absolute path in the `command` field, for example: `/Users/johnsmith/.local/bin/uvx`\n2. To debug MCP Server issues, you can use MCP Inspector by running: `npx @modelcontextprotocol/inspector -- uvx --from mostlyai-mock[mcp] mcp-server`\n3. In order to develop locally, modify the configuration by replacing `\"command\": \"uv\"` (or use the full path to `uv` if needed) and `\"args\": [\"--directory\", \"/Users/johnsmith/mostlyai-mock\", \"run\", \"--extra\", \"mcp\", \"mcp-server\"]`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostly-ai%2Fmostlyai-mock","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmostly-ai%2Fmostlyai-mock","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmostly-ai%2Fmostlyai-mock/lists"}