{"id":25405287,"url":"https://github.com/dacort/faker-cli","last_synced_at":"2025-10-31T01:31:33.974Z","repository":{"id":169532824,"uuid":"645525334","full_name":"dacort/faker-cli","owner":"dacort","description":"Command-line interface to quickly generate fake CSV and JSON data","archived":false,"fork":false,"pushed_at":"2024-07-11T22:57:57.000Z","size":83,"stargazers_count":72,"open_issues_count":0,"forks_count":6,"subscribers_count":3,"default_branch":"main","last_synced_at":"2025-04-07T01:23:43.638Z","etag":null,"topics":["aws","csv","deltalake","faker-provider","json","parquet","pyarrow"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dacort.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-25T21:17:39.000Z","updated_at":"2024-10-20T21:29:27.000Z","dependencies_parsed_at":"2024-01-17T22:04:01.673Z","dependency_job_id":"6f51c046-416c-4849-ba47-3455512b8f2e","html_url":"https://github.com/dacort/faker-cli","commit_stats":null,"previous_names":["dacort/faker-cli"],"tags_count":13,"template":false,"template_full_name":null,"purl":"pkg:github/dacort/faker-cli","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Ffaker-cli","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Ffaker-cli/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Ffaker-cli/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Ffaker-cli/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dacort","download_url":"https://codeload.github.com/dacort/faker-cli/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dacort%2Ffaker-cli/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":281914522,"owners_count":26583082,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-30T02:00:06.501Z","response_time":61,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["aws","csv","deltalake","faker-provider","json","parquet","pyarrow"],"created_at":"2025-02-16T04:33:33.449Z","updated_at":"2025-10-31T01:31:33.580Z","avatar_url":"https://github.com/dacort.png","language":"Python","readme":"# Faker CLI\n\n[Faker](https://faker.readthedocs.io/en/master/) is an awesome Python library, but I often just want a simple command I can run to generate data in a variety of formats.\n\nWith Faker CLI, you can easily generate CSV, JSON, or Parquet data with fields of your choosing.\n\nYou can also utilize pre-built templates for common data formats!\n\n## Installation\n\n```bash\npip install faker-cli\n```\n\n\u003e [!TIP]\n\u003e To use Parquet or Delta Lake, use `pip install faker-cli[parquet]` or `pip install faker-cli[delta]`\n\n## Usage\n\nOnce installed you should have the `fake` command in your path. Run the following see usage / help:\n\n```bash\nfake --help\n```\n\nBy default, `fake` will generate a CSV output for you. You just specify the number of rows you want and the column types.\n\n```bash\nfake -n 10 pyint,user_name,date_this_year\n```\n\nBAM! You've got a CSV file with your data.\n\n```\npyint,user_name,date_this_year\n8649,fward,2023-03-08\n3933,zharris,2023-03-20\n1469,jasonellis,2023-05-16\n3660,heather91,2023-02-10\n9160,cameronlopez,2023-05-05\n2735,candacemoore,2023-05-12\n7240,zachary06,2023-01-23\n9778,thomasstacey,2023-05-23\n5820,kenneth36,2023-04-26\n2856,michael23,2023-01-16\n```\n\n### JSON\n\nWnat a JSON file? Sweet, use `-f json`.\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f json\n```\n\n```json\n{\"pyint\": 3854, \"user_name\": \"cchavez\", \"date_this_year\": \"2023-01-20\"}\n{\"pyint\": 2008, \"user_name\": \"vnguyen\", \"date_this_year\": \"2023-04-03\"}\n{\"pyint\": 1434, \"user_name\": \"karen38\", \"date_this_year\": \"2023-03-02\"}\n{\"pyint\": 4922, \"user_name\": \"duncanellen\", \"date_this_year\": \"2023-04-22\"}\n{\"pyint\": 230, \"user_name\": \"tiffany72\", \"date_this_year\": \"2023-02-25\"}\n{\"pyint\": 7252, \"user_name\": \"maydouglas\", \"date_this_year\": \"2023-04-01\"}\n{\"pyint\": 2716, \"user_name\": \"sheilaflores\", \"date_this_year\": \"2023-03-20\"}\n{\"pyint\": 2827, \"user_name\": \"parksandra\", \"date_this_year\": \"2023-04-01\"}\n{\"pyint\": 3353, \"user_name\": \"melissaatkinson\", \"date_this_year\": \"2023-02-10\"}\n{\"pyint\": 5306, \"user_name\": \"mark12\", \"date_this_year\": \"2023-04-16\"}\n```\n\n### Column Names\n\nDefault column names aren't good enough for you? Fine, use your own.\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f json -c id,awesome_name,last_attention_at\n```\n\n```\n{\"id\": 6048, \"awesome_name\": \"jtran\", \"last_attention_at\": \"2023-04-24\"}\n{\"id\": 4310, \"awesome_name\": \"stacey99\", \"last_attention_at\": \"2023-04-27\"}\n{\"id\": 1839, \"awesome_name\": \"jho\", \"last_attention_at\": \"2023-03-07\"}\n{\"id\": 236, \"awesome_name\": \"melissamassey\", \"last_attention_at\": \"2023-04-17\"}\n{\"id\": 6599, \"awesome_name\": \"mwells\", \"last_attention_at\": \"2023-04-25\"}\n{\"id\": 6071, \"awesome_name\": \"wilcoxrick\", \"last_attention_at\": \"2023-01-17\"}\n{\"id\": 9646, \"awesome_name\": \"michael92\", \"last_attention_at\": \"2023-04-22\"}\n{\"id\": 6986, \"awesome_name\": \"ballen\", \"last_attention_at\": \"2023-01-08\"}\n{\"id\": 6892, \"awesome_name\": \"jennifer61\", \"last_attention_at\": \"2023-01-03\"}\n{\"id\": 1967, \"awesome_name\": \"jmendoza\", \"last_attention_at\": \"2023-01-23\"}\n```\n\n### Providers (beta)\n\nWhile [Faker](https://faker.readthedocs.io) is a sweet library, we all like options don't we? [Mimesis](https://mimesis.name/en/master/) is _also_ awesome and can be quite a bit faster than Faker. 🤫 You can use a different provider by using `-p mimesis`.\n\n\u003e [!NOTE]  \n\u003e Providers use their own syntax for data types, so you must change out your column names as necessary.\n\nTo generate the same dataset above with Mimesis for example:\n\n```bash\nfake -p mimesis -n 10 \"numeric.integer_number(0),person.username,datetime.date(2024)\" -f json -c id,awesome_name,last_attention_at\n```\n\n### Provider Arguments\n\nSome [Faker providers](https://faker.readthedocs.io/en/master/providers/baseprovider.html) (like `pyint`) take arguments. You can also specify those if you like, separated by semi-colons (_because some arguments take a comma-separated string :)_)\n\n```bash\nfake -n 10 \"pyint(1;100),credit_card_number(amex),pystr_format(?#-####)\" -f json -c id,credit_card_number,license_plate\n```\n\n\u003e [!IMPORTANT]\n\u003e When using arguments with output formats like JSON, it's best to provide column headers as well with `-c`.\n\nAnd unique values are supported as well.\n\n```bash\nfake -n 10 \"unique.pyint(1;10),unique.name\"\n```\n\n### Parquet\n\nOK, it had to happen, you can even write Parquet.\n\nInstall with the `parquet` module: `pip install faker-cli[parquet]`\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f parquet -o sample.parquet\n```\n\n_youcanevenwritestraighttos3_ 🤭\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f parquet -o s3://YOUR_BUCKET/data/sample.parquet\n```\n\n### Delta Lake\n\nData can be exported as a delta lake table.\n\nInstall with the `delta` module: `pip install faker-cli[delta]`\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f deltalake -o sample_data\n```\n\n### Iceberg\n\nAnd, of course, Iceberg tables!\n\nCurrently supported are writing to a Glue or generic SQL catalog.\n\nInstall with the `iceberg` module: `pip install faker-cli[iceberg]`\n\n```bash\nfake -n 10 pyint,user_name,date_this_year -f iceberg -C glue://default.iceberg_sample -o s3://YOUR_BUCKET/iceberg-data/\n```\n\n## Templates\n\nThe libary includes a couple templates that can be used to generate certain types of fake data easier.\n\nToday, the only templates that exist are for S3 Access and CloudFront logs.\n\nWant to generate 1 MILLION S3 Access logs in ~2 minutes? Now you can. (But I only show 10 below so as not to crash your terminal)\n\n```bash\nfake -t s3access -n 10\n```\n\nHow about CloudFront? Go ahead.\n\n```bash\nfake -t cloudfront -n 10\n```\n\n\u003e **Warning**: Both of these templates are still being validated - please be cautious!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdacort%2Ffaker-cli","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdacort%2Ffaker-cli","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdacort%2Ffaker-cli/lists"}