{"id":22825885,"url":"https://github.com/tylerburdsall/combigen","last_synced_at":"2025-03-31T00:26:37.937Z","repository":{"id":95611098,"uuid":"132060460","full_name":"tylerburdsall/combigen","owner":"tylerburdsall","description":"An efficient CLI tool to generate possible combinations written in C++","archived":false,"fork":false,"pushed_at":"2019-01-01T20:24:10.000Z","size":4334,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-06T05:42:31.998Z","etag":null,"topics":["c-plus-plus","cartesian-product","cartesian-products","cli","combigen","combination","cpp"],"latest_commit_sha":null,"homepage":null,"language":"C++","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/tylerburdsall.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-05-03T23:37:00.000Z","updated_at":"2024-02-23T12:44:46.000Z","dependencies_parsed_at":"2023-05-21T02:15:16.815Z","dependency_job_id":null,"html_url":"https://github.com/tylerburdsall/combigen","commit_stats":null,"previous_names":[],"tags_count":8,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylerburdsall%2Fcombigen","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylerburdsall%2Fcombigen/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylerburdsall%2Fcombigen/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/tylerburdsall%2Fcombigen/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/tylerburdsall","download_url":"https://codeload.github.com/tylerburdsall/combigen/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246399818,"owners_count":20770905,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["c-plus-plus","cartesian-product","cartesian-products","cli","combigen","combination","cpp"],"created_at":"2024-12-12T17:12:23.717Z","updated_at":"2025-03-31T00:26:37.911Z","avatar_url":"https://github.com/tylerburdsall.png","language":"C++","readme":"# combigen\nAn efficient CLI tool to generate possible combinations written in C++\n\n## Introduction\nCombigen aims to assist with data generation and exploration. Given a `.json` input where each key contains an array of string values (or simply an array of string arrays), combigen can either generate every possible combination or a random, evenly-distributed subset of the possible combinations. It aims to be memory-efficient while maintaining high-performance. This can be especially useful when large amounts of data are needed for statistical analysis or mock data in an application.\n\nIt supports output as `.csv` and `.json`.\n\n## Usage\n\nBasic commands are listed below:\n\n\n```\nUsage: combigen [options]\n   -h             Displays this help message\n\n   -a             Generates every possible combination, restricted to memory mode.\n                  (Note: this should be used with caution when storing to disk)\n\n   -n \u003cindex\u003e     Generate combination at nth index\n\n   -i \u003cinput\u003e     Take the given .json file as input. Otherwise, input will come\n                  from stdin.\n                  Example: \"{ \"foo\": [ \"a\", \"b\", \"c\" ], \"bar\": [ \"1\", \"2\" ] }\"\n\n   -t \u003ctype\u003e      Output type (csv or json). Defaults to csv\n\n   -r \u003csize\u003e      Generate a random sample of size r from\n                  the possible set of combinations\n\n   -d \u003cdelimiter\u003e Set the delimiter when displaying combinations (default is ',')\n\n   -k             Display the keys on the first line of output (for .csv)\n\n   -p             Use performance mode to generate combinations faster at the\n                  expense of higher RAM usage.\n                  (Note: this is only recommended for computers with large amounts\n                  of RAM when generating a large number of random combinations)\n\n   -v             Display version number\n```\n\n## Prerequisites\n### Linux/UNIX/Cygwin\n**Required:**\n* make\n* g++ (capable of compiling to the C++14 standard or higher)\n\n**Optional:**\n* [Boost](https://www.boost.org), in case you are working with large sets of data\n\nIf you need to install Boost, I recommend utilizing your distro's package manager:\n\n#### Debian/Ubuntu\n`$ sudo apt install libboost-all-dev`\n\n#### Fedora\n`$ sudo dnf install boost`\n\n#### Arch/Manjaro/Antergos\n`$ sudo pacman -Sy boost`\n\n\n### Windows\n**Required:**\n* Visual Studio 2015 or higher\n\n**Optional:**\n* [Boost](https://www.boost.org), in case you are working with large sets of data. \nI recommend downloaded the precompiled libraries and placing them somewhere easy to remember on your machine.\n\n\n## Building From Source and Installing\nNote: for Windows, if you do not want to/don't have the ability to compile from the source files you can go to the [Release](https://github.com/iamtheburd/combigen/releases) page and directly download the `combigen.exe` binary from there. This also has the added of benefit of being compiled with the Boost libraries already.\n\n### Linux/UNIX\n\n1. Clone the repository and `cd` into it:\n\n```\n$ git clone --recurse-submodules -j8 https://github.com/iamtheburd/combigen.git \u0026\u0026 cd combigen\n```\n\n2. Build with `make`:\n\n```\n$ make\n```\n\nIf you need support for larger sets of data (and have Boost installed), instead build with `make perf`:\n\n```\n$ make perf\n```\n\n3. Install:\n\n```\n$ sudo make install\n```\n\n### Windows\n\n1. Download Visual Studio 2015+ and install.\n\n2. Clone the repository to some directory using the above command\n\n3. Open up the Developer Command Prompt (can usually be found by searching in the Start menu)\n\n4. `cd` to where your cloned repository is\n\n5. Build the file:\n\n```\n\u003e cl /EHsc /O2 src\\cli_functions.cpp src\\combigen.cpp src\\main.cpp /Fe\".\\combigen.exe\" \n```\n\nAlternatively, if you need support for larger sets of data (and have Boost installed somewhere on your machine), run this command instead. Ensure you fill in the proper path to your Boost directory (this example assumes Boost 1.68.0 installed):\n\n```\n\u003e cl /EHsc /DUSE_BOOST /O2 /I C:\\path\\to\\boost_1_68_0 src\\cli_functions.cpp src\\boost_functions.cpp src\\main.cpp /Fe\".\\combigen.exe\" /link /LIBPATH:C:\\path\\to\\boost_1_68_0\\lib64-msvc-14.1\n```\n\n6. Place the resulting `combigen.exe` wherever you desire\n\n7. **Note:** Do not use PowerShell to execute `combigen.exe`. For whatever reason, PowerShell completely bogs down execution time. It is better to use `cmd` instead.\n\n\n## Usage\n\nUsing the example `combinations.json` data provided, here are some examples showcasing some features:\n\n### Input\n\nYou can either use the `-i` flag and specify an input `.json` file:\n\n\n```\n$ combigen -i example_data/combinations.json -n 100  # Find the combination at index 100\n```\n\nOr you can feed in an input from `stdin`:\n\n\n```\n$ cat example_data/combinations.json | combigen -n 100  # Find the combination at index 100\n```\n\nAlternatively, if you want to manually type in your string, the program will await user input until EOF. For Windows, this is `CTRL+Z`. For Linux/UNIX, this is `CTRL+D`.\n\n### Output\n\nIt's recommended to use your OS's built-in output redirection to write out to a file for ease-of-use and performance:\n\n```\n$ combigen -i example_data/combinations.json -r 50000 \u003e output.txt  # Generate 50,000 random combinations\n                                                                    # and store them in output.txt\n```\n\n### Large Sets of Data\n\nTo demonstrate how `combigen` can even work with large sets of data (when compiled with the Boost library) we can use the example `large_bits.json` file. Unlike the above example data, this file only contains an array of string arrays. In this set of data, the maximum size is equivalent to 3 ^ 256. We can still find the last entry (max size - 1):\n\n```\n$ combigen -i example_data/large_bits.json -n 139008452377144732764939786789661303114218850808529137991604824430036072629766435941001769154109609521811665540548899435520\n2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2\n$\n```\n\n\n### Types\n\nYou can export in either `.csv` or `.json`. Use the `-t` flag to explicitly set the output:\n\n#### CSV\n\nIf you need the first row to contain column headers, you can also use the `-k` flag to display the keys as column headers:\n\n\n```\n$ combigen -i example_data/combinations.json -r 5 -k  # Generate 5 random combinations and display the keys\nAge,First Name,Last Name,Number of Children,Number of Pets,Primary Desktop OS,Primary Mobile Phone OS,Residence,State/Territory\n35,Kevin,Long,0,4,macOS,iOS,House,MD\n45,Samantha,Thomas,3,1,Linux,Windows,Other,IL\n50,Katherine,Williams,4,1,macOS,Windows,Other,TN\n60,Kevin,Johnson,5+,1,Windows,Other,Other,NE\n90,Sally,Wilson,5+,0,BSD,Windows,RV,PR\n$\n```\n\nYou can also change the delimiter with the `-d` flag:\n\n```\n$ combigen -i example_data/combinations.json -r 3 -k -d \"||\"  # Generate 3 random combinations, display the keys,\n                                                              # and set the delimiter to ||\nAge||First Name||Last Name||Number of Children||Number of Pets||Primary Desktop OS||Primary Mobile Phone OS||Residence||State/Territory\n20||Samantha||Harris||3||4||Windows||Other||RV||GA\n25||Matthew||Thomas||2||0||Windows||Other||Town Home||IL\n80||James||Jones||1||3||macOS||iOS||House||FM\n$                                                               \n```\n\n#### JSON\n\nUse the `-t` flag to specify a JSON output:\n\n\n```\n$ combigen -i example_data/combinations.json -r 5 -t json  # Generate 5 random combinations in .json format\n[\n{\n    \"Age\": \"40\",\n    \"First Name\": \"Matthew\",\n    \"Last Name\": \"Harris\",\n    \"Number of Children\": \"5+\",\n    \"Number of Pets\": \"3\",\n    \"Primary Desktop OS\": \"Other\",\n    \"Primary Mobile Phone OS\": \"iOS\",\n    \"Residence\": \"RV\",\n    \"State/Territory\": \"UT\"\n},{\n    \"Age\": \"50\",\n    \"First Name\": \"Kimberly\",\n    \"Last Name\": \"Anderson\",\n    \"Number of Children\": \"3\",\n    \"Number of Pets\": \"3\",\n    \"Primary Desktop OS\": \"Linux\",\n    \"Primary Mobile Phone OS\": \"Windows\",\n    \"Residence\": \"Apartment\",\n    \"State/Territory\": \"RI\"\n},{\n    \"Age\": \"70\",\n    \"First Name\": \"Kevin\",\n    \"Last Name\": \"Torres\",\n    \"Number of Children\": \"5+\",\n    \"Number of Pets\": \"0\",\n    \"Primary Desktop OS\": \"Other\",\n    \"Primary Mobile Phone OS\": \"iOS\",\n    \"Residence\": \"Apartment\",\n    \"State/Territory\": \"LA\"\n},{\n    \"Age\": \"80\",\n    \"First Name\": \"Sally\",\n    \"Last Name\": \"Gonzales\",\n    \"Number of Children\": \"1\",\n    \"Number of Pets\": \"1\",\n    \"Primary Desktop OS\": \"BSD\",\n    \"Primary Mobile Phone OS\": \"Windows\",\n    \"Residence\": \"Condo\",\n    \"State/Territory\": \"TN\"\n},{\n    \"Age\": \"90\",\n    \"First Name\": \"Brooke\",\n    \"Last Name\": \"Wilson\",\n    \"Number of Children\": \"1\",\n    \"Number of Pets\": \"4\",\n    \"Primary Desktop OS\": \"Other\",\n    \"Primary Mobile Phone OS\": \"Android\",\n    \"Residence\": \"House\",\n    \"State/Territory\": \"CO\"\n}]\n$\n```\n\n## Using Performance Mode\n\nWhen generating a large number of combinations, there come a desire to speed up the process. For this case, use the `-p` flag to set combigen to switch to Performance Mode. This will generate all of the combinations at once before outputting them to `stdout`. **Note: this is only recommended for systems with a large amount of RAM when generating incredibly large sets of data**.\n\nThis begins to make a difference when the generated sets of data start to become quite large, as opposed to the default Memory Mode. See the results of some tests below for more information.\n\nFor now, when generating every possible combination this will be performed in Memory Mode to save RAM space.\n\n### Performance Tests\n\nTo visualize the performance differences between Memory Mode and Performance Mode, a small test was performed to illustrate where Performance Mode begins to offer a significant advantage.\n\n#### Testing Parameters\n\nEach iteration of a test would time the amount of time it takes to generate *n* amount of random combinations and write them to disk; 5 times each. Then, for each amount of *n*, the average of these 5 iterations would be recorded and graphed.\n\nThe following tests were performed on a Lenovo ThinkPad T460 with the following specs:\n\n* Windows 10 Enterprise\n* 256GB SSD w/full disk encryption\n* 8GB Ram\n* Intel Core i5 - 6300U @ 2.40GHz\n\nThe environment was tested with the following:\n\n* Compiled with Visual Studio Developer Tools 2017 x64 with the compile flags listed above\n* Git Bash as a shell to utilize the UNIX `time` function\n* Each iteration was generated using the command `time ./combigen.exe -i example_data/combinations.json -r \"$n\" # amount of random combinations \u003e output.txt`\n\nThe source code for these shell scripts can be found in the [peformance_tests](performance_tests/) folder.\n\n#### Testing Results\n\nThe results from the test were graphed:\n\n![Testing Results](performance_tests/performance-mode-vs-memory-mode-test-results.png)\n\n#### Conclusion\n\nBased on the results above, Performance Mode will only start to offer real benefits when the amount of combinations is quite large, but the net difference still ends up being negligible. Performance mode should only be used when the computer can truly handle storing all of these combinations in RAM. Ultimately, it boils down to two factors:\n\n* If you can spare time and don't want to bog down your machine (or the amount of generated combinations is small), stick with the default Memory Mode.\n* If you have a well-spec'd machine and can sacrifice the RAM when generating a large amount of combinations, choose Performance Mode.\n\nRegardless, a large amount of combinations requires a large amount of disk space, so keep this into account when generating data.\n\n## Third-Party Libraries\n\nCombigen uses the following open-source libraries:\n\n* [nlohmann/json](https://github.com/nlohmann/json) - An excellent C++ library for parsing JSON\n\n* [lazy-cartesian-product](https://github.com/iamtheburd/lazy-cartesian-product) - Small C++ library I developed to generate the combinations\n\n* [skandhurkat/Getopt-for-Visual-Studio](https://github.com/skandhurkat/Getopt-for-Visual-Studio) - Port of the MinGW version of `getopt.h` so that the CLI works on Windows\n\n* [Boost](https://www.boost.org) - For operating with incredibly large sets of data that push the limits of an `unsigned long long`.\n\n\n## Contributing\nPull-requests are always welcome\n\n## License\nLicensed under GPLv3, see [LICENSE](https://github.com/iamtheburd/combigen/blob/master/LICENSE)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerburdsall%2Fcombigen","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftylerburdsall%2Fcombigen","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftylerburdsall%2Fcombigen/lists"}