{"id":23859131,"url":"https://github.com/angelospanag/sort_nums","last_synced_at":"2026-05-30T23:30:17.186Z","repository":{"id":80607913,"uuid":"94688841","full_name":"angelospanag/sort_nums","owner":"angelospanag","description":"A console application written in Go that performs optimised sorting for a large number of integers in a 'Comma Separated Values' file.","archived":false,"fork":false,"pushed_at":"2017-06-22T02:11:08.000Z","size":50,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-03T03:34:37.423Z","etag":null,"topics":["csv","go","golang","sorting-algorithms"],"latest_commit_sha":null,"homepage":"","language":"Go","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/angelospanag.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-06-18T13:25:59.000Z","updated_at":"2017-06-22T02:00:31.000Z","dependencies_parsed_at":null,"dependency_job_id":"3e023ac8-888e-4e09-a88c-f880a5a6fee0","html_url":"https://github.com/angelospanag/sort_nums","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fsort_nums","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fsort_nums/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fsort_nums/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/angelospanag%2Fsort_nums/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/angelospanag","download_url":"https://codeload.github.com/angelospanag/sort_nums/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240163542,"owners_count":19758027,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","go","golang","sorting-algorithms"],"created_at":"2025-01-03T03:32:05.984Z","updated_at":"2026-05-30T23:30:15.092Z","avatar_url":"https://github.com/angelospanag.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sort_nums\n\nA console application written in Go that performs optimised sorting for a large number of integers in a 'Comma Separated Values' file.\n\n## Methodology used\n\nExternal merge sorting with Direct K-way merge. A simplified description of it:\n\n* **Splitting input file into sorted chunks in slow memory (hard disk)**\n\n  Given that we want to be efficient and not keep a very large amount of integers in memory, we split the input file into chunks.\n\n  Each chunk will be sequentially read in fast memory (RAM) and will be sorted on the spot. After that it will be written in the slow memory (hard disk).\n\n* **Merging the sorted chunks back into an output file**\n\n  A Direct K-Way merge is performed where k is the number of chunks we created from the previous step. The first entry of each chunk is read, stored in fast memory and evaluated so we can get the minimum of those entries. After that, the minimum entry is written on the output file and the next entry from the chunk that was used now takes the place of the previous entry.\n\n  This process continues until all the stored chunks that were stored in slow memory are read until their end.\n\n* **Finding a minimum entry using a Priority Queue**\n\n  For better performance an implementation of a Priority Queue was used to find the minimum of a series of entries as described in the previous step.\n\n  A Priority Queue associates a priority with each of its elements. Depending on our needs, when we remove an element from the queue it has the highest or the lowest priority. It is usually implemented using heaps and the implementation in this project follows that.\n\n  A reference implementation was used as shown in the official Go documentation:\n\n  https://golang.org/pkg/container/heap/#example__priorityQueue\n\n## Requirements\nThe following software must be installed on your environment to run this project.\n\n### Go\nYour environment must be configured with Go version 1.8.*\n\nhttps://golang.org/\n\nNo third-party Go libraries were used. The implementation uses only the facilities given by the Go standard library.\n\n#### MacOS (with `brew`)\n\n`$ brew install go`\n\n### Apt based GNU/Linux distributions (Debian/Ubuntu)\n`$ sudo apt-get install golang-go`\n\n### Windows (with `chocolatey`)\n\n`C:\\\u003e choco install golang`\n\n## Get it!\n\n`go get github.com/angelospanag/sort_nums`\n\n## Usage\n\nA Makefile has been provided at the root of the project with some convenient shortcut commands.\n\n### Build\nCompiles and builds a binary of the project\n\n`make build`\n\n### Documentation (godoc)\nAfter running the below command, visit http://localhost:6060/pkg/github.com/angelospanag/sort_nums/ on your browser\n\n`make doc`\n\n### Unit testing\n`make test`\n\n### Clean generated files\n`make clean`\n\n## Running the project\n\nThe fastest way to run the project without performing any compilation is to issue a `go run` command from the root of the project. For example:\n\n`go run main.go -file=random_10000.txt -memory=20000`\n\n* Parameters required\n\n  `-file`: specifies the CSV file that will serve as input\n\n  `-memory`: how much memory (in bytes) will be the limit for holding data in RAM\n\n* Sample output\n```\n  2017/06/22 02:34:20 File random_10000.txt is 48853 bytes, will be split to 4 chunks\n  2017/06/22 02:34:20 Trying to merge runs of 4 chunks\n  2017/06/22 02:34:20 Sorting took 63.144511ms\n```\n\n* Bonus bash command\n\n  A nice one-liner I discovered that counts the number of elements in a CSV file. For example for a file called `sorted_output.txt` do:\n\n  `sed 's/[^,]//g' sorted_output.txt | wc -c`\n\n## Resources Used\n\nThis section describes the resources used in the development of this project.\n\n* [Go 1.8.3](https://golang.org/)\n\n* [Atom 1.18](https://atom.io/) with a [series of preferred packages](https://github.com/angelospanag/atom_packages)\n\n* [go-plus for Atom](https://atom.io/packages/go-plus)\n\n* My trusty MacBook Pro 2015\n  * OS: 64bit Mac OS X 10.12.5 16F73\n  * CPU: Intel Core i5-5257U @ 2.70GHz\n  * GPU: Intel Iris Graphics 6100\n  * RAM: 10328MiB / 16384MiB\n\n\n* Lots and lots of coffee\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangelospanag%2Fsort_nums","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fangelospanag%2Fsort_nums","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fangelospanag%2Fsort_nums/lists"}