{"id":20989905,"url":"https://github.com/iddm/prisma-test","last_synced_at":"2025-03-13T11:46:06.201Z","repository":{"id":259307403,"uuid":"877541640","full_name":"iddm/prisma-test","owner":"iddm","description":null,"archived":false,"fork":false,"pushed_at":"2024-10-23T20:48:50.000Z","size":16,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-03-10T12:55:42.946Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Rust","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/iddm.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-10-23T20:37:41.000Z","updated_at":"2024-10-24T18:41:00.000Z","dependencies_parsed_at":"2024-10-24T08:36:35.179Z","dependency_job_id":"d92b55f0-8f61-457d-a32f-24e65d4e716f","html_url":"https://github.com/iddm/prisma-test","commit_stats":null,"previous_names":["iddm/prisma-test"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iddm%2Fprisma-test","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iddm%2Fprisma-test/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iddm%2Fprisma-test/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/iddm%2Fprisma-test/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/iddm","download_url":"https://codeload.github.com/iddm/prisma-test/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243401492,"owners_count":20285052,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-19T06:26:40.674Z","updated_at":"2025-03-13T11:46:06.167Z","avatar_url":"https://github.com/iddm.png","language":"Rust","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Simple CSV parser and query tool\n\n## Running\n\nThe program loads the `data.csv` in the current working directory and\nprovides a tiny and simple REPL-like interface to perform queries.\n\nThe query language is dead simple and always requires two things:\n`PROJECT`and `FILTER` to be specified within the query.\n\n`PROJECT` lists the column names to output, and the `FILTER` lists the filter conditions for the data.\n\nRunning:\n\n```sh\ncargo run\n```\n\nQuerying:\n\n```sh\nWelcome to the CSV data query tool!\n\n (CTRL-C for exit) REPL \u003e PROJECT col1 FILTER col2=\"bar\"\n\n {\"col1\": Integer(IntegerColumnType(2))}\n\n (CTRL-C for exit) REPL \u003e\n```\n\n## Questions\n\n### What were some of the tradeoffs you made when building this and why were these acceptable tradeoffs?\n\nI simplified the way I work with the table and I don't require any order.\nI also only work with `i64` integers and strings.\nThe filter iterator returns a hashmap, but there are other options, I\ndid not evaluate each one of them.\n\n### Given more time, what improvements or optimizations would you want to add?\n\nI'd first look at the filter iterator performance.\nBetter output. I'd at least provide a tui interface.\n\n### When would you add them?\n\nDidn't understand the question.\n\n### What changes are needed to accommodate changes to support other data types, multiple filters, or ordering of results?\n\nIt depends. There is not just one way to do all of this, and never is\none just always better. We could store the index to the data entries and\nthe data itself separately instead of a hashmap. We could use vectors,\nfor example, this would be much more efficient for the CPU and better\nwith regards to the memory accesses, less latency and jumps.\n\n### What changes are needed to process extremely large datasets?\n\nThere are millions of ways to achieve that, depending on the desired\noutcome and the actual problem at that time.\n\nPerhaps, not even loading the data set into RAM in the first place, but\nrather locking it on the filesystem (if it is a file), or locking the\nparts of it, if possible, and going through it in the filter iterator.\n\nThen, avoiding copies, and using some compression goes there.\n\nThere are many books on how to write a database out there, I am too lazy\nto repeat everything written there and all the old and new, bad and good\npractices to do the things. Only reinvent the wheel if you know all the\nother wheels.\n\n### What do you still need to do to make this code production ready?\n\nI doubt there are just CSV parsers like that :-D This question is too\nbroad and there are millions of choices to make depending on the exact\n\"production\" requirements. It is impossible to answer unless we discuss\nthe exact requirements first. There can't be one single solution to fit\nall, or it would take unreasonably long time to implement while the\nbusiness actually doesn't need it.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiddm%2Fprisma-test","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiddm%2Fprisma-test","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiddm%2Fprisma-test/lists"}