{"id":22886601,"url":"https://github.com/zenodeapp/protein-crud","last_synced_at":"2025-07-06T21:02:51.884Z","repository":{"id":50706977,"uuid":"519905824","full_name":"zenodeapp/protein-crud","owner":"zenodeapp","description":"A basic CRUD for Proteins with string query functionality.","archived":false,"fork":false,"pushed_at":"2023-12-07T20:04:18.000Z","size":75534,"stargazers_count":0,"open_issues_count":4,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-06T23:19:14.384Z","etag":null,"topics":["awk","contracts","crud","proteins","query-parser","solidity"],"latest_commit_sha":null,"homepage":"","language":"Solidity","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/zenodeapp.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2022-07-31T23:05:39.000Z","updated_at":"2022-09-08T16:04:10.000Z","dependencies_parsed_at":"2023-01-21T03:16:31.093Z","dependency_job_id":null,"html_url":"https://github.com/zenodeapp/protein-crud","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenodeapp%2Fprotein-crud","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenodeapp%2Fprotein-crud/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenodeapp%2Fprotein-crud/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/zenodeapp%2Fprotein-crud/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/zenodeapp","download_url":"https://codeload.github.com/zenodeapp/protein-crud/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246523090,"owners_count":20791431,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["awk","contracts","crud","proteins","query-parser","solidity"],"created_at":"2024-12-13T20:19:27.415Z","updated_at":"2025-03-31T18:45:23.625Z","avatar_url":"https://github.com/zenodeapp.png","language":"Solidity","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Protein CRUD (This readme is outdated!)\nA basic CRUD for Proteins with string query functionality.\n\nThis has been built within the Hardhat environment and is merely a basis for querying protein sequences and PDBID/ACCESSIONs.\n\n## Getting Started\n### 1. Installation\nTo get started, install all dependencies using a package manager of your choosing. For instance: \u003ccode\u003eyarn install\u003c/code\u003e or \u003ccode\u003enpm install\u003c/code\u003e.\n\n### 2. Run Test Node\nAfter having installed all dependencies use \u003ccode\u003enpx hardhat node\u003c/code\u003e to locally run a test environment where we could deploy our \u003cb\u003eProteinQuery\u003c/b\u003e contract to.\nYou could also use a different network of your choosing, which you could configure in the \u003cb\u003ehardhat.config.js\u003c/b\u003e-file (for more info on this: https://hardhat.org/hardhat-runner/docs/config).\n\n### 3. Deployment\nNow after having a node up and running we'll have to deploy our contract using \u003ccode\u003enpx hardhat run scripts/deploy.js\u003c/code\u003e. Which will return the contract address, found in your terminal.\n\n#### 3.1 Contract Address\nAfter deploying, add the contract address to the \u003cb\u003eproteins.config.js\u003c/b\u003e in the root folder, assign the address to the key named \u003cb\u003econtractAddress\u003c/b\u003e.\n\n### 4. Adding the Proteins\nSo our contract's deployed, we now just have to add our proteins to the contract's storage. The script \u003cb\u003eaddProteins.js\u003c/b\u003e will help with this. It's a simple Javascript that reads data from a \u003cb\u003e.txt file\u003c/b\u003e located in the \u003cb\u003edatasets\u003c/b\u003e-folder. These files contain the first n proteins from the Genesis L1 dataset (https://datasetnft.org/), where 104059 is the file containing all proteins. Change which dataset size you wish to use by editing this in the \u003cb\u003eproteins.config.js\u003c/b\u003e file.\n\u003c/br\u003e\n\u003c/br\u003e\nUse: \u003ccode\u003enpx hardhat run scripts/insertProteins.js\u003c/code\u003e to run the script.\n\n#### 4.1 Adding the Seeds (Optional, but recommended)\nSince version 1.1.0 a new way to query was added (semi-blast), which depends on the insertion of short seeds. Basically, these seeds are created by cutting all sequences into tiny n-sized words, where n could be adjusted by interacting with the contract. Each short sequence holds information on where this segment could be found.\n\u003c/br\u003e\n\u003c/br\u003e\nUse: \u003ccode\u003enpx hardhat run scripts/insertSeeds.js\u003c/code\u003e to run the script. Do know that this is costly and takes longer than the proteins script (optimization is needed).\n\n### 5. Querying the Proteins\nWe can finally query our dataset of proteins! I've written a task in the \u003cb\u003ehardhat.config.js\u003c/b\u003e-file which calls the function \u003ci\u003enaiveQuery\u003c/i\u003e and one calling \u003ci\u003esemiBlastQuery\u003c/i\u003e (only works with seeds added) present in \u003cb\u003econtracts/ProteinQuery.sol\u003c/b\u003e. It enables us to query by \u003ci\u003esequence OR ID OR both (an exclusive query)\u003c/i\u003e. \n\u003c/br\u003e\n\u003c/br\u003e\nTo run this task, use:\n\u003c/br\u003e\n\u003ccode\u003enpx hardhat naiveQuery --id \"your_id_query\" --sequence \"your_sequence_query\" --exclusive \"true/false\"\u003c/code\u003e\n\u003c/br\u003e\n\u003c/br\u003e\nOR\n\u003c/br\u003e\n\u003c/br\u003e\n\u003ccode\u003enpx hardhat semiBlastQuery --sequence \"your_sequence_query\" --casesensitive \"true/false\"\u003c/code\u003e \u003ci\u003ealot faster!\u003c/i\u003e\n\n#### 5.1 Flags\nAll flags are optional. So if you want to, let's say, only search for id's containing \"1A\", you'd only set the flag \u003ccode\u003e--id\u003c/code\u003e to \u003ccode\u003e\"1A\"\u003c/code\u003e. If you wanted to search for sequences containing \"AAA\" but also contain \"1A\" in its id, you'd have to set both flags to the corresponding values AND set \u003ccode\u003e--exclusive\u003c/code\u003e to \u003ccode\u003e\"true\"\u003c/code\u003e. This, because a value of \u003ccode\u003e\"false\"\u003c/code\u003e would return all sequences that match \"AAA\" AND all sequences that have a id containing \"1A\", while in this particular case we'd only want the values where both queries are true.\n\u003c/br\u003e\n\u003c/br\u003e\nThe default values for each flag, if omitted, are:\n\n`--id`: \"\"\n\n`--sequence`: \"\"\n\n`--exclusive`: \"false\"\n\n#### 5.2 Returned Value\nThe query returns an object containing an array with all found proteins and an integer stating the amount of results found: \u003ccode\u003e{proteins: Array of ProteinStruct, proteinsFound: uint}\u003c/code\u003e where \u003ci\u003eProteinStruct\u003c/i\u003e is an object of the format \u003ccode\u003e{nftId: uint, id: string, sequence: string}\u003c/code\u003e.\n\u003c/br\u003e\n\u003c/br\u003e\nSo, for instance, getting the sequence of the third protein in the returned value, in Javascript, would look like this: \u003ccode\u003eresult.proteins[2].sequence\u003c/code\u003e. See the \u003ci\u003enaiveQuery\u003c/i\u003e-task in \u003cb\u003ehardhat.config.js\u003c/b\u003e for a working example on how to loop through all the query results.\n\n## Remarks\n- ~~Solidity is not the most optimal when it comes to handling strings. Especially when it comes to larger strings. Therefore ideas like pre-processing the database and storing smaller segments are possible routes to explore to get this working faster (which I'm currently working on).~~ - included since version 1.1.0.\n- ~~The searches are \u003ci\u003ecase-sensitive\u003c/i\u003e at the moment. This could be solved upon insertion of the proteins (but would discard whether the letters were lower or uppercase) or solved by adding an extra toLowerCase function in Solidity. But then again, Solidity is not optimal for string manipulation and this would degrade performance.~~ - included since version 1.2.2.\n- There's a limitation in Solidity where `memory arrays` can't be dynamic in size. And since we cannot know beforehand how many results a query will have, we temporarily store the results in an array of size n, with n = the total amount of proteins. To prevent returning an array with a bunch of empty values, we copy the query results, in the temporary array, to a smaller sized array in the last line of the \u003ci\u003enaiveQuery\u003c/i\u003e function. But, ofcourse this is an extra step, degrading the speed of our queries. More info about this issue can be found in the contract.\n\n## Credits and sources of inspiration\nI've tried to credit everyone else's code by commenting in code whenever this was the case!\n- Hardhat's infrastructure! (https://hardhat.org/)\n- Rob Hitchen's User CRUD (https://bitbucket.org/rhitchens2/soliditycrud/src/master/)\n- Hermes Ateneo's \"contains\" function (https://github.com/HermesAteneo/solidity-repeated-word-in-string/blob/main/RepeatedWords.sol)\n- Ottodevs' toLowerCase function (https://gist.github.com/ottodevs/c43d0a8b4b891ac2da675f825b1d1dbf) \n- Comparing strings (https://ethereum.stackexchange.com/questions/30912/how-to-compare-strings-in-solidity)\n- Quick-sort from Subhodi, but in code altered for our needs (https://gist.github.com/subhodi/b3b86cc13ad2636420963e692a4d896f)\n- Semi-blast is inspired by the first steps of the Blast algorithm (by reading research papers, lectures, pseudo-code and implementations in other languages by others)\n\u003c/br\u003e\n\u003c/br\u003e\n\nTousuke (ZEN - https://twitter.com/KeymasterZen)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenodeapp%2Fprotein-crud","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fzenodeapp%2Fprotein-crud","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fzenodeapp%2Fprotein-crud/lists"}