{"id":20040764,"url":"https://github.com/simonblanke/search-data-collector","last_synced_at":"2025-05-05T08:32:05.308Z","repository":{"id":57467378,"uuid":"384162091","full_name":"SimonBlanke/search-data-collector","owner":"SimonBlanke","description":"Thread safe and atomic data collection into csv-files","archived":false,"fork":false,"pushed_at":"2024-08-15T12:20:26.000Z","size":75,"stargazers_count":2,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-04-08T19:47:24.700Z","etag":null,"topics":["csv","data-collection","hyperactive","pandas","python"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SimonBlanke.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-07-08T15:02:14.000Z","updated_at":"2024-09-05T12:38:20.000Z","dependencies_parsed_at":"2024-01-07T14:44:25.220Z","dependency_job_id":"b0ed7014-a553-4424-a38a-7c18d46ff6de","html_url":"https://github.com/SimonBlanke/search-data-collector","commit_stats":{"total_commits":76,"total_committers":2,"mean_commits":38.0,"dds":"0.13157894736842102","last_synced_commit":"24bd06417ac7810af49f26732fb7f80f0e2a6d3d"},"previous_names":["simonblanke/data-collector"],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimonBlanke%2Fsearch-data-collector","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimonBlanke%2Fsearch-data-collector/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimonBlanke%2Fsearch-data-collector/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SimonBlanke%2Fsearch-data-collector/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SimonBlanke","download_url":"https://codeload.github.com/SimonBlanke/search-data-collector/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252466724,"owners_count":21752422,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["csv","data-collection","hyperactive","pandas","python"],"created_at":"2024-11-13T10:43:42.557Z","updated_at":"2025-05-05T08:32:05.020Z","avatar_url":"https://github.com/SimonBlanke.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cH1 align=\"center\"\u003e\n    Search Data Collector\n\u003c/H1\u003e\n\n\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/SimonBlanke/search-data-collector/actions\"\u003e\n    \u003cimg src=\"https://github.com/SimonBlanke/search-data-collector/actions/workflows/tests.yml/badge.svg?branch=main\" alt=\"img not loaded: try F5 :)\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://app.codecov.io/gh/SimonBlanke/search-data-collector\"\u003e\n    \u003cimg src=\"https://img.shields.io/codecov/c/github/SimonBlanke/search-data-collector/main\u0026logo=codecov\" alt=\"img not loaded: try F5 :)\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n\n\u003cbr\u003e\n\n\u003cH2 align=\"center\"\u003e\n    Thread-safe and atomic collection of tabular data into csv-files.\n\u003c/H2\u003e\n\n\u003cbr\u003e\n\n## Introduction\n\nThe search-data-collector provides a single class with following methods to manage data:\n - save\n - append\n - load\n - remove\n\nThe Search-Data-Collector was created as a utility function for the [Gradient-Free-Optimizers](https://github.com/SimonBlanke/Gradient-Free-Optimizers)- and [Hyperactive](https://github.com/SimonBlanke/Hyperactive)-package. It is intended to be used as a tool to collect search-data from the optimization run. The search-data can be collected during the optimization run as a dictionary via `append` or after the run as a dataframe with the `save`-method. \u003cbr\u003e\nThe `append`-method is thread-safe to work with hyperactive-multiprocessing. The `save`-method is atomic to avoid accidental data-loss, when interupting the save-process. \u003cbr\u003e\nFor the Hyperactive-package the search-data-collector handles functions in the data by converting them to strings. If the data is loaded you can pass the search-space to convert the strings back to functions. \n\n\n\n\u003cbr\u003e\n\n## Disclaimer\n\nThis project is in an early development stage and is sparsely tested. If you encounter bugs or have suggestions for improvements, then please open an issue.\n\n\n\u003cbr\u003e\n\n## Installation\n\n```console\npip install search-data-collector \n```\n\n\n\u003cbr\u003e\n\n## Examples\n\n\n\u003cbr\u003e\n\n### Append search-data\n\n```python\nimport numpy as np\nfrom hyperactive import Hyperactive\nfrom search_data_collector import CsvSearchData\n\ncollector = CsvSearchData(\"./search_data.csv\")  # the csv is created automatically\n\n\ndef parabola_function(para):\n    loss = para[\"x\"] * para[\"x\"] + para[\"y\"] * para[\"y\"]\n\n    data_dict = dict(para)  # copy the parameter dictionary\n    data_dict[\"score\"] = -loss  # add the score to the dictionary\n    collector.append(data_dict)  # you can append a dictionary to the csv\n\n    return -loss\n\n\nsearch_space = {\n    \"x\": list(np.arange(-10, 10, 0.1)),\n    \"y\": list(np.arange(-10, 10, 0.1)),\n}\n\n\nhyper = Hyperactive()\nhyper.add_search(parabola_function, search_space, n_iter=1000)\nhyper.run()\nsearch_data = hyper.search_data(parabola_function)\n\nsearch_data = collector.load(search_space)  # load data\n\nprint(\"\\n search_data \\n\", search_data)\n```\n\n\n\u003cbr\u003e\n\n### Save search-data\n\n```python\nimport numpy as np\nfrom hyperactive import Hyperactive\nfrom search_data_collector import CsvSearchData\n\ncollector = CsvSearchData(\"./search_data.csv\")  # the csv is created automatically\n\n\ndef parabola_function(para):\n    loss = para[\"x\"] * para[\"x\"] + para[\"y\"] * para[\"y\"]\n\n    return -loss\n\n\nsearch_space = {\n    \"x\": list(np.arange(-10, 10, 0.1)),\n    \"y\": list(np.arange(-10, 10, 0.1)),\n}\n\n\nhyper = Hyperactive()\nhyper.add_search(parabola_function, search_space, n_iter=1000)\nhyper.run()\nsearch_data = hyper.search_data(parabola_function)\n\ncollector.save(search_data)  # save a dataframe instead\n\nsearch_data = collector.load(search_space)  # load data\n\nprint(\"\\n search_data \\n\", search_data)\n```\n\n\n\n\u003cbr\u003e\n\n### Functions in the search-space/search-data\n\n```python\nimport numpy as np\nfrom hyperactive import Hyperactive\nfrom search_data_collector import CsvSearchData\n\ncollector = CsvSearchData(\"./search_data.csv\")  # the csv is created automatically\n\n\ndef parabola_function(para):\n    loss = para[\"x\"] * para[\"x\"] + para[\"y\"] * para[\"y\"]\n\n    return -loss\n\n\n# just some dummy functions to show how this works\n\n\ndef function1():\n    print(\"this is function1\")\n\n\ndef function2():\n    print(\"this is function2\")\n\n\ndef function3():\n    print(\"this is function3\")\n\n\nsearch_space = {\n    \"x\": list(np.arange(-10, 10, 0.1)),\n    \"y\": list(np.arange(-10, 10, 0.1)),\n    \"string.example\": [\"string1\", \"string2\", \"string3\"],\n    \"function.example\": [function1, function2, function3],\n}\n\n\nhyper = Hyperactive()\nhyper.add_search(parabola_function, search_space, n_iter=30)\nhyper.run()\nsearch_data = hyper.search_data(parabola_function)\n\ncollector.save(search_data)  # save a dataframe instead of appending a dictionary\n\nsearch_data = collector.load()  # load data\n\nprint(\n    \"\\n In this dataframe the 'function.example'-column contains strings, which are the '__name__' of the functions. \\n search_data \\n \",\n    search_data,\n    \"\\n\",\n)\n\nsearch_data = collector.load(search_space)  # load data with search-space\n\nprint(\n    print(\n        \"\\n In this dataframe the 'function.example'-column contains the functions again. \\n search_data \\n \",\n        search_data,\n        \"\\n\",\n    )\n)\n```\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonblanke%2Fsearch-data-collector","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsimonblanke%2Fsearch-data-collector","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsimonblanke%2Fsearch-data-collector/lists"}