{"id":22493211,"url":"https://github.com/negrinho/research_toolbox","last_synced_at":"2025-09-06T01:33:04.421Z","repository":{"id":57461407,"uuid":"121714230","full_name":"negrinho/research_toolbox","owner":"negrinho","description":"Utilities to help manage a machine learning experimental workflow","archived":false,"fork":false,"pushed_at":"2021-07-31T11:41:09.000Z","size":268,"stargazers_count":19,"open_issues_count":0,"forks_count":5,"subscribers_count":8,"default_branch":"master","last_synced_at":"2025-08-03T01:32:10.551Z","etag":null,"topics":["machine-learning","research-data-management","utilities","utility-library","workflow-management"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/negrinho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2018-02-16T04:20:34.000Z","updated_at":"2024-02-29T04:50:34.000Z","dependencies_parsed_at":"2022-09-26T17:40:43.615Z","dependency_job_id":null,"html_url":"https://github.com/negrinho/research_toolbox","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/negrinho/research_toolbox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/negrinho%2Fresearch_toolbox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/negrinho%2Fresearch_toolbox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/negrinho%2Fresearch_toolbox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/negrinho%2Fresearch_toolbox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/negrinho","download_url":"https://codeload.github.com/negrinho/research_toolbox/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/negrinho%2Fresearch_toolbox/sbom","scorecard":{"id":678695,"data":{"date":"2025-08-11","repo":{"name":"github.com/negrinho/research_toolbox","commit":"c99aac302ba427269c07ccf25369eb4d552cac95"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.5,"checks":[{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Code-Review","score":0,"reason":"Found 0/30 approved changesets -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"SAST","score":0,"reason":"no SAST tool detected","details":["Warn: no pull requests merged into dev branch"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":10,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Info: FSF or OSI recognized license: MIT License: LICENSE:0"],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}}]},"last_synced_at":"2025-08-21T22:24:04.265Z","repository_id":57461407,"created_at":"2025-08-21T22:24:04.265Z","updated_at":"2025-08-21T22:24:04.265Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273846959,"owners_count":25178627,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-05T02:00:09.113Z","response_time":402,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","research-data-management","utilities","utility-library","workflow-management"],"created_at":"2024-12-06T18:34:20.150Z","updated_at":"2025-09-06T01:33:04.397Z","avatar_url":"https://github.com/negrinho.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"## Installation\n\n`pip install research_toolbox`\n\n## Motivation for a research toolbox\n\nThis repo contains a number of Python tools that I developed for doing experimental research in machine learning.\nIt includes a broad set of functionality:\n* file system manipulation, e.g., creating and deleting a file or a folder, checking if a given file or folder exists, and listing files or folders in a directory.\n* interacting with a remote server, e.g., synchronizing files and folders to and from the server, and running commands on the server from the local machine.\n* writing and reading simple file types, such as JSON, and pickle files.\n* creating folders with experiment configurations that can then be easily ran locally or on the server.\n* logging functionality for keeping track of important information when running code, e.g., memory usage or time since start.\n\nWhile Python has a broad set of functionality, using directly this functionality has an unnecessary high cognitive load because the functions necessary to implement the desired functionality are spread across multiple libraries and use different API and different design principles.\nExisting APIs are often unnecessarily flexible for the most common use-cases needed by a particular user.\nDeveloping your own wrapper APIs reduces cognitive load by making common use-cases more explicit.\nThese APIs are easy to use because they are high-level, coherent, and adjusted to the needs of that particular user.\nThese wrapper APIs can include high-level error-checking for each use-case, which would require considerably higher cognitive load to implement from scratch using existing APIs.\n\nI am not claiming that this library solves all problems that you may have.\nI am suggesting that creating and maintaining your own research toolbox is convenient and should lead to being able to get things done faster and an overall more pleasant experience.\nI recommend extending this toolbox or develop your own to suit your needs.\nThis library is work in progress.\nThe ultimate goal is to go from research idea to results as fast as possible.\n\n## File description\n* [tb_augmentation.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_augmentation.py):\n    simple data augmentation.\n* [tb_data.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_data.py):\n    data loaders and data related functionality.\n* [tb_debugging.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_debugging.py):\n    error checking and debugging functionality .\n* [tb_experiments.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_experiments.py):\n    writing experiment folders with configurations for running different experiments.\n* [tb_filesystem.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_filesystem.py):\n    creating, copying, and testing for existence of files and folders.\n* [tb_interact.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_interact.py):\n    interactive commands for running jobs on the server or locally.\n* [tb_io.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_io.py):\n    reading and writing simple file types.\n* [tb_logging.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_logging.py):\n    common logging funtionality.\n* [tb_plotting.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_plotting.py):\n    wrappers around plotting libraries such as matplotlib to make simple plot generation easier.\n* [tb_preprocessing.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_preprocessing.py):\n    simple preprocessing functionality for going from raw data to data that is more ameanable for the application of machine learning.\n* [tb_project.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_project.py):\n    creation of the typical project structure for a machine learning project.\n* [tb_random.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_random.py):\n    simple random functionality for shuffling, sorting, and sampling.\n* [tb_remote.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_remote.py):\n    interaction with remote servers, such as syncing folders to and from the local machine, and submitting jobs to the server.\n* [tb_resource.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_resource.py):\n    getting information about available resources in a machine, such as the number of CPUs or GPUS.\n* [tb_training.py](https://github.com/negrinho/research_toolbox/blob/master/research_toolbox/tb_training.py):\n    learning rate schedules and additional logic that is often employed in training machine learning models such as saving and loading the best model found during training.\n\n## Example code\n```python\n### retrieving certain keys from a dictionary (example from tb_utils.py)\ndef subset_dict_via_selection(d, ks):\n    return {k : d[k] for k in ks}\n\n### sorting and randomness tools (examples from tb_random.py)\ndef argsort(xs, fns, increasing=True):\n    \"\"\"The functions in fns are used to compute a key which are then used to\n    construct a tuple which is then used to sort. The earlier keys are more\n    important than the later ones.\n    \"\"\"\n    def key_fn(x):\n        return tuple([f(x) for f in fns])\n\n    idxs, _ = tb_ut.zip_toggle(\n        sorted(enumerate(xs),\n            key=lambda x: key_fn(x[1]),\n            reverse=not increasing))\n    return idxs\n\ndef sort(xs, fns, increasing=True):\n    idxs = argsort(xs, fns, increasing)\n    return apply_permutation(xs, idxs)\n\ndef apply_permutation(xs, idxs):\n    assert len(set(idxs).intersection(range(len(xs)))) == len(xs)\n    return [xs[i] for i in idxs]\n\ndef apply_inverse_permutation(xs, idxs):\n    assert len(set(idxs).intersection(range(len(xs)))) == len(xs)\n\n    out_xs = [None] * len(xs)\n    for i_from, i_to in enumerate(idxs):\n        out_xs[i_to] = xs[i_from]\n    return out_xs\n\ndef shuffle_tied(xs_lst):\n    assert len(xs_lst) \u003e 0 and len(map(len, xs_lst)) == 1\n\n    n = len(xs_lst[0])\n    idxs = random_permutation(n)\n    ys_lst = [apply_permutation(xs, idxs) for xs in xs_lst]\n    return ys_lst\n\n### io tools (examples from tb_io.py)\ndef read_textfile(filepath, strip=True):\n    with open(filepath, 'r') as f:\n        lines = f.readlines()\n        if strip:\n            lines = [line.strip() for line in lines]\n        return lines\n\ndef write_textfile(filepath, lines, append=False, with_newline=True):\n    mode = 'a' if append else 'w'\n\n    with open(filepath, mode) as f:\n        for line in lines:\n            f.write(line)\n            if with_newline:\n                f.write(\"\\n\")\n\ndef read_jsonfile(filepath):\n    with open(filepath, 'r') as f:\n        d = json.load(f)\n        return d\n\ndef write_jsonfile(d, filepath, sort_keys=False):\n    with open(filepath, 'w') as f:\n        json.dump(d, f, indent=4, sort_keys=sort_keys)\n\ndef read_picklefile(filepath):\n    with open(filepath, 'rb') as f:\n        return pickle.load(f)\n\ndef write_picklefile(x, filepath):\n    with open(filepath, 'wb') as f:\n        pickle.dump(x, f)\n\n### path tools (examples from tb_filesystem.py)\ndef path_prefix(path):\n    return os.path.split(path)[0]\n\ndef path_last_element(path):\n    return os.path.split(path)[1]\n\ndef path_relative_to_absolute(path):\n    return os.path.abspath(path)\n\ndef path_exists(path):\n    return os.path.exists(path)\n\ndef file_exists(path):\n    return os.path.isfile(path)\n\ndef folder_exists(path):\n    return os.path.isdir(path)\n\ndef create_file(filepath,\n        abort_if_exists=True, create_parent_folders=False):\n    assert create_parent_folders or folder_exists(path_prefix(filepath))\n    assert not (abort_if_exists and file_exists(filepath))\n\n    if create_parent_folders:\n        create_folder(path_prefix(filepath),\n            abort_if_exists=False, create_parent_folders=True)\n\n    with open(filepath, 'w'):\n        pass\n\ndef create_folder(folderpath,\n        abort_if_exists=True, create_parent_folders=False):\n    assert not file_exists(folderpath)\n    assert create_parent_folders or folder_exists(path_prefix(folderpath))\n    assert not (abort_if_exists and folder_exists(folderpath))\n\n    if not folder_exists(folderpath):\n        os.makedirs(folderpath)\n\ndef copy_file(src_filepath, dst_filepath,\n        abort_if_dst_exists=True, create_parent_folders=False):\n    assert file_exists(src_filepath)\n    assert src_filepath != dst_filepath\n    assert not (abort_if_dst_exists and file_exists(dst_filepath))\n\n    src_filename = path_last_element(src_filepath)\n    dst_folderpath = path_prefix(dst_filepath)\n    dst_filename = path_last_element(dst_filepath)\n\n    assert create_parent_folders or folder_exists(dst_folderpath)\n    if not folder_exists(dst_folderpath):\n        create_folder(dst_folderpath, create_parent_folders=True)\n\n    shutil.copyfile(src_filepath, dst_filepath)\n\ndef copy_folder(src_folderpath, dst_folderpath,\n        ignore_hidden_files=False, ignore_hidden_folders=False, ignore_file_exts=None,\n        abort_if_dst_exists=True, create_parent_folders=False):\n    assert folder_exists(src_folderpath)\n    assert src_folderpath != dst_folderpath\n    assert not (abort_if_dst_exists and folder_exists(dst_folderpath))\n\n    if (not abort_if_dst_exists) and folder_exists(dst_folderpath):\n        delete_folder(dst_folderpath, abort_if_nonempty=False)\n\n    pref_dst_fo = path_prefix(dst_folderpath)\n    assert create_parent_folders or folder_exists(pref_dst_fo)\n    create_folder(dst_folderpath, create_parent_folders=create_parent_folders)\n\n    # create all folders in the destination.\n    args = subset_dict_via_selection(locals(),\n        ['ignore_hidden_folders', 'ignore_hidden_files'])\n    fos = list_folders(src_folderpath, use_relative_paths=True, recursive=True, **args)\n\n    for fo in fos:\n        fo_path = join_paths([dst_folderpath, fo])\n        create_folder(fo_path, create_parent_folders=True)\n\n    # copy all files to the destination.\n    args = subset_dict_via_selection(locals(),\n        ['ignore_hidden_folders', 'ignore_hidden_files', 'ignore_file_exts'])\n    fis = list_files(src_folderpath, use_relative_paths=True, recursive=True, **args)\n\n    for fi in fis:\n        src_fip = join_paths([src_folderpath, fi])\n        dst_fip = join_paths([dst_folderpath, fi])\n        copy_file(src_fip, dst_fip)\n\ndef delete_file(filepath, abort_if_notexists=True):\n    assert file_exists(filepath) or (not abort_if_notexists)\n\n    if file_exists(filepath):\n        os.remove(filepath)\n\ndef delete_folder(folderpath, abort_if_nonempty=True, abort_if_notexists=True):\n    assert folder_exists(folderpath) or (not abort_if_notexists)\n\n    if folder_exists(folderpath):\n        assert len(os.listdir(folderpath)) == 0 or (not abort_if_nonempty)\n        shutil.rmtree(folderpath)\n    else:\n        assert not abort_if_notexists\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnegrinho%2Fresearch_toolbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fnegrinho%2Fresearch_toolbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fnegrinho%2Fresearch_toolbox/lists"}