{"id":18673791,"url":"https://github.com/opencsgs/csghub-sdk","last_synced_at":"2025-04-12T01:31:59.458Z","repository":{"id":244826823,"uuid":"814736809","full_name":"OpenCSGs/csghub-sdk","owner":"OpenCSGs","description":"The CSGHub SDK is a powerful Python client specifically designed to interact seamlessly with the CSGHub server. This toolkit is engineered to provide Python developers with an efficient and straightforward method to operate and manage remote CSGHub instances. ","archived":false,"fork":false,"pushed_at":"2025-03-18T00:32:50.000Z","size":119,"stargazers_count":14,"open_issues_count":8,"forks_count":7,"subscribers_count":6,"default_branch":"main","last_synced_at":"2025-03-25T21:38:17.257Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/OpenCSGs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-06-13T15:44:12.000Z","updated_at":"2025-03-13T10:11:20.000Z","dependencies_parsed_at":"2024-06-17T18:43:20.742Z","dependency_job_id":"896b1da7-143f-462c-a631-8f3d59f2ccaf","html_url":"https://github.com/OpenCSGs/csghub-sdk","commit_stats":null,"previous_names":["opencsgs/csghub-sdk"],"tags_count":16,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsghub-sdk","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsghub-sdk/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsghub-sdk/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/OpenCSGs%2Fcsghub-sdk/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/OpenCSGs","download_url":"https://codeload.github.com/OpenCSGs/csghub-sdk/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248504291,"owners_count":21115142,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-07T09:16:33.996Z","updated_at":"2025-04-12T01:31:59.445Z","avatar_url":"https://github.com/OpenCSGs.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"left\"\u003e\n    English ｜ \u003ca href=\"https://github.com/OpenCSGs/csghub-sdk/blob/main/README_cn.md\"\u003e中文\u003c/a\u003e\n\u003c/p\u003e\n\n# CSGHub SDK\n## Introduction\n\nThe CSGHub SDK is a powerful Python client specifically designed to interact seamlessly with the CSGHub server. This toolkit is engineered to provide Python developers with an efficient and straightforward method to operate and manage remote CSGHub instances. Whether you're looking to automate tasks, manage data, or integrate CSGHub functionalities into your Python applications, the CSGHub SDK offers a comprehensive set of features to accomplish your goals with ease.\n\n## Key Features\n\nWith just a few lines of code, you can seamlessly and quickly switch the model download URL to [OpenCSG](https://opencsg.com/), [enhancing the download speed of models](#quickly-switch-download-urls).\n\nEffortlessly connect and interact with CSGHub server instances from your Python code.\n\nComprehensive API Coverage: Full access to the wide array of functionalities provided by the CSGHub server, ensuring you can perform a broad spectrum of operations.\n\nUser-Friendly: Designed with simplicity in mind, making it accessible for beginners while powerful enough for advanced users.\n\nEfficient Data Management: Streamline the process of managing and manipulating data on your CSGHub server.\n\nAutomation Ready: Automate repetitive tasks and processes, saving time and reducing the potential for human error.\n\nOpen Source: Dive into the source code, contribute, and customize the SDK to fit your specific needs.\n\nThe main functions are:\n\n1. Repo downloading（model/dataset）\n2. Repo information query（Compatible with huggingface）\n\n## Get My Token\n\nVisit [OpenCSG](https://opencsg.com/), click on Sign Up in the top right corner to complete the user registration process. Use the successfully registered username and password to log in to [OpenCSG](https://opencsg.com/). After logging in, find [Access Token](https://opencsg.com/settings/access-token) under Account Settings to obtain the token.\n\n## Getting Started\n\nTo get started with the CSGHub SDK, ensure you have Python installed on your system. Then, you can install the SDK using pip:\n\n```python\npip install csghub-sdk\n```\n\nAfter installation, you can begin using the SDK to connect to your CSGHub server by importing it into your Python script:\n\n```python\nimport os \nfrom pycsghub.repo_reader import AutoModelForCausalLM, AutoTokenizer\n\nos.environ['CSG_TOKEN'] = 'your_access_token'\n\nmid = 'OpenCSG/csg-wukong-1B'\nmodel = AutoModelForCausalLM.from_pretrained(mid)\ntokenizer = AutoTokenizer.from_pretrained(mid)\n\ninputs = tokenizer.encode(\"Write a short story\", return_tensors=\"pt\")\noutputs = model.generate(inputs)\nprint('result: ',tokenizer.batch_decode(outputs))\n```\n\n### Quickly switch download URLs\n\nBy simply changing the import package name from `transformers` to `pycsghub.repo_reader` and setting the download token, you can seamlessly and quickly switch the model download URL.\n\n```python\nos.environ['CSG_TOKEN'] = 'token-of-your'\nfrom pycsghub.repo_reader import AutoModelForCausalLM, AutoTokenizer\n```\n\n### Install from source code\n\n```shell\ngit clone https://github.com/OpenCSGs/csghub-sdk.git\ncd csghub-sdk\npip install .\n```\n\nYou can install the dependencies related to the model and dataset using `pip install '.[train]'`, for example:\n\n```shell\npip install '.[train]'\n```\n\n## Use cases of command line\n\n```shell\nexport CSG_TOKEN=your_access_token\n\n# download model\ncsghub-cli download wanghh2000/myprivate1\n\n# download model with allow patterns '*.json' and ignore '*_config.json' pattern of files\ncsghub-cli download wanghh2000/myprivate1 --allow-patterns \"*.json\" --ignore-patterns \"*_config.json\"\n\n# download dataset\ncsghub-cli download wanghh2000/myds1 -t dataset\n\n# upload a single file to folder1\ncsghub-cli upload wanghh2000/myprivate1 abc/3.txt folder1\n\n# upload local folder '/Users/hhwang/temp/jsonl' to root path of repo 'wanghh2000/m01' with default branch\ncsghub-cli upload wanghh2000/m01 /Users/hhwang/temp/jsonl\n\n# upload local folder '/Users/hhwang/temp/jsonl' to root path of repo 'wanghh2000/m04' with token 'xxxxxx' and v2 branch\ncsghub-cli upload wanghh2000/m04 /Users/hhwang/temp/jsonl -k xxxxxx --revision v2\n\n# upload local folder '/Users/hhwang/temp/jsonl' to path 'test/files' of repo 'wanghh2000/m01' with branch v1\ncsghub-cli upload wanghh2000/m01 /Users/hhwang/temp/jsonl test/files --revision v1\n\n# upload local folder '/Users/hhwang/temp/jsonl' to path 'test/files' of repo 'wanghh2000/m01' with token 'xxxxxx'\ncsghub-cli upload wanghh2000/m01 /Users/hhwang/temp/jsonl test/files -k xxxxxx\n```\n\nNotes: `csghub-cli upload` will create repo and its branch if they do not exist. The default branch is `main`. If you want to upload to a specific branch, you can use the `--revision` option. If the branch does not exist, it will be created. If the branch already exists, the files will be uploaded to that branch. \n\nDownload location is `~/.cache/csg/` by default.\n\n## Use cases of SDK\n\nFor more detailed instructions, including API documentation and usage examples, please refer to the Use case.\n\n### Download model\n\n```python\nfrom pycsghub.snapshot_download import snapshot_download\ntoken = \"your_access_token\"\n\nendpoint = \"https://hub.opencsg.com\"\nrepo_id = 'OpenCSG/csg-wukong-1B'\ncache_dir = '/Users/hhwang/temp/'\nresult = snapshot_download(repo_id, cache_dir=cache_dir, endpoint=endpoint, token=token)\n```\n\n### Download model with allow patterns '*.json' and ignore '*_config.json' pattern of files\n\n```python\nfrom pycsghub.snapshot_download import snapshot_download\ntoken = \"your_access_token\"\n\nendpoint = \"https://hub.opencsg.com\"\nrepo_id = 'OpenCSG/csg-wukong-1B'\ncache_dir = '/Users/hhwang/temp/'\nallow_patterns = [\"*.json\"]\nignore_patterns = [\"*_config.json\"]\nresult = snapshot_download(repo_id, cache_dir=cache_dir, endpoint=endpoint, token=token, allow_patterns=allow_patterns, ignore_patterns=ignore_patterns)\n```\n\n### Download dataset \n```python\nfrom pycsghub.snapshot_download import snapshot_download\ntoken=\"xxxx\"\nendpoint = \"https://hub.opencsg.com\"\nrepo_id = 'AIWizards/tmmluplus'\nrepo_type=\"dataset\"\ncache_dir = '/Users/xiangzhen/Downloads/'\nresult = snapshot_download(repo_id, repo_type=repo_type, cache_dir=cache_dir, endpoint=endpoint, token=token)\n```\n\n### Download single file\n\nUse `http_get` function to download single file\n\n```python\nfrom pycsghub.file_download import http_get\ntoken = \"your_access_token\"\n\nurl = \"https://hub.opencsg.com/api/v1/models/OpenCSG/csg-wukong-1B/resolve/tokenizer.model\"\nlocal_dir = '/home/test/'\nfile_name = 'test.txt'\nheaders = None\ncookies = None\nhttp_get(url=url, token=token, local_dir=local_dir, file_name=file_name, headers=headers, cookies=cookies)\n```\n\nuse `file_download` function to download single file from a repository\n\n```python\nfrom pycsghub.file_download import file_download\ntoken = \"your_access_token\"\n\nendpoint = \"https://hub.opencsg.com\"\nrepo_id = 'OpenCSG/csg-wukong-1B'\ncache_dir = '/home/test/'\nresult = file_download(repo_id, file_name='README.md', cache_dir=cache_dir, endpoint=endpoint, token=token)\n```\n\n### Upload file\n\n```python\nfrom pycsghub.file_upload import http_upload_file\n\ntoken = \"your_access_token\"\n\nendpoint = \"https://hub.opencsg.com\"\nrepo_type = \"model\"\nrepo_id = 'wanghh2000/myprivate1'\nresult = http_upload_file(repo_id, endpoint=endpoint, token=token, repo_type='model', file_path='test1.txt')\n```\n\n### Upload multi-files\n\n```python\nfrom pycsghub.file_upload import http_upload_file\n\ntoken = \"your_access_token\"\n\nendpoint = \"https://hub.opencsg.com\"\nrepo_type = \"model\"\nrepo_id = 'wanghh2000/myprivate1'\n\nrepo_files = [\"1.txt\", \"2.txt\"]\nfor item in repo_files:\n    http_upload_file(repo_id=repo_id, repo_type=repo_type, file_path=item, endpoint=endpoint, token=token)\n```\n\n### Upload the local path to repo\n\nBefore starting, please make sure you have Git-LFS installed (see [here](https://git-lfs.github.com/) for installation instructions).\n\n```python\nfrom pycsghub.repository import Repository\n\ntoken = \"your access token\"\n\nr = Repository(\n    repo_id=\"wanghh2003/ds15\",\n    upload_path=\"/Users/hhwang/temp/bbb/jsonl\",\n    user_name=\"wanghh2003\",\n    token=token,\n    repo_type=\"dataset\",\n)\n\nr.upload()\n```\n\n### Upload the local path to the specified path in the repo\n\nBefore starting, please make sure you have Git-LFS installed (see [here](https://git-lfs.github.com/) for installation instructions).\n\n```python\nfrom pycsghub.repository import Repository\n\ntoken = \"your access token\"\n\nr = Repository(\n    repo_id=\"wanghh2000/model01\",\n    upload_path=\"/Users/hhwang/temp/jsonl\",\n    path_in_repo=\"test/abc\",\n    user_name=\"wanghh2000\",\n    token=token,\n    repo_type=\"model\",\n    branch_name=\"v1\",\n)\n\nr.upload()\n```\n\n### Model loading compatible with huggingface\n\nThe transformers library supports directly inputting the repo_id from Hugging Face to download and load related models, as shown below:\n\n```python\nfrom transformers import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained('model/repoid')\n```\n\nIn this code, the Hugging Face Transformers library first downloads the model to a local cache folder, then reads the configuration, and loads the model by dynamically selecting the relevant class for instantiation.\n\nTo ensure compatibility with Hugging Face, version 0.2 of the CSGHub SDK now includes the most commonly features: downloading and loading models. Models can be downloaded and loaded as follows:\n\n```python\n# import os \n# os.environ['CSG_TOKEN'] = 'token_to_set'\nfrom pycsghub.repo_reader import AutoModelForCausalLM\nmodel = AutoModelForCausalLM.from_pretrained('model/repoid')\n```\n\nThis code: \n\n1. Use the `snapshot_download` from the CSGHub SDK library to download the related files.\n\n2. By generating batch classes dynamically and using class name reflection mechanism, a large number of classes with the same names as those automatically loaded by transformers are created in batches.\n\n3. Assign it with the from_pretrained method, so the model read out will be an hf-transformers model.\n\n## Roadmap\n\n1. Download repo file by CLI\n2. Interacting with CSGHub via command-line tools\n3. Management operations such as creation and modification of CSGHub repositories\n4. Model deployment locally or online\n5. Model fine-tuning locally or online\n6. Quick upload large folder to CSGHub\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fcsghub-sdk","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fopencsgs%2Fcsghub-sdk","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fopencsgs%2Fcsghub-sdk/lists"}