{"id":23906400,"url":"https://github.com/bgorlick/getai","last_synced_at":"2025-09-10T13:32:05.424Z","repository":{"id":235778682,"uuid":"782016568","full_name":"bgorlick/getai","owner":"bgorlick","description":"GetAI - The Easiest to Use AI Model Search \u0026 Download API and Interactive Console Tool","archived":false,"fork":false,"pushed_at":"2024-06-19T00:40:36.000Z","size":67555,"stargazers_count":8,"open_issues_count":0,"forks_count":2,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-12T18:44:03.416Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/bgorlick.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-04-04T13:31:20.000Z","updated_at":"2024-11-16T05:24:54.000Z","dependencies_parsed_at":"2024-04-24T15:22:14.890Z","dependency_job_id":"eabdb03c-d4b5-442d-8e0e-11019f280758","html_url":"https://github.com/bgorlick/getai","commit_stats":null,"previous_names":["bgorlick/getai"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgorlick%2Fgetai","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgorlick%2Fgetai/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgorlick%2Fgetai/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/bgorlick%2Fgetai/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/bgorlick","download_url":"https://codeload.github.com/bgorlick/getai/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":232538657,"owners_count":18538735,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-01-05T02:01:06.216Z","updated_at":"2025-01-05T02:02:19.736Z","avatar_url":"https://github.com/bgorlick.png","language":"Python","funding_links":[],"categories":["Python"],"sub_categories":[],"readme":"\n# GetAI - The Easiest to Use AI Model Search \u0026 Download API and Interactive Console Tool\n\n![PyPI](https://img.shields.io/pypi/v/getai) ![Pylint](https://github.com/bgorlick/getai/actions/workflows/pylint.yml/badge.svg)\n\nGetAI is a powerful API library and command-line tool for AI Models and Datasets. It simplifies the process of searching, downloading, and exploring AI models and datasets from various sources like Hugging Face and other platforms. With GetAI, you can easily find and download the models and datasets you need with a simple import statement and minimal lines of code, without the hassle of navigating through multiple websites and repositories.\n\n- **Easy to Download and Use**\n  - Many tools force you into their controlled ecosystem. GetAI liberates you and your AI agents.\n  - Install: `pip install getai`\n  - Search Models: `getai search model \u003cquery\u003e`\n  - Search Datasets: `getai search dataset \u003cquery`\n  - Download a Model: `getai model author/model_name`  Working Example: `getai model meta-llama/Llama-2-7b-hf`\n \n- **Two lines of code to add AI Model Search or Download) to Your Project**\n  ```python\n  from getai import search_datasets, download_dataset; import asyncio\n  asyncio.run(search_datasets(\"sentiment analysis\", hf_token=None, max_connections=5, output_dir=\"datasets\"))\n  ```\n- **Interactive Search for Models and Datasets**\n  - Powerful fully interactive console UX for search (showing sizes, branches, last updated, and more)\n  - Accelerates navigation and model discovery\n\n- **Detailed Information**\n  - Displays sizes and last modified dates\n  - Enables sorting results in various ways\n\n- **Future-Ready Design**\n  - Quickly find the most current models and datasets\n  - Designed to simplify integration with your AI agents in searching and downloading datasets and models for fully autonomous projects\n\n## Table of Contents\n- [Why GetAI?](#why-getai)\n- [Features of GetAI](#features-of-getai)\n- [Installation](#installation)\n- [Usage](#usage)\n  - [Search for AI Models](#search-for-ai-models)\n  - [Search for Datasets](#search-for-datasets)\n  - [Download a Model](#download-a-model)\n  - [Download a Dataset](#download-a-dataset)\n- [Configuration](#configuration)\n- [Contributing](#contributing)\n- [License](#license)\n- [Support and Feedback](#support-and-feedback)\n- [Sources of Inspiration](#sources-of-inspiration)\n- [Using GetAI as a Library](#using-getai-as-a-library)\n\n## Why GetAI?\n\nThe advent of large language models has led many companies to release open-source versions of their pre-trained foundation models. This has enabled anyone to download and run AI models locally, rather than relying on third-party API services. However, the process of finding, downloading, and setting up these models can be cumbersome and not always straightforward, especially for new users.\n\nGetAI aims to simplify this process, not only for human developers but also for AI agents that need a simple tool to search and download models and datasets easily. With GetAI, you can quickly find the models and datasets you need, download them asynchronously, and start exploring and using them in your projects.\n\n## Features of GetAI\n\n- **Asynchronous Downloads**: GetAI allows you to download models asynchronously, making efficient use of network resources and saving you time.\n- **Searching for AI Models**: You can easily search for AI models using various filters such as name, last updated date, and other attributes. GetAI provides a user-friendly interface to find the models that best suit your needs.\n- **Searching for Datasets**: GetAI also enables you to search for datasets by name and download them easily. You can quickly find and access the datasets you require for training or evaluating your AI models.\n- **Multiple Sources**: GetAI supports downloading models and datasets from multiple sources, including Hugging Face, TensorFlow Hub, and other platforms. You can access a wide range of resources from different providers through a single tool.\n- **Flexible Configuration**: GetAI allows you to configure sources and authentication through a *`config.yaml`* file (default location: *`/home/.getai/config.yaml`*). You can easily set up your credentials and preferences to streamline your workflow.\n- **Interactive CLI**: GetAI provides an easy-to-use command-line interface with interactive features such as branch selection and progress display. You can navigate through the available options and monitor the download progress seamlessly.\n- **API Access**: Easily integrate GetAI functionalities into your own applications using the provided API.\n\n## Installation\n\nYou can install *`getai`* using pip:\n\n```bash\npip install getai\n```\n\n## Usage\n\nGetAI provides a simple and intuitive command-line interface. Here are some examples of how you can use GetAI:\n\n### Search for AI Models\n\n```bash\ngetai search model \u003cquery\u003e [--author \u003cauthor\u003e] [--filter \u003cfilter\u003e] [--sort \u003csort\u003e] [--direction \u003cdirection\u003e] [--limit \u003climit\u003e] [--full]\n```\n\nThis command allows you to search for AI models based on the provided query. You can use various options to refine your search results:\n- *`--author`*: Filter models by author or organization.\n- *`--filter`*: Filter models based on tags.\n- *`--sort`*: Property to use when sorting models.\n- *`--direction`*: Direction in which to sort models.\n- *`--limit`*: Limit the number of models fetched.\n- *`--full`*: Fetch full model information.\n\nExample:\n\n```bash\ngetai search model \"text-generation\" --sort downloads --direction -1 --limit 10\n```\n\nSample output:\n```\nSearch results for 'text-generation' (Page 1 of 1, Total: 10):\n1. gpt2 by OpenAI (openai/gpt2) (Size: 548.09 MB)\n2. distilgpt2 by HuggingFace (distilgpt2) (Size: 353.75 MB)\n3. gpt2-large by OpenAI (openai/gpt2-large) (Size: 1.50 GB)\n...\nEnter 'n' for the next page, 'p' for the previous page, 'f' to filter, 's' to sort, 'r' to return to previous search results, or the model number to download.\n```\n\n### Search for Datasets\n\n```bash\ngetai search dataset \u003cquery\u003e [--author \u003cauthor\u003e] [--filter \u003cfilter\u003e] [--sort \u003csort\u003e] [--direction \u003cdirection\u003e] [--limit \u003climit\u003e] [--full]\n```\n\nThis command enables you to search for datasets based on the provided query. You can use various options to refine your search results:\n- *`--author`*: Filter datasets by author or organization.\n- *`--filter`*: Filter datasets based on tags.\n- *`--sort`*: Property to use when sorting datasets.\n- *`--direction`*: Direction in which to sort datasets.\n- *`--limit`*: Limit the number of datasets fetched.\n- *`--full`*: Fetch full dataset information.\n\nExample:\n\n```bash\ngetai search dataset \"sentiment analysis\" --filter language:en --sort downloads --direction -1 --limit 5\n```\n\nSample output:\n```\nSearch results for 'sentiment analysis' (Page 1 of 1, Total: 5):\n1. imdb by andrew-maas (andrew-maas/imdb) (Size: 80.23 MB)\n2. twitter_sentiment by nlp-with-deeplearning (nlp-with-deeplearning/twitter_sentiment) (Size: 63.15 MB)\n3. sst2 by glue (glue/sst2) (Size: 7.09 MB)\n...\nEnter 'n' for the next page, 'p' for the previous page, 'f' to filter, 's' to sort, 'r' to return to previous search results, or the dataset number to download.\n```\n\n### Download a Model\n\n```bash\ngetai model \u003cidentifier\u003e [--branch \u003cbranch\u003e] [--output-dir \u003coutput-dir\u003e] [--max-retries \u003cmax-retries\u003e] [--max-connections \u003cmax-connections\u003e] [--clean] [--check]\n```\n\nThis command allows you to download a specific model by providing its identifier. You can use various options to customize the download process:\n- *`--branch`*: Specify a branch name or enable branch selection.\n- *`--output-dir`*: Directory to save the model.\n- *`--max-retries`*: Max retries for downloads.\n- *`--max-connections`*: Max simultaneous connections for downloads.\n- *`--clean`*: Start download from scratch.\n- *`--check`*: Validate the checksums of files after download.\n\nExample:\n\n```bash\ngetai model meta-llama/Llama-2-7b-hf --branch main --output-dir models/gpt2 --max-retries 3 --max-connections 5\n```\n\n### Download a Dataset\n\n```bash\ngetai dataset \u003cidentifier\u003e [--revision \u003crevision\u003e] [--output-dir \u003coutput-dir\u003e] [--max-retries \u003cmax-retries\u003e] [--max-connections \u003cmax-connections\u003e] [--full]\n```\n\nThis command enables you to download a specific dataset by providing its identifier. You can use various options to customize the download process:\n- *`--revision`*: Revision of the dataset.\n- *`--output-dir`*: Directory to save the dataset.\n- *`--max-retries`*: Max retries for downloads.\n- *`--max-connections`*: Max simultaneous connections for downloads.\n- *`--full`*: Fetch full dataset information.\n\nExample:\n\n```bash\ngetai dataset glue/sst2 --revision main --output-dir datasets/sst2 --max-retries 3 --max-connections 5\n```\n\nFor more detailed usage instructions and additional options, please refer to the GetAI documentation.\n\n## Configuration\n\nGetAI uses a *`config.yaml`* file to store configuration settings such as API tokens and other preferences. By default, the configuration file is located at *`/home/.getai/config.yaml`*. However, we recommend using the *`huggingface-cli login`* command to securely set up your Hugging Face token or setting the *`HF_TOKEN`* environment variable.\n\nExample of setting the environment variable:\n\n```bash\nexport HF_TOKEN=your_huggingface_token_here\n```\n\nHere's an example of a *`config.yaml`* file:\n\n```yaml\nhf_token: your_huggingface_token_here\n```\n\nReplace *`your_huggingface_token_here`* with your actual Hugging Face token.\n\n## Using GetAI as a Library\n\nGetAI isn't just a command-line tool; it's also a powerful Python library for searching and downloading datasets and models from Hugging Face. This guide shows you how to leverage GetAI's capabilities programmatically in your Python applications.\n\n### Installation\n\nFirst, install the GetAI package:\n\n```bash\npip install getai\n```\n\n### Usage Examples\n\nBelow are examples of how to use the main functions provided by GetAI. These examples demonstrate how to search for and download datasets and models programmatically.\n\n#### Searching for Datasets\n\nLet's say you're working on a project related to sentiment analysis and you want to find relevant datasets. Here's how you can do it:\n\n```python\nfrom getai import search_datasets\nimport asyncio\n\nasync def search_datasets_example():\n    await search_datasets(\n        query=\"sentiment analysis\",\n        hf_token=\"your_huggingface_token\",\n        max_connections=5,\n        output_dir=\"datasets\"\n    )\n\nif __name__ == \"__main__\":\n    asyncio.run(search_datasets_example())\n```\n\n#### Downloading a Dataset\n\nOnce you've found the dataset you need, downloading it is simple. For example, to download the SST-2 dataset from the GLUE benchmark:\n\n```python\nfrom getai import download_dataset\nimport asyncio\n\nasync def download_dataset_example():\n    await download_dataset(\n        identifier=\"stanfordnlp/imdb\",\n        hf_token=\"None\",\n        max_connections=5,\n        output_dir=\"datasets/stanfordnlp/imdb\"\n    )\n\nif __name__ == \"__main__\":\n    asyncio.run(download_dataset_example())\n```\n\n#### Searching for Models\n\nImagine you need a model for text generation. Here's how you can search for it:\n\n```python\nfrom getai import search_models\nimport asyncio\n\nasync def search_models_example():\n    await search_models(\n        query=\"text-generation\",\n        hf_token=\"your_huggingface_token\",\n        max_connections=5\n    )\n\nif __name__ == \"__main__\":\n    asyncio.run(search_models_example())\n```\n\n#### Downloading a Model\n\nAfter finding the model, downloading it is straightforward. For instance, to download the GPT-2 model:\n\n```python\nfrom getai import download_model\nimport asyncio\n\nasync def download_model_example():\n    await download_model(\n        identifier=\"openai/gpt2\",\n        branch=\"main\",\n        hf_token=\"your_huggingface_token\",\n        max_connections=5,\n        output_dir=\"models/gpt2\"\n    )\n\nif __name__ == \"__main__\":\n    asyncio.run(download_model_example())\n```\n\nReplace *`your_huggingface_token`* with your actual Hugging Face token.\n\n### Detailed Function Usage\n\nTo give you a deeper understanding, here are the detailed descriptions and usages of the core functions provided by GetAI.\n\n#### search_datasets\n\n```python\nasync def search_datasets(\n    query, hf_token=None, max_connections=5, output_dir=None, **kwargs\n):\n    \"\"\"\n    Search datasets on Hugging Face based on a query.\n    \n    Args:\n        query (str): The search query.\n        hf_token (str): Hugging Face token.\n        max_connections (int): Maximum number of concurrent connections.\n        output_dir (Path): Directory to save search results.\n        **kwargs: Additional keyword arguments for filtering search results.\n    \"\"\"\n```\n\n#### download_dataset\n\n```python\nasync def download_dataset(\n    identifier, hf_token=None, max_connections=5, output_dir=None, **kwargs\n):\n    \"\"\"\n    Download a dataset from Hugging Face by its identifier.\n    \n    Args:\n        identifier (str): The dataset identifier.\n        hf_token (str): Hugging Face token.\n        max_connections (int): Maximum number of concurrent connections.\n        output_dir (Path): Directory to save the dataset.\n        **kwargs: Additional keyword arguments for dataset download.\n    \"\"\"\n```\n\n#### search_models\n\n```python\nasync def search_models(\n    query, hf_token=None, max_connections=5, **kwargs\n):\n    \"\"\"\n    Search models on Hugging Face based on a query.\n    \n    Args:\n        query (str): The search query.\n        hf_token (str): Hugging Face token.\n        max_connections (int): Maximum number of concurrent connections.\n        **kwargs: Additional keyword arguments for filtering search results.\n    \"\"\"\n```\n\n#### download_model\n\n```python\nasync def download_model(\n    identifier, branch=\"main\", hf_token=None, max_connections=5, output_dir=None, **kwargs\n):\n    \"\"\"\n    Download a model from Hugging Face by its identifier and branch.\n    \n    Args:\n        identifier (str): The model identifier.\n        branch (str): The branch name.\n        hf_token (str): Hugging Face token.\n        max_connections (int): Maximum number of concurrent connections.\n        output_dir (Path): Directory to save the model.\n        **kwargs: Additional keyword arguments for model download.\n    \"\"\"\n```\n\n## Contributing\n\nContributions to GetAI are welcome! If you would like to contribute to the project, please follow the guidelines outlined in the *`CONTRIBUTING.md`* file. You can help improve GetAI by reporting issues, suggesting new features, or submitting pull requests.\n\n## License\n\nGetAI is released under the MIT License with attribution to the author, Ben Gorlick (github.com/bgorlick). Please see the *`LICENSE`* file for more details.\n\n(c) 2023-2024 Ben Gorlick github.com/bgorlick\n\n## Support and Feedback\n\nIf you encounter any issues, have questions, or would like to provide feedback, please open an issue on the GetAI GitHub repository. We appreciate your input and will do our best to assist you.\n\nThank you for using GetAI! We hope it simplifies your workflow and enhances your experience with AI models and datasets.\n\n## Sources of Inspiration\n\nThis project started as an attempt to create a completely asynchronous port of oobagooba's [text-generation-webui](https://github.com/oobabooga/text-generation-webui) model downloading script. His script at the time operated with a multithreaded design and I wanted to explore building an asynchronous version. Credits go entirely to him for the initial approach, pagination methods, and parsing logic for a variety of the file types.\n\n## Thank you to everyone who continues to advance the AI and ML development!\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgorlick%2Fgetai","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbgorlick%2Fgetai","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbgorlick%2Fgetai/lists"}