{"id":24668776,"url":"https://github.com/in-c0/updAPI","last_synced_at":"2025-10-08T05:31:51.012Z","repository":{"id":270826040,"uuid":"911569775","full_name":"in-c0/updAPI","owner":"in-c0","description":"Free, open-source collection of latest public API documentations - Update LLM's knowledge base with the latest API doc, policies, and community resources for enhanced context awareness (Note: The website is just a demonstration of what is possible with this open-source project. yet)","archived":false,"fork":false,"pushed_at":"2025-08-10T14:58:41.000Z","size":2144,"stargazers_count":15,"open_issues_count":1,"forks_count":12,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-09-02T23:47:10.086Z","etag":null,"topics":["api","api-docs","api-resources","beginner-friendly","first-contributions","first-project","first-timers-friendly","help-wanted","llm","mit-license","opensource","public-api-documentation","public-apis","scrapers","vector-database","vscode-extension"],"latest_commit_sha":null,"homepage":"https://updapi.com","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/in-c0.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-01-03T10:25:50.000Z","updated_at":"2025-08-27T01:23:58.000Z","dependencies_parsed_at":null,"dependency_job_id":"2bf927dd-2489-4e67-b3f7-d5d51fafe6a5","html_url":"https://github.com/in-c0/updAPI","commit_stats":null,"previous_names":["in-c0/updapi","updapi/updapi"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/in-c0/updAPI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/in-c0%2FupdAPI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/in-c0%2FupdAPI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/in-c0%2FupdAPI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/in-c0%2FupdAPI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/in-c0","download_url":"https://codeload.github.com/in-c0/updAPI/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/in-c0%2FupdAPI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":278892178,"owners_count":26063943,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-08T02:00:06.501Z","response_time":56,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api","api-docs","api-resources","beginner-friendly","first-contributions","first-project","first-timers-friendly","help-wanted","llm","mit-license","opensource","public-api-documentation","public-apis","scrapers","vector-database","vscode-extension"],"created_at":"2025-01-26T09:17:26.082Z","updated_at":"2025-10-08T05:31:51.006Z","avatar_url":"https://github.com/in-c0.png","language":"JavaScript","readme":"\n# UpdAPI 🔧\n\n### \"Update your knowledge base with the latest **API** resources\"\n### A free, lightweight tool to streamline the discovery of API documentation, policies, and community resources and enhancing LLMs with accurate, relevant context\n\n ![updAPI](https://github.com/user-attachments/assets/7a67269e-5ce0-480d-95d8-cd7dfe79ca87)\n\n[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)  \n[![Contributions Welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](#contributing)  \n[![Build Status](https://img.shields.io/badge/status-under_construction-orange.svg)](#status)  \n[![Open Issues](https://img.shields.io/github/issues/in-c0/updapi)](https://github.com/in-c0/updapi/issues)  \n\n---\n\u003e Like the project? Please give it a Star so it can reach more people \u003e\u003e\u003e\u003e\u003e [![Star on GitHub](https://img.shields.io/badge/⭐-Star_on_GitHub-blue?style=flat)](https://github.com/UpdAPI/updAPI/stargazers)\n\n\n\u003e ⚠️ **Under Construction**  \n\u003e This project is in the early stages of development and may not function as intended yet. Contributions, feedback, and ideas are highly welcome!\n\n## 📋 Links to Public API DOCS\n\n`api-docs-urls.csv` contains a **centralized collection of popular APIs** with **links to their official documentation and associated policies**. It includes tools to scrape, preprocess, and update the dataset for better usability and retrieval.\n\napi-docs-urls.csv:\n\n| API Name              | Official Documentation URL                           | Privacy Policy URL                 | Terms of Service URL         | Rate Limiting Policy URL               | Changelog/Release Notes URL             | Security Policy URL               | Developer Community/Forum URL           |\n|-----------------------|-----------------------------------------------------|------------------------------------|-------------------------------|----------------------------------------|------------------------------------------|-----------------------------------|------------------------------------------|\n| OpenAI API           | [Documentation](https://platform.openai.com/docs)   | [Privacy](https://openai.com/privacy) | [Terms](https://openai.com/terms) | [Rate Limits](https://platform.openai.com/docs/guides/rate-limits) | [Changelog](https://platform.openai.com/docs/release-notes) | [Security](https://openai.com/security) | [Community](https://community.openai.com/) |\n...\n\n\u003e ⚠️ **The URLs are auto-generated and require manual verification**  \n\u003e We aim to maintain these URLs to be pointing to the **current** document (TODO: Set up cron jobs/GitHub Actions to periodically re-run the scrapers and keep the dataset up-to-date)\n\n\n## 🛠 Adding More APIs to the Dataset\n\n### **Option 1: Manually Add to `api-docs-urls.csv`**\nYou can manually add new entries to `api-docs-urls.csv` with the following format:\n```csv\nAPI_Name,Official_Documentation_URL,Privacy_Policy_URL,Terms_of_Service_URL,Rate_Limiting_Policy_URL,Changelog_Release_Notes_URL,Security_Policy_URL,Developer_Community_Forum_URL\nExample API,https://example.com/docs,https://example.com/privacy,https://example.com/tos,https://example.com/rate-limits,https://example.com/changelog,https://example.com/security,https://example.com/community\n```\n\n### **Option 2: Combine Multiple CSV Files**\nIf you have additional entries in separate CSV files, use the provided Python utility script to merge them into the main dataset.\n\n#### Combine CSV Files\n1. Ensure you have Python installed.\n2. Run the script:\n   ```bash\n   python utils/combine_csv.py new_entries.csv api-docs-urls.csv combined_dataset.csv\n   ```\n3. Replace the existing `api-docs-urls.csv` with the new `combined_dataset.csv`.\n\n---\n\n## What can we do with the API URLs?\n\n**Use Case 1:**\n You can use the scrapers (fast-scraper.js or accurate-scraper.js) to extract content from API docs and enhance your LLM to provide specific and accurate answers about APIs\n\n**Workflow Example:**\n1. Retrieve relevant snippets with a custom script / Query the vector database for a user question\n2. Generate Answers with an LLM: Pass the retrieved snippets as context to the LLM (e.g., GPT-4 or LLaMA-2)\n\n   ```python\n   from transformers import AutoModelForCausalLM, AutoTokenizer\n   from faiss import read_index\n\n   # Load vector index\n   index = read_index('vector_index.faiss')\n\n   # Query embeddings\n   user_query = \"What are the rate limits for the OpenAI API?\"\n   query_embedding = model.encode(user_query)\n   _, indices = index.search(np.array([query_embedding]), k=5)\n\n   # Retrieve relevant chunks\n   context = \" \".join([documents[i] for i in indices[0]])\n\n   # Use an LLM to answer\n   model = AutoModelForCausalLM.from_pretrained('gpt-4')\n   tokenizer = AutoTokenizer.from_pretrained('gpt-4')\n\n   prompt = f\"Context: {context}\\nQuestion: {user_query}\\nAnswer:\"\n   inputs = tokenizer(prompt, return_tensors='pt')\n   outputs = model.generate(**inputs, max_new_tokens=200)\n   print(tokenizer.decode(outputs[0], skip_special_tokens=True))\n   ```\n\n**Use Case 2:**\nMaintain offline copies of API documentation for scenarios where internet access is unavailable or restricted. Offline access ensures reliability and speed when querying API documentation.\n\n**How?**\n- Use the scrapers to generate offline copies of the documentation in JSON, HTML, or Markdown formats.\n- Serve these copies locally or integrate them into a lightweight desktop or web application.\n\n\n**Use Case 3:**\nAPI documentation changes frequently, and outdated information can lead to bugs or misconfigurations. Automating change detection ensures your knowledge base remains up-to-date.\n\n**How?**\n- Compare the current version of a page with its previously saved version.\n- Use hashing (e.g., MD5) or diff-checking tools to detect changes in content.\n\n---\n\n## 🚀 How to Use the Scrapers\n\n\n\n### Check Python Version\n**Recommended Python Versions**: Python \u003e=3.7 and \u003c3.10\n\n  1. Check your Python version:\n     ```bash\n     python --version\n     ```\n  2. If your Python version is incompatible, you can:\n     - Install a compatible version (e.g., Python 3.9).\n     - Use a virtual environment:\n       ```bash\n       python3.9 -m venv venv\n       source venv/bin/activate  # Or venv\\Scripts\\activate on Windows\n       pip install -r requirements.txt\n       ```\n  3. Alternatively, use Conda to install PyTorch and its dependencies:\n     ```bash\n     conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia\n     ```\n\n\n\nWe provide two scraping tools to suit different needs:\n- **`fast-scraper.js`**: A lightweight Cheerio-based scraper for fast retrieval of static content.\n- **`accurate-scraper.js`**: A Playwright-based scraper for handling JavaScript-loaded pages and more dynamic content.\n\n\n### **1. `fast-scraper.js` (Cheerio-Based)**\n- **Purpose**: For quickly scraping static API documentation pages.\n- **Strengths**:\n  - Lightweight and fast.\n  - Suitable for pages without JavaScript content.\n- **Limitations**:\n  - Does not handle JavaScript-loaded content.\n\n#### Run the Script\n1. Install dependencies:\n   ```bash\n   npm install\n   ```\n2. Run the script:\n   ```bash\n   node fast-scraper.js\n   ```\n3. Results will be saved in `scraped_data_fast.json`.\n\n---\n\n### **2. `accurate-scraper.js` (Playwright-Based)**\n- **Purpose**: For scraping API documentation pages that rely on JavaScript for rendering.\n- **Strengths**:\n  - Handles dynamic content and JavaScript-loaded pages.\n  - More accurate for modern, interactive documentation sites.\n- **Limitations**:\n  - Slower compared to `fast-scraper.js`.\n\n#### Run the Script\n1. Install Playwright:\n   ```bash\n   npm install playwright\n   ```\n2. Run the script:\n   ```bash\n   node accurate-scraper.js\n   ```\n3. Results will be saved in `scraped_data_accurate.json`.\n\n---\n\n\n\n## 💡 How to Contribute\n\n\u003e For first time contributors, I recommend you to check out https://github.com/firstcontributions/first-contributions and https://www.youtube.com/watch?v=YaToH3s_-nQ\n\nContributions are welcome! Here's how you can contribute:\n\n\n1. **Add API Entries**:\n   - Add new API entries directly to `api-docs-urls.csv` or via pull request.\n   - Ensure URLs point to the **current version** of the documentation and policies.\n     \n2. **Verify API Entries**:\n   - Is the URL up-to-date?\n   - Is the URL root-level for the relevant page? (`api.com/docs/`, not `api.com/docs/nested`)\n   - Is the API doc public and does it comply with \"robots.txt\"?\n   - Does the URL provide all the expected information (changelogs, rate limits, etc) ?\n   - Is there any dynamically loaded page content that the scraper is able to extract?\n     \n3. **Improve Scrapers**:\n   - Enhance `fast-scraper.js` or `accurate-scraper.js` for better performance and compatibility.\n   - Add features like advanced error handling or field-specific scraping.\n     \n4. **Submit Pull Requests**:\n   - Fork the repository.\n   - Create a new branch for your changes.\n   - Submit a pull request for review.\n\nIf you're using the scripts, first install dependencies:\n```bash\nnpm install\npip install -r requirements.txt\n```\nThis installs everything listed in package.json and requirements.txt\n\n\n### 🚀 Roadmap Features\n- 🔍 **Search \u0026 Browse:** Easily find APIs by keyword or category  (e.g., \"Machine Learning APIs,\" \"Finance APIs\")  \n- 📄 **Latest API Metadata Retrieval:** Retrieve up-to-date API endpoints and parameters, directly from official documentation.\n- 🛠 **VS Code Integration:** Use the lightweight UpdAPI extension to search and retrieve APIs directly from your terminal.  \n\n---\n\n## 📜 License\n\nThis repository is licensed under the [MIT License](LICENSE).\n\n---\n\n## Status  \n\n### 🛠 Current Phase:  \n- **Under Construction:** We’re building the core MVP features and testing functionality.  \n\n### 📌 Known Issues:  \n- Limited API support.  \n- Some features may not work as expected.  \n\nCheck the [Open Issues](https://github.com/in-c0/updapi/issues) for more details.\n\n---\n\n## Roadmap  \n\n### ✅ MVP Goals  \n- Basic search and browse functionality.  \n- JSON exports for select APIs.  \n- Direct links to official API documentation.  \n\n### 🔜 Future Enhancements  \n- IDE integrations (e.g., VS Code plugin).  \n- API update notifications via email/webhooks.  \n- Support for more APIs.  \n\n---\n\n## Community  \n\n- [Discussions](https://github.com/in-c0/updapi/discussions)  \n- [Bug Reports](https://github.com/in-c0/updapi/issues)  \n\n---\n\n## ❤️ Acknowledgments\n\nWe thank all API providers for publishing robust documentation and fostering developer-friendly ecosystems. Your contributions make projects like this possible!\nSpecial thanks to:\n\n- [Crawlee](https://github.com/apify/crawlee): A powerful web scraping and crawling library that simplifies the extraction of structured data from websites.\n- [OpenAPI](https://github.com/APIs-guru/openapi-directory): For setting the standard in API specifications and enabling better interoperability and accessibility.\n\n\n## Questions?\n\nSend emails to support@updapi.com\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fin-c0%2FupdAPI","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fin-c0%2FupdAPI","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fin-c0%2FupdAPI/lists"}