{"id":27172305,"url":"https://github.com/happyhackingspace/ai-research-writers-tool","last_synced_at":"2025-04-09T09:36:08.954Z","repository":{"id":285762783,"uuid":"959256638","full_name":"HappyHackingSpace/AI-Research-Writers-Tool","owner":"HappyHackingSpace","description":"AI-powered tool for generating technical and academic research articles using arXiv papers and GPT models.","archived":false,"fork":false,"pushed_at":"2025-04-02T14:41:54.000Z","size":31029,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-02T15:24:17.909Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HappyHackingSpace.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-02T13:53:26.000Z","updated_at":"2025-04-02T14:41:59.000Z","dependencies_parsed_at":"2025-04-02T15:34:31.149Z","dependency_job_id":null,"html_url":"https://github.com/HappyHackingSpace/AI-Research-Writers-Tool","commit_stats":null,"previous_names":["happyhackingspace/ai-research-writers-tool"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HappyHackingSpace%2FAI-Research-Writers-Tool","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HappyHackingSpace%2FAI-Research-Writers-Tool/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HappyHackingSpace%2FAI-Research-Writers-Tool/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HappyHackingSpace%2FAI-Research-Writers-Tool/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HappyHackingSpace","download_url":"https://codeload.github.com/HappyHackingSpace/AI-Research-Writers-Tool/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248013605,"owners_count":21033383,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-04-09T09:36:08.309Z","updated_at":"2025-04-09T09:36:08.942Z","avatar_url":"https://github.com/HappyHackingSpace.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AI Research Writers Tool\n\nThis project is an AI-powered technical article generator that uses ArXiv papers as its knowledge base. The system consists of two main components:\n\n1. **arxivdatabase.py**: A script to fetch and store ArXiv papers in a vector database\n2. **arxivapp.py**: A Streamlit web application that generates technical articles based on user queries\n\n## Features\n\n- Fetches thousands of ArXiv papers across multiple computer science categories\n- Stores papers in a Chroma vector database with embeddings\n- Provides a user-friendly web interface to generate technical articles\n- Retrieves relevant research papers based on the user's topic\n- Generates structured academic papers with proper citations\n\n## Requirements\n\n- Python 3.8+\n- OpenAI API key\n- Internet connection (for fetching ArXiv data)\n\n## Installation\n\n1. Clone the repository:\n```bash\ngit clone https://github.com/HappyHackingSpace/AI-Research-Writers-Tool.git\ncd arxiv-research-tool\n```\n\n2. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n3. Create a `.env` file in the project root and add your OpenAI API key:\n```\nOPENAI_API_KEY=your_api_key_here\n```\n\n## Usage\n\n### Step 1: Build the Vector Database\n\nRun the database builder script to fetch ArXiv papers and create the vector database:\n\n```bash\npython arxivdatabase.py\n```\n\nThis process may take several hours depending on the number of papers you're fetching (default is 5000).\n\n\u003e **Note**: You can adjust the `num_papers` variable in the script to change the number of papers fetched per category.\n\n### Step 2: Launch the Web Application\n\nAfter building the database, run the Streamlit application:\n\n```bash\nstreamlit run arxivapp.py\n```\n\nThe web interface will open in your browser. Enter a topic in the text field and click \"Generate Article\" to create a technical article based on relevant ArXiv papers.\n\n## How It Works\n\n### arxivdatabase.py\n\n1. Fetches papers from ArXiv across multiple computer science categories\n2. Extracts title, authors, summary, and other metadata\n3. Splits the text into chunks suitable for embedding\n4. Creates embeddings using OpenAI's text-embedding-3-small model\n5. Stores the embedded documents in a Chroma vector database\n\n### arxivapp.py\n\n1. Accepts a topic from the user via the Streamlit interface\n2. Searches the vector database for relevant research papers\n3. Allows the user to select the number of references to use\n4. Generates a structured academic paper using OpenAI's GPT-4o model\n5. Displays the generated article with proper formatting\n\n## Configuration\n\nYou can modify the following parameters in the scripts:\n\n- `num_papers`: Number of papers to fetch per category (default: 5000)\n- `results_article`: Maximum number of results per ArXiv API request (default: 200)\n- `batch_size`: Number of documents to add to the vector database in each batch (default: 5000)\n- `chunk_size`: Size of text chunks for embedding (default: 2000)\n- `chunk_overlap`: Overlap between consecutive chunks (default: 100)\n- `model`: LLM model used for generation (default: \"gpt-4o\")\n- `temperature`: Creativity parameter for the LLM (default: 0.4)\n\n## Limitations\n\n- The ArXiv API has rate limits, so the fetching process includes sleep intervals\n- Generating articles for very niche topics may result in less relevant content\n- The quality of generated articles depends on the availability of relevant papers in the database\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhappyhackingspace%2Fai-research-writers-tool","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhappyhackingspace%2Fai-research-writers-tool","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhappyhackingspace%2Fai-research-writers-tool/lists"}