{"id":29873125,"url":"https://github.com/manmeetkaurbaxi/inboxpilot","last_synced_at":"2026-05-06T00:36:57.599Z","repository":{"id":303701473,"uuid":"1016391778","full_name":"manmeetkaurbaxi/InboxPilot","owner":"manmeetkaurbaxi","description":"Fast, smart, automated cold-outreach to companies/hiring managers","archived":false,"fork":false,"pushed_at":"2025-07-09T01:27:08.000Z","size":275,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-07-09T01:35:00.861Z","etag":null,"topics":["agentic-ai","beautifulsoup","chromadb","llama3","pydantic-ai","streamlit"],"latest_commit_sha":null,"homepage":"https://inboxpilot.streamlit.app/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/manmeetkaurbaxi.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-07-09T00:27:36.000Z","updated_at":"2025-07-09T01:27:11.000Z","dependencies_parsed_at":"2025-07-09T01:37:49.549Z","dependency_job_id":"a3c3f950-a4d5-402e-828b-2bdfe7bcf738","html_url":"https://github.com/manmeetkaurbaxi/InboxPilot","commit_stats":null,"previous_names":["manmeetkaurbaxi/inboxpilot"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/manmeetkaurbaxi/InboxPilot","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manmeetkaurbaxi%2FInboxPilot","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manmeetkaurbaxi%2FInboxPilot/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manmeetkaurbaxi%2FInboxPilot/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manmeetkaurbaxi%2FInboxPilot/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/manmeetkaurbaxi","download_url":"https://codeload.github.com/manmeetkaurbaxi/InboxPilot/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/manmeetkaurbaxi%2FInboxPilot/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267951736,"owners_count":24171092,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agentic-ai","beautifulsoup","chromadb","llama3","pydantic-ai","streamlit"],"created_at":"2025-07-30T22:16:01.636Z","updated_at":"2025-10-17T17:24:18.537Z","avatar_url":"https://github.com/manmeetkaurbaxi.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InboxPilot - CV Extractor \u0026 Email Generator\n\nA comprehensive, production-ready system for extracting CV data, parsing job descriptions, and generating personalized emails using AI agents built with PydanticAI and Groq LLM.\n\n## 📁 Project Structure\n\n```\n├── 📄 Core Application Files\n│   ├── main.py                    # Main Streamlit application entry point\n│   ├── config.py                  # Configuration and model settings\n│   ├── requirements.txt           # Python dependencies\n│   └── README.md                  # This documentation file\n│\n├── 🤖 AI Agent Modules\n│   ├── cv_extractor.py            # CV/Resume extraction agent and UI\n│   ├── job_parser.py              # Job description parser with web scraping\n│   ├── email_generator.py         # Email generator agent\n│   └── email_tracker.py           # Email tracking and analytics\n│\n├── 🗄️ Data Storage \u0026 Management\n│   ├── vector_store.py            # ChromaDB vector database operations\n│   ├── chroma_db/                 # ChromaDB database files\n│   │   ├── chroma.sqlite3         # Main database file\n│   │   └── [collection_ids]/      # Vector collections\n│   └── email_records.json         # Email tracking records\n│\n├── 🛠️ Utility \u0026 Support Files\n│   ├── error_handler.py           # Centralized error handling\n│   ├── debug_scraper.py           # Debug script for web scraping\n│   ├── check_email_config.py      # Email configuration validator\n│   └── run_tests.py               # Test runner for all components\n│\n├── 🧪 Test Scripts\n│   ├── test_scripts/\n│   │   ├── test_cv_extractor.py   # CV extraction tests\n│   │   ├── test_scraper.py        # Web scraping tests\n│   │   ├── test_link_extraction.py # Link extraction tests\n│   │   ├── test_setup.py          # Environment setup tests\n│   │   └── test_email_functionality.py # Email functionality tests\n│\n├── ⚙️ Setup \u0026 Configuration\n│   ├── setup_scripts/\n│   │   └── setup_env.py           # Environment setup automation\n│   └── TROUBLESHOOTING.md         # Detailed troubleshooting guide\n│\n└── 📊 Generated Files\n    ├── __pycache__/               # Python cache files\n    └── email_records.json         # Email tracking data\n```\n\n## 🚀 Features\n\n### 📄 CV/Resume Data Extraction\n\n- **AI-Powered Parsing**: Extract structured data from PDF resumes using Groq LLM\n- **Comprehensive Data**: Personal info, education, experience, volunteer work, skills, projects, awards, publications\n- **Manual Link Management**: Add social media profiles and GitHub repositories with username-based input\n- **Data Validation**: Detect placeholder/fake data and ensure quality\n- **Chronological Sorting**: All time-based data sorted from latest to oldest\n\n### 🔍 Job Description Parser\n\n- **Dual Input Methods**: Manual text input or automatic web scraping from job URLs\n- **Web Scraping**: Extract job information from major job sites using BeautifulSoup\n- **Structured Extraction**: Parse job descriptions into organized data using AI\n- **Comprehensive Job Data**: Title, company, location, skills, responsibilities, qualifications, benefits\n- **Duplicate Prevention**: Check if emails were already sent to prevent spam\n- **Email Tracking**: Monitor sent emails with statistics and success rates\n\n### 📧 Email Generator\n\n- **Personalized Content**: Generate emails using CV and job data for perfect matching\n- **Multiple Email Types**: Industry positions, academic research, freelance, networking\n- **Tone Customization**: Professional, friendly, confident, enthusiastic\n- **Smart Personalization**: Match skills, experience, and projects to job requirements\n- **Email Tracking**: Record sent emails with detailed metadata\n\n### 📊 Email Tracking System\n\n- **Duplicate Prevention**: Prevent sending multiple emails to same company/job\n- **Statistics Dashboard**: Track total emails, companies contacted, success rates\n- **Email History**: View recent emails with status and metadata\n- **Data Persistence**: Store email records in JSON format\n\n## 🏗️ Architecture\n\n### Technology Stack\n\n- **AI Framework**: PydanticAI for structured agents\n- **LLM**: Groq (Llama 3.3 70B Versatile)\n- **UI Framework**: Streamlit\n- **Data Models**: Pydantic for type safety\n- **PDF Processing**: PyPDF2\n- **Web Scraping**: BeautifulSoup4, Requests\n- **Vector Database**: ChromaDB for semantic search\n- **Data Storage**: JSON files for persistence\n\n### Core Components\n\n#### 1. **Main Application (`main.py`)**\n\n- Streamlit app entry point\n- Navigation and session management\n- Data flow coordination between modules\n\n#### 2. **CV Extractor (`cv_extractor.py`)**\n\n- PDF resume parsing with AI\n- Structured data extraction\n- Manual link management\n- Data validation and quality checks\n\n#### 3. **Job Parser (`job_parser.py`)**\n\n- Web scraping from job sites\n- Manual job description input\n- AI-powered job data extraction\n- URL validation and error handling\n\n#### 4. **Email Generator (`email_generator.py`)**\n\n- Personalized email generation\n- Multiple email types and tones\n- SMTP email sending integration\n- Email preview and management\n\n#### 5. **Vector Store (`vector_store.py`)**\n\n- ChromaDB integration for semantic search\n- CV and job data storage\n- Data retrieval and management\n- Collection management\n\n#### 6. **Email Tracker (`email_tracker.py`)**\n\n- Email history tracking\n- Duplicate prevention\n- Statistics and analytics\n- Data persistence\n\n## 🛠️ Installation\n\n1. **Clone the repository**:\n\n   ```bash\n   git clone \u003crepository-url\u003e\n   cd my_own\n   ```\n\n2. **Install dependencies**:\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. **Set up environment variables**:\n\n   ```bash\n   # Option 1: Create .env file (recommended for local development)\n   cp env_example.txt .env\n   # Edit .env with your Groq API key and email settings\n\n   # Option 2: Use Streamlit secrets (see SECRETS_MANAGEMENT.md for details)\n   ```\n\n4. **Run the application**:\n   ```bash\n   streamlit run main.py\n   ```\n\n## 📋 Usage Workflow\n\n### Step 1: CV Extractor\n\n1. **Upload CV/Resume**: Upload your PDF resume\n2. **Extract Data**: AI extracts structured information\n3. **Add Social Links**: Enter usernames for social platforms\n4. **Review \u0026 Download**: Verify extracted data and download\n\n### Step 2: Job Parser\n\n1. **Choose Input Method**:\n   - **Manual Text Input**: Paste job description text\n   - **Job URL Scraping**: Enter job posting URL for automatic extraction\n2. **Parse Job Data**: AI extracts structured job information\n3. **Check Duplicates**: System warns if email already sent\n4. **Review Details**: Verify extracted job requirements\n\n### Step 3: Email Generator\n\n1. **Configure Settings**: Choose email type and tone\n2. **Add Recipient Info**: Enter hiring manager details (optional)\n3. **Generate Email**: AI creates personalized email\n4. **Review \u0026 Send**: Preview, download, or mark as sent\n\n## 🌐 Supported Job Sites\n\nThe web scraper supports the following job posting sites:\n\n- **LinkedIn Jobs**: `linkedin.com/jobs`\n- **Indeed**: `indeed.com`\n- **Glassdoor**: `glassdoor.com`\n- **Monster**: `monster.com`\n- **CareerBuilder**: `careerbuilder.com`\n- **ZipRecruiter**: `ziprecruiter.com`\n- **Dice**: `dice.com`\n- **Angel.co**: `angel.co`\n- **Stack Overflow Jobs**: `stackoverflow.com/jobs`\n- **GitHub Jobs**: `github.com/jobs`\n- **Remote.co**: `remote.co`\n- **WeWorkRemotely**: `weworkremotely.com`\n- **FlexJobs**: `flexjobs.com`\n\n### Web Scraping Features\n\n- **Automatic Detection**: Validates URLs against supported job sites\n- **Smart Extraction**: Uses multiple selectors to find job information\n- **Fallback Methods**: Extracts from page title and URL if specific elements not found\n- **Error Handling**: Graceful handling of network issues and parsing errors\n- **User-Agent Spoofing**: Uses realistic browser headers to avoid blocking\n\n## 📊 Data Models\n\n### CV Data Structure\n\n```json\n{\n  \"name\": \"Full Name\",\n  \"email\": \"email@example.com\",\n  \"phone\": \"Phone Number\",\n  \"education\": [\"Degree, Institution, Year, GPA\"],\n  \"experience\": [\"Job Title, Company, Year\"],\n  \"volunteer\": [\"Volunteer Role, Organization, Year\"],\n  \"skills\": [\"Skill 1\", \"Skill 2\", \"Skill 3\"],\n  \"projects\": [\"Project Name, Year\"],\n  \"awards\": [\"Award Name, Year\"],\n  \"publications\": [\"Publication Title, Year\"],\n  \"summary\": \"Professional summary\",\n  \"manual_links\": {\n    \"LinkedIn\": \"https://linkedin.com/in/username\",\n    \"GitHub\": \"https://github.com/username\",\n    \"Portfolio\": \"https://portfolio.com\"\n  }\n}\n```\n\n### Job Data Structure\n\n```json\n{\n  \"job_title\": \"Software Engineer\",\n  \"company_name\": \"Tech Corp\",\n  \"location\": \"San Francisco, CA\",\n  \"job_type\": \"Full-time\",\n  \"experience_level\": \"Senior\",\n  \"required_skills\": [\"Python\", \"React\", \"AWS\"],\n  \"preferred_skills\": [\"Docker\", \"Kubernetes\"],\n  \"responsibilities\": [\"Develop features\", \"Code review\"],\n  \"qualifications\": [\"Bachelor's degree\", \"5+ years experience\"],\n  \"benefits\": [\"Health insurance\", \"401k\"],\n  \"salary_range\": \"$120k-$150k\",\n  \"industry\": \"Technology\",\n  \"department\": \"Engineering\",\n  \"remote_policy\": \"Hybrid\",\n  \"visa_sponsorship\": true,\n  \"summary\": \"Brief job summary\"\n}\n```\n\n### Email Record Structure\n\n```json\n{\n  \"id\": \"unique-identifier\",\n  \"job_title\": \"Software Engineer\",\n  \"company_name\": \"Tech Corp\",\n  \"recipient_email\": \"hiring@techcorp.com\",\n  \"recipient_name\": \"John Smith\",\n  \"sent_date\": \"2024-01-15T10:30:00\",\n  \"email_type\": \"Industry Position\",\n  \"status\": \"sent\",\n  \"cv_data_used\": {...},\n  \"job_data_used\": {...},\n  \"email_content\": \"Full email content\",\n  \"notes\": \"Additional notes\"\n}\n```\n\n## ⚙️ Configuration\n\n### Environment Variables\n\nCreate a `.env` file with the following variables:\n\n```bash\n# Required: Groq API Key\nGROQ_API_KEY=your_groq_api_key_here\n\n# Optional: Email Configuration\nEMAIL_ADDRESS=your.email@gmail.com\nEMAIL_PASSWORD=your_app_password\nSENDER_NAME=Your Name\n\n# Optional: Model Configuration\nGROQ_MODEL=llama-3.3-70b-versatile\n```\n\n### Model Configuration\n\nThe system uses Groq's Llama 3.3 70B model for accurate data extraction and email generation. You can modify the model in `config.py`:\n\n```python\nGROQ_MODEL = \"llama-3.3-70b-versatile\"\n```\n\n## 🧪 Testing\n\n### Test Runner\n\nUse the comprehensive test runner for all components:\n\n```bash\n# Run all tests\npython run_tests.py\n\n# Run specific test categories\npython run_tests.py cv          # CV extraction tests\npython run_tests.py scraper     # Web scraping tests\npython run_tests.py email       # Email functionality tests\npython run_tests.py setup       # Environment setup tests\npython run_tests.py email_config # Email configuration tests\n```\n\n### Individual Test Scripts\n\n```bash\n# CV extraction tests\npython test_scripts/test_cv_extractor.py\n\n# Web scraping tests\npython test_scripts/test_scraper.py\n\n# Link extraction tests\npython test_scripts/test_link_extraction.py\n\n# Setup tests\npython test_scripts/test_setup.py\n\n# Email functionality tests\npython test_scripts/test_email_functionality.py\n```\n\n### Debug Tools\n\n```bash\n# Debug web scraping\npython debug_scraper.py\n\n# Check email configuration\npython check_email_config.py\n```\n\n## 🔧 Key Features\n\n### Smart Personalization\n\n- **Skill Matching**: Automatically match CV skills to job requirements\n- **Experience Alignment**: Connect relevant experience to job responsibilities\n- **Project Highlighting**: Reference specific projects that align with job needs\n- **Social Integration**: Include relevant social links and portfolio\n\n### Web Scraping Capabilities\n\n- **Multi-Site Support**: Works with major job posting platforms\n- **Intelligent Parsing**: Uses multiple strategies to extract job information\n- **Robust Error Handling**: Graceful degradation when scraping fails\n- **Rate Limiting**: Respectful scraping with delays and user-agent headers\n\n### Email Tracking \u0026 Analytics\n\n- **Duplicate Prevention**: 30-day cooldown for same company/job combinations\n- **Success Metrics**: Track delivery, open, and reply rates\n- **Company Analytics**: Monitor engagement across different companies\n- **Email History**: Complete audit trail of all sent emails\n\n### Production-Ready Features\n\n- **Error Handling**: Comprehensive error handling and user feedback\n- **Data Validation**: Input validation and quality checks\n- **Session Management**: Persistent data across application sessions\n- **Modular Architecture**: Easy to extend and maintain\n\n## 🎯 Use Cases\n\n### Job Seekers\n\n- **Mass Applications**: Efficiently apply to multiple positions\n- **Personalized Outreach**: Stand out with tailored emails\n- **Application Tracking**: Monitor all applications in one place\n- **Performance Analytics**: Track which approaches work best\n\n### Recruiters\n\n- **Candidate Assessment**: Quickly parse and evaluate CVs\n- **Job Description Analysis**: Extract key requirements and benefits\n- **Communication Tracking**: Monitor outreach effectiveness\n\n### Career Coaches\n\n- **Client Support**: Help clients create personalized applications\n- **Strategy Development**: Analyze successful application patterns\n- **Performance Tracking**: Monitor client application success rates\n\n## 🚀 Advanced Features\n\n### Email Templates\n\n- **Industry-Specific**: Templates for different job types\n- **Tone Variations**: Professional, friendly, confident, enthusiastic\n- **Customization**: Easy to modify and extend templates\n\n### Analytics Dashboard\n\n- **Success Metrics**: Track email performance\n- **Company Insights**: Analyze which companies respond best\n- **Timing Analysis**: Optimize send times for better response rates\n\n### Integration Ready\n\n- **API Endpoints**: RESTful API for external integrations\n- **Webhook Support**: Real-time notifications for email events\n- **Export Options**: Multiple format support for data export\n\n## 📧 Email Sending Functionality\n\nThe application includes direct email sending capabilities through SMTP. You can send generated emails directly from the application.\n\n### Email Setup\n\n1. **Enable 2-Factor Authentication** on your email account\n2. **Generate an App Password**:\n   - **Gmail**: Google Account → Security → App Passwords\n   - **Outlook**: Account Settings → Security → App Passwords\n   - **Yahoo**: Account Security → App Passwords\n3. **Configure Environment Variables**: Add your email credentials to the `.env` file:\n   ```\n   EMAIL_ADDRESS=your.email@gmail.com\n   EMAIL_PASSWORD=your_app_password\n   SENDER_NAME=Your Name\n   ```\n4. **Restart Application**: The app will automatically use your configured email settings\n\n### Supported Email Providers\n\n- **Gmail**: `smtp.gmail.com:587`\n- **Outlook/Office365**: `smtp.outlook.com:587`\n- **Yahoo**: `smtp.yahoo.com:587`\n\n### Testing Email Functionality\n\n```bash\n# Test email configuration\npython run_tests.py email_config\n\n# Test email functionality\npython run_tests.py email\n```\n\n## 🆘 Troubleshooting\n\n### Common Issues\n\n1. **API Key Issues**\n\n   - Ensure `GROQ_API_KEY` is set in `.env` file\n   - Verify the API key is valid and has sufficient credits\n   - Check network connectivity\n\n2. **Email Configuration Problems**\n\n   - Run `python check_email_config.py` to diagnose issues\n   - Ensure 2FA is enabled and app password is generated\n   - Verify SMTP settings for your email provider\n\n3. **Web Scraping Issues**\n\n   - Some job sites may block automated requests\n   - Try different job URLs or use manual input\n   - Check the debug output for specific error messages\n\n4. **Data Persistence Issues**\n   - Ensure write permissions in the project directory\n   - Check if ChromaDB files are corrupted\n   - Verify JSON file permissions\n\n### Debug Steps\n\n1. **Enable Debug Mode**: Check \"Show Debug Options\" in the app\n2. **Run Test Suite**: Use `python run_tests.py` to identify issues\n3. **Check Logs**: Review error messages and debug output\n4. **Verify Configuration**: Ensure all environment variables are set correctly\n\nFor detailed troubleshooting, see `TROUBLESHOOTING.md`.\n\n## 🤝 Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Submit a pull request\n\n## 📄 License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## 🔮 Future Enhancements\n\n- [x] Email sending integration (SMTP, Gmail API) ✅\n- [x] Web scraping for job descriptions ✅\n- [x] Vector database integration ✅\n- [x] Email tracking and analytics ✅\n- [ ] AI-powered follow-up email generation\n- [ ] User session management\n- [ ] Email tracking management\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanmeetkaurbaxi%2Finboxpilot","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmanmeetkaurbaxi%2Finboxpilot","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmanmeetkaurbaxi%2Finboxpilot/lists"}