{"id":49073403,"url":"https://github.com/hansel-7/linkedin_stalker","last_synced_at":"2026-04-20T08:31:06.747Z","repository":{"id":327710441,"uuid":"1110454325","full_name":"hansel-7/linkedin_stalker","owner":"hansel-7","description":"Python tool to scrape LinkedIn company posts using Playwright","archived":false,"fork":false,"pushed_at":"2025-12-05T08:13:56.000Z","size":17,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-12-25T13:38:02.438Z","etag":null,"topics":["automation","linkedin-scraper","playwright","python","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/hansel-7.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-12-05T08:08:08.000Z","updated_at":"2025-12-24T10:20:05.000Z","dependencies_parsed_at":null,"dependency_job_id":null,"html_url":"https://github.com/hansel-7/linkedin_stalker","commit_stats":null,"previous_names":["purpleairfryer/linkedin_stalker"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/hansel-7/linkedin_stalker","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hansel-7%2Flinkedin_stalker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hansel-7%2Flinkedin_stalker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hansel-7%2Flinkedin_stalker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hansel-7%2Flinkedin_stalker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/hansel-7","download_url":"https://codeload.github.com/hansel-7/linkedin_stalker/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/hansel-7%2Flinkedin_stalker/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32039888,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-20T00:18:06.643Z","status":"online","status_checked_at":"2026-04-20T02:00:06.527Z","response_time":94,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["automation","linkedin-scraper","playwright","python","web-scraping"],"created_at":"2026-04-20T08:31:05.857Z","updated_at":"2026-04-20T08:31:06.692Z","avatar_url":"https://github.com/hansel-7.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# LinkedIn Stalker - LinkedIn Post Scraper\n\nA Python tool to scrape LinkedIn company page posts and user activity feeds using Playwright. Extract and organize posts from multiple companies for competitive analysis, market research, or content monitoring.\n\n\u003e ⚠️ **Disclaimer**: This tool is for educational and research purposes only. Users must comply with LinkedIn's Terms of Service and applicable laws.\n\n## Features\n\n- ✅ Extracts top N posts from LinkedIn feeds (default: 10)\n- ✅ Cookie-based authentication\n- ✅ Duplicate detection\n- ✅ Repost filtering\n- ✅ Multiple URL support\n- ✅ Text output format\n\n## Installation\n\n1. Install dependencies:\n```bash\npip install -r requirements.txt\n```\n\n2. Install Playwright browsers:\n```bash\nplaywright install chromium\n```\n\n## Setup\n\n### Step 1: Extract LinkedIn Cookies\n\nRun the cookie extractor to save your LinkedIn session:\n\n```bash\npython get_linkedin_cookies.py\n```\n\nThis will:\n1. Open a browser window\n2. Navigate to LinkedIn\n3. Wait for you to log in manually\n4. Save your cookies to `linkedin_cookies.json`\n\n### Step 2: Configure Companies and URLs\n\nEdit `linkedin_urls.json` to add company names and their LinkedIn URLs:\n\n```json\n[\n  [\"OpenAI\", \"https://www.linkedin.com/company/openai/posts/\"],\n  [\"Anthropic\", \"https://www.linkedin.com/company/anthropic/posts/\"],\n  [\"Google DeepMind\", \"https://www.linkedin.com/company/google-deepmind/posts/\"]\n]\n```\n\n**Format:** Each entry is an array with two elements:\n1. Company/Person name (string)\n2. LinkedIn URL (string)\n\n## Usage\n\n### Scrape Multiple URLs\n\nRun the main scraper to process all URLs from `linkedin_urls.json`:\n\n```bash\npython scrape_linkedin.py\n```\n\nResults will be saved to `linkedin_output.txt`, organized by company with clear section separators.\n\n### Scrape Single URL (Manual)\n\nUse the agent module directly in Python:\n\n```python\nfrom linkedin_agent import get_linkedin_updates\n\n# Get top 10 posts\nposts = get_linkedin_updates(\"https://www.linkedin.com/company/openai/posts/\", max_posts=10)\n\nfor post in posts:\n    print(f\"Position: {post['position']}\")\n    print(f\"Content: {post['text']}\")\n    print(f\"URL: {post['url']}\")\n    print()\n```\n\n## Configuration\n\n### Number of Posts\n\nChange the number of posts to extract by editing `scrape_linkedin.py`:\n\n```python\nposts = get_linkedin_updates(url, max_posts=20)  # Get 20 posts instead of 10\n```\n\n### Date Filtering (Optional)\n\nIf you want to filter by date (experimental, disabled by default):\n\n```python\nposts = get_linkedin_updates(url, max_posts=10, max_days=30)\n```\n\n## Output Format\n\nThe output file (`linkedin_output.txt`) is organized by company:\n\n```\n================================================================================\nLINKEDIN SCRAPING RESULTS\nGenerated: 2024-12-05 10:30:00\nTotal Companies: 2\n================================================================================\n\n████████████████████████████████████████████████████████████████████████████████\nCOMPANY: COMPANY NAME 1\n████████████████████████████████████████████████████████████████████████████████\nURL: https://www.linkedin.com/company/...\nScraped: 2024-12-05 10:30:15\n────────────────────────────────────────────────────────────────────────────────\n\n📊 Total Posts Found: 10\n\n┌─ POST #1 ──────────────────────────────────────────────────────────────────\n│ Position in feed: 1\n│\n│ Content:\n│ [Post content here...]\n└───────────────────────────────────────────────────────────────────────────\n\n[Additional posts...]\n\n████████████████████████████████████████████████████████████████████████████████\nCOMPANY: COMPANY NAME 2\n████████████████████████████████████████████████████████████████████████████████\n[...]\n```\n\n## Project Structure\n\n```\nlinkedin_stalker/\n├── linkedin_agent.py              # Core scraping module\n├── scrape_linkedin.py             # Batch scraper for multiple companies\n├── get_linkedin_cookies.py        # Cookie extraction helper\n├── debug_linkedin.py              # Debug mode scraper\n├── linkedin_urls.json             # List of companies and URLs to scrape\n├── linkedin_cookies.json.example  # Example cookie file structure\n├── requirements.txt               # Python dependencies\n├── LINKEDIN_STRUCTURE.md          # LinkedIn HTML structure documentation\n├── README.md                      # This file\n├── LICENSE                        # MIT License\n└── .gitignore                     # Git ignore rules\n\n# Generated files (gitignored):\n├── linkedin_cookies.json          # Your session cookies\n├── linkedin_output.txt            # Scraping results\n└── debug.html                     # Debug output\n```\n\n## Troubleshooting\n\n### Browser doesn't open\n\nMake sure Playwright browsers are installed:\n```bash\nplaywright install chromium\n```\n\n### No posts found\n\n1. Check that `linkedin_cookies.json` exists and has valid cookies\n2. Try running `get_linkedin_cookies.py` again to refresh cookies\n3. Make sure you're logged into LinkedIn in the browser when extracting cookies\n\n### Duplicates or missing content\n\nThe script includes deduplication logic and filters out:\n- Reposts\n- Image placeholders\n- Action buttons (Like, Comment, Share)\n- Metadata text\n\nIf posts are still being missed, try increasing `max_posts` parameter.\n\n## Notes\n\n- **Legal \u0026 Ethical Use**: This tool is for educational purposes and personal research. LinkedIn may rate-limit or block automated access. Always respect LinkedIn's Terms of Service and robots.txt.\n- **Cookie Expiration**: Session cookies expire after some time. Re-run `get_linkedin_cookies.py` if scraping fails.\n- **Detection Avoidance**: The script uses non-headless mode by default to reduce detection risk.\n- **Rate Limiting**: Add delays between requests when scraping multiple companies to avoid rate limits.\n\n## Disclaimer\n\nThis tool is for educational and research purposes only. Users are responsible for ensuring their use complies with LinkedIn's Terms of Service and applicable laws. The authors are not responsible for any misuse of this software.\n\n## Contributing\n\nContributions are welcome! Please feel free to submit a Pull Request.\n\n## License\n\nMIT - see [LICENSE](LICENSE) file for details\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhansel-7%2Flinkedin_stalker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhansel-7%2Flinkedin_stalker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhansel-7%2Flinkedin_stalker/lists"}