https://github.com/apex-woot/mr-scraper
LinkedIn Profile Scraper & Data Exporter - TS/Playwright tool for extracting profiles, companies, jobs, and posts powered by Bun runtime
https://github.com/apex-woot/mr-scraper
automation browser-automation bun company-data data-exporter data-extraction job-scraper linkedin linkedin-api linkedin-data linkedin-jobs linkedin-posts linkedin-scraper playwright profile-extraction profile-scraper scraper typescript web-scraping zod
Last synced: 4 months ago
JSON representation
LinkedIn Profile Scraper & Data Exporter - TS/Playwright tool for extracting profiles, companies, jobs, and posts powered by Bun runtime
- Host: GitHub
- URL: https://github.com/apex-woot/mr-scraper
- Owner: apex-woot
- License: gpl-3.0
- Created: 2026-02-04T07:04:41.000Z (4 months ago)
- Default Branch: main
- Last Pushed: 2026-02-04T16:17:21.000Z (4 months ago)
- Last Synced: 2026-02-04T20:20:09.071Z (4 months ago)
- Topics: automation, browser-automation, bun, company-data, data-exporter, data-extraction, job-scraper, linkedin, linkedin-api, linkedin-data, linkedin-jobs, linkedin-posts, linkedin-scraper, playwright, profile-extraction, profile-scraper, scraper, typescript, web-scraping, zod
- Language: TypeScript
- Homepage:
- Size: 114 KB
- Stars: 1
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# @apexwoot/mr-scraper
[](https://www.npmjs.com/package/@apexwoot/mr-scraper)
[](LICENSE)
[](https://bun.sh)
A high-performance LinkedIn scraper for **Bun + Node.js**. Built with **Playwright** and **Zod** for robust automation and type-safe data extraction.
## Features
- **Dual Runtime Support:** Optimized builds for both **Bun** and **Node.js** natively.
- **Data Extraction:** Profiles, Companies, Job Postings, and Company Posts.
- **Type Safety:** Full TypeScript support with Zod-validated schemas.
- **Session Management:** Persist authentication via `storageState` to bypass logins.
- **Extensible:** Custom callbacks for real-time progress tracking (JSON, Multi, Console).
### 🚀 Improved Robustness
| Feature | Python Version | This Version |
| :--- | :---: | :---: |
| **Experience** | Basic | **Robust & Detailed** |
| **Patents** | Limited | **Full Extraction** |
| **Data Validation** | Pydantic | **Strict Zod Schemas** |
| **Concurrency** | Threading | **Modern Async/Await** |
## Session Persistence
To avoid repeated logins and bot detection, save and reuse your session state:
```typescript
// Save session
await loginWithCredentials(page, { email, password });
await browser.context.storageState({ path: 'state.json' });
// Reuse session
const browser = new BrowserManager({ storageState: 'state.json' });
await browser.start();
```
## Development
```bash
bun install # Setup
bun test # Run tests
run build # Build dist
```
## Roadmap / TODO
- [x] High-performance Bun + Playwright core
- [x] Robust Experience & Patent extraction
- [ ] Robust extraction of other sections (Education, Publications, Skills, Interests, etc.)
- [ ] Proxy support integration
- [ ] LinkedIn Messaging scraping support
- [ ] Recruiter-specific data points
- [ ] Automated CAPTCHA solving hooks
---
*Disclaimer: This tool is for educational purposes only. Users are responsible for complying with LinkedIn's Terms of Service.*
TypeScript port of [linkedin_scraper](https://github.com/joeyism/linkedin_scraper) by [joeyism](https://github.com/joeyism) done mostly by AI.