An open API service indexing awesome lists of open source software.

https://github.com/apex-woot/mr-scraper

LinkedIn Profile Scraper & Data Exporter - TS/Playwright tool for extracting profiles, companies, jobs, and posts powered by Bun runtime
https://github.com/apex-woot/mr-scraper

automation browser-automation bun company-data data-exporter data-extraction job-scraper linkedin linkedin-api linkedin-data linkedin-jobs linkedin-posts linkedin-scraper playwright profile-extraction profile-scraper scraper typescript web-scraping zod

Last synced: 4 months ago
JSON representation

LinkedIn Profile Scraper & Data Exporter - TS/Playwright tool for extracting profiles, companies, jobs, and posts powered by Bun runtime

Awesome Lists containing this project

README

          

# @apexwoot/mr-scraper

[![npm version](https://img.shields.io/npm/v/@apexwoot/mr-scraper.svg)](https://www.npmjs.com/package/@apexwoot/mr-scraper)
[![License: GPL-3.0](https://img.shields.io/badge/License-GPL--3.0-blue.svg)](LICENSE)
[![Bun](https://img.shields.io/badge/Bun-%23000000.svg?style=flat&logo=bun&logoColor=white)](https://bun.sh)

A high-performance LinkedIn scraper for **Bun + Node.js**. Built with **Playwright** and **Zod** for robust automation and type-safe data extraction.

## Features

- **Dual Runtime Support:** Optimized builds for both **Bun** and **Node.js** natively.
- **Data Extraction:** Profiles, Companies, Job Postings, and Company Posts.

- **Type Safety:** Full TypeScript support with Zod-validated schemas.
- **Session Management:** Persist authentication via `storageState` to bypass logins.
- **Extensible:** Custom callbacks for real-time progress tracking (JSON, Multi, Console).

### 🚀 Improved Robustness

| Feature | Python Version | This Version |
| :--- | :---: | :---: |
| **Experience** | Basic | **Robust & Detailed** |
| **Patents** | Limited | **Full Extraction** |
| **Data Validation** | Pydantic | **Strict Zod Schemas** |
| **Concurrency** | Threading | **Modern Async/Await** |

## Session Persistence

To avoid repeated logins and bot detection, save and reuse your session state:

```typescript
// Save session
await loginWithCredentials(page, { email, password });
await browser.context.storageState({ path: 'state.json' });

// Reuse session
const browser = new BrowserManager({ storageState: 'state.json' });
await browser.start();
```

## Development

```bash
bun install # Setup
bun test # Run tests
run build # Build dist
```

## Roadmap / TODO

- [x] High-performance Bun + Playwright core
- [x] Robust Experience & Patent extraction
- [ ] Robust extraction of other sections (Education, Publications, Skills, Interests, etc.)
- [ ] Proxy support integration
- [ ] LinkedIn Messaging scraping support
- [ ] Recruiter-specific data points
- [ ] Automated CAPTCHA solving hooks

---

*Disclaimer: This tool is for educational purposes only. Users are responsible for complying with LinkedIn's Terms of Service.*

TypeScript port of [linkedin_scraper](https://github.com/joeyism/linkedin_scraper) by [joeyism](https://github.com/joeyism) done mostly by AI.