Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/ausboss/4chan-scraper-playwright
A node js scraper for 4chan threads. Finds threads by text in the subject line.
https://github.com/ausboss/4chan-scraper-playwright
4chan 4chan-scraper playwright playwright-javascript scraper
Last synced: 6 days ago
JSON representation
A node js scraper for 4chan threads. Finds threads by text in the subject line.
- Host: GitHub
- URL: https://github.com/ausboss/4chan-scraper-playwright
- Owner: ausboss
- Created: 2024-07-25T16:28:31.000Z (3 months ago)
- Default Branch: main
- Last Pushed: 2024-07-25T22:09:46.000Z (3 months ago)
- Last Synced: 2024-10-10T11:21:09.601Z (27 days ago)
- Topics: 4chan, 4chan-scraper, playwright, playwright-javascript, scraper
- Language: JavaScript
- Homepage:
- Size: 31.3 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 4chan Thread Extractor
This project is a Node.js application that extracts posts from the most popular thread matching specific subjects on any 4chan board.
## Features
- Flexibly search for 'general' threads or any thread by subject on any 4chan board
- Specify multiple subject texts to match threads subject line
- Automatically finds the most popular matching thread
- Extracts all posts from the thread, including text and image information
- Returns the extracted data as a formatted string## Todo
- Improve the formatting of the extracted text
## Prerequisites
- Node.js (v14 or later recommended)
- npm (comes with Node.js)## Installation
1. Clone this repository:
```
git clone https://github.com/yourusername/4chan-thread-extractor.git
```
2. Navigate to the project directory:
```
cd 4chan-thread-extractor
```
3. Install the dependencies:
```
npm install
```## Project Structure
```
4chan-thread-extractor/
├── src/
│ ├── index.js
│ └── utils/
│ └── threadExtractor.js
├── package.json
└── README.md
```- `src/index.js`: The entry point of the application
- `src/utils/threadExtractor.js`: Contains the main function for extracting the thread data
- `package.json`: Project configuration and dependencies## Usage
The main function `extract4chanThread` in `src/utils/threadExtractor.js` takes two parameters:
1. `board`: The 4chan board to search (e.g., "g" for /g/, "v" for /v/, etc.)
2. `subjectTexts`: An array of strings to match in thread subjectsTo run the script, you can modify `src/index.js` to search for specific threads. For example, to search for "/lmg/" threads on /g/:
```javascript
import { extract4chanThread } from "./utils/threadExtractor.js";async function main() {
console.log("Extracting posts from the most popular /lmg/ thread...");
const extractedText = await extract4chanThread("g", [
"/lmg/",
"Local Models General",
]);
console.log(extractedText);
}main().catch(console.error);
```Then run the script using:
```
npm start
```This will execute the script and print the extracted posts to the console.
### Examples
1. To scrape the local models general thread on /g/:
```javascript
const extractedText = await extract4chanThread("g", [
"/lmg/",
"Local Models General",
]);
```2. To scrape the Fortnite General on /vg/:
```javascript
const extractedText = await extract4chanThread("vg", [
"/fng/",
"Fortnite General",
]);
```