https://github.com/hoan02/novel-crawler
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
https://github.com/hoan02/novel-crawler
crawler python
Last synced: over 1 year ago
JSON representation
Tool cào dữ liệu truyện để phục vụ cho doctruyen.io.vn
- Host: GitHub
- URL: https://github.com/hoan02/novel-crawler
- Owner: hoan02
- Created: 2024-05-30T18:18:41.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-05-31T17:10:15.000Z (about 2 years ago)
- Last Synced: 2025-01-20T06:35:29.407Z (over 1 year ago)
- Topics: crawler, python
- Language: Python
- Homepage: https://doctruyen.io.vn/
- Size: 8.79 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Novel Crawler
This project is a web crawler designed to collect novels and their chapters from the website truyenfull.vn. The collected data is then stored in a MongoDB database for use in the development of the online novel reading platform doctruyen.io.vn.
## Prerequisites
- Python 3.7+
- MongoDB
- Required Python packages (see below)
## Installation
1. Clone the repository:
```bash
git clone https://github.com/yourusername/novel-crawler.git
cd novel-crawler
```
2. Create a virtual environment and activate it:
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
3. Install the required packages:
```bash
pip install -r requirements.txt
```
4. Create a .env file in the project root directory and add your MongoDB URI:
```bash
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
```
5. Create a novel_data.txt file in the project root directory and add the URLs and total chapters of the novels to crawl:
```bash
url_1 total_chapters_1
url_2 total_chapters_2
```
## Usage
Run the crawler:
```bash
python crawl_novel_multi_threaded.py # Or crawl_novel_single_threaded.py
```
## License
This project is licensed under the MIT License. See the LICENSE file for more details.
## Copyright
Copyright © 2024 Hoan Cu Te
## Contact
Facebook: [Lê Công Hoan](https://www.facebook.com/hoanit02/)