An open API service indexing awesome lists of open source software.

https://github.com/siddhant-vij/concurrent-web-crawler

Efficient crawling & data extraction from web pages using concurrency in multiple programming languages.
https://github.com/siddhant-vij/concurrent-web-crawler

concurrency cpp fetch-data go java multithreading performance-analysis python web-crawler

Last synced: 2 months ago
JSON representation

Efficient crawling & data extraction from web pages using concurrency in multiple programming languages.

Awesome Lists containing this project

README

        

# Concurrent Web Crawler

Designed to showcase the development and performance of web crawlers using concurrent programming techniques across Java, C++, Go, and Python. This project aims to demonstrate how concurrency can significantly enhance web crawling efficiency, allowing for faster data retrieval and processing.




By implementing the crawler in multiple programming languages, the project provides insights into the concurrency models of each language and their practical application in web scraping tasks.


## Table of Contents

1. [Features](#features)
1. [Installation and Usage](#installation-and-usage)
1. [Contributing](#contributing)
1. [License](#license)


## Features

- Concurrent fetching of web pages to maximize data retrieval speed.
- Configurable depth and domain restrictions for targeted crawling.
- Efficient URL management to avoid processing duplicates.
- Performance analysis comparing concurrent crawlers against sequential ones.
- Implementations in Java, C++, Go, and Python to highlight language-specific concurrency strategies.


## Installation and Usage

1. **Clone the Repository**:
```bash
git clone https://github.com/siddhant-vij/Concurrent-Web-Crawler.git
```
2. **Navigate to Language Directory**:
```bash
cd Concurrent-Web-Crawler/[language]
```
3. **Install Dependencies**: Standard instructions to be followed for each language, if any external dependency.
4. **Build and Run the Application**: Standard instructions to be followed for each language.


## Contributing

Contributions are what make the open-source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.

1. **Fork the Project**
2. **Create your Feature Branch**:
```bash
git checkout -b feature/AmazingFeature
```
3. **Commit your Changes**:
```bash
git commit -m 'Add some AmazingFeature'
```
4. **Push to the Branch**:
```bash
git push origin feature/AmazingFeature
```
5. **Open a Pull Request**


## License

Distributed under the MIT License. See [`LICENSE`](https://github.com/siddhant-vij/Concurrent-Web-Crawler/blob/main/LICENSE) for more information.