https://github.com/scriptogre/crawler-fastapi
FastAPI-based web application for crawling and analyzing webpages with real-time feedback.
https://github.com/scriptogre/crawler-fastapi
Last synced: 4 months ago
JSON representation
FastAPI-based web application for crawling and analyzing webpages with real-time feedback.
- Host: GitHub
- URL: https://github.com/scriptogre/crawler-fastapi
- Owner: scriptogre
- Created: 2023-10-09T10:14:59.000Z (over 2 years ago)
- Default Branch: master
- Last Pushed: 2023-10-09T10:15:50.000Z (over 2 years ago)
- Last Synced: 2025-09-29T00:29:35.364Z (9 months ago)
- Size: 104 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Intelligent Web Crawler
A simple web application to crawl webpages and check for suspicious words. Developed for fun and as a hobby project to experiment with FastAPI.

## Features
- **Upload**: Upload a CSV containing URLs to start the crawling process.
- **Suspicious Word Detection**: The crawler checks webpages for any suspicious words specified by the user.
- **Real-time Feedback**: The application provides real-time feedback on the webpages being scanned and the detection of suspicious words.
## Tech Stack
- **Backend**: FastAPI
- **Web Templating**: Jinja2
- **Styling**: TailwindCSS with DaisyUI
- **Frontend Enhancements**: HTMX and Hyperscript
- **Deployment**: Docker and docker-compose
- **Crawler**: Scrapy
## How to Run
Ensure you have Docker and docker-compose installed.
1. Clone the repository:
```bash
git clone
```
2. Navigate to the project directory:
```bash
cd
```
3. Use docker-compose to build and run the application:
```bash
docker-compose up --build
```
4. The application should now be running at http://localhost:8000.
## Usage
1. Go to the homepage at http://localhost:8000.
2. Upload a .csv file containing the URLs you want to crawl.
3. Input suspicious words for the crawler to detect.
4. Start the web crawling process and monitor the results in real-time. (Integration with the crawler is currently pending)
## Feedback & Contributions
Feedback is welcome! Feel free to open an issue or submit a pull request.
## License
This project is open-source and available under the MIT License.