https://github.com/vipulbunny/web-tech-scanner
A Python-based web scraping tool that detects technologies used on a website by analyzing its scripts, meta tags, and HTML content.
https://github.com/vipulbunny/web-tech-scanner
beautifulsoup beautifulsoup4 data-analysis data-science python requests technology-detection web-scraping
Last synced: about 1 month ago
JSON representation
A Python-based web scraping tool that detects technologies used on a website by analyzing its scripts, meta tags, and HTML content.
- Host: GitHub
- URL: https://github.com/vipulbunny/web-tech-scanner
- Owner: VIPULbunny
- Created: 2025-02-25T18:19:06.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-27T11:10:38.000Z (over 1 year ago)
- Last Synced: 2025-02-27T15:29:56.983Z (over 1 year ago)
- Topics: beautifulsoup, beautifulsoup4, data-analysis, data-science, python, requests, technology-detection, web-scraping
- Language: Jupyter Notebook
- Homepage:
- Size: 234 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 🔍 Technology Detector
A powerful Python-based tool that scrapes a given website and detects the technologies used on it! This tool utilizes web scraping and pattern matching to identify various technologies, frameworks, and libraries implemented in the site's HTML, scripts, and metadata.
## 🚀 Features
- Scrapes a website and analyzes its **HTML, meta tags, and scripts**
- Matches technologies against a predefined dataset
- Provides a **clean and accurate** list of detected technologies
- Fast and efficient, using **BeautifulSoup** for parsing and **requests** for fetching data
## 📌 Tags
`Python` `Web Scraping` `Technology Detector` `BeautifulSoup` `Requests` `Automation`
---
## 📥 Installation
### Prerequisites
Ensure you have **Python 3.x** installed along with the required libraries.
```sh
pip install pandas requests beautifulsoup4
```
### Clone the Repository
```sh
git clone https://github.com/yourusername/technology-detector.git
cd technology-detector
```
---
## ⚡ Usage
Run the script and enter a website URL to analyze.
```sh
python technolog.py
```
### Example Output:
```
Enter the website URL: https://example.com
Formatted URL: example.com
Technologies used in this website: jQuery, Bootstrap, Google Analytics
```
---
## 🛠 How It Works
1. **Loads Technology Data** 📂
- Fetches a dataset of web technologies from a JSON file.
- Converts the dataset into a structured **pandas DataFrame**.
2. **Scrapes the Website** 🌐
- Uses `requests` to fetch the page source.
- Parses the HTML using `BeautifulSoup`.
3. **Matches Technologies** 🔍
- Extracts **scripts, meta tags, and headers** from the website.
- Checks for predefined technology patterns.
- Returns a list of matched technologies.
---
---
## 🌟 Future Enhancements
✅ Add support for more technology datasets 🔧
✅ Improve accuracy with **machine learning-based detection** 🤖
✅ Build a **GUI or Web Interface** for ease of use 🖥️
---
## 🤝 Contributing
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to modify.
---
## 📜 License
This project is open-source and available under the **MIT License**.
---
## 📧 Contact
Have questions or suggestions? Feel free to reach out!
📩 Email: vipulsolanki339@gmail.com
🔗 GitHub: [VIPULbunny](https://github.com/VIPULbunny)