https://github.com/rosriv30/safe-url
Classifying URLs as safe, unsafe, or invalid based on 30 features
https://github.com/rosriv30/safe-url
security url webscraping
Last synced: 3 months ago
JSON representation
Classifying URLs as safe, unsafe, or invalid based on 30 features
- Host: GitHub
- URL: https://github.com/rosriv30/safe-url
- Owner: RoSriv30
- Created: 2021-07-26T22:23:56.000Z (almost 4 years ago)
- Default Branch: main
- Last Pushed: 2025-01-06T17:41:21.000Z (5 months ago)
- Last Synced: 2025-01-20T16:34:14.452Z (5 months ago)
- Topics: security, url, webscraping
- Language: Python
- Homepage:
- Size: 25.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Welcome to Safe URL


Safe URL is a flask web app that utilizes machine learning to check whether a URL is safe, unsafe, or non-existent. It uses a model trained on the UCI Phishing Dataset (https://archive.ics.uci.edu/ml/datasets/phishing+websites) along with the Random Forest Classifier algorithm to essentially predict the status of a URL. This app includes a feature extractor which breaks an input URL into 30 distinct features for the model to analyze.
The general feature categories include- Address Bar Features
- HTML/JS Features
- Domain Features
- Abnormalities
Each feature translates to either a 1 for safe, 0 for suspicious, or -1 for unsafe. The combination of each of these numeric values across the various features yields the overall URL status.
## Files
- **app.py**: Default page of the app; handles routing
- **featureExtraction.py**: Handles logic to extract all features and return a list containing the numeric equivalent of each feature
- **prediction.py**: Includes model training using Random Forest Classifier
- **phishing.csv**: UCI Phishing Dataset
- **templates/index.html**: HTML page template
- **static/style.css**: Styling of HTML page
## Key Libraries
- pandas
- BeautifulSoup4
- scikit-learn
- googlesearch-python
- urllib3
- regex
- python-whois## Getting Started
Install all of the dependencies. For any libraries not installed, install manually.
```sh
pip install requirements.txt
```Run app.py to bring up the webpage on localhost.
```sh
python app.py
```
## Demo
Before Check
After Check
