{"id":41324423,"url":"https://github.com/rohith-2/url_classification_dl","last_synced_at":"2026-01-23T06:00:16.742Z","repository":{"id":51209609,"uuid":"365747079","full_name":"Rohith-2/url_classification_dl","owner":"Rohith-2","description":"URL Feature extraction and Engineering aided with Classification via Neural Networks","archived":false,"fork":false,"pushed_at":"2021-12-11T13:35:33.000Z","size":32397,"stargazers_count":11,"open_issues_count":1,"forks_count":11,"subscribers_count":1,"default_branch":"main","last_synced_at":"2026-01-22T14:11:13.038Z","etag":null,"topics":["classification","deep-learning","pyquery","tinyml","url","whois"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Rohith-2.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-05-09T12:26:45.000Z","updated_at":"2025-06-07T15:06:54.000Z","dependencies_parsed_at":"2022-09-05T09:41:47.154Z","dependency_job_id":null,"html_url":"https://github.com/Rohith-2/url_classification_dl","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"purl":"pkg:github/Rohith-2/url_classification_dl","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rohith-2%2Furl_classification_dl","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rohith-2%2Furl_classification_dl/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rohith-2%2Furl_classification_dl/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rohith-2%2Furl_classification_dl/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Rohith-2","download_url":"https://codeload.github.com/Rohith-2/url_classification_dl/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Rohith-2%2Furl_classification_dl/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28681690,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-23T05:48:07.525Z","status":"ssl_error","status_checked_at":"2026-01-23T05:48:07.129Z","response_time":59,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["classification","deep-learning","pyquery","tinyml","url","whois"],"created_at":"2026-01-23T06:00:09.971Z","updated_at":"2026-01-23T06:00:16.688Z","avatar_url":"https://github.com/Rohith-2.png","language":"Python","readme":"# Url Feature Extraction \u0026 Classification  \n### Using Neural Networks to classify various URLs  \n  \nAuthors:  \n\u003e [Aaditya Jain](https://github.com/aadityajain1)    \n\u003e [Anirudh Bhaskar](https://github.com/AnirudhBhaskar21)    \n\u003e [Srikanth]( https://github.com/Srikanth-AIE)    \n\u003e [Rohith Ramakrishnan](https://github.com/Rohith-2)\n\u003chr style=\\\"border:0.5px solid gray\\\"\u003e \u003c/hr\u003e\n\n## Meduim Post:\nhttps://medium.com/@rrohith2001/url-feature-engineering-and-classification-66c0512fb34d  \n\u003chr style=\\\"border:0.5px solid gray\\\"\u003e \u003c/hr\u003e  \n\n## Acknowledgment:  \n__We would like to thank our professor [Premjith B](https://github.com/premjithb) for the assistance and guidance.__  \n \u003chr style=\\\"border:0.5px solid gray\\\"\u003e \u003c/hr\u003e \n \n## Set-Up:  \n__Pre-requisites :__ [conda](https://repo.anaconda.com/) and [git](https://git-scm.com/)     \n*Please Note : All System Paths in the scripts, are coded in UNIX OS format, please convert '/' to \"\\\\\\ \" for Windows OS.*\n```\ngit clone https://github.com/Rohith-2/url_classification_dl.git\ncd url_classification_dl\nconda create -n pyenv python=3.8.5\nconda activate pyenv\npip install -r requirements.txt\n```\nFeature Extraction :    \n```\ncd scripts/\npython extract_Features.py\n```\nThe features extracted are explained and visualised in this [Notebook](https://github.com/Rohith-2/url_classification_dl/blob/main/Notebook/DataProcessing.ipynb). The output training data after feature extraction is labbeled as [features.csv](https://github.com/Rohith-2/url_classification_dl/blob/main/FinalDataset/feature.csv) under FinalDataset. Feature extraction for each category of URLs took on an average 18-26 hours, which extends the total of 95 hours on an average.  \n  \nTraining:\n```\ncd scripts/\npython nn_Training.py\n```\nThe output of the trained model is exported to the [models](https://github.com/Rohith-2/url_classification_dl/blob/main/models).  \n  \nTesting:\n```\ncd scripts/\npython predict_args.py -i \u003curl\u003e\n``` \nIf you only wish to use the pre-trained model, please check [releases](https://github.com/Rohith-2/url_classification_dl/releases)    \n\nRunning the GUI locally:\n```\ncd GUI/\nstreamlit run predict.py\n```\n*All the above commands are from the home(url_classification_dl) folder*  \n\u003chr style=\\\"border:0.5px solid gray\\\"\u003e \u003c/hr\u003e   \n  \n## GUI:  \n[![Streamlit App](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://share.streamlit.io/rohith-2/url_classification_dl/main/GUI/gui.py)  \n\n![Screenshot 2021-05-21 at 12 18 06 PM](https://user-images.githubusercontent.com/55501708/119094445-a8e87280-ba2e-11eb-8241-56c580f073cb.png)  \n\n## Data Description via Extracted Features:\n| Feature Name | Feature Group | Feature Discription|\n| --- | --- | --- |\n| `URL Entropy` | URL String Characteristics | Entropy of URL |\n| `numDigits` | URL String Characteristics | Total number of digits in URL string |\n| `URL Lenght` | URL String Characteristics | Total number of characters in URL string |\n| `numParameters` | URL String Characteristics | Total number of query parameters in URL |\n| `numFragments` | URL String Characteristics | Total Number of Fragments in URL |\n| `domainExtension` | URL String Characteristics | Domian extension |\n| `num_%20` | URL String Characteristics | Number of '%20' in URL |\n| `num_@` | URL String Characteristics | Number of '@' in URL |\n| `has_ip` | URL String Characteristics | Occurence of IP in URL |\n| `hasHTTP` |  URL domain features | Website domain has http protocol |\n| `hasHTTPS` | URL domain features | Website domain has http protocol |\n| `urllsLive` | URL domain features | The page is online |\n| `daysSinceRegistration` | URL domain features | Number of days from today since\tdomain was registered |\n| `daysSinceExpired` | URL domain features | Number of days from today since domain expired |\n| `bodyLength` | URL page fratures | Total number of characters in URL's\tHTML page |\n| `numTitles` | URL page fratures | Total number of HI-H6 titles in URL's\tHTML page |\n| `numlmages` | URL page fratures | Total number of images embedded in URL's\tHTML page |\n| `numLinks` | URL page fratures | Total number of links embedded in URL's\tHTML page |\n| `scriptLength` | URL page fratures | Total number of characters in embedded scripts in URL's HTML page |\n| `specialCharacters` | URL page fratures | Total number of special characters in URL's\tHTML page |\n| `scriptToSpecialCharacterRatio` | URL page fratures | The ratio of total length of embedded scripts to special characters in HTML page |\n| `scriptToBodyRatio` | URL page fratures | The ratio of total length of embedded scripts to total number of characters in HTML page |  \n\n\n  \n#### Plot depecting numerous features normalised(ranging from 0 to 1) and the mean of all the classes. \n![download](https://user-images.githubusercontent.com/55501708/119180825-6b1b3680-ba8e-11eb-83a1-e68dc29251d6.png)\n\n## Performance metric:  \n![Screenshot 2021-05-20 at 6 32 01 PM](https://user-images.githubusercontent.com/55501708/118983160-c1f31400-b999-11eb-8fd9-dd54a204f6d0.png)  \n\n\u003chr style=\\\"border:0.5px solid gray\\\"\u003e \u003c/hr\u003e   \n\n## License\nThe feature_data.csv file is licensed under a Creative Commons Attribution 4.0 International License.\n\n\n\n\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohith-2%2Furl_classification_dl","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frohith-2%2Furl_classification_dl","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frohith-2%2Furl_classification_dl/lists"}