{"id":20637986,"url":"https://github.com/gregavrbancic/phishing-dataset","last_synced_at":"2025-07-09T13:10:19.910Z","repository":{"id":37955196,"uuid":"188196654","full_name":"GregaVrbancic/Phishing-Dataset","owner":"GregaVrbancic","description":"Phishing dataset with more than 88,000 instances and 111 features. Web application available at. https://gregavrbancic.github.io/Phishing-Dataset/","archived":false,"fork":false,"pushed_at":"2023-03-20T01:00:47.000Z","size":20284,"stargazers_count":62,"open_issues_count":4,"forks_count":20,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-04-15T21:51:40.232Z","etag":null,"topics":["dataset","machine-learning","phishing","phishing-websites-detection"],"latest_commit_sha":null,"homepage":"https://gregavrbancic.github.io/Phishing-Dataset/","language":"Svelte","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GregaVrbancic.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":"CITATION.cff","codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2019-05-23T08:50:57.000Z","updated_at":"2025-03-23T21:02:56.000Z","dependencies_parsed_at":"2025-04-11T14:55:55.648Z","dependency_job_id":"65b420ac-45ef-41d5-944c-402bcfe852a9","html_url":"https://github.com/GregaVrbancic/Phishing-Dataset","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/GregaVrbancic/Phishing-Dataset","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GregaVrbancic%2FPhishing-Dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GregaVrbancic%2FPhishing-Dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GregaVrbancic%2FPhishing-Dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GregaVrbancic%2FPhishing-Dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GregaVrbancic","download_url":"https://codeload.github.com/GregaVrbancic/Phishing-Dataset/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GregaVrbancic%2FPhishing-Dataset/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261853254,"owners_count":23219828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dataset","machine-learning","phishing","phishing-websites-detection"],"created_at":"2024-11-16T15:16:31.467Z","updated_at":"2025-06-25T10:37:28.976Z","avatar_url":"https://github.com/GregaVrbancic.png","language":"Svelte","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Datasets for Phishing Websites Detection\n\nIn this repository the two variants of the phishing dataset are presented.\n\n## Web application\n\nTo preview the dataset interactively and/or tailor it to your needs, please visit a dedicated [web application](https://gregavrbancic.github.io/Phishing-Dataset/).\n\n## dataset_full.csv\n\n**Short description of the full variant dataset:**\n- Total number of instances: 88,647\n    - Number of legitimate website instances (labeled as 0): 58,000\n    - Number of phishing website instances (labeled as 1): 30,647\n- Total number of features: 111 (without target)\n\n## dataset_small.csv\n\n**Short description of the small variant dataset:**\n- Total number of instances: 58,645\n    - Number of legitimate website instances (labeled as 0): 27,998\n    - Number of phishing website instances (labeled as 1): 30,647\n- Total number of features: 111 (without target)\n\n## Extracted Features\n\n|          Feature           |                   Description                      |\n|----------------------------|----------------------------------------------------|\n| qty_dot_url                | count (.) in URL                                   |\n| qty_hyphen_url             | count (-) in URL                                   |\n| qty_underline_url          | count (_) in URL                                   |\n| qty_slash_url              | count (/) in URL                                   |\n| qty_questionmark_url       | count (?) in URL                                   |\n| qty_equal_url              | count (=) in URL                                   |\n| qty_at_url                 | count (@) in URL                                   |\n| qty_and_url                | count (\u0026) in URL                                   |\n| qty_exclamation_url        | count (!) in URL                                   |\n| qty_space_url\t             | count ( ) in URL                                   |\n| qty_tilde_url\t             | count (~) in URL                                   |\n| qty_comma_url\t             | count (,) in URL                                   |\n| qty_plus_url\t             | count (+) in URL                                   |\n| qty_asterisk_url\t         | count (*) in URL                                   |\n| qty_hashtag_url\t         | count (#) in URL                                   |\n| qty_dollar_url\t         | count ($) in URL                                   |\n| qty_percent_url\t         | count (%) in URL                                   |\n| qty_tld_url\t             | top-level-domain length                            |\n| length_url\t             | URL length                                         |\n| qty_dot_domain\t         | count (.) in domain                                |\n| qty_hyphen_domain\t         | count (-) in domain                                |\n| qty_underline_domain       | count (_) in domain                                |\n| qty_slash_domain\t         | count (/) in domain                                |\n| qty_questionmark_domain    | count (?) in domain                                |\n| qty_equal_domain           | count (=) in domain                                |\n| qty_at_domain              | count (@) in domain                                |\n| qty_and_domain             | count (\u0026) in domain                                |\n| qty_exclamation_domain     | count (!) in domain                                |\n| qty_space_domain           | count ( ) in domain                                |\n| qty_tilde_domain           | count (~) in domain                                |\n| qty_comma_domain           | count (,) in domain                                |\n| qty_plus_domain            | count (+) in domain                                |\n| qty_asterisk_domain        | count (*) in domain                                |\n| qty_hashtag_domain         | count (#) in domain                                |\n| qty_dollar_domain          | count ($) in domain                                |\n| qty_percent_domain         | count (%) in domain                                |\n| qty_vowels_domain          | count vowels in domain                             |\n| domain_length              | domain length                                      |\n| domain_in_ip               | URL domain in IP address format                    |\n| server_client_domain       | domain contains the keywords \"server\" or \"client\"  |\n| qty_dot_directory\t         | count (.) in directory                             |\n| qty_hyphen_directory       | count (-) in directory                             |\n| qty_underline_directory    | count (_) in directory                             |\n| qty_slash_directory        | count (/) in directory                             |\n| qty_questionmark_directory | count (?) in directory                             |\n| qty_equal_directory        | count (=) in directory                             |\n| qty_at_directory           | count (@) in directory                             |\n| qty_and_directory          | count (\u0026) in directory                             |\n| qty_exclamation_directory\t | count (!) in directory                             |\n| qty_space_directory        | count ( ) in directory                             |\n| qty_tilde_directory        | count (~) in directory                             |\n| qty_comma_directory        | count (,) in directory                             |\n| qty_plus_directory         | count (+) in directory                             |\n| qty_asterisk_directory     | count (*) in directory                             |\n| qty_hashtag_directory      | count (#) in directory                             |\n| qty_dollar_directory       | count ($) in directory                             |\n| qty_percent_directory      | count (%) in directory                             |\n| directory_length           | directory length                                   |\n| qty_dot_file               | count (.) in file                                  |\n| qty_hyphen_file            | count (-) in file                                  |\n| qty_underline_file         | count (_) in file                                  |\n| qty_slash_file             | count (/) in file                                  |\n| qty_questionmark_file      | count (?) in file                                  |\n| qty_equal_file             | count (=) in file                                  |\n| qty_at_file                | count (@) in file                                  |\n| qty_and_file               | count (\u0026) in file                                  |\n| qty_exclamation_file       | count (!) in file                                  |\n| qty_space_file             | count ( ) in file                                  |\n| qty_tilde_file             | count (~) in file                                  |\n| qty_comma_file             | count (,) in file                                  |\n| qty_plus_file\t             | count (+) in file                                  |\n| qty_asterisk_file          | count (*) in file                                  |\n| qty_hashtag_file           | count (#) in file                                  |\n| qty_dollar_file            | count ($) in file                                  |\n| qty_percent_file           | count (%) in file                                  |\n| file_length                | file length                                        |\n| qty_dot_params             | count (.) in parameters                            |\n| qty_hyphen_params          | count (-) in parameters                            |\n| qty_underline_params       | count (_) in parameters                            |\n| qty_slash_params           | count (/) in parameters                            |\n| qty_questionmark_params    | count (?) in parameters                            |\n| qty_equal_params           | count (=) in parameters                            |\n| qty_at_params              | count (@) in parameters                            |\n| qty_and_params             | count (\u0026) in parameters                            |\n| qty_exclamation_params     | count (!) in parameters                            |\n| qty_space_params           | count ( ) in parameters                            |\n| qty_tilde_params           | count (~) in parameters                            |\n| qty_comma_params           | count (,) in parameters                            |\n| qty_plus_params            | count (+) in parameters                            |\n| qty_asterisk_params        | count (*) in parameters                            |\n| qty_hashtag_params         | count (#) in parameters                            |\n| qty_dollar_params          | count ($) in parameters                            |\n| qty_percent_params         | count (%) in parameters                            |\n| params_length              | parameters length                                  |\n| tld_present_params         | TLD presence in arguments                          |\n| qty_params                 | number of parameters                               |\n| email_in_url               | email present in URL                               |\n| time_response              | search time (response) domain (lookup)             |\n| domain_spf                 | domain has SPF                                     |\n| asn_ip                     | AS Number (or ASN)                                 |\n| time_domain_activation     | time (in days) of domain activation                |\n| time_domain_expiration     | time (in days) of domain expiration                |\n| qty_ip_resolved            | number of resolved IPs                             |\n| qty_nameservers            | number of resolved name servers (NameServers - NS) |\n| qty_mx_servers             | number of MX Servers                               |\n| ttl_hostname               | time-to-live (TTL) value associated with hostname  |\n| tls_ssl_certificate        | valid TLS / SSL Certificate                        |\n| qty_redirects              | number of redirects                                |\n| url_google_index           | check if URL is indexed on Google                  |\n| domain_google_index        | check if domain is indexed on Google               |\n| url_shortened              | check if URL is shortened                          |\n| phishing                   | is phishing website                                |\n\n## Cite this dataset\n\nG. Vrbančič, I. Jr. Fister, V. Podgorelec. Datasets for Phishing Websites Detection. Data in Brief, Vol. 33, 2020, DOI: [10.1016/j.dib.2020.106438](http://dx.doi.org/10.1016/j.dib.2020.106438)\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgregavrbancic%2Fphishing-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgregavrbancic%2Fphishing-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgregavrbancic%2Fphishing-dataset/lists"}