{"id":23705262,"url":"https://github.com/javi-aranda/pelusa-server","last_synced_at":"2026-04-12T05:32:28.939Z","repository":{"id":197085312,"uuid":"697966155","full_name":"javi-aranda/pelusa-server","owner":"javi-aranda","description":"Backend and ML configuration of PELUSA, the ML engine to detect malicious URLs","archived":false,"fork":false,"pushed_at":"2024-03-19T12:42:23.000Z","size":13702,"stargazers_count":1,"open_issues_count":15,"forks_count":1,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-05-30T01:19:29.758Z","etag":null,"topics":["docker-compose","fastapi","hacktoberfest","machine-learning","pandas","python","sklearn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/javi-aranda.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null},"funding":{"github":["javi-aranda"]}},"created_at":"2023-09-28T21:06:00.000Z","updated_at":"2023-10-26T12:00:13.000Z","dependencies_parsed_at":"2023-10-20T16:45:50.808Z","dependency_job_id":"7ff92e85-906d-4487-a091-88801441f74e","html_url":"https://github.com/javi-aranda/pelusa-server","commit_stats":null,"previous_names":["javi-aranda/pelusa-server"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/javi-aranda/pelusa-server","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javi-aranda%2Fpelusa-server","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javi-aranda%2Fpelusa-server/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javi-aranda%2Fpelusa-server/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javi-aranda%2Fpelusa-server/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/javi-aranda","download_url":"https://codeload.github.com/javi-aranda/pelusa-server/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/javi-aranda%2Fpelusa-server/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":259668105,"owners_count":22893120,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["docker-compose","fastapi","hacktoberfest","machine-learning","pandas","python","sklearn"],"created_at":"2024-12-30T14:34:19.513Z","updated_at":"2025-12-30T22:35:44.052Z","avatar_url":"https://github.com/javi-aranda.png","language":"Python","funding_links":["https://github.com/sponsors/javi-aranda"],"categories":[],"sub_categories":[],"readme":"# Pelusa Server\n\n![GitHub last commit (branch)](https://img.shields.io/github/last-commit/javi-aranda/pelusa-server/master)\n[![Build and test](https://github.com/javi-aranda/pelusa-server/actions/workflows/test.yaml/badge.svg)](https://github.com/javi-aranda/pelusa-server/actions/workflows/test.yaml)\n\n\n## Description\nPelusa (Predictive Engine for Legitimate \u0026 Unverified Site Assessment) is a machine learning\nbased application that predicts the legitimacy of a website based on the URL provided. It is\nbuilt using FastAPI and PostgreSQL, deployed with Docker Compose.\n\n## Installation\nTo get started, clone the repository and run with Docker Compose.\n\n```bash\ngit clone https://github.com/javi-aranda/pelusa-server\ncd pelusa-server\ndocker-compose up  # add flag -d to run detached\ndocker-compose exec -T backend alembic upgrade head  # run SQLAlchemy migrations\n```\n\nThat should run the application on [http://localhost:8000](http://localhost:8000).\n\n## Usage\nYou can get a more detailed reference of the API by visiting [http://localhost:8000/docs](http://localhost:8000/docs).\nBut mainly it consists of an endpoint `api/v1/analysis` that accepts JSON body with `{\"input\": \"\u003cURL_TO_CHECK\u003e\"}`\nand returns the legitimacy of the website (1 means potentially bad, 0 means potentially safe).\n\nThose results are stored in a PostgreSQL database, which could be useful to train the model in a future\nor as persistence mechanism in case an URL is submitted multiple times in a short period of time.\n\n### Exploring the database\nThere is a [pgAdmin](https://www.pgadmin.org/) instance running on [http://localhost:5050](http://localhost:5050) with credentials\ndefined in `.env` file. After connecting to the PostgreSQL server, you can explore the database and run any query you want.\n\n## Dataset\nThe dataset used for training the model is handmade, it consists on 30000 URLs, 50% legitimate and 50% malicious.\n\nMalicious websites were randomly sampled from [PhishTank active threats](http://data.phishtank.com/data/online-valid.csv)\nand legitimate URLs were sampled from multiple [Kaggle datasets](https://www.kaggle.com/search?q=urls+in%3Adatasets).\nAfter extracting features for both types, the resulting dataset is [phishing_dataset.csv](https://github.com/javi-aranda/pelusa-server/blob/master/backend/app/ml/data/phishing_dataset.csv)\n\n## Training\nThe model is trained using a Random Forest Classifier with an accuracy of 94% over the training dataset\nand the code is available as a Jupyter Notebook in [train.ipynb](https://github.com/javi-aranda/pelusa-server/blob/master/backend/app/ml/notebooks/train.ipynb)\n\n## Credits\n\nThis project was made keeping in mind [FastAPI Starter](https://github.com/gaganpreet/fastapi-starter) as a reference,\nbut bundling the frontend in a different repository, which is available in [Pelusa React](https://github.com/javi-aranda/pelusa-react).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavi-aranda%2Fpelusa-server","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjavi-aranda%2Fpelusa-server","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjavi-aranda%2Fpelusa-server/lists"}