{"id":20827388,"url":"https://github.com/cvcio/rtaa-classifier","last_synced_at":"2025-10-11T20:52:16.014Z","repository":{"id":128326510,"uuid":"416290705","full_name":"cvcio/rtaa-classifier","owner":"cvcio","description":"Comments \u0026 Twitter accounts gRPC classification service.","archived":false,"fork":false,"pushed_at":"2022-09-04T09:05:25.000Z","size":1687,"stargazers_count":5,"open_issues_count":0,"forks_count":3,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-05-07T21:04:54.696Z","etag":null,"topics":["bot-classification","catboost","classification","detoxify","grpc","huggingface","python","toxic-comment-classification","transformers","twitter"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/cvcio.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-10-12T10:31:08.000Z","updated_at":"2023-08-15T06:07:47.000Z","dependencies_parsed_at":null,"dependency_job_id":"67b3af9c-5742-4c60-8646-8f3ad95262f5","html_url":"https://github.com/cvcio/rtaa-classifier","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvcio%2Frtaa-classifier","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvcio%2Frtaa-classifier/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvcio%2Frtaa-classifier/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/cvcio%2Frtaa-classifier/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/cvcio","download_url":"https://codeload.github.com/cvcio/rtaa-classifier/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252954432,"owners_count":21830903,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot-classification","catboost","classification","detoxify","grpc","huggingface","python","toxic-comment-classification","transformers","twitter"],"created_at":"2024-11-17T23:11:58.627Z","updated_at":"2025-10-11T20:52:10.981Z","avatar_url":"https://github.com/cvcio.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# RTAA \u0026mdash; Classifier\n\nComments \u0026 Twitter accounts gRPC classification service.\n\nThere are two distinct classification services a) account classification and b) comment classification.\n\nAccouncClassifier uses a pre-trained model created by the [Civic Information Office](https://cvcio.org) from 2016 until 17/07/2021, with almost 12,000 handpicked accounts in the open-source version and more than 25,000 users in the production version, of which 7,000 (fake/genuine) accounts from [Botometer](https://botometer.osome.iu.edu/bot-repository/datasets.html).\n\nIn our approach we investigated the behavioural characteristics that differentiates normal accounts with \"amplifiers\" without addressing the binary logic -bot or not- instead, classifing accounts as Influencers, Active, Amplifier, and Unknown. We use the [CatBoost](https://catboost.ai/) Classifier to predict user classes Refer to [Twitter Accounts - CatBoost Classifier](notebooks/twitter-accounts-catboost-classifier.ipynb) notebook for more details.\n\nCommentsClassifier uses multiple pre-trained models and can be extented with any fintuned model supported by [Huggingface](https://huggingface.co/)'s [transformers](https://huggingface.co/docs/transformers/index) module. By default we use [detoxify-original](https://github.com/unitaryai/detoxify) for english comments, [detoxify-multilingual](https://github.com/unitaryai/detoxify) for italian, french, russian, portuguese, spanish and turkish and out own [comments-el-toxic](https://huggingface.co/cvcio/comments-el-toxic) for greek comments.\n\n## How to use\n\n```bash\ndocker run --rm -it --name rtaa-classifier -p 50052:50052 cvcio/rtaa-72-rtaa-classifier:latest\n```\n\n## Development\n\n```bash\n# clone the repo\ngit clone git@github.com:cvcio/rtaa-classifier.git\ncd rtaa-classifier\n\n# create the virtual environment (ex. with cobda)\nconda create --name rtaa-classifier python=3.9\n\n# install poetry package manager\npip install poetry\n# install dependencies\npoetry install\n\n# run the service\nmake serve\n```\n\n### Protos\n\nIt is recommented not to re-generate thh stubs if you want to have the latest version. Alternatevely you must clone the [rtaa-72](https://github.com/cvcio/rtaa-72) repo and set the `PROTO_PATH` environment variable to that path. To build the protos, we use [buf](https://buf.build). Please refer to buf's documentation on how to install it, alternatevely you can run:\n\n```bash\n# install buf\nmake buf-install\n\n# build protocol buffers\n# the command will download the latest version of github.com/cvcio/proto,\n# generate the stubs, and finally clean up the leftovers.\nmake proto \n```\n\n## Contribution\n\nIf you're new to contributing to Open Source on Github, [this guide](https://opensource.guide/how-to-contribute/) can help you get started. Please check out the contribution guide for more details on how issues and pull requests work. Before contributing be sure to review the [code of conduct](https://github.com/cvcio/rtaa-classifier/blob/main/CODE_OF_CONDUCT.md).\n\n\u003ca href=\"https://github.com/cvcio/rtaa-classifier/graphs/contributors\"\u003e\n  \u003cimg src=\"https://contrib.rocks/image?repo=cvcio/rtaa-classifier\" /\u003e\n\u003c/a\u003e\n\n## License and Attribution\n\nIn general, we are making this software publicly available for broad, noncommercial public use, including academics, journalists, policymakers, researchers and the public in general.\n\nIf you use this service, please let us know at [info@cvcio.org](mailto:info@cvcio.org).\n\nSee our [LICENSE](https://github.com/cvcio/covid-19-api/blob/main/LICENSE.md) for the full terms of use for this software.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvcio%2Frtaa-classifier","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fcvcio%2Frtaa-classifier","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fcvcio%2Frtaa-classifier/lists"}