https://github.com/tripplyons/find-twitter-accounts
Find and classify Twitter accounts using text embeddings
https://github.com/tripplyons/find-twitter-accounts
Last synced: 6 months ago
JSON representation
Find and classify Twitter accounts using text embeddings
- Host: GitHub
- URL: https://github.com/tripplyons/find-twitter-accounts
- Owner: tripplyons
- License: mit
- Created: 2022-09-15T16:06:56.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-30T19:33:06.000Z (about 3 years ago)
- Last Synced: 2024-12-29T05:42:10.319Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 23.4 KB
- Stars: 2
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Find Twitter Accounts
Find and classify Twitter accounts using text embeddings
## Use cases
- Finding bot accounts
- Finding accounts of cryptocurrency projects ([example output and dataset](https://gist.github.com/tripplyons/eb5977dcf788ca408f4fe542daeb914e))
- Finding any other kind of account you can make a dataset for
## Installation
```bash
conda env create -f environment.yml
conda activate twitter
```
## Usage
### Labeling data
This will create or add to a dataset stored in `dataset.json`.
```bash
python labeling.py
```
### Training a classifier
This will train a linear classifier on embeddings.
It will use the dataset defined in `dataset.json` and save a model to `classifier.pkl`:
```bash
python classifier.py
```
### Finding accounts
This will find embeddings of scraped accounts and use the classifier to classify them.
It will output links to any accounts with a specified label.
```bash
python main.py
```
## Details
### Input format for the model
```
Display Name, Username, Profile description
```
## Credits
- https://platform.openai.com/docs/guides/embeddings (Embeddings)
- https://github.com/JustAnotherArchivist/snscrape (Twitter Scraper)