Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/fcakyon/instafake-dataset
Dataset for Intagram Fake and Automated Account Detection
https://github.com/fcakyon/instafake-dataset
bot classification data-science dataset fake instafake instagram machine-learning research
Last synced: 20 days ago
JSON representation
Dataset for Intagram Fake and Automated Account Detection
- Host: GitHub
- URL: https://github.com/fcakyon/instafake-dataset
- Owner: fcakyon
- Created: 2019-09-12T17:19:54.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2019-11-01T11:00:58.000Z (about 5 years ago)
- Last Synced: 2024-12-26T16:05:53.478Z (about 1 month ago)
- Topics: bot, classification, data-science, dataset, fake, instafake, instagram, machine-learning, research
- Language: Python
- Size: 1.9 MB
- Stars: 51
- Watchers: 4
- Forks: 27
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# InstaFake Dataset
Dataset of the [Intagram Fake and Automated Account Detection](https://arxiv.org/pdf/1910.03090.pdf) paper### Installation
##### Install miniconda
https://conda.io/en/latest/miniconda.html##### Setup a CONDA environment
We create a new vitrual environment named `instafake`.
```
conda create --name instafake python=3.6
```##### Activating the virtual environment
If you are inside the virtual environment, your shell prompt should look like: `(instafake) user@computer:~$`
If that is not the case, you can enable the virtual environment using:
```
conda activate instafake
```
To deactivate the virtual environment, use:
```
conda deactivate
```##### Install required packages
To install the required packages, run the following command in your `instafake` virtual environment:
```
pip install -r requirements.txt
```### Import Datasets as Dataframes
To import the fake and automated datasets as pandas dataframes, simply define the dataset folder path `data`, and dataset version `dataset_version` and call `import_data` from `utils`:~~~py
from utils import import_datadataset_path = "data"
dataset_version = "fake-v1.0"fake_dataset = import_data(dataset_path, dataset_version)
dataset_path = "data"
dataset_version = "automated-v1.0"automated_dataset = import_data(dataset_path, dataset_version)
~~~### Dataset Structures
The dataset contains of 2 set of json files with given features:
##### Fake Account Detection
1. `user_media_count` - Total number of posts, an account has.
2. `user_follower_count` - Total number of followers, an account has.
3. `user_following_count` - Total number of followings, an account has.
4. `user_has_profil_pic` - Whether an account has a profil picture, or not.
5. `user_is_private` - Whether an account is a private profile, or not.
6. `user_biography_length` - Number of characters present in account biography.
7. `username_length` - Number of characters present in account username.
8. `username_digit_count` - Number of digits present in account username.
9. `is_fake` - True, if account is a spam/fake account, False otherwise##### Automated Account Detection
1. `user_media_count` - Total number of posts, an account has.
2. `user_follower_count` - Total number of followers, an account has.
3. `user_following_count` - Total number of followings, an account has.
4. `user_has_highlight_reels` - Whether an account has at least one highlight reel present, or not.
5. `user_has_url` - Whether an account has an url present in biography, or not.
6. `user_biography_length` - Number of characters present in account biography.
7. `username_length` - Number of characters present in account username.
8. `username_digit_count` - Number of digits present in account username.
9. `media_comment_numbers` - Total number of comments for a given media.
10. `media_comments_are_disabled` - Whether given media is closed for comments, or not.
11. `media_has_location_info` - Whether given media includes location, or not.
12. `media_hashtag_numbers` - Total number of hashtags, given media has.
13. `media_upload_times` - Media upload timastamps.
14. `automated_behaviour` - True, if account is an automated account, False otherwise### Dataset Metadata
The following table is necessary for this dataset to be indexed by search
engines such as Google Dataset Search.
property
value
name
InstaFake Dataset: An Instagram fake and automated account detection dataset
alternateName
InstaFake Dataset
url
https://github.com/fcakyon/instafake-dataset
sameAs
https://github.com/fcakyon/instafake-dataset
sameAs
https://github.com/fcakyon/instafake-dataset
description
The InstaFake Dataset is comprised of anonymized Instagram user data collected by Fatih Cagatay Akyon and Esat Kalfaoglu over the second half of 2018. We’re releasing this dataset publicly to aid the research community in making advancements in machine learning based social media analysis.
provider
property
value
name
Fatih C. Akyon and Esat Kalfaoglu
sameAs
https://scholar.google.com.tr/citations?user=RHGyDE0AAAAJ&hl=en
license
property
value
name
Attribution-NonCommercial
url
https://creativecommons.org/licenses/by-nc/4.0//