{"id":15647286,"url":"https://github.com/fcakyon/instafake-dataset","last_synced_at":"2025-04-30T12:41:30.146Z","repository":{"id":51799672,"uuid":"208105301","full_name":"fcakyon/instafake-dataset","owner":"fcakyon","description":"Dataset for Intagram Fake and Automated Account Detection","archived":false,"fork":false,"pushed_at":"2019-11-01T11:00:58.000Z","size":1997,"stargazers_count":54,"open_issues_count":2,"forks_count":27,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-03-30T16:12:02.379Z","etag":null,"topics":["bot","classification","data-science","dataset","fake","instafake","instagram","machine-learning","research"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/fcakyon.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-09-12T17:19:54.000Z","updated_at":"2025-03-21T15:31:58.000Z","dependencies_parsed_at":"2022-08-20T04:00:22.206Z","dependency_job_id":null,"html_url":"https://github.com/fcakyon/instafake-dataset","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcakyon%2Finstafake-dataset","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcakyon%2Finstafake-dataset/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcakyon%2Finstafake-dataset/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/fcakyon%2Finstafake-dataset/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/fcakyon","download_url":"https://codeload.github.com/fcakyon/instafake-dataset/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":251701967,"owners_count":21629925,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bot","classification","data-science","dataset","fake","instafake","instagram","machine-learning","research"],"created_at":"2024-10-03T12:18:11.083Z","updated_at":"2025-04-30T12:41:30.117Z","avatar_url":"https://github.com/fcakyon.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# InstaFake Dataset\nDataset of the [Intagram Fake and Automated Account Detection](https://arxiv.org/pdf/1910.03090.pdf) paper\n\n### Installation\n\n##### Install miniconda\nhttps://conda.io/en/latest/miniconda.html\n\n##### Setup a CONDA environment\nWe create a new vitrual environment named `instafake`.\n```\nconda create --name instafake python=3.6\n```\n\n##### Activating the virtual environment\nIf you are inside the virtual environment, your shell prompt should look like: `(instafake) user@computer:~$`\nIf that is not the case, you can enable the virtual environment using:\n```\nconda activate instafake \n```\nTo deactivate the virtual environment, use:\n```\nconda deactivate\n```\n\n##### Install required packages\n\nTo install the required packages, run the following command in your `instafake` virtual environment:\n```\npip install -r requirements.txt\n```\n\n### Import Datasets as Dataframes\nTo import the fake and automated datasets as pandas dataframes, simply define the dataset folder path `data`, and dataset version  `dataset_version` and call `import_data` from `utils`:\n\n~~~py\nfrom utils import import_data\n\ndataset_path = \"data\"\ndataset_version = \"fake-v1.0\"\n\nfake_dataset = import_data(dataset_path, dataset_version)\n\ndataset_path = \"data\"\ndataset_version = \"automated-v1.0\"\n\nautomated_dataset = import_data(dataset_path, dataset_version)\n~~~\n\n### Dataset Structures\n\nThe dataset contains of 2 set of json files with given features:\n\n##### Fake Account Detection\n1. `user_media_count` - Total number of posts, an account has.\n2. `user_follower_count` - Total number of followers, an account has.\n3. `user_following_count` - Total number of followings, an account has.\n4. `user_has_profil_pic` - Whether an account has a profil picture, or not.\n5. `user_is_private` - Whether an account is a private profile, or not.\n6. `user_biography_length` - Number of characters present in account biography.\n7. `username_length` - Number of characters present in account username.\n8. `username_digit_count` - Number of digits present in account username.\n9. `is_fake` - True, if account is a spam/fake account, False otherwise\n\n##### Automated Account Detection\n1. `user_media_count` - Total number of posts, an account has.\n2. `user_follower_count` - Total number of followers, an account has.\n3. `user_following_count` - Total number of followings, an account has.\n4. `user_has_highlight_reels` - Whether an account has at least one highlight reel present, or not.\n5. `user_has_url` - Whether an account has an url present in biography, or not.\n6. `user_biography_length` - Number of characters present in account biography.\n7. `username_length` - Number of characters present in account username.\n8. `username_digit_count` - Number of digits present in account username.\n9. `media_comment_numbers` - Total number of comments for a given media.\n10. `media_comments_are_disabled` - Whether given media is closed for comments, or not.\n11. `media_has_location_info` - Whether given media includes location, or not.\n12. `media_hashtag_numbers` - Total number of hashtags, given media has.\n13. `media_upload_times` - Media upload timastamps.\n14. `automated_behaviour` - True, if account is an automated account, False otherwise\n\n### Dataset Metadata\nThe following table is necessary for this dataset to be indexed by search\nengines such as \u003ca href=\"https://g.co/datasetsearch\"\u003eGoogle Dataset Search\u003c/a\u003e.\n\u003cdiv itemscope itemtype=\"http://schema.org/Dataset\"\u003e\n\u003ctable\u003e\n  \u003ctr\u003e\n    \u003cth\u003eproperty\u003c/th\u003e\n    \u003cth\u003evalue\u003c/th\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ename\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eInstaFake Dataset: An Instagram fake and automated account detection dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003ealternateName\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"alternateName\"\u003eInstaFake Dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eurl\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"url\"\u003ehttps://github.com/fcakyon/instafake-dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003esameAs\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://github.com/fcakyon/instafake-dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n    \u003ctr\u003e\n    \u003ctd\u003esameAs\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://github.com/fcakyon/instafake-dataset\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003edescription\u003c/td\u003e\n    \u003ctd\u003e\u003ccode itemprop=\"description\"\u003eThe InstaFake Dataset is comprised of anonymized Instagram user data collected by Fatih Cagatay Akyon and Esat Kalfaoglu over the second half of 2018. We’re releasing this dataset publicly to aid the research community in making advancements in machine learning based social media analysis.\u003c/code\u003e\u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003eprovider\u003c/td\u003e\n    \u003ctd\u003e\n      \u003cdiv itemscope itemtype=\"http://schema.org/Organization\" itemprop=\"provider\"\u003e\n        \u003ctable\u003e\n          \u003ctr\u003e\n            \u003cth\u003eproperty\u003c/th\u003e\n            \u003cth\u003evalue\u003c/th\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003ename\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eFatih C. Akyon and Esat Kalfaoglu\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003esameAs\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"sameAs\"\u003ehttps://scholar.google.com.tr/citations?user=RHGyDE0AAAAJ\u0026hl=en\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n        \u003c/table\u003e\n      \u003c/div\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n  \u003ctr\u003e\n    \u003ctd\u003elicense\u003c/td\u003e\n    \u003ctd\u003e\n      \u003cdiv itemscope itemtype=\"http://schema.org/CreativeWork\" itemprop=\"license\"\u003e\n        \u003ctable\u003e\n          \u003ctr\u003e\n            \u003cth\u003eproperty\u003c/th\u003e\n            \u003cth\u003evalue\u003c/th\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003ename\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"name\"\u003eAttribution-NonCommercial\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n          \u003ctr\u003e\n            \u003ctd\u003eurl\u003c/td\u003e\n            \u003ctd\u003e\u003ccode itemprop=\"url\"\u003ehttps://creativecommons.org/licenses/by-nc/4.0//\u003c/code\u003e\u003c/td\u003e\n          \u003c/tr\u003e\n        \u003c/table\u003e\n      \u003c/div\u003e\n    \u003c/td\u003e\n  \u003c/tr\u003e\n\u003c/table\u003e\n\u003c/div\u003e\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffcakyon%2Finstafake-dataset","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ffcakyon%2Finstafake-dataset","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ffcakyon%2Finstafake-dataset/lists"}