{"id":19693161,"url":"https://github.com/breadrock1/socialnetworkscraper","last_synced_at":"2025-07-30T14:37:16.225Z","repository":{"id":171596016,"uuid":"422655658","full_name":"breadrock1/SocialNetworkScraper","owner":"breadrock1","description":"Web scraping is simply the process of using a social media web scraper to gather data automatically. It saves users time, effort and sometimes money since it’s an automatic process performed by bots. You could take the time to search the web for all mentions of a certain word or find all prices for a certain product, but that would take a lot of time.","archived":false,"fork":false,"pushed_at":"2021-11-06T22:29:15.000Z","size":67,"stargazers_count":16,"open_issues_count":0,"forks_count":3,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-06-15T18:57:28.952Z","etag":null,"topics":["facebook","facebook-scraping","flake8","mailru","osint","osint-python","python","python3","scraper","scraping","site-scraper","social-network","social-network-analysis","twitter","vk-api","vkontakte","web-scraper","web-scraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/breadrock1.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2021-10-29T17:11:40.000Z","updated_at":"2025-01-24T02:11:11.000Z","dependencies_parsed_at":null,"dependency_job_id":"b6029cb9-d62c-4c03-8b6d-24f5f97c22ed","html_url":"https://github.com/breadrock1/SocialNetworkScraper","commit_stats":null,"previous_names":["breadrock1/socialnetworkscraper"],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/breadrock1/SocialNetworkScraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2FSocialNetworkScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2FSocialNetworkScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2FSocialNetworkScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2FSocialNetworkScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/breadrock1","download_url":"https://codeload.github.com/breadrock1/SocialNetworkScraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/breadrock1%2FSocialNetworkScraper/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267883235,"owners_count":24160227,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-30T02:00:09.044Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["facebook","facebook-scraping","flake8","mailru","osint","osint-python","python","python3","scraper","scraping","site-scraper","social-network","social-network-analysis","twitter","vk-api","vkontakte","web-scraper","web-scraping"],"created_at":"2024-11-11T19:15:54.346Z","updated_at":"2025-07-30T14:37:16.211Z","avatar_url":"https://github.com/breadrock1.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Social Network Scraper\n\n![GitHub](https://badgen.net/badge/icon/github?icon=github\u0026label)\n![version](https://img.shields.io/badge/version-1.1-blue)\n[![Awesome](https://awesome.re/badge.svg)](https://awesome.re)\n\n![Python](https://img.shields.io/badge/Python-FFD43B?style=for-the-badge\u0026logo=python\u0026logoColor=darkgreen)\n\n## What is scraping?\n\nWeb scraping is simply the process of using a social media web scraper to gather data automatically. It saves users time, effort and sometimes money since it’s an automatic process performed by bots. You could take the time to search the web for all mentions of a certain word or find all prices for a certain product, but that would take a lot of time.\n\nIn certain cases, it might even be impossible for one person to do it on their own. I mean, think about it. Can you imagine scrolling through page after page of a website, writing down every single mention of a word, analyzing the context of that word and putting it all in an easy-to-read format that other people can understand? I can’t imagine doing that, and I don’t think I’m the only one.\n\nThat’s why we have social media web scrapers to do that work for us. Web scraping these sites is easy as pie, especially if you have the right tools to help you. (More on that later.) All you have to do is tell the scraper what information you want to collect and it will parse through hypertext markup language on the social media platform. This information can range anywhere from collecting usernames, finding followers, collecting comments and analyzing conversations that include your keywords. Of course, it’s possible to use this information to gather sensitive data and manipulate others, and some people do it. But that’s not (and shouldn’t be) the goal. Done correctly, web scraping can really help individuals accomplish their personal and professional goals by helping them collect valuable data and give that data meaning.\n\n***\n\n## Installation project\n\nFirst need install requirements:\n```shell\n$ pip3 install -r requirements.txt\n```\n\n## Setting up config file\nEdit the `config.py` file with private app`s information.\n\n\u003cdetails\u003e\u003csummary\u003eConfig data details\u003c/summary\u003e\n\n1. Flask backend config data: host-address and port:\n```python\nHOST = '127.0.0.1'  # host of flask backend\nPORT = 7654         # port\n```\n\n2. Vkontakte config data:\n```python\nVK_APP_VERSION=''       # version of vk user application\nVK_APP_ID=''            # vk application id\nVK_APP_SECRET_KEY=''    # secret key of vk application (see preferences...)\nVK_APP_SERVICE_KEY=''   # service key of vk application (see preferences...)\nVK_APP_ACCESS_TOKEN=''  # access token of vk application (see preferences...)\n```\n\n3. Facebook config data:\n```python\nFB_APP_VERSION=''       # version of facebook application\nFB_APP_ID=''            # facebook application id\nFB_CLIENT_MARKER=''     # client marker-token to get access user info\nFB_APP_SECRET_KEY=''    # secret key of facebook application (see preferences...)\nFB_APP_ACCESS_KEY=''    # access key of facebook application (see preferences...)\n```\n\n4. Twitter config data:\n```python\nTW_CONSUMER_KEY=''          # consumer key (see preferences...)\nTW_CONSUMER_SECRET=''       # consumer secret token (see preferences...)\nTW_ACCESS_TOKEN_KEY=''      # access token of twitter application (see preferences...)\nTW_ACCESS_TOKEN_SECRET=''   # access secret token of twitter application (see preferences...)\n```\n\n5. LinkedIn config data:\n```python\nLI_USERNAME=''  # username to linkedIn account\nLI_PASSWORD=''  # password to specified username\n```\n\n6. MyMainRu config data:\n```python\nMM_APP_ID=0             # MyMailRu application id\nMM_USERNAME=''          # username to MyMailRu account\nMM_PASSWORD=''          # password to specified username\nMM_APP_SECRET_KEY=''    # secret key of MyMailRu application (see preferences...)\nMM_APP_PRIVATE_KEY=''   # private key of MyMainRu application (see preferences...)\n```\n\n7. OSINT sites config data:\n```python\nEMAILREP_API_KEY=''     # API token to https://emailrep.io\nDEHASHED_API_KEY=''     # API token to https://dehashed.com\n```\n\n\u003c/details\u003e\n\n***\n\n## Launching \n\nThere are several process launching modes: \n\nFirst mode - python script.\n\n```shell\n   Usage: simple_run.py {path to user json-file}\n```\n\nSecond mode if backend based on Flask microframework. The host and port of backend you can set by changing `config.py` file. Command to launch backend:\n```shell\n   python3 flask_backed_run.py\n```\n\nor \n\n```shell\n   ./flask_backed_run.py\n```\n\nThere are several available REST API:\n   - `/osint_scraping` scraping all data from osint-sites (Now accessible only two resources: https://emailrep.io and https://dehashed.com); \n   - `/social_scraping` scraping all data from social network sites/application like (Vk, Facebook, Twitter, LinkedIn ...); \n   - `/full_scraping` scraping all data from all resources.\n\n### Input json file/data\nUser json-file contains user's contact information that the user specified \nwhen sign up to `cvcode`. Such information is `Vkontakte ID`, `Facebook ID` and \n`UserAccessMarker` and e.t.c.\n\n\u003cdetails\u003e\u003csummary\u003e\u003cb\u003eExample of this json\u003c/b\u003e\u003c/summary\u003e\n\n```json\n{\n    \"Vkontakte\": {\n        \"id\": \"123456789\"\n    },\n    \"LinkedIn\": {\n        \"id\": \"ivan-ivanov-123456789\"\n    },\n    \"Twitter\": {\n        \"id\": \"Ivan123456789\"\n    },\n    \"Facebook\": {\n        \"id\": \"101313123456789\",\n    \"user_access_token\": \"EAAMTR2pPmqUBACIvzm...\"\n    },\n    \"MyMailRu\": {\n        \"id\": \"ivan.ivanov@bk.ru\",\n        \"session_key\": \"dec21acb9b62bdaabe6ef89965d58e56\"\n    },\n   \"OSINT\": {\n      \"email\": \"ivan.ivanov@bk.ru\"\n   }\n}\n```\n\n\u003c/details\u003e\n\n### Output json file/data\nThe result is json-file or json-response from backend. Examples of result you can research into `Tests/Reulst/` directory ([see input json-file](#Tests/Users/yuliya_chesnokova.json))\n\n***\n\n## Contacts\n\ntelegram: @sudo_udo\nemail: breadrock1@gmail.com\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreadrock1%2Fsocialnetworkscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fbreadrock1%2Fsocialnetworkscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fbreadrock1%2Fsocialnetworkscraper/lists"}