Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soxoj/socid-extractor
⛏️ Extract accounts info from personal pages on various sites for OSINT purpose
https://github.com/soxoj/socid-extractor
identifiers osint parsing privacy socid-extractor socmint uid
Last synced: 9 days ago
JSON representation
⛏️ Extract accounts info from personal pages on various sites for OSINT purpose
- Host: GitHub
- URL: https://github.com/soxoj/socid-extractor
- Owner: soxoj
- License: gpl-3.0
- Created: 2019-11-17T18:30:06.000Z (almost 5 years ago)
- Default Branch: master
- Last Pushed: 2024-04-10T10:36:21.000Z (7 months ago)
- Last Synced: 2024-10-17T15:38:03.887Z (22 days ago)
- Topics: identifiers, osint, parsing, privacy, socid-extractor, socmint, uid
- Language: Python
- Homepage:
- Size: 338 KB
- Stars: 708
- Watchers: 22
- Forks: 74
- Open Issues: 15
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project
- project-awesome - soxoj/socid-extractor - ⛏️ Extract accounts info from personal pages on various sites for OSINT purpose (Python)
README
# socid_extractor
Extract information about a user from profile webpages / API responses and save it in machine-readable format.
## Usage
As a command-line tool:
```
$ socid_extractor --url https://www.deviantart.com/muse1908
country: France
created_at: 2005-06-16 18:17:41
gender: female
username: Muse1908
website: www.patreon.com/musemercier
links: ['https://www.facebook.com/musemercier', 'https://www.instagram.com/muse.mercier/', 'https://www.patreon.com/musemercier']
tagline: Nothing worth having is easy...
```Without installing:
```
$ ./run.py --url https://www.deviantart.com/muse1908
```As a Python library:
```
>>> import socid_extractor, requests
>>> r = requests.get('https://www.patreon.com/annetlovart')
>>> socid_extractor.extract(r.text)
{'patreon_id': '33913189', 'patreon_username': 'annetlovart', 'fullname': 'Annet Lovart', 'links': "['https://www.facebook.com/322598031832479', 'https://www.instagram.com/annet_lovart', 'https://twitter.com/annet_lovart', 'https://youtube.com/channel/UClDg4ntlOW_1j73zqSJxHHQ']"}
```## Installation
$ pip3 install socid-extractor
The latest development version can be installed directly from GitHub:
$ pip3 install -U git+https://github.com/soxoj/socid_extractor.git
## Sites and methods
[More than 100 methods](https://github.com/soxoj/socid-extractor/blob/master/METHODS.md) for different sites and platforms are supported!
- Google (all documents pages, maps contributions), cookies required
- Yandex (disk, albums, znatoki, music, realty, collections), cookies required to prevent captcha blocks
- Mail.ru (my.mail.ru user mainpage, photo, video, games, communities)
- Facebook (user & group pages)
- VK.com (user page)
- OK.ru (user page)
- Medium
- Flickr
- Tumblr
- TikTok
- GitHub...and many others.
You can also check [tests file](https://github.com/soxoj/socid-extractor/blob/master/tests/test_e2e.py) for data examples, [schemes file](https://github.com/soxoj/socid-extractor/blob/master/socid_extractor/schemes.py) to expore all the methods.
## When it may be useful
- Getting all available info by the username or/and account UID. Examples: [Week in OSINT](https://medium.com/week-in-osint/getting-a-grasp-on-googleids-77a8ab707e43), [OSINTCurious](https://osintcurio.us/2019/10/01/searching-instagram-part-2/)
- Users tracking, checking that the account was previously known (by ID) even if all public info has changed. Examples: [Aware Online](https://www.aware-online.com/en/importance-of-user-ids-in-social-media-investigations/)
- Searching by commonly used cross-service UIDs (GAIA ID, Facebook UID, Yandex Public ID, etc.)
- DB leaks of forums and platforms in SQL format
- Indexed links that contain target profile ID
- Searching for tracking data by comparison with other IDs - [how it works](https://www.eff.org/wp/behind-the-one-way-mirror), [how can it be used](https://www.nytimes.com/interactive/2019/12/19/opinion/location-tracking-cell-phone.html).
- Law enforcement investigations## SOWEL classification
This tool uses the following OSINT techniques:
- [SOTL-1.4. Analyze Internal Identifiers](https://sowel.soxoj.com/internal-identifiers)
- [SOTL-11.1. Check Outdated And Unused Functionality](https://sowel.soxoj.com/outdated-unused-functionality)## Tools using socid_extractor
- [Maigret](https://github.com/soxoj/maigret) - powerful namechecker, generate a report with all available info from accounts found.
- [TheScrapper](https://github.com/champmq/TheScrapper) - scrape emails, phone numbers and social media accounts from a website.
- [InfoHunter](https://github.com/sweetnight19/InfoHunter) - An open source OSINT tool that allows you to search, collect and analyze information online to get a complete picture of the person or company you are interested in.
- [YaSeeker](https://github.com/HowToFind-bot/YaSeeker) - tool to gather all available information about Yandex account by login/email.
- [Marple](https://github.com/soxoj/marple) - scrape search engines results for a given username.
## Testing
```sh
python3 -m pytest tests/test_e2e.py -n 10 -k 'not cookies' -m 'not github_failed and not rate_limited'
```## Contributing
Check [separate page](https://github.com/soxoj/socid-extractor/blob/master/CONTRIBUTING.md) if you want to add a new methods of fix anything.