Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/soxoj/marple
📖 Collect links to profiles by username through search engines and analyze with various plugins
https://github.com/soxoj/marple
osint scraper search-engine username-checker username-search
Last synced: 18 days ago
JSON representation
📖 Collect links to profiles by username through search engines and analyze with various plugins
- Host: GitHub
- URL: https://github.com/soxoj/marple
- Owner: soxoj
- License: mit
- Created: 2021-11-16T20:48:54.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2023-08-02T13:02:27.000Z (over 1 year ago)
- Last Synced: 2024-10-17T15:39:10.203Z (27 days ago)
- Topics: osint, scraper, search-engine, username-checker, username-search
- Language: Python
- Homepage:
- Size: 123 KB
- Stars: 215
- Watchers: 5
- Forks: 21
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Marple
## Summary
Collect links to profiles by username through 10+ search engines ([see the full list below](#supported-sources)).
Features:
- multiple engines
- proxy support
- CSV file export
- plugins
- pdf metadata extraction
- social media info [extraction](socid_extractor)## Quick Start
```
./marple.py soxoj
```**Results**:
```
https://t.me/soxoj
Contact @soxoj - Telegramhttps://github.com/soxoj
soxoj - GitHubhttps://coder.social/soxoj
soxoj - Coder Socialhttps://gitmemory.com/soxoj
soxoj...
PDF files
https://codeby.net/attachments/v-0-0-1-social-osint-fundamentals-pdf.45770
Social OSINT fundamentals - Codeby.net
/Creator: Google...
Links: total collected 111 / unique with username in URL 97 / reliable 38 / documents 3
```Advanced usage:
```
./marple.py soxoj --plugins metadata./marple.py smirnov --engines google baidu -v
```## Installation
All you need is Python3. And pip. And requirements, of course.
```
pip3 install -r requirements.txt
```You need API keys for some search engines (see requirements in [Supported sources](#supported-sources)). Keys should be exported to env in this way:
```
export YANDEX_KEY=key
```## Options
You can specify 'junk threshold' with option `-t` or `--threshold` (default 300) to get more or less reliable results.
Junk score is summing up from length of link URL and symbols next to username as a part of URL.
Also you can increase count of results from search engines with option `--results-count` (default 1000). Currently limit is only applicable for Google.
Other options:
```
-h, --help show this help message and exit
-t THRESHOLD, --threshold THRESHOLD
Threshold to discard junk search results
--results-count RESULTS_COUNT
Count of results parsed from each search engine
--no-url-filter Disable filtering results by usernames in URLs--engines {baidu,dogpile,google,bing,ask,aol,torch,yandex,naver,paginated,yahoo,startpage,duckduckgo,qwant}
Engines to run (you can choose more than one)--plugins {socid_extractor,metadata,maigret} [{socid_extractor,metadata,maigret} ...]
Additional plugins to analyze links-v, --verbose Display junk score for each result
-d, --debug Display all the results from sources and debug messages
-l, --list Display only list of all the URLs
--proxy PROXY Proxy string (e.g. https://user:[email protected]:8080)
--csv CSV Save results to the CSV file
```## Supported sources
| Name | Method | Requirements |
| ------------------- | --------------------------------------| ----------------- |
| [Google](http://google.com/) | scraping | None, works out of the box; frequent captcha |
| [DuckDuckGo](https://duckduckgo.com/) | scraping | None, works out of the box |
| [Yandex](https://yandex.ru/) | XML API | [Register and get YANDEX_USER/YANDEX_KEY tokens](https://github.com/fluquid/yandex-search) |
| [Naver](https://www.naver.com/) | SerpApi | [Register and get SERPAPI_KEY token](https://serpapi.com/) |
| [Baidu](https://www.baidu.com/) | SerpApi | [Register and get SERPAPI_KEY token](https://serpapi.com/) |
| [Aol](https://search.aol.com/) | scraping | None, scrapes with pagination |
| [Ask](https://www.ask.com/) | scraping | None, scrapes with pagination |
| [Bing](https://www.bing.com/) | scraping | None, scrapes with pagination |
| [Startpage](https://www.startpage.com/) | scraping | None, scrapes with pagination |
| [Yahoo](https://yahoo.com/) | scraping | None, scrapes with pagination |
| [Mojeek](https://www.mojeek.com) | scraping | None, scrapes with pagination |
| [Dogpile](https://www.dogpile.com/) | scraping | None, scrapes with pagination |
| [Torch](http://torchdeedp3i2jigzjdmfpn5ttjhthh5wbmda2rr3jvqjg5p77c54dqd.onion) | scraping | Tor proxies (socks5://localhost:9050 by default), scrapes with pagination |
| [Qwant](https://www.qwant.com/) | Qwant API | Check [if search available](https://www.qwant.com/) in your exit IP country, scrapes with pagination |## Development & testing
```sh
$ python3 -m pytest tests
```## TODO
- [x] Proxy support
- [ ] Engines choose through arguments
- [ ] Exact search filter
- [ ] Engine-specific filters
- [ ] 'Username in title' check## Mentions and articles
[Sector035 - Week in OSINT #2021-50](https://sector035.nl/articles/2021-50)
[OS2INT - MARPLE: IDENTIFYING AND EXTRACTING SOCIAL MEDIA USER LINKS](https://os2int.com/toolbox/identifying-and-extracting-social-media-user-links-with-marple/)