https://github.com/Josue87/MetaFinder

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata
https://github.com/Josue87/MetaFinder

crawler metadata osint

Last synced: 5 months ago
JSON representation

Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata

Host: GitHub
URL: https://github.com/Josue87/MetaFinder
Owner: Josue87
License: gpl-3.0
Created: 2020-12-09T12:38:58.000Z (almost 5 years ago)
Default Branch: main
Last Pushed: 2024-01-19T23:22:13.000Z (almost 2 years ago)
Last Synced: 2024-11-07T04:19:41.325Z (about 1 year ago)
Topics: crawler, metadata, osint
Language: Python
Homepage:
Size: 53.7 KB
Stars: 195
Watchers: 8
Forks: 32
Open Issues: 4
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

osint_stuff_tool_collection - MetaFinder
awesome-hacking-lists - Josue87/MetaFinder - Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata (Python)

README

          


  MetaFinder

  






  

     

  

   

    

   

    

  





Search for documents in a domain through Search Engines. The objective is to extract metadata. 






## Installation:

```

> pip3 install metafinder

```

Upgrades are also available using:

```

> pip3 install metafinder --upgrade

```

## Usage 

MetaFinder can be used in 2 ways:

### CLI

```

metafinder -d domain.com -l 20 -o folder [-t 10] -go -bi -ba

```

Parameters:

* d: Specifies the target domain.

* l: Specify the maximum number of results to be searched in the searchs engines.

* o: Specify the path to save the report.

* t: Optional. Used to configure the threads (4 by default).

* v: Show Metafinder version.

* Search Engines to select (Google by default):

  * go: Optional. Search in Google.

  * bi: Optional. Search in Bing.

  * ba: Optional. Search in Baidu. (Experimental)

### In Code

```

import metafinder.extractor as metadata_extractor

documents_limit = 5

domain = "target_domain"

result = metadata_extractor.extract_metadata_from_google_search(domain, documents_limit)

# result = metadata_extractor.extract_metadata_from_bing_search(domain, documents_limit)

# result = metadata_extractor.extract_metadata_from_baidu_search(domain, documents_limit)

authors = result.get_authors()

software = result.get_software()

for k,v in result.get_metadata().items():

    print(f"{k}:")

    print(f"|_ URL: {v['url']}")

    for metadata,value in v['metadata'].items():

        print(f"|__ {metadata}: {value}")

document_name = "test.pdf"

try:

    metadata_file = metadata_extractor.extract_metadata_from_document(document_name)

    for k,v in metadata_file.items():

        print(f"{k}: {v}")

except FileNotFoundError:

    print("File not found")

```

## Example

![image](https://user-images.githubusercontent.com/16885065/118243158-69ee7600-b49e-11eb-9562-2dc1fab59d67.png)

# Author

This project has been developed by:

* **Josué Encinar García** -- [@JosueEncinar](https://twitter.com/JosueEncinar)

# Contributors

* **Félix Brezo Fernández** -- [@febrezo](https://twitter.com/febrezo)

# Disclaimer!

The software is designed to leave no trace in the documents we upload to a domain. The author is not responsible for any illegitimate use.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/Josue87/MetaFinder

Awesome Lists containing this project

README

MetaFinder