https://github.com/jplusplus/janus
A basic tool to retrieve the documents metadata from a domain name
https://github.com/jplusplus/janus
Last synced: 10 months ago
JSON representation
A basic tool to retrieve the documents metadata from a domain name
- Host: GitHub
- URL: https://github.com/jplusplus/janus
- Owner: jplusplus
- Created: 2013-06-13T13:00:57.000Z (about 13 years ago)
- Default Branch: master
- Last Pushed: 2013-07-09T17:21:05.000Z (almost 13 years ago)
- Last Synced: 2024-04-14T04:55:28.363Z (about 2 years ago)
- Language: JavaScript
- Homepage:
- Size: 775 KB
- Stars: 3
- Watchers: 5
- Forks: 3
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Janus
## Extract metadata from PDFs, fast
Janus is a simple tool to extract all meta data from all PDF files on a single domain. Type in a domain name, for instance "gov.uk", and get a list of all PDFs with their metadata (e.g. Author, creation and modification date). Metadata analysis is a great source of information for investigative journalists.
In the future, Janus will include other data types and go further in the analysis, clustering metadata together (like individuals who appear in the metadata).
It was developed by Journalism++' [Pierre Bellon](http://twitter.com/toutenrab) and [Leo Wallentin](http://twitter.com/leo_wallentin), who was an embedded news nerd there in June, 2013.
## How to install it
- be sure to have nodeJS installed on your computer
- get the sources
```git clone https://github.com/jplusplus/documents-from-domains.git```
- install the dependencies
```
cd janus
npm install
```
- copy the configuration file template
```
cp config.template.json config.json
```
- then enter your bing account key
## Launch the application
You can simply launch it by executing ```coffee app.coffe``` but I recommend you to use nodemon:
```
npm install -g nodemon
nodemon app.coffee
```
## Troubleshooting
- I get an error when I run npm install
| You may have an older version of node, please make sure to have node >= 9.4.1 installed on your system
## TODO
- handle images search
- handle doc & docx search