Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zbetcheckin/pdf_analysis
Several PDF analysis reassembled with additional tips and tools
https://github.com/zbetcheckin/pdf_analysis
Last synced: 20 days ago
JSON representation
Several PDF analysis reassembled with additional tips and tools
- Host: GitHub
- URL: https://github.com/zbetcheckin/pdf_analysis
- Owner: zbetcheckin
- Created: 2016-08-03T04:30:35.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2023-05-16T11:56:09.000Z (over 1 year ago)
- Last Synced: 2024-11-05T15:15:59.218Z (2 months ago)
- Size: 235 KB
- Stars: 321
- Watchers: 9
- Forks: 57
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
- awesome-security-collection - **154**星
README
# PDF Analysis
>Several PDF analysis has already been done, I reassembled a lot of them with additional tips & tools here
- [PDF format](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#pdf-format-page_facing_up)
- [Tools list](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#tools-list-wrench)
- [Quick Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#quick-analysis-rocket)
- [Complete Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#complete-analysis-mag_right)
- [Basic informations](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#basic-informations-1)
- [Metadata](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#metadata)
- [Search for older version](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#search-for-older-versions)
- [Online Analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#online-analysis-1)
- [Statistics](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#statistics)
- [Visual analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#visual-analysis)
- [Go deeper in the analysis](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#go-deeper-in-the-analysis)
- [Displaying objects and actions structure](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#displaying-objects-and-actions-structure-1)
- [Map of the objects flows](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#map-of-the-objects-flows)
- [Actions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#actions)
- [Compression](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#compression)
- [Embeded files](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#embeded-files)
- [Extract files / scripts / objects](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#extract-files--scripts--objects-1)
- [Conversions](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#conversion)
- [Encryption](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#encryption)
- [Javascript](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#javascript)
- [Flash](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#flash)
- [Sources](https://github.com/zbetcheckin/PDF_analysis/blob/master/README.md#sources-information_source)## PDF Format :page_facing_up:
https://www.adobe.com/devnet/pdf/pdf_reference.html
https://blog.didierstevens.com/2008/04/09/quickpost-about-the-physical-and-logical-structure-of-pdf-files/
https://web.archive.org/web/20141010035745/http://gnupdf.org/Introduction_to_PDF
## Tools list :wrench:
Tool | URL
------------------------------------ | ---------------------------------------------
AnalyzePDF.py | https://github.com/hiddenillusion/AnalyzePDF
ByteForce | https://github.com/weaknetlabs/ByteForce
Caradoc | https://github.com/ANSSI-FR/caradoc
Didier Stevens suite | https://github.com/DidierStevens/DidierStevensSuite
dumppdf | https://packages.debian.org/jessie/python-pdfminer
forensics-all | https://packages.debian.org/jessie-backports/forensics-all
Origami | https://code.google.com/archive/p/origami-pdf/
ParanoiDF | https://github.com/patrickdw123/ParanoiDF
peepdf | https://github.com/jesparza/peepdf
PDF Xray | https://github.com/9b/pdfxray_public
pdf-parser | http://didierstevens.com/files/software/pdf-parser_V0_6_4.zip
pdf2jhon.py | https://github.com/magnumripper/JohnTheRipper/blob/unstable-jumbo/run/pdf2john.py
pdfcrack | https://packages.debian.org/jessie/pdfcrack
pdfextract | https://github.com/CrossRef/pdfextract
pdfobjflow.py | https://bitbucket.org/sebastiendamaye/pdfobjflow
pdfresurrect | https://packages.debian.org/jessie/pdfresurrect
PdfStreamDumper.exe | http://sandsprite.com/CodeStuff/PDFStreamDumper_Setup.exe
pdftk | https://packages.debian.org/en/jessie/pdftk
pdfxray_lite.py | https://github.com/9b/pdfxray_lite
poppler-utils | https://packages.debian.org/en/jessie/poppler-utils (pdftotext, pdfimages, pdftohtml, pdftops, pdfinfo, pdffonts, pdfdetach, pdfseparate, pdfsig, pdftocairo, pdftoppm, pdfunite)
pyew | https://packages.debian.org/en/jessie/pyew
qpdf | https://packages.debian.org/jessie/qpdf
swf_mastah.py | https://github.com/9b/pdfxray_public/blob/master/builder/swf_mastah.py#### Existing list
http://blog.didierstevens.com/programs/pdf-tools/
https://github.com/sans-dfir/sift-files/tree/master/pdf-tools## Quick Analysis :rocket:
#### Basic informations
```
$ file file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf
```#### Displaying objects and actions structure
```
$ python pdfdid.py -aefv file.pdf
```#### Search for /OpenAction /AA /Launch /GoTo /GoToR /SubmitForm /Richmedia (for Flash) /JS /JavaScript /URI - Encode - Cipher - Shell code - Obfuscation...
Automatically with ParanoiDF
```
$ python paranoiDF.py -fl file.pdf
```
Or with pdf-parser
```
$ python pdf-parser.py -v file.pdf
```
With an hexadecimal analyser
```
$ bless file.pdf
```#### Extract files / scripts / Objects
pdf-parser to extract a js object for example
```
$ pdf-parser --object 32 --raw > extractedObject.js
```
pdfextract from Origami
```
$ pdfextract file.pdf
```#### Online analysis
*Beware to don't leak any important/professional/personnal data or to expose your research*
https://www.hybrid-analysis.com/## Complete Analysis :mag_right:
### Basic informations
```
$ file file.pdf
$ pdfinfo file.pdf
$ pdfinfo -box -meta -js -rawdates file.pdf
```### Powerfull Python tool to analyze PDF and exploit
```
$ pyew file.pdf
```### Other Python tool to explore PDF
```
$ peepdf -fl file.pdf
$ peepdf --interactive file.pdf
```#### Analysis under Windows
PDF Stream Dumper
https://github.com/dzzie/pdfstreamdumper### Metadata
Get metadata
```
$ exiftool -a -u -g2 file.pdf
```Get metadata recursivly from current directory
```
$ exiftool -r -ext pdf .
```Change an element
```
$ exiftool -Title="New title" file.pdf
```Remove metadata
```
$ exiftool -all= file.pdf && exiftool -all:all= file.pdf && qpdf --linearize file.pdf filewithoutmeta.pdf
$ mat file.pdf # latest version of mat doesn't support pdf format anymore...
```Remove metadata recursively from the current directory :
*Very dirty but work well*
*The filename must not have space at the moment, the commande will be optimized*
```
$ find . -name "*.pdf" -print0 | while read -d $'\0' file; do echo ${file:2} && mv ${file:2} ${file:2}.pdf && exiftool -all= ${file:2}.pdf && exiftool -all:all= ${file:2}.pdf && qpdf --linearize ${file:2}.pdf ${file:2} && rm ${file:2}.pdf && rm ${file:2}.pdf_original; done
```### Search for older versions
Search for older "hidden" versions
```
$ pdfresurrect file.pdf -i
$ exiftool -pdf-update:all= file.pdf
```### Online Analysis
Name | URL
------------------------------------ | ---------------------------------------------
Malwr | https://malwr.com/submission/
Hybrid analysis | https://www.hybrid-analysis.com/
Malware Tracker | https://www.malwaretracker.com/pdf.php
VirusTotal | http://www.virustotal.com/
PDF examiner | http://www.pdfexaminer.com/
Document Analyzer | http://www.document-analyzer.net/
Jotti | https://virusscan.jotti.org/
PDF X-ray | http://www.pdfxray.com/
PDF Online | https://www.pdf-online.com/
Extract PDF | http://www.extractpdf.com
Char conversion | https://kt.pe/tools.html#conv/### Statistics
Calcul byte statistics, entropy min and max, ASCII count, ... from a PDF
```
$ python byte-stats.py file.pdf
```### Visual analysis
Visual analysis of a PDF or a binary file
http://binvis.io## Go deeper in the analysis
### Displaying objects and actions structure
```
$ python pdfid.py --all --extra --force --verbose file.pdf
```### Map of the objects flows
```
$ pdf-parser file.pdf | ./pdfobjflow
$ eog pdfobjflow.png
```### Actions
Search for :
/OpenAction /AA specifies the script or action to run automatically.
/Names /AcroForm /Action can also specify and launch scripts or actions.
/JavaScript specifies JavaScript to run.
/GoTo changes the view to a specified destination within the PDF or in another PDF file.
/Launch a program or opens a document.
/URI accesses a resource by its URL.
/SubmitForm /GoToR can send data to URL.
/RichMedia can be used to embed Flash in PDF.
/ObjStm can hide objects inside an Object Stream.
/JavaScript > /J#61vaScript Beware on obfuscation technique with hex codesWith ParanoiDF
```
$ python paranoiDF.py -fl file.pdf
```
With pdf-parser
```
$ python pdf-parser.py -v file.pdf
```
With an hexadecimal analyser
```
$ bless file.pdf
```
With dumppdf
```
$ dumppdf -a file.pdf
```### Compression
Search for compression
```
$ strings file.pdf | grep --color "/Filter"
```2 ways to decompress a PDF
```
$ pdftk compressed.pdf output uncompressed.pdf uncompress
$ qpdf --stream-data=uncompress compressed.pdf uncompressed.pdf
```### Embeded files
4 ways to search for embeded files/scripts inside a PDF
```
$ binwalk file.pdf
$ foremost -a -v file.pdf
$ hachoir-subfile file.pdf
$ scalpel file.pdf
```### Extract files / scripts / objects
Extract file corresponding to object ID, jpg for example
```
$ dumppdf.py -i 32 -r file.pdf > image.jpg
```
Extract js from an object for example
```
$ pdf-parser --object 32 --raw > extractedObject.js
```
pdfextract from Origami
```
$ pdfextract file.pdf
```### Conversion
PDF to Postscript
```
$ pdftops file.pdf
```
PDF to TXT
```
$ pdftotext file.pdf
```
PDF to JPG
```
$ convert file.pdf image.jpg
```
Non-exhaustive list of possible conversion### LZWDecode filter
Convert a PDF to Postscript without the LZWDecode filter
```
$ qpdf --stream-data=uncompress original.pdf decoded.pdf # Decompress it
$ pdftops decoded.pdf decoded.ps # Convert it
```### Encryption
PDF supports RC4 encryption (40 to 128 bits keys) and AES (128 to 256 with the Extension Level 3).
Beware with empty password.#### Password recovering
Brute force a PDF with pdfcrack
```
$ pdfcrack -w yourDictionnary.txt file.pdf
```
With john
```
$ pdf2john.py file.pdf > x.hash
$ john --wordlist=yourDictionnary.txt x.hash
```### Javascript
2 ways to search for Javascript
```
$ pdf-parser --search=JavaScript file.pdf
$ pdfinfo -js file.pdf
```Extract an object
With jsunpack
```
$ jsunpack-extractjs file.pdf
```
With pdf-parser
```
$ pdf-parser --object 32 --raw file.pdf > file.js
```
With pdfextract from Origami
```
$ pdfextract --js file.pdf
```#### De-obfuscate
https://github.com/urule99/jsunpack-nOnline :
http://jsunpack.jeek.org/java/Malzilla and SpiderMonkey can also help deobfuscate JavaScript.
Malzilla :
http://www.malzilla.org/downloads.html
SpiderMonkey :
http://www.didierstevens.com/files/software/js-1.7.0-mod.tar.gz
More details coming soon.#### Add Javascript to PDF
https://didierstevens.com/files/software/make-pdf_V0_1_6.zip
https://neonprimetime.blogspot.fr/2015/03/how-to-add-javascript-to-pdf.html#### Disarming a PDF
```
$ python pdfid.py --disarm file.pdf
```### Flash
Search for flash
```
$ python pdf-parser.py --search flash file.pdf
```Extract flash with swf_mastah
```
$ python swf_mastah.py -f file.pdf -o ./
$ file *.swf
```
With pdf-parser
```
$ pdf-parser.py --object 32 --filter --raw file.pdf > flashFile.swf
$ file flashFile.swf
```Analysing flash program
```
$ swfdump -Ddu flashFile.swf > flashFile.txt
```
More details coming soon.## Sources :information_source:
https://blog.didierstevens.com/category/pdf/
http://www.decalage.info/file_formats_security/pdf
https://zeltser.com/analyzing-malicious-documents/
https://code.google.com/archive/p/corkami/wikis/PDFTricks.wiki
https://www.sans.org/reading-room/whitepapers/malicious/owned-malicious-pdf-analysis-33443
https://digital-forensics.sans.org/blog/2009/12/14/pdf-malware-analysis/
http://fileformats.archiveteam.org/wiki/PDF