https://github.com/dongju93/extract-ti-from-reports

Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.
https://github.com/dongju93/extract-ti-from-reports

json jupyter-notebook pdf pdf-to-text python regex text-to-json threat-intelligence

Last synced: 3 months ago
JSON representation

Convert PDFs to text, then transform that text into structured JSON objects for Threat Intelligence.

Host: GitHub
URL: https://github.com/dongju93/extract-ti-from-reports
Owner: dongju93
Created: 2023-08-22T02:11:02.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-03-24T09:18:08.000Z (about 1 year ago)
Last Synced: 2025-01-07T12:44:38.450Z (5 months ago)
Topics: json, jupyter-notebook, pdf, pdf-to-text, python, regex, text-to-json, threat-intelligence
Language: Jupyter Notebook
Homepage:
Size: 134 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

pdf_to_text
-
Uses the [pdfminer.six](https://github.com/pdfminer/pdfminer.six) library to perform the task of converting .PDF to .TXT
```
pip install pdfminer.six
```

text_to_json_ti
-
Converts .TXT to .JSON, using regular expressions to separate JSON items by predetermined fields.
URL and Filename items are extracted along with any incorrect information (not malicious) to create a whitelist array for filtering.

field_to_excel
-
Reads all of the specific field data from the .JSON files, dataframes them, and saves them to an .XLSX file with statistics as needed.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/dongju93/extract-ti-from-reports

Awesome Lists containing this project

README