https://github.com/brandonrobertz/foia-pdf-processing-system
A Django app that provides a workflow for managing FOIA response records, converting PDF to CSV and data cleaning
https://github.com/brandonrobertz/foia-pdf-processing-system
django foia-pdf-processing investigative-journalism workflow-management
Last synced: 3 months ago
JSON representation
A Django app that provides a workflow for managing FOIA response records, converting PDF to CSV and data cleaning
- Host: GitHub
- URL: https://github.com/brandonrobertz/foia-pdf-processing-system
- Owner: brandonrobertz
- License: mit
- Created: 2020-10-08T21:00:41.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-04-06T04:33:39.000Z (about 3 years ago)
- Last Synced: 2025-01-19T15:38:43.216Z (5 months ago)
- Topics: django, foia-pdf-processing, investigative-journalism, workflow-management
- Language: Python
- Homepage:
- Size: 281 KB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# FOIA Records Processing System
A Django app that provides a workflow for managing FOIA response records,
converting PDF to CSV and data cleaningThis assumes you'll be using [FOIAmail](https://github.com/bettergov/foiamail)
to manage and gather responsive records (not necessary, though). Data from
FOIAmail comes across in this directory structure:data/
agency_attachments/
Agency Name One/
request-file-1.pdf
Another Agency Name/
another-record-here.pdfThis project also assumes that you'll be extracting structured data (CSV
format) from PDFs, Word Documents, Excel spreadsheets, email archives using
the methods found in `extractor.template.sh`. This script has commands for
turning records from PDF, DOC/X, XLSX, EML, scanned images into CSVs.Eventually, I'd like to pull this functionality into this application so users
can directly rotate, OCR, split, extract and clean data from PDFs/documents in
one place.