https://github.com/dataforgoodfr/offseason_missiontransition_datasource
https://github.com/dataforgoodfr/offseason_missiontransition_datasource
Last synced: 5 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/dataforgoodfr/offseason_missiontransition_datasource
- Owner: dataforgoodfr
- License: mit
- Created: 2021-10-07T13:18:46.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2022-07-24T11:24:35.000Z (almost 4 years ago)
- Last Synced: 2024-04-24T10:06:58.270Z (about 2 years ago)
- Language: Jupyter Notebook
- Size: 9.52 MB
- Stars: 1
- Watchers: 5
- Forks: 3
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# offseason_missiontransition
Scrap all the aids from the API : https://aides-territoires.beta.gouv.fr/api/aids/
For each one gives the information of :
- name
- url
- if error on orig url and the type
- if error on app url and the type
- list of pdfs if available on url
- Whether a pdf contains a word from the list given here
`CRITERIA_WORDS = ["conditions", "critères", "éligible", "éligibilité"]`
## Install
```
pip install .
python scripts/scrap_pdf_files.py
```
## How to read a pdf from url
Following script shows how to read content from a sample pdf, using csv file generated
by scrap_pdf_files.py. It also creates a .json file with associated pdfs urls & contents.
```
python scripts/read_pdf_content_tutorial.py
```