https://github.com/bact/policy-topic-model

Topic model of policy papers on artificial intelligence
https://github.com/bact/policy-topic-model

policy-analysis topic-modeling

Last synced: 11 months ago
JSON representation

Topic model of policy papers on artificial intelligence

Host: GitHub
URL: https://github.com/bact/policy-topic-model
Owner: bact
License: cc0-1.0
Created: 2022-03-30T16:28:28.000Z (about 4 years ago)
Default Branch: main
Last Pushed: 2024-11-10T06:05:48.000Z (over 1 year ago)
Last Synced: 2025-02-24T12:46:38.860Z (over 1 year ago)
Topics: policy-analysis, topic-modeling
Language: Jupyter Notebook
Homepage:
Size: 960 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# AI Policy Topic Model

My work during an internship at [UCD Centre for Digital Policy](https://digitalpolicy.ie/) in 2022.

## Pre-processing

- Download the policy documents from the shared Google Drive, put them in `data/nat-ai/orig`
- See list of documents here: https://docs.google.com/spreadsheets/d/1e6nCWAKRSAo3cq4O-3WUKtFp5AR7up_cr8hY2jI12Zg/edit?usp=sharing
- Run `./pdf-to-txt.sh`
- Text files should then be populated in `data/nat-ai/text`

## Visualization

- Try https://github.com/bact/policy-topic-model/blob/main/notebooks/topic-vis.ipynb

## Dependencies

- stopwordsiso - for stopword list
- NLTK - for lemmatizer
- scikit-learn - for document classifier (using Latent Dirichlet Allocation - LDA)
- pyLDAvis - for visualization
- Apache PDFBox 3 is required for text extraction from PDF.
- Download from https://pdfbox.apache.org/download.html
- Rename the jar file to `pdfbox-app-3.jar` and put it inside `lib/` directory
- Apache PDFBox 3 is licensed under the Apache License, Version 2.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/bact/policy-topic-model

Awesome Lists containing this project

README