https://github.com/prak112/document-wordcloud
Insights of commonly used words in the document represented through a word cloud
https://github.com/prak112/document-wordcloud
wordcloud-visualization
Last synced: 10 days ago
JSON representation
Insights of commonly used words in the document represented through a word cloud
- Host: GitHub
- URL: https://github.com/prak112/document-wordcloud
- Owner: prak112
- License: mit
- Created: 2021-04-27T15:37:45.000Z (about 5 years ago)
- Default Branch: main
- Last Pushed: 2023-06-14T20:13:37.000Z (about 3 years ago)
- Last Synced: 2025-01-15T01:41:55.763Z (over 1 year ago)
- Topics: wordcloud-visualization
- Language: Jupyter Notebook
- Homepage:
- Size: 4.28 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Approach
## Extraction of commonly used terms in the GYBN policy briefs
- Use PyPDF2 library to extract text from .pdf file
- Create a dictionary with counter for each identified word
- Filter common stop-words (based on context)
## Evaluation of the terms to identify meaning
- Load data into Dataframe
- Visualize wordcloud from Dataframe, learnt from this [Datacamp tutorial](https://www.datacamp.com/community/tutorials/wordcloud-python)
## Output
