https://github.com/tomhalloin/Springboard-Berkshire

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)
https://github.com/tomhalloin/Springboard-Berkshire

gensim nlp spacy springboard textacy topic-modeling

Last synced: 6 months ago
JSON representation

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)

Host: GitHub
URL: https://github.com/tomhalloin/Springboard-Berkshire
Owner: tomhalloin
License: mit
Created: 2020-03-31T20:55:44.000Z (about 5 years ago)
Default Branch: master
Last Pushed: 2022-12-08T07:26:54.000Z (over 2 years ago)
Last Synced: 2024-08-14T22:31:48.837Z (9 months ago)
Topics: gensim, nlp, spacy, springboard, textacy, topic-modeling
Language: Java
Homepage:
Size: 25.7 MB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 22
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

awesome-springboard-capstones - GitHub

README

# Springboard Capstone Project II: Topic Model Analysis of Berkshire Hathaway's Shareholder Letters

This project is an analysis of Berkshire Hathaway's annual letters using Natural Language Processing with Python. Approaches included three types of extractive summarization: [LexRank](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/LexRank_Summaries_summaries.txt), [TextRank](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/TextRank_Summaries_summaries.txt), and [Latent Semantic Analysis](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/LSA_Summaries_summaries.txt), as well as topic modeling using the Mallet wrapper from Gensim and a Java version of Mallet LDA.

If you plan to run this code, make sure to set the file locations and shortcuts for Mallet to your respective files on your computer, as otherwise, the code will not run on your computer. I would recommend not running the notebook to scrape the letters and just using the letters that come with it instead because usually, Berkshire's website denies me access from scraping multiple letters at once.

Also note that the final topics change from run to run, even with the same random seed. The final topic distributions between the notebook and the report differ slightly.

[Notebook to Scrape the Letters](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Notebooks/Final%20Version/Scraping_Letters.ipynb)

[Notebook for Everything Else](https://nbviewer.jupyter.org/github/toshimelonhead/Springboard-Berkshire/blob/e0c3270166722a21765e415b4de800396537ec99/Notebooks/Final%20Version/Final_Version.ipynb)

[Final Writeup](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Reports/Final%20Paper%202.0.pdf)

[Final Presentation](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Reports/Final%20Presentation.pdf)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/tomhalloin/Springboard-Berkshire

Awesome Lists containing this project

README