Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/toshimelonhead/Springboard-Berkshire

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)
https://github.com/toshimelonhead/Springboard-Berkshire

gensim nlp spacy springboard textacy topic-modeling

Last synced: about 1 month ago
JSON representation

Topic model analysis of Berkshire Hathaway annual letters (Completed Capstone Project #2)

Awesome Lists containing this project

README

        

# Springboard Capstone Project II: Topic Model Analysis of Berkshire Hathaway's Shareholder Letters

This project is an analysis of Berkshire Hathaway's annual letters using Natural Language Processing with Python. Approaches included three types of extractive summarization: [LexRank](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/LexRank_Summaries_summaries.txt), [TextRank](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/TextRank_Summaries_summaries.txt), and [Latent Semantic Analysis](https://raw.githubusercontent.com/toshimelonhead/Springboard-Berkshire/master/Outputs/Summaries/LSA_Summaries_summaries.txt), as well as topic modeling using the Mallet wrapper from Gensim and a Java version of Mallet LDA.

If you plan to run this code, make sure to set the file locations and shortcuts for Mallet to your respective files on your computer, as otherwise, the code will not run on your computer. I would recommend not running the notebook to scrape the letters and just using the letters that come with it instead because usually, Berkshire's website denies me access from scraping multiple letters at once.

Also note that the final topics change from run to run, even with the same random seed. The final topic distributions between the notebook and the report differ slightly.

[Notebook to Scrape the Letters](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Notebooks/Final%20Version/Scraping_Letters.ipynb)

[Notebook for Everything Else](https://nbviewer.jupyter.org/github/toshimelonhead/Springboard-Berkshire/blob/e0c3270166722a21765e415b4de800396537ec99/Notebooks/Final%20Version/Final_Version.ipynb)

[Final Writeup](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Reports/Final%20Paper%202.0.pdf)

[Final Presentation](https://github.com/toshimelonhead/Springboard-Berkshire/blob/master/Reports/Final%20Presentation.pdf)