https://github.com/r2d4/blog-embeddings
Script to generate embeddings from a blog and use GPT-3.5 to categorize the embedding space
https://github.com/r2d4/blog-embeddings
bert-embeddings embeddings gpt t-sne
Last synced: 3 months ago
JSON representation
Script to generate embeddings from a blog and use GPT-3.5 to categorize the embedding space
- Host: GitHub
- URL: https://github.com/r2d4/blog-embeddings
- Owner: r2d4
- Created: 2023-05-28T05:46:13.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2024-01-02T21:17:28.000Z (almost 2 years ago)
- Last Synced: 2025-06-15T04:46:14.805Z (4 months ago)
- Topics: bert-embeddings, embeddings, gpt, t-sne
- Language: Jupyter Notebook
- Homepage: https://matt-rickard.com/two-years-of-daily-blogging
- Size: 2.32 MB
- Stars: 11
- Watchers: 2
- Forks: 1
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Blog Embeddings
An analysis of two years of daily blogging on [matt-rickard.com](https://matt-rickard.com).
1. I embedded all my posts using BERT (a transformers model pre-trained on a large corpus of English data). BERT uses 768-dimensional vectors.
2. Then I ran them through t-SNE (t-distributed stochastic neighbor embedding, a fancy way to visualize high-dimensional data by translating them to two dimensions.
3. Finally, I separated the two-dimensional space into equally sized bins and asked GPT-3.5 to develop a category name for each set of post titles.