https://github.com/chris-santiago/bookmarks-topics

Using unsupervised learning and language modeling to cluster and reorganize web bookmarks.
https://github.com/chris-santiago/bookmarks-topics

bert-embeddings bertopic bookmarks clustering generative-modeling hdbscan hydra llm openai taskfile umap unsupervised-learning

Last synced: about 2 months ago
JSON representation

Using unsupervised learning and language modeling to cluster and reorganize web bookmarks.

Host: GitHub
URL: https://github.com/chris-santiago/bookmarks-topics
Owner: chris-santiago
Created: 2024-11-21T03:44:08.000Z (6 months ago)
Default Branch: main
Last Pushed: 2024-12-06T02:18:54.000Z (6 months ago)
Last Synced: 2025-03-20T08:48:06.541Z (2 months ago)
Topics: bert-embeddings, bertopic, bookmarks, clustering, generative-modeling, hdbscan, hydra, llm, openai, taskfile, umap, unsupervised-learning
Language: Jupyter Notebook
Homepage:
Size: 290 KB
Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# bookmarks-topics

This project is a continuation of the stale [bookmarks_clustering](https://github.com/chris-santiago/bookmarks_clustering) project. It's updated to use newer embedding and generative models, mostly via [BERTopic](https://maartengr.github.io/BERTopic/index.html) library.

# Usage

## Prerequisites

1. This project uses [Task](https://taskfile.dev/) to run and manage tasks, so you'll need to first install that on your machine.
2. This project uses OpenAI's API. You'll need an API key from OpenAI; place it in a `.env` file within this project's root directory. The key should be `OPENAI_KEY` and the value is your API key. For example:

```toml
OPENAI_KEY=sk-proj-_mySuperSecretOpenAIkey
```

3. Export your bookmarks to an HTML file. *Note: this project used Google Chrome bookmarks.*

## Setup

Clone this repo and install the project and dependencies:

```bash
git clone https://github.com/chris-santiago/bookmarks-topics.git
cd bookmarks-topics
conda env create -f environment.yaml
pip install .
```

## Quick Start

Once you've completed the prerequisites and setup the project environment, you can run the entire pipeline using the command:

```bash
task cluster-bookmarks -- "bookmarks.input_path=your/path/to/bookmarks.html"
```

This will parse your bookmarks file and fetch content from all the bookmarked URLs, before running the clustering algorithm. **You may not want to organize ALL of your bookmarks, but rather a subset.** In this case, you can pass a comma-separated list of specific folders:

```bash
task cluster-bookmarks -- "bookmarks.input_path=your/path/to/bookmarks.html" "bookmarks.folders=[My first folder,My second folder]"
```

Once complete, your re-organized bookmarks are placed within a newly-created `ouputs/topics/` directory, within this project's root directory. That directory is organized by date and time; find the folder that corresponds with your most recent run and import the `new_bookmarks.html` file back into your browser. You can also view a breakdown of bookmarks and topics in the `bookmarks_topics.json` file, within that same directory.

**Note**: If you haven't added `task` to your PATH then you can replace that command with `./bin/task`

### Example Output

#### HTML

```html

Bookmarks

JavaScript D3.js

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/chris-santiago/bookmarks-topics

Awesome Lists containing this project

README

Bookmarks

JavaScript D3.js