https://github.com/pncnmnp/bookmark-manager
NLP based approach to automatically categorize your bookmarks!
https://github.com/pncnmnp/bookmark-manager
bookmark-manager multinomial-naive-bayes
Last synced: 3 months ago
JSON representation
NLP based approach to automatically categorize your bookmarks!
- Host: GitHub
- URL: https://github.com/pncnmnp/bookmark-manager
- Owner: pncnmnp
- License: mit
- Created: 2019-08-18T11:09:43.000Z (about 6 years ago)
- Default Branch: master
- Last Pushed: 2022-08-28T03:08:13.000Z (about 3 years ago)
- Last Synced: 2023-03-03T13:05:47.254Z (over 2 years ago)
- Topics: bookmark-manager, multinomial-naive-bayes
- Language: Python
- Size: 642 KB
- Stars: 21
- Watchers: 1
- Forks: 6
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Bookmark-Manager
NLP based approach to automatically categorize your bookmarks!## In-Depth
To understand this project in-depth, refer to my technical paper: [Bookmark Classification using Multinomial Naive Bayes Model](https://pncnmnp.github.io/blogs/bookmark-classification.pdf)## How does this work ?
* Enter your bookmarks in **`./links.json`** file
* To run the code, run `categorize.py`
* `scrape_filter_link.py` contains the classes used to scrape information from each URL## What bookmarks is it categorizing ?
It can categorize a variety of bookmarks. Currently it supports all the categories mentioned in the `./corpus/` directory.## Will I have to enter my bookmark links manually ?
To a certain extent! For example: **Firefox** allows users to backup the bookmarks in a JSON format. You can extract the `uri` from that JSON file and feed it into `./links.json`.
To backup your bookmarks in **Firefox**, press Ctrl+Shift+O, go to `Import and Backup` and then to `Backup`.
**Chrome** users can check this post on [superuser](https://superuser.com/questions/325394/how-to-export-my-bookmarks-via-cli-in-google-chrome/1349857).## Will the code create a directory structure with my bookmarks ?
**No, the mapping of a URL with it's appropriate category is stored in a JSON file: `result.json`, in a *dict* format**.
The keys are your bookmarks with values being their categories.## Can I see a demo ?
Sure, here's one (The highlighted part is the one stored in `result.json`):
## Can I improve the corpus, by adding more categories in `./corpus/` directory ?
**Yes, you can! The code is fairly scalable.**
To add your own corpuses:
* Create a directory with a *unique* category name in `./corpus/`
* Inside the `./corpus/your-category-dir` add your corpus text in a **JSON** file with the format: `{"text": "_your_corpus_text_here_"}`(**NOTE:** You can add multiple JSON files in a category directory)
When you run the code, you will find that the `categorize.py` will take the new/modified corpuses into consideration.## License
The code is under MIT License