https://github.com/brunoarine/org-similarity
Emacs package that helps org-mode users (re)discover similar documents
https://github.com/brunoarine/org-similarity
bm25 elisp emacs org-mode org-roam python semantic-similarity similarity-search tf-idf
Last synced: 5 months ago
JSON representation
Emacs package that helps org-mode users (re)discover similar documents
- Host: GitHub
- URL: https://github.com/brunoarine/org-similarity
- Owner: brunoarine
- License: gpl-3.0
- Created: 2020-11-26T00:28:26.000Z (almost 5 years ago)
- Default Branch: main
- Last Pushed: 2024-02-14T14:07:56.000Z (over 1 year ago)
- Last Synced: 2025-05-05T06:16:49.458Z (5 months ago)
- Topics: bm25, elisp, emacs, org-mode, org-roam, python, semantic-similarity, similarity-search, tf-idf
- Language: Emacs Lisp
- Homepage:
- Size: 438 KB
- Stars: 90
- Watchers: 4
- Forks: 12
- Open Issues: 4
-
Metadata Files:
- Readme: README.org
- License: LICENSE
Awesome Lists containing this project
- awesome-org-roam - org-similarity - A package to help Emacs org-mode users discover similar or related files. Under the hood, it uses Python and scikit-learn for text feature extraction, and nltk for text pre-processing. (Tools)
README
#+TITLE: org-similarity
#+HTML:![]()
=org-similarity= is a package to help Emacs [[https://orgmode.org][org-mode]] users (re)discover similar documents in relation to the current buffer or to an input query using lexical similarity calculation algorithms like TF-IDF and BM25.
#+ATTR_HTML: :style margin-left: auto; margin-right: auto;
[[./assets/example.gif]]** Table of contents :TOC:
- [[#introduction][Introduction]]
- [[#pre-requisites][Pre-requisites]]
- [[#installation][Installation]]
- [[#loading-the-package][Loading the package]]
- [[#usage][Usage]]
- [[#configuration][Configuration]]** Introduction
A good note-taking system should help you connect your ideas. But what if you have many years worth of notes in your computer, and you can't remember the older ones?
That's why I've written =org-similarity=. As an avid =org-roam= user with more than a thousand notes, I can't count on my brain to recall them all.
=org-similarity= makes it easier to retrieve those long forgotten files that are possibly buried in everyone's piles of notes. It's different from a simple keyword search. This package is levered by [[https://github.com/brunoarine/findlike]], which looks at the words and phrases used in your reference document or a specific question you have, and then finds other files that use similar language. It’s a bit like doing a Google search across your own files.
** Pre-requisites
- Emacs 27.1+
- Python 3.6+
- Git 2.23+ (for manual installation)** Installation
You can install =org-similarity= using =straing.el=, its Doom Emacs wrapper, or manually cloning the repository. In either case, you can choose to use the =main= branch for a stable version of the repository, =develop= for the latest additions, or ={RELEASE_VERSION}= for a specific release version (e.g. =v1.0.0=). Just replace the =branch= property accordingly in the instructions below.
*Using straight.el*
Add the following to your =init.el=:
#+begin_src elisp
(straight-use-package '(org-similarity :type git :host github :repo
"brunoarine/org-similarity" :branch "main"))
#+end_src*Doom Emacs*
Add the following to =packages.el=:
#+begin_src elisp
(package! org-similarity :recipe (:host github :repo "brunoarine/org-similarity"
:branch "main"))
#+end_src*Manual installation*
Clone the repository:
#+begin_src sh
git clone -b main https://github.com/brunoarine/org-similarity
#+end_srcThen add the package to your =load-path= in your =init.el=:
#+begin_src elisp
(add-to-list 'load-path "/path/to/org-similarity")
#+end_src** Loading the package
Load the package using the =require= function in your =init.el=:
#+begin_src elisp
(require 'org-similarity)
#+end_srcOr using =use-package=:
#+begin_src elisp
(use-package org-similarity
:load-path "path/to/org-similarity")
#+end_src** Usage
Upon running the commands below for the first time, =org-similarity=
will ask you to install the necessary Python dependencies (if they are not found in the configured environment---see [[#configuration][Configuration]] for more info). By pressing =y= to continue, it will create a virtual environment under the hood and download the requirements automatically. Once finished, you can close the installation log buffer, and run the command again.| Command | Description |
|----------------------------------+---------------------------------------------------------------------------------------|
| =M-x org-similarity-insert-list= | Inserts a list of files that are similar to the current buffer at the end of it. |
| =M-x org-similarity-sidebuffer= | Shows a list of files that are similar to the current buffer in a side buffer. |
| =M-x org-similarity-query= | Lets the user input a query and shows a list of related documents in the side buffer. |** Configuration
There are a few variables that can be set to customize how =org-similarity= operates and generates the list of similar documents:
#+begin_src elisp
;; Directory to scan for possibly similar documents.
;; org-roam users might want to change it to `org-roam-directory'.
(setq org-similarity-directory org-directory);; Filename extension to scan for similar text. By default, it will
;; only scan Org-mode files, but you can change it to scan other
;; kind of files as well.
(setq org-similarity-file-extension-pattern "*.org");; Changing this value will impact stopwords filtering and word stemmer.
;; The following languages are supported: Arabic, Danish, Dutch, English, Finnish,
;; French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian,
;; Spanish and Swedish.
(setq org-similarity-language "english");; Algorithm to use when generating the scores list. The possible choices are
;; `tfidf' or `bm25'. Default is `tfidf' and it generally works better in
;; most cases. However, `bm25' may be a bit more robust in rare cases, depending
;; on the size of your notes.
(setq org-similarity-algorithm "tfidf");; How many similar entries to list at the end of the buffer.
(setq org-similarity-number-of-documents 10);; Minimum document size (in number of characters) to be included in the corpus.
;; It includes every character, including the file properties drawer.
;; Default is 0 (include all documents, even empty ones).
(setq org-similarity-min-chars 0);; Whether to prepend the list entries with similarity scores.
(setq org-similarity-show-scores nil);; Similarity score threshold. All results with a similarity score below this
;; value will be omitted from the final list.
;; Default is 0.05.
(setq org-similarity-threshold 0.05);; Whether the resulting list of similar documents will point to ID property or
;; filename. Default is nil.
;; However, I recommend setting it to `t' if you use `org-roam' v2.
(setq org-similarity-use-id-links nil);; Scan for files inside `org-similarity-directory' recursively.
(setq org-similarity-recursive-search nil);; Filepath to a custom Python interpreter (e.g. '/path/to/venv/bin/python'
;; If the package's requirements aren't met, `org-similarity' will try to
;; install or upgrade them automatically. If `nil', the package will create
;; and use a virtual environment in the same directory where `org-similarity'
;; is located (usually `~/.emacs.d/.local' if you installed via a package
;; manager, or in the path where you cloned this repo and loaded the package
;; manually).
(setq org-similarity-custom-python-interpreter nil);; Remove first result from the scores list. Useful if the current buffer is
;; saved in the searched directory, and you don't want to see it included
;; in the list. Default is nil."
(setq org-similarity-remove-first nil);; Text to show in the list heading. You can set it to "" if you
;; wish to hide the heading altogether.
(setq org-similarity-heading "** Related notes");; String to prepend the list items. You can set it to "* " to turn each
;; item into org headings, or "- " to turn them into an unordered org list.
;; Set the variable to "" to hide prefixes.
(setq org-similarity-prefix "- ");; Ignore org front-matter when calculating similarity scores. This option can
;; be useful if you think the results are inappropriately biased due to the
;; presence of some values in the the Org files' Properties drawer, like
;; filetags or categories.
(setq org-similarity-ignore-frontmatter nil)
#+end_src