https://github.com/raphaelsty/knowledge
Open-source personal bookmarks search engine
https://github.com/raphaelsty/knowledge
bookmarks github hacker-news knowledge-base search-engine twitter zotero
Last synced: about 1 month ago
JSON representation
Open-source personal bookmarks search engine
- Host: GitHub
- URL: https://github.com/raphaelsty/knowledge
- Owner: raphaelsty
- License: gpl-3.0
- Created: 2023-02-22T22:45:41.000Z (over 2 years ago)
- Default Branch: main
- Last Pushed: 2025-03-17T00:12:28.000Z (8 months ago)
- Last Synced: 2025-03-17T07:54:13.715Z (8 months ago)
- Topics: bookmarks, github, hacker-news, knowledge-base, search-engine, twitter, zotero
- Language: Python
- Homepage: https://raphaelsty.github.io/knowledge/
- Size: 11.1 GB
- Stars: 615
- Watchers: 5
- Forks: 31
- Open Issues: 1
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# Knowledge
**Knowledge** is a web application that automatically transforms the digital footprint into a personal search engine. It fetches content you interact with from various platforms—**GitHub**, **HackerNews**, and **Zotero**—and organizes it into a navigable knowledge graph.
---
## 🌟 Features
- **🤖 Automatic Aggregation:** Daily, automated extraction of GitHub stars, HackerNews upvotes, and Zotero library.
- **🔍 Powerful Search:** A built-in search engine to instantly find any item you've saved or interacted with.
- **🕸️ Knowledge Graph:** Navigate bookmarks through a graph of automatically extracted topics and their connections.
My Personal Knowledge Base is available at [raphaelsty.github.io/knowledge](https://raphaelsty.github.io/knowledge/).
---
## 🛠️ How It Works
A GitHub Actions workflow runs twice a day to perform the following tasks:
1. **Extracts Content** from specified accounts:
- GitHub Stars
- HackerNews Upvotes
- Zotero Records
2. **Processes and Stores Data** in the `database/` directory:
- `database.json`: Contains all the raw records.
- `triples.json`: Stores the knowledge graph data (topics and relationships).
- `retriever.pkl`: The serialized search engine model.
3. **Deploys Updates**:
- The backend API is automatically updated and pushed to the Fly.io instance.
- The frontend on GitHub Pages is refreshed with the latest data.
The backend is built with FastAPI and deployed on Fly.io, which offers a free tier suitable for this project. The frontend is a static site hosted on GitHub Pages. The search engine is powered by multiple [cherche](https://github.com/raphaelsty/cherche) lexical models and features a final [pylate-rs](https://github.com/lightonai/pylate-rs) model, which is compiled from Rust to WebAssembly (WASM) to run directly in the client's browser.
## 🚀 Getting Started: Installation & Deployment
Follow these steps to deploy your own instance of Knowledge.
### 1\. Fork & Clone
First, fork this repository to your own GitHub account and then clone it to your local machine.
### 2\. Configuration
#### A. Configure Secrets
The application requires API keys and credentials to function. These must be set as **Repository secrets** in your forked repository's settings (`Settings` > `Secrets and variables` > `Actions`).
Secret
Service
Required
Description
FLY_API_TOKEN
Fly.io
Yes
Allows the GitHub Action to deploy your application. See the Fly.io section for instructions.
ZOTERO_API_KEY
Zotero Settings
Optional
An API key to access your Zotero library.
ZOTERO_LIBRARY_ID
Zotero
Optional
The ID of the Zotero group library you want to index.
HACKERNEWS_USERNAME
Hacker News
Optional
HackerNews username to fetch upvoted posts.
HACKERNEWS_PASSWORD
Hacker News
Optional
HackerNews password.
#### B. Specify Sources
Next, edit the `sources.yml` file at the root of the repository to specify which GitHub users' starred repositories you want to track.
```yml
github:
- "raphaelsty"
- "gbolmier"
- "MaxHalford"
```
### 3\. Deployment
#### A. Deploy the API to Fly.io
1. **Install `flyctl`**, the Fly.io command-line tool. Instructions can be found [here](https://fly.io/docs/hands-on/install-flyctl/).
2. **Sign up and log in** to Fly.io via the command line:
```sh
flyctl auth signup
flyctl auth login
```
3. **Get API token** and add it to your GitHub repository secrets as `FLY_API_TOKEN`:
```sh
flyctl auth token
```
4. **Launch the app.** Follow the [Fly.io launch documentation](https://fly.io/docs/hands-on/launch-app/). This will generate a `fly.toml` file. You won't need a database.
> ⚠️ **Update API URLs**
> After deploying, you must replace all instances of `https://knowledge.fly.dev` in the `docs/index.html` file with your own Fly.io app URL (e.g., `https://app_name.fly.dev`).
#### B. Set up GitHub Pages
1. Go to your forked repository's settings (`Settings` > `Pages`).
2. Under `Build and deployment`, select the **Source** as `Deploy from a branch` and choose the `main` branch with the `/docs` folder.
> ⚠️ **Update CORS Origins**
> After your GitHub Pages site is live, you must add its URL to the `origins` list in the `api/api.py` file to allow cross-origin requests.
```python
origins = [
"https://your-github-username.github.io", # Add your GitHub Pages URL here
]
```
---
## 💸 Cost Management
This project is designed to be affordable, but you are responsible for the costs incurred on Fly.io. Here is how to keep them in check:
> ⚠️ **Bound Fly.io Concurrency**
> To prevent costs from scaling unexpectedly, define connection limits in the `fly.toml` file.
```toml
[services.concurrency]
hard_limit = 6
soft_limit = 3
type = "connections"
```
> ⚠️ **Select a modest Fly.io VM**
> A small virtual machine is sufficient. A **shared-cpu-1x@1024MB** is a good starting point.
---
## 💻 Local Development
To run the API on local machine for development, simply run the following command from the root of the repository:
```sh
make launch
```
---
## 🔌 Zotero Integration
The Zotero integration allows you to save academic papers, articles, and other documents, which will then be automatically indexed by your search engine.
- **Browser Extension:** Use the Zotero Connector extension for your browser to easily save documents from the web.
- **Mobile App:** The Zotero mobile app lets you add documents on the go. Any uploads will be indexed within a few hours.
---
## 💡 Acknowledgements
My personal Knowledge Base is inspired by and extracts resources from the Knowledge Base of François-Paul Servant, namely [Semanlink](http://www.semanlink.net/sl/home).
## 📜 License
This project is licensed under the **GNU General Public License v3.0**.
Knowledge Copyright (C) 2023 Raphaël Sourty