An open API service indexing awesome lists of open source software.

https://github.com/autoparallel/learner

Making learning sh*t less annoying
https://github.com/autoparallel/learner

automation learning papers research

Last synced: 11 months ago
JSON representation

Making learning sh*t less annoying

Awesome Lists containing this project

README

          

# learner
*A Rust-powered academic research management system*

[![Library](https://img.shields.io/badge/lib-learner-blue)](https://crates.io/crates/learner)
[![Crates.io](https://img.shields.io/crates/v/learner)](https://crates.io/crates/learner)
[![docs.rs](https://img.shields.io/docsrs/learner)](https://docs.rs/learner)
  |  
[![CLI](https://img.shields.io/badge/cli-learnerd-blue)](https://crates.io/crates/learnerd)
[![Crates.io](https://img.shields.io/crates/v/learnerd)](https://crates.io/crates/learnerd)
[![CI](https://github.com/autoparallel/learner/actions/workflows/check.yaml/badge.svg)](https://github.com/autoparallel/learner/actions/workflows/check.yaml)
[![License](https://img.shields.io/crates/l/learner)](LICENSE)

learner header

[Features](#features) |
[Installation](#installation) |
[Usage](#usage) |
[Configuration](#configuration) |
[Roadmap](#roadmap) |
[Contributing](#contributing) |
[Development](#development) |
[SDK](#sdk) |
[License](#license) |
[Acknowledgements](#acknowledgements)

---
## Features

- Paper Metadata Management
- Support for arXiv, IACR, and DOI sources
- Automatic source detection from URLs or identifiers
- Full metadata extraction including authors and abstracts

- Local Database
- SQLite-based storage with full-text search
- Configurable document storage
- Platform-specific defaults

- Interactive Interfaces
- Terminal User Interface (TUI) with vim-style navigation
- Command-line interface (CLI) for scripting and automation with shell CLI completions
- Search, filter, and preview functionality
- Document management and viewing
- Daemon support for background operations

## Installation

### Library

```toml
[dependencies]
learner = { version = "*" } # Uses latest version
```

### CLI Tool

```bash
cargo +nightly install learnerd --features tui
```

This installs both the CLI tool and TUI interface, accessible via the `learner` command.

To obtain shell completions for `learner`:
```
# replace fish with your shell: bash, zsh or whatever
# then, move completions to somewhere reasonable, and source them from your shell setup config.
learner -g fish > learner_completions.fish
source learner_completions.fish
```

## Usage

### Library Usage

```rust
use learner::{Paper, Database};

#[tokio::main]
async fn main() -> Result> {
let db = Database::open(Database::default_path()).await?;

// Add papers from various sources
let paper = Paper::new("https://arxiv.org/abs/2301.07041").await?;
paper.save(&db).await?;

// Download associated document
let storage = Database::default_storage_path();
paper.download_pdf(&storage).await?;

Ok(())
}
```

### Command Line Interface

```bash
# Initialize database
learner init --default-retrievers

# Add papers
learner add 2301.07041
learner add "https://arxiv.org/abs/2301.07041" --pdf
learner add "10.1145/1327452.1327492" --no-pdf

# Search papers
learner search "quantum computing"
learner search "quantum" --author "Feynman" --detailed
learner search "neural" --source arxiv --before 2023

# Remove papers
learner remove "outdated paper"
learner remove "temp" --force --remove-pdf
```

### Terminal User Interface
If you install with
```
cargo install learnerd --features tui
```
you can get access to a Terminal User Interface (TUI). To launch the interactive TUI just do:
```bash
learner
```

TUI navigation:
- `↑`/`k`, `↓`/`j`: Navigate papers
- `←`/`h`, `→`/`l`: Switch panes
- `:`: Enter command mode
- `o`: Open selected PDF
- `q`: Quit

TUI commands:
```bash
:add # Add a paper
:remove # Remove paper(s)
:search # Search papers
```

(TODO:) Search within TUI supports all filters:
```bash
:search "quantum" --author "Feynman"
:search "neural" --source arxiv --before 2023
```

### System Daemon Management

`learnerd` can run as a background service for paper monitoring and updates.
Currently, there are no distinct processes it runs but there is a tracking issue: [issue #83](https://github.com/Autoparallel/learner/issues/83).

#### System Service
```bash
# Install and start
sudo learnerd daemon install
sudo systemctl enable --now learnerd # Linux
sudo launchctl load /Library/LaunchDaemons/learnerd.daemon.plist # macOS

# Remove
sudo learnerd daemon uninstall
```

#### Logs
- Linux: /var/log/learnerd/
- macOS: /Library/Logs/learnerd/

Files: `learnerd.log` (main, rotated daily), `stdout.log`, `stderr.log`

#### Troubleshooting

- **Permission Errors:** Check ownership of log directories
- **Won't Start:** Check system logs and remove stale PID file if present
- **Installation:** Run commands as root/sudo

## Configuration

The `learner` system uses a flexible configuration system that allows customization of paper sources, storage paths, and retrieval behavior.

### Default Locations

- **Config**:
- Linux: `~/.config/learner/config.toml`
- macOS: `~/Library/Application Support/learner/config.toml`
- Windows: `%APPDATA%\learner\config.toml`

- **Database**:
- Linux: `~/.local/share/learner/learner.db`
- macOS: `~/Library/Application Support/learner/learner.db`
- Windows: `%APPDATA%\learner\learner.db`

- **Papers**:
- Linux/macOS: `~/Documents/learner/papers`
- Windows: `Documents\learner\papers`

### Configuration File

The configuration file (`config.toml`) allows you to customize:
```toml
# Base configuration
[config]
database_path = "/custom/path/to/db.sqlite" # Where the datbase itself is stored
storage_path = "/custom/path/to/papers" # Where the documents are stored
retrievers_path = "/custom/path/to/papers" # Where configuration for retrievers are stored
```

### Adding Custom Sources

1. Create a source configuration in TOML:
```toml
[sources.new_source]
name = "New Paper Source"
base_url = "https://api.example.com"
pattern = "^PREFIX-\\d+$" # Regex for identifier validation
endpoint_template = "/api/v1/papers/{identifier}"
headers = { "API-Key" = "your-key" } # Optional headers

# For JSON responses
response_format = { type = "json" }
field_maps.title = { path = "data.title" }
field_maps.abstract = { path = "data.description" }
field_maps.pdf_url = {
path = "data.files.pdf",
transform = { type = "url", base = "https://cdn.example.com", suffix = ".pdf" }
}

# For XML responses
response_format = { type = "xml" }
field_maps.title = { path = "paper/title" }
field_maps.authors = { path = "paper/authors/author" }
```
Put this TOML configuration file in your `~/.learner/retrievers/` (or equivalent) directory.
Examples can be found in `crates/learner/config/retrievers/`.

### Source Requirements

Custom sources must provide:
1. A unique identifier pattern (regex)
2. An API endpoint that returns paper metadata
3. Field mappings for required metadata:
- Title
- Authors
- Abstract
- Publication date
- Optional: PDF URL, DOI

### Supported Response Formats

- **JSON**:
- Path-based field extraction
- Value transformations (dates, URLs)
- Array handling for authors/references

- **XML**:
- XPath-style field selection
- Namespace handling
- Multiple value aggregation

## Project Structure

1. `learner` - Core library
- Paper metadata extraction and management
- Database operations and search
- PDF handling and source-specific clients
- Error handling and type safety

2. `learnerd` - CLI application
- Paper and document management interface
- System daemon capabilities
- Logging and diagnostics

## Roadmap

- [ ] Generic LLM integration (similar to the configurable `Retriever` abstraction)
- [ ] RAG system
- [ ] Document version control and annotations
- [ ] Paper discovery and streaming
- [ ] Configurable daemon process (e.g., watch file system, RSS, automated LLM querying)
- [ ] REST API and Daemonize so `learner` can be a plugin with/for other apps (e.g., Raycast, Syncthing)
- [ ] Database improvements (more searchable fields, tags, organization)
- [ ] TUI improvements (organization, flexibility, in-terminal paper reading)
- [ ] Citation analysis and related works.

## Contributing

Contributions welcome! Please open an issue before making major changes.

### CI Workflow

Our automated pipeline ensures:

- Code Quality
- rustfmt and taplo for consistent formatting
- clippy for Rust best practices
- cargo-udeps for dependency management
- cargo-semver-checks for API compatibility

- Testing
- Full test suite across workspace and platforms

All checks must pass before merging pull requests.

## Development

This project uses [just](https://github.com/casey/just) as a command runner.

```bash
# Setup
cargo install just
just setup

# Common commands
just test # run tests
just fmt # format code
just ci # run all checks
just build-all # build all targets
```

> [!TIP]
> Running `just setup` and `just ci` locally is a quick way to get up to speed and see that the repo is working on your system!

## SDK
This repository now supplies a very basic SDK for validating a `Retriever` and a `Resource` TOML configurations.
To work with this SDK, use:
```
# Setup
just setup-sdk

# Validations
just validate-retriever # optionally supply url/identifer
just validate-resource

# Examples
just validate-retriever crates/learner/config/retrievers/arxiv.toml 2301.07041
just validate-resource crates/learner/config/resources/thesis.toml
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## Acknowledgments

- [arXiv API](https://arxiv.org/help/api/index) for paper metadata
- [IACR](https://eprint.iacr.org/) for cryptography papers
- [CrossRef](https://www.crossref.org/) for DOI resolution
- [SQLite](https://www.sqlite.org/) for local database support

---


Made for making learning sh*t less annoying.