https://github.com/anycable/uptriever

A CLI to upload documentation chunks to Trieve
https://github.com/anycable/uptriever

documentation trieveai

Last synced: over 1 year ago
JSON representation

A CLI to upload documentation chunks to Trieve

Host: GitHub
URL: https://github.com/anycable/uptriever
Owner: anycable
License: mit
Created: 2024-07-24T15:50:02.000Z (almost 2 years ago)
Default Branch: master
Last Pushed: 2025-01-03T20:27:07.000Z (over 1 year ago)
Last Synced: 2025-04-15T19:07:51.820Z (over 1 year ago)
Topics: documentation, trieveai
Language: Ruby
Homepage:
Size: 19.5 KB
Stars: 4
Watchers: 3
Forks: 0
Open Issues: 1
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE.txt

Awesome Lists containing this project

README

          [![Gem Version](https://badge.fury.io/rb/uptriever.svg)](https://rubygems.org/gems/uptriever)

[![Build](https://github.com/palkan/uptriever/workflows/Build/badge.svg)](https://github.com/palkan/uptriever/actions)

# Uptriever

Uptriever is a CLI to upload documentation source file (HTML, Markdown) to [Trieve][] for search indexing.

## Installation

Install Uptreiver as a Ruby gem (Ruby 3.1+ is required):

```sh

gem install uptriever

```

## Usage

Currently, Uptriever requires an index configuration file (`.trieve.yml`) to be present in the documentation root folder containing the list of files to index and their metadata. A minimal example of indexing everything looks as follows:

```yml

hostname: https://myproject.example/docs

pages:

  - "**/*.md"

```

The `hostname` field is used to generate the `link` property for chunks (see [Trieve API](https://docs.trieve.ai/api-reference/chunk/create-or-upsert-chunk-or-chunks)).

The `pages` field contains the list of pages to index. It supports glob patterns.

With config in place, you can run the `uptriever` executable to perform the indexing:

```sh

$ uptriever -d ./docs --api-key= --dataset=

Groups: |===========================|

Chunks: |===========================|

```

## Full-featured example

Why do we need a configuration file? To leverage Trieve features such as groups, tags, and weights. Here is a real-life example:

```yml

# Ignore patterns for globs in pages

ignore:

 - "**/*/Readme.md"

hostname: https://docs.anycable.io

# Prepend file paths with this prefix.

# Useful when you store documentation in multiple sources.

url_prefix: anycable-go/

# Make sure the following chunk groups are created

groups:

  - name: PRO version

    tracking_id: pro

  - name: Server

    tracking_id: server

  - name: Client

    tracking_id: client

  - name: Go package

    tracking_id: package

# Default metadata for pages (can be overriden)

defaults:

  groups: ["server"]

  tags: ["docs"]

pages:

  # You can use a dictionary to define source paths

  # along with metadata

  - source: "./apollo.md"

    groups: ["pro", "server"]

  - source: "./binary_formats.md"

    groups: ["pro", "server", "client"]

  - "./broadcasting.md"

  - "./broker.md"

  - "./health_checking.md"

  - "./instrumentation.md"

  - source: "./library.md"

    groups: ["package"]

  - "./pubsub.md"

  - source: "./js/**/*.md"

    groups: ["client"]

```

## Contributing

Bug reports and pull requests are welcome on GitHub at [https://github.com/palkan/uptriever](https://github.com/palkan/uptriever).

## Credits

This gem is generated via [`newgem` template](https://github.com/palkan/newgem) by [@palkan](https://github.com/palkan).

## License

The gem is available as open source under the terms of the [MIT License](http://opensource.org/licenses/MIT).

[Trieve]: https://trieve.ai

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/anycable/uptriever

Awesome Lists containing this project

README