Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/compscidr/ipfs_indexer
An ipfs indexer / search engine built in rust
https://github.com/compscidr/ipfs_indexer
hacktoberfest indexer ipfs rust rust-lang
Last synced: 2 months ago
JSON representation
An ipfs indexer / search engine built in rust
- Host: GitHub
- URL: https://github.com/compscidr/ipfs_indexer
- Owner: compscidr
- License: gpl-3.0
- Created: 2021-10-01T22:12:49.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2024-10-31T16:31:39.000Z (3 months ago)
- Last Synced: 2024-10-31T17:28:04.072Z (3 months ago)
- Topics: hacktoberfest, indexer, ipfs, rust, rust-lang
- Language: Rust
- Homepage:
- Size: 385 KB
- Stars: 4
- Watchers: 3
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# ipfs_indexer
[![.github/workflows/build-and-test.yml](https://github.com/compscidr/ipfs_indexer/actions/workflows/build-and-test.yml/badge.svg)](https://github.com/compscidr/ipfs_indexer/actions/workflows/build-and-test.yml)An ipfs indexer / search engine built in rust.
## Build notes
On Ubuntu you may need to apt install: `libssl-dev`.As of libp2p 0.44.0, it seems to require rust nightly: https://stackoverflow.com/questions/69848319/unable-to-specify-edition2021-in-order-to-use-unstable-packages-in-rust
Did init as a "binary" - not sure if this makes sense, or if other people think we should split this into a library
bit and an application bit. I suppose we can always change it later as it grows.Following this guide for libp2p:
https://github.com/libp2p/rust-libp2p/blob/master/src/tutorial.rsTrying to follow best practices from here:
https://doc.rust-lang.org/cargo/commands/cargo-init.htmlAdding Cargo.lock to version control - it *seems* like it might be best practice for a binary (app):
https://stackoverflow.com/questions/62861623/should-cargo-lock-be-committed-when-the-crate-is-both-a-rust-library-and-an-exec## Building on mac
Requirements: `xcode`
Run `xcode-select --install` if you do not have xcode installed, need to update xcode, or run into xcode related build errors
## Running with logging output
- Run `cargo build` to build
- Run `RUST_LOG=info ./target/debug/ipfs_indexer` to see logging output (adjust level accordingly)
- Run `RUST_LOG=info ./target/debug/ipfs_indexer 127.0.0.1:8080` to use your own ipfs gateway instead of ipfs.ioBy default runs an endpoing on `0.0.0.0:9090` so you can go to
- http://localhost:9090/status
- http://localhost:9090/enqueue/somecid
- http://localhost:9090/search/somequery## Running with docker
From the docker directory, run `docker-compose up`. Currently image is only ~26MB.## CI
I setup a workflow that should run a build at least on push, but doesnt run any tests because I have no idea how test
suites work yet for rust.## What Needs to be Done
- Discover content to be indexed, add them to the index queue
- [ ] Listen in on the gossip protocol **Jason working on**
- [X] Start from some collection of pages on ipfs.io/ipfs
- Implement an index queue processor
- [X] Fetch the ipfs content
- [X] Process the page for more ipfs links, Add those links into the index queue
- Index the pages somehow
- Ranked keywords by frequency or something?
- Need to update to support more than just html content (look at header and index files)
- Update the except to be flexible - for images, it could be a small crop render of the original image, for videos it could be a gif preview render
- Store the index somehow (start with in-memory, then figure out how to do storage later) - **Conor working on**
- A hashmap of map[keyword] -> sorted tree where the entries are sorted by keyword frequency and entries contain ipfs hash? - **Conor working on**
- Will probably want to think of ejection mechanism sooner than later so we can eject to storage (least recently used? oldest? who knows?)
- Farther out - need to think about how the store will be designed
- [X] Probably also want to store an excerpt, page title of the page to present to front-end
- [X] Implement a backend API which a future front-end can use, and in short term we can use to debug
- search -> search result
- ordered list of , possibly grouped by text, images, videos, other
- stats:
- indexed entries
- outstanding index queue
- memory used / free
- stoage used / free
- Implement a front-end which queries the index storage and displays the page title, ipfs/io/ipfs link to the page and excerpt
from the browser
- Feedback loop from what people click on more often to rank those higher
- [X] Might a docker container:
- [X] deploy will auto restart itself on crash (will also make it easy to see consumed memory with docker stats and other tools)
- [X] will be able to deploy with a local ipfs instance all ready to go within the container
- can artificially restrict memory so we can test things like ejection mechanisms
- Farther out -> hook into papertrail or some logging service so we can see what's up if it dies
- Tests! - **Conor working on**