https://github.com/alexmolas/microsearch

Last synced: 5 months ago
JSON representation

Host: GitHub
URL: https://github.com/alexmolas/microsearch
Owner: alexmolas
License: mit
Created: 2024-02-05T15:15:00.000Z (almost 2 years ago)
Default Branch: main
Last Pushed: 2024-07-16T08:59:59.000Z (over 1 year ago)
Last Synced: 2024-08-14T07:08:02.686Z (over 1 year ago)
Language: Python
Size: 48.8 KB
Stars: 375
Watchers: 3
Forks: 33
Open Issues: 1
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

jimsghstars - alexmolas/microsearch - (Python)

README

# microsearch

`microsearch` is a minimal Python search engine designed for simplicity and efficiency. The project allows users to perform searches using Python, and it also provides an option to deploy a FastAPI app with an endpoint and a website for a user-friendly experience. It has been designed to provide users with a straightforward way to deploy their own search engine and search documents from their favorite blogs. The project includes a script for asynchronously downloading all the posts from a series of RSS feeds.

## Features:
- **Python Implementation**: `microsearch` is entirely implemented in Python, making it accessible and easy to understand for developers with varying levels of experience.

- **FastAPI App Deployment**: The project provides an option to deploy a FastAPI app, allowing users to interact with the search engine through a dedicated endpoint and a user-friendly website.

- **RSS Feed Crawling Script**: To populate the search engine with data, `microsearch` offers a script for asynchronously downloading posts from a series of RSS feeds. This feature ensures that users can conveniently aggregate content from their chosen blogs.

## Getting started

The first step is to download this repo

```bash
git clone https://github.com/alexmolas/microsearch.git
```

Then, I recommend you install everything in a virtual environment. I usually use `virtualenv` but any other environment manager should work.

```bash
virtualenv -p python3.10 venv
```

activate the environment

```bash
source venv/bin/activate
```

and install the package and the dependencies

```bash
pip install .
```

## Crawl data

Now we need to download the content of the blogs. I'm sharing [here](https://github.com/alexmolas/microsearch/blob/main/feeds.txt) a list of feed examples, but please feel free to use your own. To download the content do

```bash
python download_content.py --feed-path feeds.txt
```

## Launch app

Finally, once the content is crawled and stored you can run the app as

```bash
python -m app.app --data-path output.parquet
```

and if you navigate to [http://127.0.0.1:8000/](http://127.0.0.1:8000/) you'll be able to query the engine.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/alexmolas/microsearch

Awesome Lists containing this project

README