An open API service indexing awesome lists of open source software.

https://github.com/joeywhelan/unstructured-es

Demonstration of unstructured data search with Elastic + Jina
https://github.com/joeywhelan/unstructured-es

elasticsearch jinaai python unstructured-data

Last synced: 22 days ago
JSON representation

Demonstration of unstructured data search with Elastic + Jina

Awesome Lists containing this project

README

          

# Unstructured Data Search with Elastic + Jina
## Contents
1. [Summary](#summary)
2. [Architecture](#architecture)
3. [Features](#features)
4. [Prerequisites](#prerequisites)
5. [Installation](#installation)
6. [Usage](#usage)

## Summary
This is a demonstration of various search scenarios against technical product manuals using Elasticsearch and Jina models.

## Architecture
![architecture](assets/arch.png)

## Features
- Jupyter notebook
- Builds an Elastic Serverless deployment via Terraform
- Creates a data set from iFixit technical manuals.
- Utilizes the Jina Reader to parse the tech manual contents.
- Utilizes the Jina embeddings v5 model to embed the manual content.
- Performs four different search scenarios that demonstrate the enhanced search capabilities
- Deletes the entire deployment via Terraform

## Prerequisites
- terraform
- Elastic Cloud account and API key
- Jina API key
- Python

## Installation
- Edit the terraform.tfvars.sample and rename to terraform.tfvars
- Create a Python virtual environment

## Usage
- Execute notebook