https://github.com/datastaxdevs/mini-demo-astradb-glean
Demo showing how to index AstraDB data into Glean
https://github.com/datastaxdevs/mini-demo-astradb-glean
Last synced: about 1 year ago
JSON representation
Demo showing how to index AstraDB data into Glean
- Host: GitHub
- URL: https://github.com/datastaxdevs/mini-demo-astradb-glean
- Owner: datastaxdevs
- Created: 2024-09-16T14:24:28.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2024-09-16T20:59:49.000Z (over 1 year ago)
- Last Synced: 2024-09-18T00:15:14.227Z (over 1 year ago)
- Language: Jupyter Notebook
- Size: 87.5 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# mini-demo-astradb-glean
Demo showing how to index AstraDB data into Glean
You can follow this tutorial fully in a google collab or follow the instructions below to run locally
## Work in a Collab
[](https://colab.research.google.com/github/datastaxdevs/mini-demo-astradb-glean/blob/main/AstraDB_Glean_Integration.ipynb)
## Run Locally
[](#)
### 1.1 Setup AstraDB
ℹ️ [Astra Reference documentation](https://docs.datastax.com/en/astra-db-serverless/databases/create-database.html)
`✅ 1.1.a`: Create an Astra ACCOUNT
Access [https://astra.datastax.com](https://astra.datastax.com) and register with `Google` or `Github` account.

`✅ 1.1.b`: Create an Astra Database
Get to the databases dashboard (by clicking on Databases in the left-hand navigation bar, expanding it if necessary), and click the `[Create Database]` button on the right.

- **ℹ️ Fields Description**
| Field | Description |
|--------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| **Vector Database vs Serverless Database** | Choose `Vector Database` In june 2023, Cassandra introduced the support of vector search to enable Generative AI use cases. |
| **Database name** | It does not need to be unique, is not used to initialize a connection, and is only a label (keep it between 2 and 50 characters). It is recommended to have a database for each of your applications. The free tier is limited to 5 databases. |
| **Cloud Provider** | Choose whatever you like. Click a cloud provider logo, pick an Area in the list and finally pick a region. We recommend choosing a region that is closest to you to reduce latency. In free tier, there is very little difference. |
| **Cloud Region** | Pick region close to you available for selected cloud provider and your plan.
If all fields are filled properly, clicking the "Create Database" button will start the process.

It should take a couple of minutes for your database to become `Active`.

`✅ 1.1.c`: Create an Astra TOKEN
To connect to your database, you need the API Endpoint and a token. The api endpoint is available on the database screen, there is a little icon to copy the URL in your clipboard. (it should look like `https://-.apps.astra.datastax.com`).

To get a token click the `[Generate Token]` button on the right. It will generate a token that you can copy to your clipboard.
## 2. Installation
### 2.1 Python Environment
- `✅ 2.1.a`: Create and activate a virtual environment
```console
python3 -m venv venv
```
_macOS_
```
source venv/bin/activate
```
_Windows_
```
venv\Scripts\activate
```
- `✅ 2.1.b`:Install the dependencies
```console
pip install astrapy==1.4.1 --no-deps
pip install -r requirements.txt
```
- `✅ 2.1.c`: Edit `.env`
_Copy `.env.example` as `.env`_
```ini
# Astra Configuration
export ASTRA_DB_APPLICATION_TOKEN=
export ASTRA_DB_API_ENDPOINT=
export ASTRA_DB_COLLECTION_NAME="plain_collection"
# Glean Configuration
export GLEAN_CUSTOMER=
export GLEAN_DATASOURCE_NAME=
export GLEAN_API_TOKEN=
```
- `✅ 2.1.d`:Run the script
```console
python3 astra-glean-import-job.py
```