https://github.com/pangeacyber/pangea-multipass
Pangea Multipass is the authorization checker for systems like Google Workspace, Jira, and more.
https://github.com/pangeacyber/pangea-multipass
authorization llms rag
Last synced: 8 months ago
JSON representation
Pangea Multipass is the authorization checker for systems like Google Workspace, Jira, and more.
- Host: GitHub
- URL: https://github.com/pangeacyber/pangea-multipass
- Owner: pangeacyber
- License: mit
- Created: 2024-11-12T18:08:12.000Z (over 1 year ago)
- Default Branch: main
- Last Pushed: 2025-02-11T18:23:48.000Z (over 1 year ago)
- Last Synced: 2025-02-11T19:29:35.435Z (over 1 year ago)
- Topics: authorization, llms, rag
- Language: Python
- Homepage:
- Size: 676 KB
- Stars: 13
- Watchers: 7
- Forks: 1
- Open Issues: 2
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Pangea Multipass: Your Authorization Helper
Pangea Multipass is a Python library for checking user access to upstream data sources.
In practice, you can use it to check if a specific user has access to a file in a Google Drive, a ticket in Jira, or a page in Confluence. In concept, we've built this library to be extensible to eventually support Slack channels, GitHub repositories, Salesforce opportunities, and more.
We originally built this to support our customers' Retrieval-Augmented Generation (RAG) applications to mitigate data leaks. In a RAG architecture, the application inserts additional context at inference time. If you don't check the user's authorization to that context, you could inadvertently leak sensitive information.
While this is useful in AI/LLM apps, we've abstracted this to work independently so you can use it in any app.
Check out the `/examples` folder for AI-specific and generic examples.
## Features
- **Document Reading**: Supports document content extraction for use in processing and enrichment.
- **Metadata Enrichment**: Includes enrichers for hashing, constant value setting, and custom metadata.
- **Metadata Filtering**: Provides flexible operators to filter document metadata for customized queries.
- **Authorization Processing**: Manages authorized and unauthorized nodes with customizable node processors.
- **Extensible**: Built on abstract base classes, allowing easy extension and customization of functionality.
## Installation
To install `pangea-multipass`, you can use [Poetry](https://python-poetry.org/) for dependency management:
```bash
poetry add pangea-multipass
```
There are full runnable demos in the `pangea_multipass_lib\examples` directory but here are the key aspects.
Using a set of Google Drive credentials - following the steps in the llama_index_examples folder - you initialize the data source:
```python
gdrive_reader = GoogleDriveReader(
folder_id=gdrive_fid, token_path=admin_token_filepath, credentials_path=credentials_filepath
)
documents = gdrive_reader.load_data(folder_id=gdrive_fid)
```
This gives you a list of files. You can then use the processors to filter into the authorized and unauthorized resource lists:
```python
gdrive_processor = LlamaIndexGDriveProcessor(creds)
node_processor = NodePostprocessorMixer([gdrive_processor])
authorized_docs = node_processor.postprocess_nodes(documents)
unauthorized_docs = node_processor.get_unauthorized_nodes()
```
In general, the authorized list will be more important but you may notify an admin or log if a user is attempting to access a folder where they have limited access. It could be an attempt at data theft or their permissions are incomplete.
## Roadmap
At release, this library supports Google Workspace, Confluence, and Jira. For adding systems, our top priorities are:
- Box
- Dropbox
- Office 365
Others we plan to support or are looking for contributions are:
- Zoom
- Salesforce
- GitLab
- Zendesk
- Notion
- Sharepoint
- Asana
- Hubspot
Check out `EXTENDING.md` for the specific structure and requirements for extending Pangea Multipass for your data sources. Pull requests are welcome.