https://github.com/vianneymi/monggregate

Library to make MongoDB aggregation framework and pipelines easy to use in python.
https://github.com/vianneymi/monggregate

aggregation-framework aggregation-pipeline data-science data-wrangling database mongodb nosql pandas pydantic pymongo query-builder query-engine

Last synced: 7 months ago
JSON representation

Library to make MongoDB aggregation framework and pipelines easy to use in python.

Host: GitHub
URL: https://github.com/vianneymi/monggregate
Owner: VianneyMI
License: mit
Created: 2022-09-14T20:50:54.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2025-06-25T21:47:27.000Z (8 months ago)
Last Synced: 2025-07-04T17:13:06.848Z (7 months ago)
Topics: aggregation-framework, aggregation-pipeline, data-science, data-wrangling, database, mongodb, nosql, pandas, pydantic, pymongo, query-builder, query-engine
Language: Python
Homepage: https://vianneymi.github.io/monggregate/
Size: 1.67 MB
Stars: 21
Watchers: 2
Forks: 4
Open Issues: 41
Metadata Files:
- Readme: readme.md
- Contributing: docs/contributing.md
- License: LICENSE

Awesome Lists containing this project

README

          # 📊 **Monggregate**

## 📋 **Overview**

Monggregate is a library that aims at simplifying usage of MongoDB aggregation pipelines in Python.

It's a lightweight QueryBuilder for MongoDB aggregation pipelines based on [pydantic](https://docs.pydantic.dev/latest/) and compatible with all mongodb drivers and ODMs.

### ✨ **Features**

- 🔄 Provides an Object Oriented Programming (OOP) interface to the aggregation pipeline.

- 🎯 Allows you to focus on your requirements rather than MongoDB syntax.

- 📚 Integrates all the MongoDB documentation and allows you to quickly refer to it without having to navigate to the website.

- 🔍 Enables autocompletion on the various MongoDB features.

- 🔗 Offers a pandas-style way to chain operations on data.

- 💻 Mimics the syntax of your favorite tools like pandas

## 📥 **Installation**

> 💡 The package is available on PyPI:

```shell

pip install monggregate

```

## 🚀 **Usage**

> 📘 The below examples reference the MongoDB sample_mflix database

### 🔰 **Basic Pipeline usage**

```python

import os

from dotenv import load_dotenv 

import pymongo

from monggregate import Pipeline, S

# Creating connexion string securely

# You need to create a .env file with your password

load_dotenv(verbose=True)

MONGODB_URI = os.environ["MONGODB_URI"] 

# Connect to your MongoDB cluster:

client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:

db = client["sample_mflix"]

# Creating the pipeline

pipeline = Pipeline()

# The below pipeline will return the most recent movie with the title "A Star is Born"

pipeline.match(

    title="A Star Is Born"

).sort(

    by="year"

).limit(

    value=1

)

# Executing the pipeline

curosr = db["movies"].aggregate(pipeline.export())

# Printing the results

results = list(curosr)

print(results)

```

### 🌟 **Advanced Usage, with MongoDB Operators**

```python

import os

from dotenv import load_dotenv 

import pymongo

from monggregate import Pipeline, S

# Creating connexion string securely

load_dotenv(verbose=True)

MONGODB_URI = os.environ["MONGODB_URI"]

# Connect to your MongoDB cluster:

client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:

db = client["sample_mflix"]

# Creating the pipeline

pipeline = Pipeline()

pipeline.match(

    year=S.type_("number") # Filtering out documents where the year field is not a number

).group(

    by="year",

    query = {

        "movie_count":S.sum(1), # Aggregating the movies per year

        "movie_titles":S.push("$title")

    }

).sort(

    by="_id",

    descending=True

).limit(10)

# Executing the pipeline

cursor = db["movies"].aggregate(pipeline.export())

# Printing the results

results = list(cursor)

print(results)

```

### 🔥 **Even More Advanced Usage with Expressions**

```python

import os

from dotenv import load_dotenv 

import pymongo

from monggregate import Pipeline, S

# Creating connexion string securely

load_dotenv(verbose=True)

MONGODB_URI = os.environ["MONGODB_URI"] 

# Connect to your MongoDB cluster:

client = pymongo.MongoClient(MONGODB_URI)

# Get a reference to the "sample_mflix" database:

db = client["sample_mflix"]

# Using expressions

comments_count = S.size(S.comments)

# Creating the pipeline

pipeline = Pipeline()

pipeline.lookup(

    right="comments",

    right_on="movie_id",

    left_on="_id",

    name="comments"

).add_fields(

    comments_count=comments_count

).match(

    expression=comments_count>2

).limit(1)

# Executing the pipeline

cursor = db["movies"].aggregate(pipeline.export())

# Printing the results

results = list(cursor)

print(results)

```

## 🔍 **Going Further**

* 📚 Check out the [full documentation](https://vianneymi.github.io/monggregate/) for more examples.

* 📝 Check out this [medium article](https://medium.com/@vianney.mixtur_39698/mongo-db-aggregations-pipelines-made-easy-with-monggregate-680b322167d2).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/vianneymi/monggregate

Awesome Lists containing this project

README