Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/vk/mongodf

A mongoDB to pandas DataFrame converter with a pandas filter style
https://github.com/vk/mongodf

Last synced: about 2 months ago
JSON representation

A mongoDB to pandas DataFrame converter with a pandas filter style

Host: GitHub
URL: https://github.com/vk/mongodf
Owner: VK
Created: 2021-12-22T22:02:54.000Z (about 3 years ago)
Default Branch: main
Last Pushed: 2024-09-16T14:30:16.000Z (4 months ago)
Last Synced: 2024-10-29T20:57:13.487Z (2 months ago)
Language: Python
Size: 2.73 MB
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 9
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # MongoDf

[![Python Package](https://github.com/VK/mongodf/actions/workflows/python-publish.yml/badge.svg)](https://github.com/VK/mongodf/actions/workflows/python-publish.yml)

[![PyPI](https://img.shields.io/pypi/v/mongodf?logo=pypi)](https://pypi.org/project/mongodf)

[![Documentation](https://github.com/VK/mongodf/workflows/Documentation/badge.svg)](https://vk.github.io/mongodf)

A mongoDB to pandas DataFrame converter with a pandas filter style.

## Install

```

pip install mongodf

```

## Filter Example

```python

import mongodf

import pymongo

mongo = pymongo.MongoClient("mongodb://mongo:27017")

# create a dataframe from a mongoDB collection

df = mongodf.from_mongo(mongo, "DB", "Collection")

# filter values

df = df[(df["colA"] == "Test") & (df.ColB.isin([1, 2]))]

# filter columns

df = df[["colA", "colC"]]

# compute a pandas.DataFrame

df.compute()

```

|   | colA  | colC |

|---| ----- | ---- |

|0  | Test  |  NaN |

|1  | Test  |   12 |

## Cache Example

```

import plotly.express as px

df = px.data.gapminder()

cache = MongoDFCache(

    host="mongodb://mongo:27017",

    database="mongodfcache",

    expire_after_seconds=20,

)

# put the dataframe into the mongo cache

# the name can be auto generated, array_group can be a list of cols

id = cache.cache_dataframe(df, "test_df", array_group=True)

# get a mongodf without reading all the data

cdf = cache.get_dataframe(id)

# get the metadata and the content of the dataframe

gcdf.get_meta()

gcdf.compute()

```