Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/vk/mongodf
A mongoDB to pandas DataFrame converter with a pandas filter style
https://github.com/vk/mongodf
Last synced: about 2 months ago
JSON representation
A mongoDB to pandas DataFrame converter with a pandas filter style
- Host: GitHub
- URL: https://github.com/vk/mongodf
- Owner: VK
- Created: 2021-12-22T22:02:54.000Z (about 3 years ago)
- Default Branch: main
- Last Pushed: 2024-09-16T14:30:16.000Z (4 months ago)
- Last Synced: 2024-10-29T20:57:13.487Z (2 months ago)
- Language: Python
- Size: 2.73 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 9
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# MongoDf
[![Python Package](https://github.com/VK/mongodf/actions/workflows/python-publish.yml/badge.svg)](https://github.com/VK/mongodf/actions/workflows/python-publish.yml)
[![PyPI](https://img.shields.io/pypi/v/mongodf?logo=pypi)](https://pypi.org/project/mongodf)
[![Documentation](https://github.com/VK/mongodf/workflows/Documentation/badge.svg)](https://vk.github.io/mongodf)A mongoDB to pandas DataFrame converter with a pandas filter style.
## Install
```
pip install mongodf
```## Filter Example
```python
import mongodf
import pymongomongo = pymongo.MongoClient("mongodb://mongo:27017")
# create a dataframe from a mongoDB collection
df = mongodf.from_mongo(mongo, "DB", "Collection")# filter values
df = df[(df["colA"] == "Test") & (df.ColB.isin([1, 2]))]# filter columns
df = df[["colA", "colC"]]# compute a pandas.DataFrame
df.compute()
```| | colA | colC |
|---| ----- | ---- |
|0 | Test | NaN |
|1 | Test | 12 |## Cache Example
```
import plotly.express as px
df = px.data.gapminder()cache = MongoDFCache(
host="mongodb://mongo:27017",
database="mongodfcache",
expire_after_seconds=20,
)# put the dataframe into the mongo cache
# the name can be auto generated, array_group can be a list of cols
id = cache.cache_dataframe(df, "test_df", array_group=True)# get a mongodf without reading all the data
cdf = cache.get_dataframe(id)# get the metadata and the content of the dataframe
gcdf.get_meta()
gcdf.compute()```