https://github.com/imprv-ai/date-a-scientist
Query dataframes, find issue with your notebook snippets as if a professional data scientist was pair coding with you. Currently just a thin wrapper around an amazing library called pandas-ai by sinaptik-ai!
https://github.com/imprv-ai/date-a-scientist
data-science dataframes notebook-jupyter pandas pandas-ai
Last synced: 8 months ago
JSON representation
Query dataframes, find issue with your notebook snippets as if a professional data scientist was pair coding with you. Currently just a thin wrapper around an amazing library called pandas-ai by sinaptik-ai!
- Host: GitHub
- URL: https://github.com/imprv-ai/date-a-scientist
- Owner: imprv-ai
- License: mit
- Created: 2024-07-01T14:47:53.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2025-01-10T14:39:15.000Z (over 1 year ago)
- Last Synced: 2025-08-27T03:37:07.545Z (9 months ago)
- Topics: data-science, dataframes, notebook-jupyter, pandas, pandas-ai
- Language: Jupyter Notebook
- Homepage:
- Size: 748 KB
- Stars: 3
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README

-----------------
# date-a-scientist
Query dataframes, find issue with your notebook snippets as if a professional data scientist was pair coding with you.
Currently just a thin wrapper around an amazing library called `pandas-ai` by sinaptik-ai!
## How to use it?
```python
from date_a_scientist import DateAScientist
import pandas as pd
df = pd.DataFrame(
[
{"name": "Alice", "age": 25, "city": "New York"},
{"name": "Bob", "age": 30, "city": "Los Angeles"},
{"name": "Charlie", "age": 35, "city": "Chicago"},
]
)
ds = DateAScientist(
df=df,
llm_openai_api_token=..., # your OpenAI API token goes here
llm_model_name="gpt-3.5-turbo", # by default, it uses "gpt-4o"
)
# should return "Alice"
ds.chat("What is the name of the first person?")
```
Additionally we can pass a description of fields, so that more meaningful questions can be asked:
```python
ds = DateAScientist(
df=df,
llm_openai_api_token=..., # your OpenAI API token goes here
llm_model_name="gpt-3.5-turbo", # by default, it uses "gpt-4o"
column_descriptions={
"name": "The name of the person",
"age": "The age of the person",
"city": "The city where the person lives",
},
)
ds = DateAScientist(
df=df,
llm_openai_api_token=..., # your OpenAI API token goes here
llm_model_name="gpt-3.5-turbo", # by default, it uses "gpt-4o"
)
# should return DataFrame with Chicago rows
ds.chat("Who lives in Chicago?")
```
Finally if you want to get the code that was generated, you can use `ds.code()`:
```python
ds.code("Who lives in Chicago?")
```
which will return monokai styled code. If you want to return plain code, you can use:
```python
ds.code("Who lives in Chicago?", return_as_string=True)
```
## Inspirations
- https://github.com/sinaptik-ai/pandas-ai
- https://levelup.gitconnected.com/create-copilot-inside-your-notebooks-that-can-chat-with-graphs-write-code-and-more-e9390e2b9ed8