https://github.com/forgineer/pandas-liteql

A simple pandas extension that enables users to execute SQL statements against DataFrames using in-memory SQLite.
https://github.com/forgineer/pandas-liteql

data-engineering pandas pandas-dataframe pandas-extension python python3 sqlite

Last synced: about 1 month ago
JSON representation

A simple pandas extension that enables users to execute SQL statements against DataFrames using in-memory SQLite.

Host: GitHub
URL: https://github.com/forgineer/pandas-liteql
Owner: forgineer
License: mit
Created: 2023-09-06T00:48:34.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-05-20T04:55:00.000Z (about 1 year ago)
Last Synced: 2025-03-16T17:06:50.124Z (2 months ago)
Topics: data-engineering, pandas, pandas-dataframe, pandas-extension, python, python3, sqlite
Language: Python
Homepage:
Size: 107 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

        


    




---

# What is pandas-liteql?

**pandas-liteql** is a simple [pandas](https://pandas.pydata.org/) extension that enables users to execute SQL statements against DataFrames using in-memory [SQLite](https://www.sqlite.org/index.html). It is meant to streamline data manipulation and analysis tasks. For more detailed information and examples on **pandas-liteql**, visit the [documentation pages](https://forgineer.pythonanywhere.com/pandas-liteql).

# What pandas-liteql is not

**pandas-liteql** is not a competitor to libraries such as [PySpark](https://spark.apache.org/docs/latest/api/python/index.html) or [DuckDB](https://duckdb.org/) that can perform SQL queries on larger data sets and perform more advanced data science use-cases. Rather, it is inspired by those projects and similar libraries that have performed the same function, but have since been abandoned or were not as user-friendly.

# Installing pandas-liteql

**pandas-liteql** requires a minimum of Python 3.7 and the following libraries:

| Library    | Version      |

|------------|--------------|

| Pandas     | `>= 1.3.5`   |

| SQLAlchemy | `>= 1.4.36`  |

Assuming these prerequisites are already installed, adding **pandas-liteql** is as simple as...

```

pip install pandas-liteql

```

# Examples

Below are some usage examples to load, query, and drop data from the in-memory SQLite sessions established with **pandas-liteql** and pandas DataFrame integration.

## Loading

Start by loading your DataFrame with the `load` function. When **pandas-liteql** is imported, an in-memory SQLite session is created where data can be loaded to.

```python

import pandas as pd

from src import pandas_liteql as lql

# Data set creation

person_data = {

    'name': ['Bill', 'Ted', 'Abraham', 'Genghis', 'Napoleon'],

    'age': [25, 24, 56, 64, 51],

    'email': ['[email protected]', '[email protected]',

              '[email protected]', '[email protected]',

              '[email protected]']

}

# DataFrame creation

person_df = pd.DataFrame(data=person_data)

# Loading the DataFrame to in-memory SQLite as the 'person' table

# The 'person' variable is also a LiteQL class containing the table name and schema information

person = lql.load(df=person_df, table_name='person')

print(f'Table name: {person.name}')

print(person.schema)

```

Output:

```

Table name: person

    name    type  nullable default autoincrement  primary_key

0  index  BIGINT      True    None          auto            0

1   name    TEXT      True    None          auto            0

2    age  BIGINT      True    None          auto            0

3  email    TEXT      True    None          auto            0

```

## Querying

Next, query the table using the `query` function. Using SQL syntax, the loaded table can be queried and the results will be returned as a pandas DataFrame.

```python

bill_and_ted = lql.query(sql='SELECT * FROM person WHERE age < 30')

print(bill_and_ted)

```

Output:

```

   index  name  age               email

0      0  Bill   25  [email protected]

1      1   Ted   24   [email protected]

```

## Dropping

If finished with a table within the flow of a script, you can simply drop it with the `drop` function to preserve memory.

```python

lql.drop(table_name='person')

```

## The DataFrame SQL Accessor

Lastly, for a more simplistic approach, you can use the `liteql.sql` accessor to perform the same functions above in one line and return the result as a pandas DataFrame. This approach requires that you query from the `liteql` table that is loaded from the DataFrame, queried, and then dropped.

```python

import pandas as pd

import pandas_liteql as lql

# Data set creation

person_data = {

    'name': ['Bill', 'Ted', 'Abraham', 'Genghis', 'Napoleon'],

    'age': [25, 24, 56, 64, 51],

    'email': ['[email protected]', '[email protected]',

              '[email protected]', '[email protected]',

              '[email protected]']

}

# DataFrame creation

person_df = pd.DataFrame(data=person_data)

bill_and_ted = person_df.liteql.sql('SELECT * FROM liteql WHERE age < 30')

print(bill_and_ted)

```

Output:

```

   index  name  age               email

0      0  Bill   25  [email protected]

1      1   Ted   24   [email protected]

```

# Contributing

Currently, **pandas-liteql** will not be receiving any additional updates. Contributions will not be accepted here, but feel free to fork this project if you desire.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/forgineer/pandas-liteql

Awesome Lists containing this project

README