Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/juancarlospaco/faster-than-csv

Faster CSV for Python
https://github.com/juancarlospaco/faster-than-csv

csv csv-data csv-parser csv-parsing csv-to-html csv-to-json cython faster-than-csv process-csv python python3 speed speedup static-memory-allocation static-typing tabular-data tsv tsv-parser type-safe

Last synced: 3 months ago
JSON representation

Faster CSV for Python

Awesome Lists containing this project

README

        

# Faster-than-CSV

[![Benchmark Results](https://raw.githubusercontent.com/juancarlospaco/faster-than-csv/master/results_graph.png "Benchmark Results")](https://youtu.be/QiKwnlyhKrk?t=5)

![](https://img.shields.io/github/languages/top/juancarlospaco/faster-than-csv?style=for-the-badge)
![](https://img.shields.io/github/languages/count/juancarlospaco/faster-than-csv?logoColor=green&style=for-the-badge)
![](https://img.shields.io/github/stars/juancarlospaco/faster-than-csv?style=for-the-badge "Star faster-than-csv on GitHub!")
![](https://img.shields.io/maintenance/yes/2022?style=for-the-badge)
![](https://img.shields.io/github/languages/code-size/juancarlospaco/faster-than-csv?style=for-the-badge)
![](https://img.shields.io/github/issues-raw/juancarlospaco/faster-than-csv?style=for-the-badge "Bugs")
![](https://img.shields.io/github/issues-pr-raw/juancarlospaco/faster-than-csv?style=for-the-badge "PRs")
![](https://img.shields.io/github/commit-activity/y/juancarlospaco/faster-than-csv?style=for-the-badge)
![](https://img.shields.io/github/last-commit/juancarlospaco/faster-than-csv?style=for-the-badge "Commits")

| Library | Time (Speed) |
|-------------------------------|--------------|
| Pandas `read_csv()` | `20.09` |
| NumPy `fromfile()` | `3.88` |
| NumPy `genfromtxt()` | `4.00` |
| NumPy `loadtxt()` | `1.26` |
| csv (std lib) | `0.40` |
| csv (list) | `0.38` |
| csv (map) | `0.37` |
| Faster_than_csv | `0.08` |

- This CSV Lib is ~300 Lines of Code.

- Benchmarks run on Docker from Dockerfile on this repo.
- Speed is IRL time to complete 10000 CSV Parsings.
- Lines Of Code counted using [CLOC](https://github.com/AlDanial/cloc).
- Direct dependencies of the package when ready to run.
- Benchmarks run on Docker from Dockerfile on this repo.
- Stats as of year 2021.
- x86_64 64Bit AMD, SSD, Arch Artix Linux.

# Use

```python
import faster_than_csv as csv

csv.csv2list("example.csv") # See Docs for more info.
# Custom Separators supported.
csv.csv2json("example.csv", indentation=4) # CSV to JSON, Pretty-Printed.

csv.csv2htmltable("example.csv") # CSV to HTML+CSS Table (No JavaScript).

csv.read_clipboard() # CSV from the Clipboard.

csv.diff_csvs("example.csv", "anotherfile.csv") # Diff optimized for CSVs.
```
- Input: CSV, TSV, Clipboard, File, URL, Custom.
- Output: CSV, TSV, HTML, JSON, NDJSON, Diff, File, Custom.

# csv2dict()

**Description:**
Takes a path of a CSV file string, process CSV and returns a list of dictionaries.
This is very similar to `pandas.read_csv(filename)`.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:**
Data from the CSV, `dict` type.

# csv2list()

**Description:**
Takes a path of a CSV file string, process CSV and returns a list.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:**
Data from the CSV, `list` type.

# read_clipboard()

**Description:**
Reads CSV string from Clipboard, process CSV and returns a list of dictionaries.
This is very similar to `pandas.read_clipboard()`. This works on Linux, Mac, Windows.

**Arguments:**

- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:**
Data from the CSV, `dict` type.

# csv2json()

**Description:**
Takes a path of a CSV file string, process CSV and returns JSON.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.
- `indentation` Pretty-Printed or Minified JSON output, `int` type, optional, `0` is Minified, `4` is Pretty-Printed, you can use any integer to adjust the indentation.

**Returns:**
Data from the CSV as JSON Minified Single-line string computer-friendly, `str` type.

# csv2ndjson()

**Description:**
Takes a path of a CSV file string, process CSV and returns NDJSON.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `ndjson_file_path` path of the NDJSON file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:** None.
Data from the CSV as NDJSON https://github.com/ndjson/ndjson-spec, `str` type.

# csv2htmltable()

**Description:**
Takes a path of a CSV file string, process CSV and returns the data rendered on HTML Table.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string, defaults to `""`, if its empty string then No file is written.
- `html_file_path` path of the CSV file, `str` type, optional, can be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.
- `header_html` HTML Header, `str` type, optional, defaults to Bulma CSS, can be empty string.

**Returns:**
Data from the CSV as HTML Table, `str` type, raw HTML (no style at all).

# csv2karax()

![](https://user-images.githubusercontent.com/22755228/117183486-482b2a00-ade0-11eb-88e6-d8eeb28951ca.png)

**Description:**
Takes a path of a CSV file string, process CSV and returns the data rendered as a [Karax](https://github.com/karaxnim/karax#karax) HTML Table.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:** Karax DSL, `str` type.

# csv2terminal()

**Description:**
Takes a path of a CSV file string, process CSV and prints to terminal a colored prety-printed table.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string, defaults to `""`, if its empty string then No file is written.
- `column_width` column width of the wider column, required, `int` type, must not be `0`, must not be negative.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:** None.

# csv2xml()

**Description:**
Takes a path of a CSV file string, process CSV and returns a Valid XML string.
Output is guaranteed to be always Valid XML.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator` Separator character of the CSV data, `str` type, optional, defaults to `','`, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.
- `header_xml` XML Header of the XML string, `str` type, optional, can be empty string, defaults to `"\n"`.

**Returns:** XML, `str` type.

# tsv2csv()

**Description:**
Takes a path of a CSV file string, process CSV and returns a TSV.

**Arguments:**
- `csv_file_path` path of the CSV file, `str` type, required, must not be empty string.
- `separator1` Separator character of the CSV data, `str` type, optional, must not be empty string.
- `separator2` Separator character of the CSV data, `str` type, optional, must not be empty string.
- `quote` Quote character of the CSV data, `str` type, optional, defaults to `'"'`, must not be empty string.

**Returns:**
Data from the CSV as TSV, `str` type.

# diff_csvs()

**Description:**
Takes 2 paths of 2 CSV files, process CSV and returns the Diff of the 2 CSV.

**Arguments:**
- `csv_file_path0` path of the CSV file, `str` type, required, must not be empty string, file must exist.
- `csv_file_path1` path of the CSV file, `str` type, required, must not be empty string, file must exist.

**Returns:** Diff.

[**For more Examples check the Examples and Tests.**](https://github.com/juancarlospaco/faster-than-csv/blob/master/examples/example.py)

Instead of having a pair of functions with a lot of arguments that you should provide to make it work,
we have tiny functions with very few arguments that do one thing and do it as fast as possible.

# Install

- `pip install faster_than_csv`

# Docker

- Make a quick test drive on Docker!.

```bash
$ ./build-docker.sh
$ ./run-docker.sh
$ ./run-benchmark.sh # Inside Docker.
```

# Dependencies

- **None**

# Platforms

- ✅ Linux
- ✅ Windows
- ✅ Mac
- ✅ Android
- ✅ Raspberry Pi
- ✅ BSD

# Requisites

- Python 3.6+ 64Bit.

# Windows

- If installation fails on Windows, just use the Source Code:

![win-compile](https://user-images.githubusercontent.com/1189414/63147831-b8bf6100-bfd5-11e9-9e6e-91d61040f139.png "Git Clone and Compile on Windows 10 with only Git and Nim installed, just 2 commands!")

- Git Clone and Compile on Windows 10 on just 2 commands!.
- [Alternatively you can try Docker for Windows.](https://docs.docker.com/docker-for-windows)
- [Alternatively you can try WSL for Windows.](https://docs.microsoft.com/en-us/windows/wsl/about)
- **The file extension must be `.pyd`, NOT `.dll`.**

# Stars

![Star faster-than-csv on GitHub](https://starchart.cc/juancarlospaco/faster-than-csv.svg "Star faster-than-csvon GitHub!")

# Contributors

- [SekouDiaoNlp](https://github.com/SekouDiaoNlp)

# FAQ

- Whats the idea, inspiration, reason, etc ?.

[Feel free to Fork, Clone, Download, Improve, Reimplement, Play with this Open Source. Make it 10 times faster, 10 times smaller.](http://tonsky.me/blog/disenchantment)

- This requires Cython ?.

No.

- This runs on PyPy ?.

No.

- This runs on Python2 ?.

I dunno. (Not supported)

- How can I Install it ?.

https://github.com/juancarlospaco/faster-than-csv/releases

If you dont understand how to install it, you can just download, extract, put the files on the same folder as your `*.py` file and you are good to go.

- How can be faster than NumPy ?.

I dunno.

- How can be faster than Pandas ?.

I dunno.

- Why needs 64Bit ?.

Maybe it works on 32Bit, but is not supported, integer sizes are too small, and performance can be worse.

- Why needs Python 3 ?.

Maybe it works on Python 2, but is not supported, and performance can be worse, we suggest to migrate to Python3.

- Can I wrap the functions on a `try: except:` block ?.

Functions do not have internal `try: except:` blocks,
so you can wrap them inside `try: except:` blocks if you need very resilient code.

- PIP fails to install or fails build the wheel ?.

Add at the end of the PIP install command:

` --isolated --disable-pip-version-check --no-cache-dir --no-binary :all: `

Not my Bug.

- How to Build the project ?.

`build.sh`

- How to Package the project ?.

`package.sh`

- This requires Nimble ?.

No.

- Whats the unit of measurement for speed ?.

Unmmodified raw output of Python `timeit` module.

Please send Pull Request to Python to improve the output of `timeit`.

# Send Crypto, request features, donate today


Bitcoin BTC

**BEP20 Binance Smart Chain Network BSC**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**BTC Bitcoin Network**
```
1Pnf45MgGgY32X4KDNJbutnpx96E4FxqVi
```


Ethereum ETH Dai DAI Uniswap UNI Axie Infinity AXS Smooth Love Potion SLP

**BEP20 Binance Smart Chain Network BSC**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**ERC20 Ethereum Network**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```


Tether USDT

**BEP20 Binance Smart Chain Network BSC**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**ERC20 Ethereum Network**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**TRC20 Tron Network**
```
TWGft53WgWvH2mnqR8ZUXq1GD8M4gZ4Yfu
```


Solana SOL

**BEP20 Binance Smart Chain Network BSC**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**SOL Solana Network**
```
FKaPSd8kTUpH7Q76d77toy1jjPGpZSxR4xbhQHyCMSGq
```


Cardano ADA

**BEP20 Binance Smart Chain Network BSC**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```
**ADA Cardano Network**
```
DdzFFzCqrht9Y1r4Yx7ouqG9yJNWeXFt69xavLdaeXdu4cQi2yXgNWagzh52o9k9YRh3ussHnBnDrg7v7W2hSXWXfBhbo2ooUKRFMieM
```


Sandbox SAND Decentraland MANA

**ERC20 Ethereum Network**
```
0xb78c4cf63274bb22f83481986157d234105ac17e
```


Algorand ALGO

**ALGO Algorand Network**
```
WM54DHVZQIQDVTHMPOH6FEZ4U2AU3OBPGAFTHSCYWMFE7ETKCUUOYAW24Q
```


Binance

https://pay.binance.com/en/checkout/e92e536210fd4f62b426ea7ee65b49c3