Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/dexplo/dexplo
Data exploration library with a pandas-like API
https://github.com/dexplo/dexplo
Last synced: 4 days ago
JSON representation
Data exploration library with a pandas-like API
- Host: GitHub
- URL: https://github.com/dexplo/dexplo
- Owner: dexplo
- License: bsd-3-clause
- Created: 2018-01-25T19:08:59.000Z (almost 7 years ago)
- Default Branch: master
- Last Pushed: 2020-06-16T12:57:14.000Z (over 4 years ago)
- Last Synced: 2024-09-19T01:39:45.594Z (about 2 months ago)
- Language: Python
- Size: 1.02 MB
- Stars: 74
- Watchers: 8
- Forks: 7
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
- awesome-dataframes - dexplo - Data exploration library with a pandas-like API. (Libraries)
README
# dexplo
[![](https://img.shields.io/pypi/v/dexplo)](https://pypi.org/project/dexplo)
[![Build Status](https://travis-ci.org/dexplo/dexplo.svg?branch=master)](https://travis-ci.org/dexplo/dexplo)
[![PyPI - License](https://img.shields.io/pypi/l/dexplo)](LICENSE)A data analysis library comparable to pandas
# Installation
You must have cython installed. Run `python setup.py build_ext --use-cython -i`# Main Goals
* A minimal set of features
* Be as explicit as possible
* There should be one-- and preferably only one --obvious way to do it.### Data Structures
* Only DataFrames
* No Series### Only Scalar Data Types
All data types allow nulls
- [x] bool - always 8 bits
- [x] int
- [x] float
- [x] str - stored as a categorical
- [x] datetime
- [x] timedelta### Column Labels
* No hierarchical index
* Column names must be strings
* Column names must be unique### Row Labels
* No row labels for now
* Only a number display on the output### Subset Selection
* Only one way to select data - `[ ]`
* Subset selection will be explicit and necessitate both rows and columns
* Rows will be selected only by integer location
* Columns will be selected by either label or integer location. Since columns must be strings, this will not be amibguous
* Slice notation is also OK### Development
* Must use type hints
* Must use 3.6+ - fstrings
* numpy### Advantages over pandas
* Easier to write idiomatically
* String processing will be much faster
* Nulls allowed in each data type
* Nearly all operations will be faster## API
#### Attributes
- [x] size
- [x] shape
- [x] values
- [x] dtypes#### Methods
**Stats**
- [x] abs
- [x] all
- [x] any
- [x] argmax
- [x] argmin
- [x] clip
- [x] corr
- [x] count
- [x] cov
- [x] cummax
- [x] cummin
- [x] cumprod
- [x] cumsum
- [x] describe
- [x] max
- [x] min
- [x] median
- [x] mean
- [x] mode
- [x] nlargest
- [x] nsmallest
- [x] prod
- [x] quantile
- [x] rank
- [x] round
- [x] std
- [x] streak
- [x] sum
- [x] var
- [x] unique
- [x] nunique
- [x] value_counts**Selection**
- [x] drop
- [x] head
- [x] isin
- [x] rename
- [x] sample
- [x] select_dtypes
- [x] tail
- [x] where**Missing Data**
- [x] isna
- [x] dropna
- [x] fillna
- [ ] interpolate**Other**
- [x] append
- [x] astype
- [x] factorize
- [x] groupby
- [x] iterrows
- [ ] join
- [x] melt
- [x] pivot
- [x] replace
- [x] rolling
- [x] sort_values
- [x] to_csv**Other (after 0.1 release)**
- [ ] cut
- [ ] plot
- [ ] profile**Functions**
- [x] read_csv
- [ ] read_sql
- [ ] concat**Group By** - specifically with `groupby` method
- [x] agg
- [x] all
- [x] apply
- [x] any
- [x] corr
- [x] count
- [x] cov
- [x] cumcount
- [x] cummax
- [x] cummin
- [x] cumsum
- [x] cumprod
- [x] head
- [x] first
- [ ] fillna
- [x] filter
- [x] last
- [x] max
- [x] median
- [x] min
- [x] ngroups
- [x] nunique
- [x] prod
- [ ] quantile
- [ ] rank
- [ ] rolling
- [x] size
- [x] sum
- [x] tail
- [x] var**str** - `df.str.`
- [x] capitalize
- [x] cat
- [x] center
- [x] contains
- [x] count
- [x] endswith
- [x] find
- [x] findall
- [x] get
- [x] get_dummies
- [x] isalnum
- [x] isalpha
- [x] isdecimal
- [x] isdigit
- [x] islower
- [x] isnumeric
- [x] isspace
- [x] istitle
- [x] isupper
- [x] join
- [x] len
- [x] ljust
- [x] lower
- [x] lstrip
- [x] partition
- [x] repeat
- [x] replace
- [x] rfind
- [x] rjust
- [x] rpartition
- [x] rsplit
- [x] rstrip
- [x] slice
- [x] slice_replace
- [x] split
- [x] startswith
- [x] strip
- [x] swapcase
- [x] title
- [x] translate
- [x] upper
- [x] wrap
- [x] zfill**dt** - `df.dt.`
- [x] ceil
- [x] day
- [x] day_of_week
- [x] day_of_year
- [x] days_in_month
- [x] floor
- [ ] freq
- [x] hour
- [x] is_leap_year
- [x] is_month_end
- [x] is_month_start
- [x] is_quarter_end
- [x] is_quarter_start
- [x] is_year_end
- [x] is_year_start
- [x] microsecond
- [x] millisecond
- [x] minute
- [x] month
- [x] nanosecond
- [x] quarter
- [x] round
- [x] second
- [x] strftime
- [x] to_pydatetime
- [x] to_pytime
- [ ] tz
- [ ] tz_convert
- [ ] tz_localize
- [x] weekday_name
- [x] week_of_year
- [x] year**td** - `df.td.`
- [ ] ceil
- [ ] components
- [x] days
- [ ] floor
- [ ] freq
- [x] microseconds
- [x] milliseconds
- [x] nanoseconds
- [ ] round
- [x] seconds
- [ ] to_pytimedelta