https://github.com/arogozhnikov/sloths

when pandas is too much hassle
https://github.com/arogozhnikov/sloths

Last synced: 3 months ago
JSON representation

when pandas is too much hassle

Host: GitHub
URL: https://github.com/arogozhnikov/sloths
Owner: arogozhnikov
Created: 2023-11-16T07:22:38.000Z (over 1 year ago)
Default Branch: main
Last Pushed: 2024-01-11T01:00:28.000Z (over 1 year ago)
Last Synced: 2025-02-08T09:46:52.293Z (4 months ago)
Size: 7.81 KB
Stars: 1
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # sloths

when pandas is too much hassle

Sloths package targets a scenario when you want a dict of dicts.

Or a dict of dicts of dicts of lists. 

In 'normal' python, after a couple of levels this turns into a mess very quickly.

**Comment from later:** I realized that a lot of functionality is missing in this concept 

to support my critical scenarios, so I'll be reworking the concept

**before**

```python

# this one is ugly

year2company2employee2diplomas = defaultdict(lambda: defaultdict(lambda : defaultdict(list)))

# this one is more or less ok

for year in years:

    for company in companies:

        for name, surname, diploma in get_diplomas_for_year_and_compan(year, company)

            year2company2employee2diplomas[year][company][name, surname].append(diploma)

# when we need to iterate the data, it is just terrible

for year, company2employee2diplomas in year2company2employee2diplomas.items():

    for company, employee2diplomas in company2employee2diplomas.items():

        for (name, surname), diplomas in employee2diplomas.items():

            for diploma in diplomas:

                finally_we_can_do_something(year, company, name, surname, diploma)

# that's specially terrible if e.g. we only needed a list of all achievemnts for a company.

```

Now, pandas does not help much with data until you completely collected it. Appending data to pandas on-the-go is quite a bad idea.

(and has other issues like auto-conversion of types, which you don't want to happen to the data without seeing the effect).

Sloth essentially works as a universal storage, where you can throw data to change its shape later.

**after**

```python

sloth = Sloth()

sloth[year][company].append_at((name, surname), diploma)

for company, diplomas in sloth.iterate('year:company:name surname:[diploma] -> company [diploma]'):

    print(f'{company} has in total {len(diplomas)}')

# and that's it, diplomas are grouped by company

```

Nested collections are created automatically, and a list is also created automatically (since we pointed at this by using `append_at`).

There is no need to think about this forward anymore.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arogozhnikov/sloths

Awesome Lists containing this project

README