https://github.com/arne-cl/pandas-gotchas

List of gotchas in Pandas (the Python data analysis library).
https://github.com/arne-cl/pandas-gotchas

list pandas python

Last synced: 5 months ago
JSON representation

List of gotchas in Pandas (the Python data analysis library).

Host: GitHub
URL: https://github.com/arne-cl/pandas-gotchas
Owner: arne-cl
Created: 2020-09-02T07:57:20.000Z (almost 5 years ago)
Default Branch: master
Last Pushed: 2021-11-10T10:06:42.000Z (over 3 years ago)
Last Synced: 2025-01-06T09:12:24.345Z (6 months ago)
Topics: list, pandas, python
Homepage:
Size: 7.81 KB
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        # pandas-gotchas

This is a list of gotchas I found in [Pandas](https://pandas.pydata.org/) (the Python data analysis library).

## grouping / aggregation

- [Dropping nuisance columns in groupby is a nuisance #21664](https://github.com/pandas-dev/pandas/issues/21664)

  - Pandas silently drops a column if the chosen aggregation method doesn't work on it.

- [pandas GroupBy columns with NaN (missing) values](https://stackoverflow.com/questions/18429491/pandas-groupby-columns-with-nan-missing-values)

  - Pandas silently drops rows when grouping by a column that contains `NaN`.

  - You can avoid this behaviour by using .`groupby(..., dropna=False)`.

  

## membership in series

- [How to determine whether a Pandas Column contains a particular value](https://stackoverflow.com/questions/21319929/how-to-determine-whether-a-pandas-column-contains-a-particular-value)

  - `x in series` tells you if `x` is in the `index` of `series`

  - use `x in series.values` to check if `x` is in the actual `series`

## filter series / column by substring

To check which elements of a column start with the prefix `field_`,  

run `df.my_column.str.startswith('field_')`. To avoid the error  

`ValueError: Cannot mask with non-boolean array containing NA / NaN values`,  

simply add `na=False` (which will ignore NA values):

```

df.my_column.str.startswith('field_', na=False)

```

## joining / merging

- [values in a Pandas index column **do not have to be unique**](https://stackoverflow.com/questions/20199129/pandas-get-duplicated-indexes/52449411) (unlike values in a PRIMARY_KEY column in SQL)

  - If you do a LEFT JOIN on two tables, you expect the result to have as many rows as the left table.

  - In Pandas, for a `.join()` or `.merge()` to work the same way, you have to remove duplicate rows,

    e.g. by calling `df_right.drop_duplicates()` before `pd.merge(df_left, df_right, on='common_column_name', how='left')`.

# See also

[Prabhant Sing. Gotchas of Pandas (Pydata Delhi).](https://github.com/prabhant/Talk-Pandas-Gotchas/blob/master/Pandas%20Gotchas.ipynb)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arne-cl/pandas-gotchas

Awesome Lists containing this project

README