{"id":23347019,"url":"https://github.com/robintw/pandas-fsdr","last_synced_at":"2025-04-07T17:26:57.159Z","repository":{"id":137660710,"uuid":"213611158","full_name":"robintw/pandas-FSDR","owner":"robintw","description":null,"archived":false,"fork":false,"pushed_at":"2023-09-12T20:26:23.000Z","size":5,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-04-01T15:57:18.903Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/robintw.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-10-08T10:16:00.000Z","updated_at":"2023-09-13T11:49:52.000Z","dependencies_parsed_at":null,"dependency_job_id":"f3252ca1-554f-4b75-9a72-42b123fe88b5","html_url":"https://github.com/robintw/pandas-FSDR","commit_stats":{"total_commits":6,"total_committers":1,"mean_commits":6.0,"dds":0.0,"last_synced_commit":"4e2567ea59c37a919030c55da25e7caa0fcd342b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robintw%2Fpandas-FSDR","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robintw%2Fpandas-FSDR/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robintw%2Fpandas-FSDR/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/robintw%2Fpandas-FSDR/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/robintw","download_url":"https://codeload.github.com/robintw/pandas-FSDR/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247696128,"owners_count":20980967,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-21T07:17:28.194Z","updated_at":"2025-04-07T17:26:57.141Z","avatar_url":"https://github.com/robintw.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Pandas Find Significantly Different Rows (pandas-FSDR)\n\nThis is a relatively simple function that finds rows in a pandas\nDataFrame where the values in two columns are significantly\ndifferent. Significantly different in this context means that the\nvalues have a relative or absolute difference greater than a\nthreshold. By default this function returns a chunk of\nhuman-readable text describing the differences.\n\nFor example, given a DataFrame of:\n\n```\n              UK  World\nBiology      50     40\nGeography    75     80\nComputing   100     50\nMaths      1500   1600\n```\n\nand a relative difference threshold of 30% (the default) and an absolute\ndifference threshold of 75, this function would return the following text:\n\n  - Maths is significantly smaller for UK (1500 for UK compared to\n  1600 for World)\n  - Computing is significantly larger for UK (100 for UK compared to\n  50 for World)\n\nThis output is easily configurable using the various parameters\n(described below).\n\nI wrote this when doing some data analysis for a client many years ago, and thought it would be worth sharing.\n\n### Example\n```\n# Set up some sample data\ndf = pd.DataFrame({'UK':[50, 75, 100, 1500],\n                   'World': [40, 80, 50, 1600]},\n                   index=['Biology', 'Geography', 'Computing', 'Maths'])\nprint(df)\n\nresult = FSDR(df, 'UK', 'World', rel_thresh=30, abs_thresh=75)\n\nprint(result)\n\n# Prints:\n#\n# - **Maths** is significantly __smaller__ for UK (1500 for UK compared to 1600 for World)\n# - **Computing** is significantly __larger__ for UK (100 for UK compared to 50 for World)\n#\n\n```\n\n### Full documentation\n\nThere is just one function, called `FSDR` with the parameters below:\n\n  - `df`: the pandas DataFrame to process\n  - `main_col`: the column to compare values in. Percentage differences will be\n    calculated relative to this column.\n  - `other_col`: the column to compare values to (ie. the 'other column' to\n  main_col)\n  - `rel_thresh`: the threshold above which a relative difference is\n  considered significant, in percent.\n    To set no relative threshold set to None. (Default: 30)\n  - `abs_thresh`: the threshold above which an absolute difference is\n  considered significant. To set no\n    absolute threshold set to None. (Default: None)\n  - `return_text`: Set to True to return human-readable text descriptions of\n  the significant differences.\n    If False then return a list of row index values for rows which have\n    significant differences. (Default: True)\n  - `markdown`: Set to True to return Markdown formatted text, wrapped in a\n  Markdown display object for display in the Jupyter notebook. If False,\n  returns plain text. (Default: True)\n  - `value_suffix`: A string to be appended to each value in the resulting\n  text output. For example, allows\n    all values to be followed by % if they are percentages. (Default: '')\n  - `comparison_text_larger`: A string to be used in the text output when\n  describing a value that is larger than another value. (Default: 'larger')\n  - `comparison_text_smaller`: A string to be used in the text output when\n  describing a value that is smaller than another value. (Default: 'smaller')\n  - `value_format_str`: A format string used to format values when included\n  in the text output. For example: '.2f' for floating point numbers with\n  two decimal places. (Default: '')\n  - `intro_text`: Text to be included before the list of significant\n  differences. For proper markdown formatting this should end in '\\n\\n'.\n  (Default: '')\n  - `min_value`: Minimum value (of either main_col or other_col) to be used\n  when calculating significant differences. All rows with values lower\n  than this will be excluded from all calculations. Set to None to disable\n  minimum value checking. (Default: None)\n\nReturns:\n    A chunk of human-readable text describing significant differences in\n    the DataFrame (by default, if `return_text` is True). Otherwise, return\n    a list of index values for rows where there are significant\n    differences.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobintw%2Fpandas-fsdr","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Frobintw%2Fpandas-fsdr","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Frobintw%2Fpandas-fsdr/lists"}