https://github.com/scottvr/fplit

A Python source code file-splitting tool that intelligently separates function calls from a single file into individual files while preserving setup context (global initialization, local prints, etc) .
https://github.com/scottvr/fplit
code demo file python splitter tests
Last synced: 12 days ago
JSON representation
Host: GitHub
URL: https://github.com/scottvr/fplit
Owner: scottvr
License: mit
Created: 2024-12-27T04:42:29.000Z (about 1 year ago)
Default Branch: main
Last Pushed: 2025-02-20T10:12:18.000Z (11 months ago)
Last Synced: 2025-02-20T10:35:33.607Z (11 months ago)
Topics: code, demo, file, python, splitter, tests
Language: Python
Homepage:
Size: 84 KB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # fplit

fplit reads in Python source files containing one or more function calls and intelligently splits them into separate, standalone script files (e.g., to use as unit tests or examples.) 

Necessary context such as imports, boilerplate setup code for well-known libraries, related statements such as print(), debug, comments,

and docstrings are detected via proximity and a string similarity threshold, and will be added as apropriate to the main execution block 

of each generated script file (one file per function).

If the generated files are not intended for execution (e.g., they are generated as docs or other reference purposes) the aforementioned 

context can be omitted by passing the --funcdefs-only option to fplit on the command-line.

The name 'fplit' is a combination of 'file' and 'split', with a typographical nod to the historical 'long s' (ſ) character; 'split' would have appeared as 'ſplit' in historical typography, which often looks like an 'f' to a modern day reader.

## Overview

`fplit` reads in Python source files containing one or more function _*calls*_ (typically in a `__main__` block or at module level) and splits them into separate, self-contained files. Necessary context for successful execution of the functions (such as imports, setup code, and related statements) is preserved from the source file and added to the main execution block of each generated script. 

Or, if the source contains function _*definitions*_, and the generated files are not intended for execution, this context can be skipped via the `--funcdefs-only` option.

### Key Features

- Splits Python files into function-call-specific demonstration files

  - alternatively, into function-specific files containing only the actual function definition (e.g., for reference/documentation purposes)

- Intelligently preserves setup code and configuration (or optionally, doesn't)

- Maintains imports and necessary context (or optionally, doesn't)

- Handles both explicit `__main__` blocks and module-level code

- Smart detection and inclusion of related print statements and comments (or optionally, not)

- Configurable pattern matching for setup code detection

## Command Line Options

```

usage: fplit.py [-h] [-o OUTPUT_DIR] [-v] [--wrap-main] [--no-setup]        

                [--list-patterns]

                [--disable-patterns PATTERN [PATTERN ...]]

                [--enable-patterns PATTERN [PATTERN ...]] [--show-setup]    

                [--patterns-dir PATTERNS_DIR] [--skip-user-patterns]        

                [--skip-project-patterns]

                source_file

positional arguments:

  source_file             Python source file to split

optional arguments:

  -h, --help              show help message and exit

  -o OUTPUT_DIR           output directory for split files (default: current directory)

  -v, --verbose           increase output verbosity (use -v or -vv)

  --wrap-main             always wrap code in __main__ blocks

  --no-setup              skip preservation of module-level setup code

  --funcdefs-only         extracts only function definitions to unique files

  --show-setup            show detected module-level setup code without splitting

  --list-patterns         list all available setup patterns

  --disable-patterns      disable specific setup patterns

  --enable-patterns       enable only specified patterns

  --patterns-dir          directory containing custom pattern definitions

  --skip-user-patterns    skip loading user pattern overrides

  --skip-project-patterns skip loading project-specific patterns

  --similarity-threshold  threshold for print statement similarity

```

## Usage

Basic usage:

```bash

python fplit.py demo.py                 # Split into current directory

python fplit.py demo.py -o output_dir   # Split into specified directory

python fplit.py demo.py -v              # Show progress

python fplit.py demo.py -vv             # Show detailed debug info

```

### Example - Function Call Extraction and Demonstration

The default mode of operation is to extract function *calls* into single-purpose fully-runnable scripts, each demonstrating one specific function

Given the following as input file `demo.py`:

```python

import logging

import matplotlib.pyplot as plt

import numpy as np

# Configure logging

logging.basicConfig(level=logging.INFO)

# Set plot style

plt.style.use('seaborn')

if __name__ == "__main__":

    # Demo the data processing

    data = process_data(sample_input)

    print("Data processed successfully")

    

    # Visualize results

    plot_results(data)

    print("Generated visualization")

```

Running:

```bash

python fplit.py demo.py

```

will create separate files for each function call, preserving necessary setup, comments, and print statements:

```python

### generated file: process_data_demo.py

import logging

import matplotlib.pyplot as plt

import numpy as np

# Configure logging

logging.basicConfig(level=logging.INFO)

if __name__ == "__main__":

    data = process_data(sample_input)

    print("Data processed successfully")

    exit(0)

```

``` python

### generated file: plot_results_demo.py

import logging

import matplotlib.pyplot as plt

import numpy as np

# Configure logging

logging.basicConfig(level=logging.INFO)

# Set plot style

plt.style.use('seaborn')

if __name__ == "__main__":

    plot_results(data)

    print("Generated visualization")

    exit(0)

```

### Example - Function Reference Extraction

This mode extracts only the pure function definitions from your source code. This is useful for creating reference libraries or cataloging implementations:

Given the following as an input file:

```python

import numpy as np

from typing import List

def quicksort(arr: List[int]) -> List[int]:

    if len(arr) <= 1:

        return arr

    pivot = arr[len(arr) // 2]

    left = [x for x in arr if x < pivot]

    middle = [x for x in arr if x == pivot]

    right = [x for x in arr if x > pivot]

    return quicksort(left) + middle + quicksort(right)

def binary_search(arr: List[int], target: int) -> int:

    left, right = 0, len(arr) - 1

    while left <= right:

        mid = (left + right) // 2

        if arr[mid] == target:

            return mid

        elif arr[mid] < target:

            left = mid + 1

        else:

            right = mid - 1

    return -1

if __name__ == "__main__":

    test_arr = [3, 6, 8, 10, 1, 2, 1]

    sorted_arr = quicksort(test_arr)

    idx = binary_search(sorted_arr, 6)

```

Running:

```bash

python fplot.py source.py --funcdefs-only

```

will create separate files for each function call, _*NOT*_ preserving any surrounding setup, comments, etc:

```python

### generated file: quicksort.py

def quicksort(arr: List[int]) -> List[int]:

    if len(arr) <= 1:

        return arr

    pivot = arr[len(arr) // 2]

    left = [x for x in arr if x < pivot]

    middle = [x for x in arr if x == pivot]

    right = [x for x in arr if x > pivot]

    return quicksort(left) + middle + quicksort(right)

```

```python

### generated file: binary_search.py

def binary_search(arr: List[int], target: int) -> int:

    left, right = 0, len(arr) - 1

    while left <= right:

        mid = (left + right) // 2

        if arr[mid] == target:

            return mid

        elif arr[mid] < target:

            left = mid + 1

        else:

            right = mid - 1

    return -1

```

This mode:

- Extracts only function definitions

- Names files directly after the functions

- Excludes imports, setup code, and main blocks

- Preserves function signatures and type hints

- Creates a clean reference library of implementations

This is particularly useful when:

- Creating an algorithm reference library

- Extracting reusable functions from existing code

- Building a catalog of implementation patterns

- Preparing code examples for documentation

 

## Common Python Library Setup Pattern Detection

fplit intelligently detects and preserves setup code for many popular Python libraries. Here's what each pattern matches:

### Data Science & ML

- **NumPy**: Random seeds, print options, error settings (but not array operations)

- **Pandas**: Display options, default settings (but not data operations)

- **Matplotlib**: Style settings, backend config (but not actual plotting)

- **Seaborn**: Theme setting, style config (but not visualizations)

- **Plotly**: Template selection, renderer config (but not plotting)

- **TensorFlow**: GPU/device config, random seeds (but not model ops)

- **PyTorch**: Device selection, seeds, cudnn config (but not training)

- **JAX**: Platform selection, precision config (but not computations)

- **Scikit-learn**: Random state setup (but not model operations)

### Web & API

- **FastAPI**: App creation, middleware setup (but not routes)

- **Django**: Settings adjustment (but not views)

- **Requests**: Session creation, auth setup (but not API calls)

- **SQLAlchemy**: Engine creation, pool config (but not queries)

### Testing & Debug

- **Pytest**: Skip conditions, import checking (but not test functions)

- **Logging**: Logger creation, level setting (but not log messages)

- **Warnings**: Warning filters (but not warning raises)

### Other

- **OpenCV**: Threading config, window params (but not image ops)

- **Ray**: Init config, resource setup (but not computations)

- **Random**: Seed setting (but not generation)

- **Environment Variables**: Environment variable setting

### Setup Patterns Configuration Guide

[Setup Patterns Configuration Guide](https://github.com/scottvr/fplit/blob/main/Pattern_Configuration_Guide.md)

## Installation

```bash

git clone https://github.com/scottvr/fplit.git

cd fplit

python -m pip install -r requirements.txt

```

## TODO

Contributions are welcome. Here are some things on the todo list:

- Additional setup patterns for other popular libraries

- Smarter handling of function dependencies

- Support for async/await syntax

- Configuration file support

## License

MIT
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/scottvr/fplit

Awesome Lists containing this project

README