https://github.com/lablnet/stepwright

A powerful web scraping library built with Playwright that provides a declarative, step-by-step approach to web automation and data extraction.
https://github.com/lablnet/stepwright
declerative json mit-license scraping structured
Last synced: 23 days ago
JSON representation
A powerful web scraping library built with Playwright that provides a declarative, step-by-step approach to web automation and data extraction.
Host: GitHub
URL: https://github.com/lablnet/stepwright
Owner: lablnet
License: mit
Created: 2025-10-22T13:16:55.000Z (4 months ago)
Default Branch: main
Last Pushed: 2025-10-31T07:09:17.000Z (3 months ago)
Last Synced: 2025-10-31T08:37:46.651Z (3 months ago)
Topics: declerative, json, mit-license, scraping, structured
Language: Python
Homepage: https://pypi.org/project/stepwright/
Size: 152 KB
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
Awesome Lists containing this project

README

          # StepWright

A powerful web scraping library built with Playwright that provides a declarative, step-by-step approach to web automation and data extraction.

## Features

- 🚀 **Declarative Scraping**: Define scraping workflows using Python dictionaries or dataclasses

- 🔄 **Pagination Support**: Built-in support for next button and scroll-based pagination

- 📊 **Data Collection**: Extract text, HTML, values, and files from web pages

- 🔗 **Multi-tab Support**: Handle multiple tabs and complex navigation flows

- 📄 **PDF Generation**: Save pages as PDFs or trigger print-to-PDF actions

- 📥 **File Downloads**: Download files with automatic directory creation

- 🔁 **Looping & Iteration**: ForEach loops for processing multiple elements

- 📡 **Streaming Results**: Real-time result processing with callbacks

- 🎯 **Error Handling**: Graceful error handling with configurable termination

- 🔧 **Flexible Selectors**: Support for ID, class, tag, and XPath selectors

- 🔁 **Retry Logic**: Automatic retry on failure with configurable delays

- 🎛️ **Conditional Execution**: Skip or execute steps based on JavaScript conditions

- ⏳ **Smart Waiting**: Wait for selectors before actions with configurable timeouts

- 🔀 **Fallback Selectors**: Multiple selector fallbacks for increased robustness

- 🖱️ **Enhanced Clicks**: Double-click, right-click, modifier keys, and force clicks

- ⌨️ **Input Enhancements**: Clear before input, human-like typing delays

- 🔍 **Data Transformations**: Regex extraction, JavaScript transformations, default values

- 🌐 **Page Actions**: Reload, get URL/title, meta tags, cookies, localStorage, viewport

- 🤖 **Human-like Behavior**: Random delays to mimic human interaction

- ✅ **Element State Checks**: Require visible/enabled before actions

## Installation

```bash

# Using pip

pip install stepwright

# Using pip with development dependencies

pip install stepwright[dev]

# From source

git clone https://github.com/lablnet/stepwright.git

cd stepwright

pip install -e .

```

## Quick Start

### Basic Usage

```python

import asyncio

from stepwright import run_scraper, TabTemplate, BaseStep

async def main():

    templates = [

        TabTemplate(

            tab="example",

            steps=[

                BaseStep(

                    id="navigate",

                    action="navigate",

                    value="https://example.com"

                ),

                BaseStep(

                    id="get_title",

                    action="data",

                    object_type="tag",

                    object="h1",

                    key="title",

                    data_type="text"

                )

            ]

        )

    ]

    results = await run_scraper(templates)

    print(results)

if __name__ == "__main__":

    asyncio.run(main())

```

## API Reference

### Core Functions

#### `run_scraper(templates, options=None)`

Main function to execute scraping templates.

**Parameters:**

- `templates`: List of `TabTemplate` objects

- `options`: Optional `RunOptions` object

**Returns:** `List[Dict[str, Any]]`

```python

results = await run_scraper(templates, RunOptions(

    browser={"headless": True}

))

```

#### `run_scraper_with_callback(templates, on_result, options=None)`

Execute scraping with streaming results via callback.

**Parameters:**

- `templates`: List of `TabTemplate` objects

- `on_result`: Callback function for each result (can be sync or async)

- `options`: Optional `RunOptions` object

```python

async def process_result(result, index):

    print(f"Result {index}: {result}")

await run_scraper_with_callback(templates, process_result)

```

### Types

#### `TabTemplate`

```python

@dataclass

class TabTemplate:

    tab: str

    initSteps: Optional[List[BaseStep]] = None      # Steps executed once before pagination

    perPageSteps: Optional[List[BaseStep]] = None   # Steps executed for each page

    steps: Optional[List[BaseStep]] = None          # Single steps array

    pagination: Optional[PaginationConfig] = None

```

#### `BaseStep`

```python

@dataclass

class BaseStep:

    id: str

    description: Optional[str] = None

    object_type: Optional[SelectorType] = None  # 'id' | 'class' | 'tag' | 'xpath'

    object: Optional[str] = None

    action: Literal[

        "navigate", "input", "click", "data", "scroll", 

        "eventBaseDownload", "foreach", "open", "savePDF", 

        "printToPDF", "downloadPDF", "downloadFile",

        "reload", "getUrl", "getTitle", "getMeta", "getCookies", 

        "setCookies", "getLocalStorage", "setLocalStorage", 

        "getSessionStorage", "setSessionStorage", "getViewportSize", 

        "setViewportSize", "screenshot", "waitForSelector", "evaluate"

    ] = "navigate"

    value: Optional[str] = None

    key: Optional[str] = None

    data_type: Optional[DataType] = None        # 'text' | 'html' | 'value' | 'default' | 'attribute'

    wait: Optional[int] = None

    terminateonerror: Optional[bool] = None

    subSteps: Optional[List["BaseStep"]] = None

    autoScroll: Optional[bool] = None

    

    # Retry configuration

    retry: Optional[int] = None                  # Number of retries on failure (default: 0)

    retryDelay: Optional[int] = None            # Delay between retries in ms (default: 1000)

    

    # Conditional execution

    skipIf: Optional[str] = None                 # JavaScript expression - skip step if true

    onlyIf: Optional[str] = None                # JavaScript expression - execute only if true

    

    # Element waiting and state

    waitForSelector: Optional[str] = None        # Wait for selector before action

    waitForSelectorTimeout: Optional[int] = None # Timeout for waitForSelector in ms (default: 30000)

    waitForSelectorState: Optional[Literal["visible", "hidden", "attached", "detached"]] = None

    

    # Multiple selector fallbacks

    fallbackSelectors: Optional[List[Dict[str, str]]] = None  # List of {object_type, object}

    

    # Click enhancements

    clickModifiers: Optional[List[ClickModifier]] = None  # ['Control', 'Meta', 'Shift', 'Alt']

    doubleClick: Optional[bool] = None            # Perform double click

    forceClick: Optional[bool] = None            # Force click even if not visible/actionable

    rightClick: Optional[bool] = None            # Perform right click

    

    # Input enhancements

    clearBeforeInput: Optional[bool] = None      # Clear input before typing (default: True)

    inputDelay: Optional[int] = None           # Delay between keystrokes in ms

    

    # Data extraction enhancements

    required: Optional[bool] = None             # Raise error if extraction returns None/empty

    defaultValue: Optional[str] = None          # Default value if extraction fails

    regex: Optional[str] = None                 # Regex pattern to extract from data

    regexGroup: Optional[int] = None            # Regex group to extract (default: 0)

    transform: Optional[str] = None             # JavaScript expression to transform data

    

    # Timeout configuration

    timeout: Optional[int] = None                # Step-specific timeout in ms

    

    # Navigation enhancements

    waitUntil: Optional[Literal["load", "domcontentloaded", "networkidle", "commit"]] = None

    

    # Human-like behavior

    randomDelay: Optional[Dict[str, int]] = None # {min: ms, max: ms} for random delay

    

    # Element state checks

    requireVisible: Optional[bool] = None        # Require element visible (default: True for click)

    requireEnabled: Optional[bool] = None       # Require element enabled

    

    # Skip/continue logic

    skipOnError: Optional[bool] = None          # Skip step if error occurs (default: False)

    continueOnEmpty: Optional[bool] = None      # Continue if element not found (default: True)

```

#### `RunOptions`

```python

@dataclass

class RunOptions:

    browser: Optional[dict] = None  # Playwright launch options

    onResult: Optional[Callable] = None

```

## Step Actions

### Navigate

Navigate to a URL.

```python

BaseStep(

    id="go_to_page",

    action="navigate",

    value="https://example.com"

)

```

### Input

Fill form fields.

```python

BaseStep(

    id="search",

    action="input",

    object_type="id",

    object="search-box",

    value="search term"

)

```

### Click

Click on elements.

```python

BaseStep(

    id="submit",

    action="click",

    object_type="class",

    object="submit-button"

)

```

### Data Extraction

Extract data from elements.

```python

BaseStep(

    id="get_title",

    action="data",

    object_type="tag",

    object="h1",

    key="title",

    data_type="text"

)

```

### ForEach Loop

Process multiple elements.

```python

BaseStep(

    id="process_items",

    action="foreach",

    object_type="class",

    object="item",

    subSteps=[

        BaseStep(

            id="get_item_title",

            action="data",

            object_type="tag",

            object="h2",

            key="title",

            data_type="text"

        )

    ]

)

```

### File Operations

#### Event-Based Download

```python

BaseStep(

    id="download_file",

    action="eventBaseDownload",

    object_type="class",

    object="download-link",

    value="./downloads/file.pdf",

    key="downloaded_file"

)

```

#### Download PDF/File

```python

BaseStep(

    id="download_pdf",

    action="downloadPDF",

    object_type="class",

    object="pdf-link",

    value="./output/document.pdf",

    key="pdf_file"

)

```

#### Save PDF

```python

BaseStep(

    id="save_pdf",

    action="savePDF",

    value="./output/page.pdf",

    key="pdf_file"

)

```

## Pagination

### Next Button Pagination

```python

PaginationConfig(

    strategy="next",

    nextButton=NextButtonConfig(

        object_type="class",

        object="next-page",

        wait=2000

    ),

    maxPages=10

)

```

### Scroll Pagination

```python

PaginationConfig(

    strategy="scroll",

    scroll=ScrollConfig(

        offset=800,

        delay=1500

    ),

    maxPages=5

)

```

### Pagination Strategies

#### paginationFirst

Paginate first, then collect data from each page:

```python

TabTemplate(

    tab="news",

    initSteps=[...],

    perPageSteps=[...],  # Collect data from each page

    pagination=PaginationConfig(

        strategy="next",

        nextButton=NextButtonConfig(...),

        paginationFirst=True  # Go to next page before collecting

    )

)

```

#### paginateAllFirst

Paginate through all pages first, then collect all data at once:

```python

TabTemplate(

    tab="articles",

    initSteps=[...],

    perPageSteps=[...],  # Collect all data after all pagination

    pagination=PaginationConfig(

        strategy="next",

        nextButton=NextButtonConfig(...),

        paginateAllFirst=True  # Load all pages first

    )

)

```

## Advanced Features

### Proxy Support

```python

from stepwright import run_scraper, RunOptions

results = await run_scraper(templates, RunOptions(

    browser={

        "proxy": {

            "server": "http://proxy-server:8080",

            "username": "user",

            "password": "pass"

        }

    }

))

```

### Custom Browser Options

```python

results = await run_scraper(templates, RunOptions(

    browser={

        "headless": False,

        "slow_mo": 1000,

        "args": ["--no-sandbox", "--disable-setuid-sandbox"]

    }

))

```

### Streaming Results

```python

async def process_result(result, index):

    print(f"Result {index}: {result}")

    # Process result immediately (e.g., save to database)

    await save_to_database(result)

await run_scraper_with_callback(

    templates, 

    process_result,

    RunOptions(browser={"headless": True})

)

```

### Data Placeholders

Use collected data in subsequent steps:

```python

BaseStep(

    id="get_title",

    action="data",

    object_type="id",

    object="page-title",

    key="page_title",

    data_type="text"

),

BaseStep(

    id="save_with_title",

    action="savePDF",

    value="./output/{{page_title}}.pdf",  # Uses collected page_title

    key="pdf_file"

)

```

### Index Placeholders

Use loop index in foreach steps:

```python

BaseStep(

    id="process_items",

    action="foreach",

    object_type="class",

    object="item",

    subSteps=[

        BaseStep(

            id="save_item",

            action="savePDF",

            value="./output/item_{{i}}.pdf",      # i = 0, 1, 2, ...

            # or

            value="./output/item_{{i_plus1}}.pdf" # i_plus1 = 1, 2, 3, ...

        )

    ]

)

```

## Error Handling

Steps can be configured to terminate on error:

```python

BaseStep(

    id="critical_step",

    action="click",

    object_type="id",

    object="important-button",

    terminateonerror=True  # Stop execution if this fails

)

```

Without `terminateonerror=True`, errors are logged but execution continues.

## Advanced Step Options

### Retry Logic

Automatically retry failed steps with configurable delays:

```python

BaseStep(

    id="click_button",

    action="click",

    object_type="id",

    object="flaky-button",

    retry=3,              # Retry up to 3 times

    retryDelay=1000        # Wait 1 second between retries

)

```

### Conditional Execution

Execute or skip steps based on JavaScript conditions:

```python

# Skip step if condition is true

BaseStep(

    id="optional_click",

    action="click",

    object_type="id",

    object="optional-button",

    skipIf="document.querySelector('.modal').classList.contains('hidden')"

)

# Execute only if condition is true

BaseStep(

    id="conditional_data",

    action="data",

    object_type="id",

    object="dynamic-content",

    key="content",

    onlyIf="document.querySelector('#dynamic-content') !== null"

)

```

### Wait for Selector

Wait for elements to appear before performing actions:

```python

BaseStep(

    id="click_after_load",

    action="click",

    object_type="id",

    object="target-button",

    waitForSelector="#loading-indicator",      # Wait for this selector

    waitForSelectorTimeout=5000,               # Timeout: 5 seconds

    waitForSelectorState="hidden"              # Wait until hidden

)

```

### Fallback Selectors

Provide multiple selector options for increased robustness:

```python

BaseStep(

    id="click_with_fallback",

    action="click",

    object_type="id",

    object="primary-button",                   # Try this first

    fallbackSelectors=[

        {"object_type": "class", "object": "btn-primary"},

        {"object_type": "class", "object": "submit-btn"},

        {"object_type": "xpath", "object": "//button[contains(text(), 'Submit')]"}

    ]

)

```

### Click Enhancements

Advanced click options for different interaction types:

```python

# Double click

BaseStep(

    id="double_click",

    action="click",

    object_type="id",

    object="item",

    doubleClick=True

)

# Right click (context menu)

BaseStep(

    id="right_click",

    action="click",

    object_type="id",

    object="context-menu-trigger",

    rightClick=True

)

# Click with modifier keys (Ctrl/Cmd+Click)

BaseStep(

    id="multi_select",

    action="click",

    object_type="class",

    object="item",

    clickModifiers=["Control"]  # or ["Meta"] for Mac

)

# Force click (click hidden elements)

BaseStep(

    id="force_click",

    action="click",

    object_type="id",

    object="hidden-button",

    forceClick=True

)

```

### Input Enhancements

More control over input behavior:

```python

# Clear input before typing (default: True)

BaseStep(

    id="clear_and_input",

    action="input",

    object_type="id",

    object="search-box",

    value="new search term",

    clearBeforeInput=True  # Clear existing value first

)

# Human-like typing with delays

BaseStep(

    id="human_like_input",

    action="input",

    object_type="id",

    object="form-field",

    value="slowly typed text",

    inputDelay=100  # 100ms delay between each character

)

```

### Data Extraction Enhancements

Advanced data extraction and transformation options:

```python

# Extract with regex

BaseStep(

    id="extract_price",

    action="data",

    object_type="id",

    object="price",

    key="price",

    regex=r"\$(\d+\.\d+)",        # Extract dollar amount

    regexGroup=1                   # Get first capture group

)

# Transform extracted data with JavaScript

BaseStep(

    id="transform_data",

    action="data",

    object_type="id",

    object="raw-data",

    key="processed",

    transform="value.toUpperCase().trim()"  # JavaScript transformation

)

# Required field with default value

BaseStep(

    id="get_required_data",

    action="data",

    object_type="id",

    object="important-field",

    key="important",

    required=True,                 # Raise error if not found

    defaultValue="N/A"            # Use if extraction fails

)

# Continue even if element not found

BaseStep(

    id="optional_data",

    action="data",

    object_type="id",

    object="optional-content",

    key="optional",

    continueOnEmpty=True           # Don't raise error if not found

)

```

### Element State Checks

Validate element state before actions:

```python

BaseStep(

    id="click_visible",

    action="click",

    object_type="id",

    object="button",

    requireVisible=True,           # Ensure element is visible

    requireEnabled=True            # Ensure element is enabled

)

```

### Random Delays

Add human-like random delays to actions:

```python

BaseStep(

    id="human_like_action",

    action="click",

    object_type="id",

    object="button",

    randomDelay={"min": 500, "max": 2000}  # Random delay between 500-2000ms

)

```

### Skip on Error

Skip steps that fail instead of stopping execution:

```python

BaseStep(

    id="optional_step",

    action="click",

    object_type="id",

    object="optional-button",

    skipOnError=True  # Continue even if this step fails

)

```

## Page Actions

### Reload Page

Reload the current page with optional wait conditions:

```python

BaseStep(

    id="reload",

    action="reload",

    waitUntil="networkidle"  # Wait for network to be idle

)

```

### Get Current URL

```python

BaseStep(

    id="get_url",

    action="getUrl",

    key="current_url"  # Store in collector

)

```

### Get Page Title

```python

BaseStep(

    id="get_title",

    action="getTitle",

    key="page_title"

)

```

### Get Meta Tags

```python

# Get specific meta tag

BaseStep(

    id="get_description",

    action="getMeta",

    object="description",  # Meta name or property

    key="meta_description"

)

# Get all meta tags

BaseStep(

    id="get_all_meta",

    action="getMeta",

    key="all_meta_tags"  # Returns dictionary of all meta tags

)

```

### Cookies Management

```python

# Get all cookies

BaseStep(

    id="get_cookies",

    action="getCookies",

    key="cookies"

)

# Get specific cookie

BaseStep(

    id="get_session_cookie",

    action="getCookies",

    object="session_id",

    key="session"

)

# Set cookie

BaseStep(

    id="set_cookie",

    action="setCookies",

    object="preference",

    value="dark_mode"

)

```

### LocalStorage & SessionStorage

```python

# Get localStorage value

BaseStep(

    id="get_storage",

    action="getLocalStorage",

    object="user_preference",

    key="preference"

)

# Set localStorage value

BaseStep(

    id="set_storage",

    action="setLocalStorage",

    object="theme",

    value="dark"

)

# Get all localStorage items

BaseStep(

    id="get_all_storage",

    action="getLocalStorage",

    key="all_storage"

)

# SessionStorage (same pattern)

BaseStep(

    id="get_session",

    action="getSessionStorage",

    object="temp_data",

    key="data"

)

```

### Viewport Operations

```python

# Get viewport size

BaseStep(

    id="get_viewport",

    action="getViewportSize",

    key="viewport"

)

# Set viewport size

BaseStep(

    id="set_viewport",

    action="setViewportSize",

    value="1920x1080"  # or "1920,1080" or "1920 1080"

)

```

### Screenshot

```python

# Full page screenshot

BaseStep(

    id="screenshot",

    action="screenshot",

    value="./screenshots/page.png",

    data_type="full"  # Full page, omit for viewport only

)

# Element screenshot

BaseStep(

    id="element_screenshot",

    action="screenshot",

    object_type="id",

    object="content-area",

    value="./screenshots/element.png",

    key="screenshot_path"

)

```

### Wait for Selector

Explicit wait for element state:

```python

BaseStep(

    id="wait_for_element",

    action="waitForSelector",

    object_type="id",

    object="dynamic-content",

    value="visible",      # visible, hidden, attached, detached

    wait=5000,            # Timeout in ms

    key="wait_result"     # Stores True/False

)

```

### Evaluate JavaScript

Execute custom JavaScript:

```python

BaseStep(

    id="custom_js",

    action="evaluate",

    value="() => document.querySelector('.counter').textContent",

    key="counter_value"

)

```

## Complete Example

```python

import asyncio

from pathlib import Path

from stepwright import (

    run_scraper,

    TabTemplate,

    BaseStep,

    PaginationConfig,

    NextButtonConfig,

    RunOptions

)

async def main():

    templates = [

        TabTemplate(

            tab="news_scraper",

            initSteps=[

                BaseStep(

                    id="navigate",

                    action="navigate",

                    value="https://news-site.com"

                ),

                BaseStep(

                    id="search",

                    action="input",

                    object_type="id",

                    object="search-box",

                    value="technology"

                )

            ],

            perPageSteps=[

                BaseStep(

                    id="collect_articles",

                    action="foreach",

                    object_type="class",

                    object="article",

                    subSteps=[

                        BaseStep(

                            id="get_title",

                            action="data",

                            object_type="tag",

                            object="h2",

                            key="title",

                            data_type="text"

                        ),

                        BaseStep(

                            id="get_content",

                            action="data",

                            object_type="tag",

                            object="p",

                            key="content",

                            data_type="text"

                        ),

                        BaseStep(

                            id="get_link",

                            action="data",

                            object_type="tag",

                            object="a",

                            key="link",

                            data_type="value"

                        )

                    ]

                )

            ],

            pagination=PaginationConfig(

                strategy="next",

                nextButton=NextButtonConfig(

                    object_type="id",

                    object="next-page",

                    wait=2000

                ),

                maxPages=5

            )

        )

    ]

    # Run scraper

    results = await run_scraper(templates, RunOptions(

        browser={"headless": True}

    ))

    # Process results

    for i, article in enumerate(results):

        print(f"\nArticle {i + 1}:")

        print(f"Title: {article.get('title')}")

        print(f"Content: {article.get('content')[:100]}...")

        print(f"Link: {article.get('link')}")

if __name__ == "__main__":

    asyncio.run(main())

```

## Development

### Setup

```bash

# Clone repository

git clone https://github.com/lablnet/stepwright.git

cd stepwright

# Create virtual environment

python -m venv venv

source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode

pip install -e ".[dev]"

# Install Playwright browsers

playwright install chromium

```

### Running Tests

```bash

# Run all tests

pytest

# Run with verbose output

pytest -v

# Run specific test file

pytest tests/test_scraper.py

# Run specific test class

pytest tests/test_scraper.py::TestGetBrowser

# Run specific test

pytest tests/test_scraper.py::TestGetBrowser::test_create_browser_instance

# Run with coverage

pytest --cov=src --cov-report=html

# Run integration tests only

pytest tests/test_integration.py

```

### Project Structure

```

stepwright/

├── src/

│   ├── __init__.py

│   ├── step_types.py      # Type definitions and dataclasses

│   ├── helpers.py         # Utility functions

│   ├── executor.py        # Core step execution logic

│   ├── parser.py          # Public API (run_scraper)

│   ├── scraper.py         # Low-level browser automation

│   ├── handlers/          # Action-specific handlers

│   │   ├── __init__.py

│   │   ├── data_handlers.py      # Data extraction handlers

│   │   ├── file_handlers.py      # File download/PDF handlers

│   │   ├── loop_handlers.py      # Foreach/open handlers

│   │   └── page_actions.py       # Page-related actions (reload, getUrl, etc.)

│   └── scraper_parser.py  # Backward compatibility

├── tests/

│   ├── __init__.py

│   ├── conftest.py        # Pytest configuration

│   ├── test_page.html     # Test HTML page

│   ├── test_page_enhanced.html  # Enhanced test page for new features

│   ├── test_scraper.py    # Core scraper tests

│   ├── test_parser.py     # Parser function tests

│   ├── test_new_features.py  # Tests for new features

│   └── test_integration.py # Integration tests

├── pyproject.toml         # Package configuration

├── setup.py               # Setup script

├── pytest.ini             # Pytest configuration

├── README.md              # This file

└── README_TESTS.md        # Detailed test documentation

```

### Code Quality

```bash

# Format code with black

black src/ tests/

# Lint with flake8

flake8 src/ tests/

# Type checking with mypy

mypy src/

```

## Module Organization

The codebase follows separation of concerns:

- **step_types.py**: All type definitions (BaseStep, TabTemplate, etc.)

- **helpers.py**: Utility functions (placeholder replacement, locator creation, condition evaluation)

- **executor.py**: Core execution logic (execute steps, handle pagination, retry logic)

- **parser.py**: Public API (run_scraper, run_scraper_with_callback)

- **scraper.py**: Low-level Playwright wrapper (navigate, click, get_data)

- **handlers/**: Action-specific handlers organized by functionality

  - **data_handlers.py**: Data extraction logic with transformations

  - **file_handlers.py**: File download and PDF operations

  - **loop_handlers.py**: Foreach loops and new tab/window handling

  - **page_actions.py**: Page-related actions (reload, getUrl, cookies, storage, etc.)

- **scraper_parser.py**: Backward compatibility wrapper

You can import from the main module or specific submodules:

```python

# From main module (recommended)

from stepwright import run_scraper, TabTemplate, BaseStep

# From specific modules

from stepwright.step_types import TabTemplate, BaseStep

from stepwright.parser import run_scraper

from stepwright.helpers import replace_data_placeholders

```

## Testing

See [README_TESTS.md](README_TESTS.md) for detailed testing documentation.

## Contributing

1. Fork the repository

2. Create a feature branch (`git checkout -b feature/amazing-feature`)

3. Make your changes

4. Add tests for new functionality

5. Ensure all tests pass (`pytest`)

6. Commit your changes (`git commit -m 'Add amazing feature'`)

7. Push to the branch (`git push origin feature/amazing-feature`)

8. Open a Pull Request

## License

MIT License - see [LICENSE](LICENSE) file for details.

## Support

- 🐛 Issues: [GitHub Issues](https://github.com/lablnet/stepwright/issues)

- 📖 Documentation: [README.md](README.md) and [README_TESTS.md](README_TESTS.md)

- 💬 Discussions: [GitHub Discussions](https://github.com/lablnet/stepwright/discussions)

## Acknowledgments

- Built with [Playwright](https://playwright.dev/)

- Inspired by declarative web scraping patterns

- Original TypeScript version: [framework-Island/stepwright](https://github.com/framework-Island/stepwright)

## Author

Muhammad Umer Farooq ([@lablnet](https://github.com/lablnet))
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/lablnet/stepwright

Awesome Lists containing this project

README