https://github.com/fabarca/andar

Provides an abstraction layer for creating and parsing paths in a programmatic way via templates.
https://github.com/fabarca/andar
best-practices desing-patterns parser path path-manager python template
Last synced: 4 months ago
JSON representation
Provides an abstraction layer for creating and parsing paths in a programmatic way via templates.
Host: GitHub
URL: https://github.com/fabarca/andar
Owner: fabarca
License: mit
Created: 2025-08-05T16:43:53.000Z (9 months ago)
Default Branch: main
Last Pushed: 2026-01-04T21:03:32.000Z (4 months ago)
Last Synced: 2026-01-05T06:34:51.166Z (4 months ago)
Topics: best-practices, desing-patterns, parser, path, path-manager, python, template
Language: Python
Homepage: https://fabarca.github.io/andar/
Size: 796 KB
Stars: 2
Watchers: 0
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

          # Andar Package

Andar is a Python library that provides an abstraction layer for managing path structures, helping to create and parse paths programmatically via templated file paths.

## Install Package

With pip:

```bash

pip install andar

```

## Key features

### Clean code

Andar promotes clean code by using a composition approach to avoid inheritance hell.

Furthermore, it allows to define a path conventions in a single place using a clear and intuitive syntax.

The use of templated path strings with field definitions helps to avoid the error-prone split/index syntax

### Reusability

Andar allows using a single path convention via a PathModel for both generating and parsing paths.

PathModels can be reused to create new path conventions with minimal effort without modifying the parent PathModel.

### Separation of Concerns

Andar helps to separate I/O layer from path generation layer resulting in a code easier to maintain.

### Predictability

Andar provides field name checking via regular expressions and functions to assert bijection between path generation and

path parsing.

### Flexibility

Andar allows for a quick start just by defining a path template thanks to its predefined fields and patterns. It also 

include more advance capabilities for customizing field parsing and generation via regular expression and string converters while 

maintaining a simple syntax.

### Lightweight

Andar is written using standard Python library, so it is very lightweight without any external dependencies.

## Concepts

### PathModel

PathModel is the main class that allows to easy define path conventions and manage path structures. It is based on two

main components: templates and fields.

Templates are strings that define the names of the fields in the path structure using a simple syntax 

(inspired by f-string) , for example: `"/{folder}/{prefix}_{name}_{suffix}.{ext}"`

Fields are the basic components that allow to map an object to a string in order to build or parse a path. Fields 

are defined via a class named FieldConf (see next section).

A PathModel can be defined only with the template string because there is already a default value for fields.

Once a PathModel is defined it can be used to generate a new path or to parse an existing path in order to get 

its fields. See [Quick Start](#quick-start) for a simple example. For more details check the [Docs](https://fabarca.github.io/andar/reference/andar/path_model/).

### FieldConf

FieldConf is the class that defines how to parse and build a given field. It can be customized by specifying its regex pattern and how to convert the input object to a string and vice versa. It comes with a handy way for automatically manage dates and datetimes. See [Examples](#examples) section for some applied use cases. For more details check the [Docs](https://fabarca.github.io/andar/reference/andar/field_conf/).

## Quick Start

Simple PathModel definition using default field configurations:

```python

from andar import PathModel

simple_path_model = PathModel(

    template="/{base_folder}/{subfolder}/{base_name}__{suffix}.{extension}"

)

```

Generate a path:

```python

result_path = simple_path_model.get_path(

    base_folder="parent_folder",

    subfolder="other_folder",

    base_name="mydata",

    suffix="2000-01-01",

    extension="csv",

)

print(result_path)

```

```python

"/parent_folder/other_folder/mydata__2000-01-01.csv"

```

Parse a path:

```python

file_path = "/data/reports/summary__2025-12-31.csv"

parsed_fields = simple_path_model.parse_path(file_path)

print(parsed_fields)

```

```python

{

    'base_folder': 'data', 

    'subfolder': 'reports', 

    'base_name': 'summary', 

    'suffix': '2025-12-31', 

    'extension': 'csv',

}

```

## Examples

### How to create a path generator / parser for a date tree structure

Define a PathModel following a date tree folder structure with datetime a suffix using the next template and fields:

```python

from andar import FieldConf, PathModel, SafePatterns

date_archived_pm = PathModel(

    template="{base_path}/{subfolder}/{date_path}/{date_prefix}_{name}_{datetime_suffix}.{ext}",

    fields={

        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),

        "subfolder": FieldConf(pattern=SafePatterns.NAME),

        "date_path": FieldConf(pattern=r"\d{4}/\d{2}/\d{2}", date_format="%Y/%m/%d"),

        "date_prefix": FieldConf(pattern=r"\d{4}-\d{2}-\d{2}", date_format="%Y-%m-%d"),

        "name": FieldConf(pattern=SafePatterns.FIELD),

        "datetime_suffix": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),

        "ext": FieldConf(pattern=SafePatterns.EXTENSION),

    },

)

```

Then, for generating the paths just iterate over dates:

```python

import datetime as dt

base_path = "/company/reports"

subfolder = "finance"

report_name = "revenue"

extension = "xls"

start_date = dt.date(2025, 12, 1)

report_date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for report_date in report_date_list:

    creation_datetime = dt.datetime.now()

    report_path = date_archived_pm.get_path(

        base_path=base_path,

        subfolder=subfolder,

        date_path=report_date,

        date_prefix=report_date,

        name=report_name,

        datetime_suffix=creation_datetime,

        ext=extension,

    )

    print(report_path)

```

For parsing already existing paths use a library that allows to recursive search (e.g. pathlib, glob, os, etc) 

and output a fullpath for each file:

```python

import pathlib

base_path = "/company/reports"

search_folder = pathlib.Path(base_path)

path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:

    parsed_fields = date_archived_pm.parse_path(file_path)

    print(parsed_fields)

```

### How to define path conventions for a datalake

For example Data Mesh propose conventions for separating data into domains, layers and products. 

This could be implemented with the following PathModel template and fields:

```python

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(

    template="/{domain}/{layer}/{product}/{aggregation}/{date}_{product}.{ext}",

    fields={

        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc

        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc

        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc

        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc

        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product date

        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc

    },

)

```

For improving traceability, it's a good practice to also include run datetime (i.e. generation date) 

as a simple version system:

```python

from andar import FieldConf, PathModel, SafePatterns

data_mesh_pm = PathModel(

    template="/{domain}/{layer}/{product}/{aggregation}/{product_date}_{product}_{run_datetime}.{ext}",

    fields={

        "domain": FieldConf(pattern=SafePatterns.NAME),  # sales, marketing, HR, finance, etc

        "layer": FieldConf(pattern=SafePatterns.NAME),  # raw, intermediate, mart, etc

        "product": FieldConf(pattern=SafePatterns.NAME),  # orders, revenues, taxes, campaigns, etc

        "aggregation": FieldConf(pattern=SafePatterns.NAME),  # daily, weekly, monthly, etc

        "product_date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),  # product target date

        "run_datetime": FieldConf(pattern=r"\d{8}_\d{6}", datetime_format="%Y%m%d_%H%M%S"),  # generation datetime

        "ext": FieldConf(pattern=SafePatterns.EXTENSION),  # csv, xls, parquet, etc

    },

)

```

### How to reorganize files and folders in a datalake

In this example we will reorganize a flatten file structure into a nested one.

First define the two PathModels, the old one and the new one:

```python

from andar import FieldConf, PathModel, SafePatterns

old_flat_pm = PathModel(

    template="{base_path}/{category}_{name}_{date}.{ext}",

    fields={

        "base_path": FieldConf(pattern=SafePatterns.DIRPATH),

        "category": FieldConf(pattern=SafePatterns.NAME),

        "name": FieldConf(pattern=SafePatterns.FIELD),

        "date": FieldConf(pattern=r"\d{8}", datetime_format="%Y%m%d"),

        "ext": FieldConf(pattern=SafePatterns.EXTENSION),

    },

)

# we can just update the template if the fields are de same

new_nested_pm = old_flat_pm.update(

    template="{base_path}/{category}/{date}/{name}.{ext}"

)

```

Example of file creating in a temporary directory using a flatten structure with the old PathModel:

```python

import pathlib

import tempfile

import datetime as dt

base_path = tempfile.mkdtemp()

start_date = dt.datetime(2025, 12, 1)

date_list = [start_date + dt.timedelta(days=d) for d in range(10)]

for date in date_list:

    creation_datetime = dt.datetime.now()

    file_path = old_flat_pm.get_path(

        base_path=base_path,

        category="sales",

        name="orders",

        date=date,

        ext="csv",

    )

    print(file_path)

    pathlib.Path(file_path).touch()  # create an empty file

```

Example of nesting file paths using the parser of the old PathModel and the get_path of the new PathModel:

```python

# First list existing files in target base path

search_folder = pathlib.Path(base_path)

path_list = [str(i) for i in search_folder.rglob("*") if i.is_file()]

for file_path in path_list:

    parsed_fields = old_flat_pm.parse_path(file_path)

    # As the fields are the same we can reuse them directly

    new_file_path = new_nested_pm.get_path(**parsed_fields)

    # create new parent directories

    pathlib.Path(new_file_path).parent.mkdir(parents=True, exist_ok=True)

    # move old file to new location using the new name

    pathlib.Path(file_path).replace(new_file_path)

```

The same strategy could be adapted to flatten a nested path structure using PathModels.

## Documentation

See the [official documentation](https://fabarca.github.io/andar) to learn more.

## Package name origin

The package name originates from a verse by the Spanish poet Antonio Machado:

> "Caminante, no hay camino, se hace camino al **andar**."

> 

> Antonio Machado
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/fabarca/andar

Awesome Lists containing this project

README