{"id":23074161,"url":"https://github.com/elcarrillo/structpy","last_synced_at":"2026-04-24T11:34:10.876Z","repository":{"id":265834465,"uuid":"896708451","full_name":"elcarrillo/StructPy","owner":"elcarrillo","description":"StructPy is a Python-based command-line tool designed for academics and scientists to manage data projects effectively. It simplifies workflows by creating structured project directories, generating timestamped filenames, validating datasets, and backing up projects seamlessly.","archived":false,"fork":false,"pushed_at":"2024-12-31T03:21:12.000Z","size":475,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-09T00:47:53.492Z","etag":null,"topics":["command-line-tool","data","database","file-structure","organization","python","science-tool"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/elcarrillo.png","metadata":{"files":{"readme":"README.md","changelog":"changelog.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-01T04:54:51.000Z","updated_at":"2024-12-31T03:36:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"074d8fcb-f124-4e81-9908-410627d851a3","html_url":"https://github.com/elcarrillo/StructPy","commit_stats":null,"previous_names":["elcarrillo/structpy"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcarrillo%2FStructPy","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcarrillo%2FStructPy/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcarrillo%2FStructPy/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/elcarrillo%2FStructPy/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/elcarrillo","download_url":"https://codeload.github.com/elcarrillo/StructPy/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246992488,"owners_count":20865822,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-tool","data","database","file-structure","organization","python","science-tool"],"created_at":"2024-12-16T08:22:00.373Z","updated_at":"2026-04-24T11:34:05.849Z","avatar_url":"https://github.com/elcarrillo.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cp align=\"center\"\u003e\n  \u003cimg src=\"./logo.png\" alt=\"StructPy Logo\" width=\"400\"/\u003e\n\u003c/p\u003e\n\n# **StructPy**\n\nStructPy is a Python-based command-line tool designed for academics and scientists to manage data projects effectively. It simplifies workflows by creating structured project directories, generating timestamped filenames, validating datasets, and backing up projects seamlessly.\n\n---\n\n[![Python](https://img.shields.io/badge/python-3.8%2B-blue)](https://python.org)\n[![License](https://img.shields.io/badge/license-MIT-green)](LICENSE)\n[![Last Commit](https://img.shields.io/github/last-commit/elcarrillo/StructPy)](https://github.com/elcarrillo/StructPy)\n[![Open Issues](https://img.shields.io/github/issues/elcarrillo/StructPy)](https://github.com/elcarrillo/StructPy/issues)\n[![Stars](https://img.shields.io/github/stars/elcarrillo/StructPy?style=social)](https://github.com/elcarrillo/StructPy/stargazers)\n[![Code Quality](https://img.shields.io/codefactor/grade/github/elcarrillo/StructPy)](https://www.codefactor.io/repository/github/elcarrillo/StructPy)\n![Build Status](https://github.com/elcarrillo/StructPy/actions/workflows/ci.yml/badge.svg)\n\n## **Features**\n\n1. **Create Project Directory Structures**:\n   - Automatically generate consistent directories and subdirectories based on a YAML configuration file.\n2. **Generate Filenames**:\n   - Create unique, timestamped filenames with customizable prefixes and extensions.\n3. **Validate Datasets**:\n   - Check for missing values, duplicate columns, and outliers in CSV files.\n4. **Backup Projects**:\n   - Compress entire project directories into `.zip` archives for easy sharing and safekeeping.\n\n...\n\n# **StructPy**\n\nStructPy is a Python-based command-line tool designed for academics and scientists to manage data projects effectively. It simplifies workflows by creating structured project directories, generating timestamped filenames, validating datasets, and seamlessly backing up projects.\n\n---\n\n## **Features**\n1. **Create Project Directory Structures**:\n   - Automatically generate consistent directories and subdirectories based on a YAML configuration file.\n2. **Generate Filenames**:\n   - Create unique, timestamped filenames with customizable prefixes and extensions.\n3. **Validate Datasets**:\n   - Check for missing values, duplicate columns, and outliers in CSV files.\n4. **Backup Projects**:\n   - Compress entire project directories into `.zip` archives for easy sharing and safekeeping.\n\n---\n\n## **Installation**\n\n1. Clone the repository:\n   ```bash\n   git clone https://github.com/elcarrillo/StructPy.git\n   cd StructPy\n   ```\n\n2. Install the package in editable mode using `setup.py`:\n   ```bash\n   pip install -e .\n   ```\n\n3. Verify installation:\n   ```bash\n   structpy --help\n   ```\n\n---\n\n## **File Structure**\n\nThe following is the structure of the project:\n\n```\nStructPy/\n├── src/                     # Source code\n│   ├── __init__.py\n│   ├── main.py\n│   ├── core_features.py\n│   ├── validate.py\n│   ├── config_loader.py\n├── tests/                   # Unit tests\n│   ├── __init__.py\n│   ├── conftest.py\n│   ├── test_core_features.py\n├── examples/                # Example configurations/files\n│   └── ...\n├── .gitignore               # Git ignore rules\n├── LICENSE                  # License file\n├── README.md                # Project documentation\n├── requirements.txt         # Project dependencies\n├── pytest.ini               # Pytest configuration\n├── setup.py                 # Installation script\n├── logo.png                 # package logo\n├── config.yaml              # Main configuration file\n├── CHANGELOG.md             # changelog\n\n\n```\n\n---\n\n## **Configuration**\n\nCustomize the tool's behavior using the `config.yaml` file in the project root.\n\n**Example `config.yaml`:**\n```yaml\ndirectories:\n  input: []\n  output:\n    - plots\n    - tables\n  logs: []\n  temp: []\n\nbackup:\n  default_archive_name: \"project_backup\"\n\nfilename:\n  default_prefix: \"datafile\"\n  default_extension: \"csv\"\n\nvalidation:\n  check_missing_values: true\n  check_duplicate_columns: true\n  missing_value_threshold: 0.1  # Allow up to 10% missing values\n```\n\n---\n\n## **Usage**\n\nTo use the tool via the command-line interface, run the `structpy` command. The syntax follows this format:\n\n```\nstructpy {positional_argument} [--option OPTION_PARAMETER]\n```\n\nFor a complete list of positional arguments and available options, use the `--help` flag:\n\n```\nstructpy --help\n```\n\nHere are some examples of typical workflows to get you started:\n\n### **1. Create Project Directories**\n```bash\nstructpy create_dirs --path ./experiment_001\n```\n\n**Result:**\nThe following directory structure is created at `./experiment_001`:\n```\nexperiment_001/\n├── input/\n├── output/\n│   ├── plots/\n│   ├── tables/\n├── logs/\n├── temp/\n```\n\n---\n\n### **2. Generate Filenames**\n```bash\nstructpy generate_filename --prefix results --ext csv\n```\n\n**Output:**\n```\nGenerated filename: results_20241121-153200.csv\n```\n\nYou can customize the prefix and extension using the command-line arguments or rely on defaults from `config.yaml`.\n\n---\n\n### **3. Validate a Dataset**\n```bash\nstructpy validate --file data.csv\n```\n\n**Example Output:**\n```\nInfo: Missing values detected (5.0%)\nWarning: Duplicate column names found\nWarning: Outliers detected in column temperature\nData validation complete.\n```\n\n---\n\n### **4. Backup a Project**\n```bash\nstructpy backup --path ./experiment_001 --archive experiment_backup\n```\n\n**Result:**\nThe project is compressed into `experiment_backup.zip`.\n\n---\n\n## **Advanced Usage Example**\n\n### **Scenario**: Managing a Collaborative Research Project\n\nIn this scenario, StructPy is used to manage multiple experiments within a single project folder.\n\n1. **Initialize the Project:**\n   Create a base directory for the project:\n   ```bash\n   structpy create_dirs --path ./collaborative_project\n   ```\n\n   Resulting structure:\n   ```\n   collaborative_project/\n   ├── input/\n   ├── output/\n   │   ├── plots/\n   │   ├── tables/\n   ├── logs/\n   ├── temp/\n   ```\n\n2. **Run Multiple Experiments:**\n   Create subdirectories for each experiment under the base project directory:\n   ```bash\n   structpy create_dirs --path ./collaborative_project/experiment_01\n   structpy create_dirs --path ./collaborative_project/experiment_02\n   ```\n\n3. **Validate Data for Experiment 1:**\n   ```bash\n   structpy validate --file ./collaborative_project/experiment_01/data.csv\n   ```\n\n   Example output:\n   ```\n   Warning: Missing values detected (2.5%)\n   Info: No duplicate column names found.\n   Warning: Outliers detected in column velocity\n   Data validation complete.\n   ```\n\n4. **Generate Results Filenames:**\n   Generate a unique filename for storing processed results:\n   ```bash\n   structpy generate_filename --prefix exp1_results --ext csv\n   ```\n\n   Output:\n   ```\n   Generated filename: exp1_results_20241121-154500.csv\n   ```\n\n5. **Backup the Project:**\n   Backup the entire collaborative project into a timestamped archive:\n   ```bash\n   structpy backup --path ./collaborative_project --archive collaborative_project_backup\n   ```\n\n   Result:\n   ```\n   collaborative_project_backup.zip\n   ```\n\n---\n\n## **Development**\n\n### **Run Tests**\nTo ensure the tool is working as expected, run the test suite:\n```bash\npytest tests/ -v\n```\n\n### **Contributing**\n1. Fork the repository.\n2. Create a feature branch:\n   ```bash\n   git checkout -b my-feature\n   ```\n3. Commit your changes and push to your branch:\n   ```bash\n   git commit -m \"Add new feature\"\n   git push origin my-feature\n   ```\n4. Open a pull request.\n\n---\n\n## **Future Enhancements**\n- Add support for other file formats (e.g., `.json`, `.h5`).\n- Extend validation to support user-defined rules.\n- Integrate with cloud storage solutions for backups.\n\n---\n\n## **License**\nThis project is licensed under the MIT License. See the `LICENSE` file for details.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felcarrillo%2Fstructpy","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Felcarrillo%2Fstructpy","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Felcarrillo%2Fstructpy/lists"}