{"id":30369405,"url":"https://github.com/dsacms/npd_csviper","last_synced_at":"2025-10-10T01:08:09.391Z","repository":{"id":306020611,"uuid":"1024486316","full_name":"DSACMS/npd_csviper","owner":"DSACMS","description":"A python command line tool that builds out other python scripts to import CSV files. ","archived":false,"fork":false,"pushed_at":"2025-09-18T21:07:04.000Z","size":180,"stargazers_count":0,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-09-18T23:49:09.267Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"ftrotter/csviper","license":"cc0-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/DSACMS.png","metadata":{"files":{"readme":"ReadMe.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2025-07-22T19:20:45.000Z","updated_at":"2025-09-18T16:17:05.000Z","dependencies_parsed_at":"2025-07-29T23:10:40.422Z","dependency_job_id":"f6feec5c-c097-4b38-9899-9efaeddd2870","html_url":"https://github.com/DSACMS/npd_csviper","commit_stats":null,"previous_names":["dsacms/ndh_csviper","dsacms/npd_csviper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/DSACMS/npd_csviper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_csviper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_csviper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_csviper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_csviper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/DSACMS","download_url":"https://codeload.github.com/DSACMS/npd_csviper/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/DSACMS%2Fnpd_csviper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279002405,"owners_count":26083373,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-09T02:00:07.460Z","response_time":59,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-08-20T02:15:52.408Z","updated_at":"2025-10-10T01:08:09.353Z","avatar_url":"https://github.com/DSACMS.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# CSViper\n\nCSViper is a command-line tool that automates the process of analyzing CSV files and generating SQL scripts and Python programs to load the data into relational databases. It supports both MySQL and PostgreSQL backends and is designed for scenarios where the database is hosted remotely while the CSV file resides on the local machine.\n\n## Features\n\n- **CSV Analysis**: Automatically detects CSV format (delimiter, quote character) and analyzes column structure\n- **Column Normalization**: Converts column names to SQL-safe identifiers with intelligent duplicate handling\n- **Multi-Database Support**: Generates scripts for both MySQL and PostgreSQL\n- **Modular Design**: Four-phase approach allows for flexible workflow management\n- **Standalone Scripts**: Generates self-contained Python import scripts for easy deployment\n- **Intelligent File Discovery**: Invoker system automatically finds and processes the latest matching data files\n- **Pattern-Based Matching**: Uses glob patterns to handle timestamped or versioned file naming conventions\n- **Full Compilation Workflow**: Single command to process CSV files from analysis to ready-to-run import scripts\n\n## Installation\n\n### From Source (Development)\n\n1. Clone the repository:\n\n```bash\ngit clone https://github.com/ftrotter/csviper.git\ncd csviper\n```\n\n2. Create and activate a virtual environment:\n\n```bash\npython3 -m venv venv\nsource venv/bin/activate  # On Windows: venv\\Scripts\\activate\n```\n\n3. Install dependencies:\n\n```bash\npip install -r requirements.txt\n```\n\n4. Install in development mode:\n\n```bash\npip install -e .\n```\n\n### From PyPI (Coming Soon)\n\n```bash\npip install csviper\n```\n\n## Usage\n\nCSViper operates in four phases that can be run together or separately:\n\n### Phase 1: Extract Metadata\n\nAnalyze a CSV file and extract column information:\n\n```bash\npython -m csviper extract_metadata --from_csv=data.csv --output_dir=./output/\n```\n\nOptions:\n\n- `--from_csv`: Path to the CSV file to analyze (required)\n- `--output_dir`: Output directory (defaults to CSV filename without extension)\n- `--overwrite_previous`: Overwrite existing output files\n\n### Phase 2: Generate SQL Scripts\n\nGenerate CREATE TABLE and data import SQL scripts:\n\n```bash\npython -m csviper build_sql --from_metadata_json=output/data.metadata.json --output_dir=./output/\n```\n\n### Phase 3: Generate Import Script\n\nCreate a standalone Python script for data import:\n\n```bash\npython -m csviper build_import_script --from_resource_dir=./output/ --output_dir=./output/\n```\n\n### Phase 4: Invoke Compiled Scripts (New!)\n\nExecute compiled import scripts with automatic file discovery:\n\n```bash\npython -m csviper invoke-compiled-script --run_import_from=./output/ --import_data_from_dir=./data_directory/ --database_type=postgresql\n```\n\nOptions:\n\n- `--run_import_from`: Directory containing compiled CSViper scripts and metadata (required)\n- `--import_data_from_dir`: Directory to search for data files (required)\n- `--database_type`: Database type - either 'mysql' or 'postgresql' (required)\n\n### Running All Phases Together\n\n```bash\npython -m csviper full_compile --from_csv=data.csv --output_dir=./output/ --overwrite_previous\n```\n\n## Development Setup\n\n### Prerequisites\n\n- Python 3.8 or higher\n- Virtual environment (recommended)\n\n### Setting up the Development Environment\n\n1. Clone the repository and navigate to the project directory\n2. Source the virtual environment setup script:\n\n```bash\nsource source_me_to_get_venv.sh\n```\n\n3. Install development dependencies:\n\n```bash\npip install -e \".[dev]\"\n```\n\n### Running Tests\n\n```bash\npytest\n```\n\n### Code Formatting\n\n```bash\nblack src/\n```\n\n### Linting\n\n```bash\nflake8 src/\n```\n\n## Project Structure\n\n```tree\ncsviper/\n├── src/csviper/\n│   ├── __init__.py              # Package initialization\n│   ├── __main__.py              # CLI entry point\n│   ├── column_normalizer.py     # Column name normalization utilities\n│   ├── metadata_extractor.py    # CSV analysis and metadata extraction\n│   ├── mysql_generator.py       # MySQL SQL generation (coming soon)\n│   ├── postgresql_generator.py  # PostgreSQL SQL generation (coming soon)\n│   └── script_generators/       # Python script generation (coming soon)\n├── tests/                       # Test files\n├── AI_Instructions/             # Development documentation\n├── setup.py                     # Package setup configuration\n├── requirements.txt             # Project dependencies\n└── README.md                    # This file\n```\n\n## Output Files\n\nCSViper generates several files during processing:\n\n### Phase 1 Output\n\n- `{filename}.metadata.json`: Contains CSV structure analysis, normalized column names, and column width information\n\n### Phase 2 Output\n\n- `{filename}.create_table_mysql.sql`: MySQL CREATE TABLE script\n- `{filename}.create_table_postgres.sql`: PostgreSQL CREATE TABLE script\n- `{filename}.import_data_mysql.sql`: MySQL data import script\n- `{filename}.import_data_postgres.sql`: PostgreSQL data import script\n\n### Phase 3 Output\n\n- `go.mysql.py`: Standalone Python script for MySQL database import\n- `go.postgresql.py`: Standalone Python script for PostgreSQL database import\n\n## Example Workflow\n\n### Option 1: Full Compilation (Recommended)\n\nRun all phases at once:\n\n```bash\npython -m csviper full_compile --from_csv=sales_data.csv --output_dir=./sales_data/\n```\n\n### Option 2: Step-by-Step Process\n\n1. **Analyze your CSV file**:\n\n```bash\npython -m csviper extract_metadata --from_csv=sales_data.csv\n```\n\n2. **Review the generated metadata** in `sales_data/sales_data.metadata.json`\n\n3. **Generate SQL scripts**:\n\n```bash\npython -m csviper build_sql --from_metadata_json=sales_data/sales_data.metadata.json\n```\n\n4. **Create import scripts**:\n\n```bash\npython -m csviper build_import_script --from_resource_dir=sales_data/\n```\n\n5. **Use the invoker system to import data**:\n\n```bash\npython -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql\n```\n\n## The Invoker System\n\nThe invoker system is CSViper's intelligent file discovery and execution engine. It automatically finds the most recent data file matching your original CSV pattern and executes the appropriate import script.\n\n### How It Works\n\n1. **File Pattern Matching**: CSViper stores a glob pattern in the metadata file (e.g., `sales_data_*.csv`) that matches files with similar naming conventions\n2. **Automatic Discovery**: The invoker searches your data directory for files matching this pattern\n3. **Latest File Selection**: It automatically selects the most recently modified file\n4. **User Confirmation**: Shows you the selected file and asks for confirmation before proceeding\n5. **Script Execution**: Runs the appropriate database import script (`go.mysql.py` or `go.postgresql.py`)\n\n### Example Invoker Usage\n\nIf you have a directory with multiple data files:\n\n```\ndata/\n├── sales_data_2024-01.csv\n├── sales_data_2024-02.csv\n├── sales_data_2024-03.csv\n└── other_data.csv\n```\n\nAnd compiled scripts in:\n\n```\nsales_data/\n├── sales_data.metadata.json\n├── go.mysql.py\n├── go.postgresql.py\n└── ...\n```\n\nRunning the invoker:\n\n```bash\npython -m csviper invoke-compiled-script --run_import_from=./sales_data/ --import_data_from_dir=./data/ --database_type=postgresql\n```\n\nWill automatically:\n- Find all files matching `sales_data_*.csv`\n- Select `sales_data_2024-03.csv` (most recent)\n- Ask for your confirmation\n- Execute `go.postgresql.py` with the selected file\n\n### Benefits\n\n- **No manual file specification**: Automatically finds the latest data file\n- **Pattern-based matching**: Works with timestamped or versioned files\n- **Safety confirmation**: Always asks before proceeding\n- **Flexible search**: Supports both recursive and non-recursive directory searching\n- **Database agnostic**: Works with both MySQL and PostgreSQL scripts\n\n## Contributing\n\n1. Fork the repository\n2. Create a feature branch\n3. Make your changes\n4. Add tests for new functionality\n5. Run the test suite\n6. Submit a pull request\n\n## License\n\nThis project is licensed under the MIT License - see the LICENSE file for details.\n\n## Roadmap\n\n- [x] Phase 1: CSV metadata extraction and column normalization\n- [x] Phase 2: SQL script generation for MySQL and PostgreSQL\n- [x] Phase 3: Python import script generation\n- [x] Phase 4: Invoker system with automatic file discovery\n- [x] Full compilation workflow\n- [ ] Enhanced error handling and validation\n- [ ] Progress bars for large file processing\n- [ ] Configuration file support\n- [ ] Additional database backend support\n- [ ] Web interface for easier usage\n- [ ] Docker containerization\n\n## Support\n\nFor questions, issues, or contributions, please visit the [GitHub repository](https://github.com/ftrotter/csviper).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsacms%2Fnpd_csviper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdsacms%2Fnpd_csviper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdsacms%2Fnpd_csviper/lists"}