{"id":17793727,"url":"https://github.com/darenasc/auto-fes","last_synced_at":"2025-03-16T20:30:27.316Z","repository":{"id":37202769,"uuid":"249280647","full_name":"darenasc/auto-fes","owner":"darenasc","description":"Automated exploration of files in a folder structure to extract metadata and potential usage of information.","archived":false,"fork":false,"pushed_at":"2024-12-24T10:37:43.000Z","size":692,"stargazers_count":7,"open_issues_count":10,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-16T05:12:12.786Z","etag":null,"topics":["data-exploration","data-profiling","data-science","eda","plain-text","python"],"latest_commit_sha":null,"homepage":"https://darenasc.github.io/auto-fes/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/darenasc.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-03-22T22:01:50.000Z","updated_at":"2025-03-10T19:45:25.000Z","dependencies_parsed_at":"2023-01-25T13:46:03.573Z","dependency_job_id":"567985ea-f230-49a8-af18-b0a510121ad2","html_url":"https://github.com/darenasc/auto-fes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenasc%2Fauto-fes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenasc%2Fauto-fes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenasc%2Fauto-fes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/darenasc%2Fauto-fes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/darenasc","download_url":"https://codeload.github.com/darenasc/auto-fes/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243928152,"owners_count":20370241,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["data-exploration","data-profiling","data-science","eda","plain-text","python"],"created_at":"2024-10-27T11:12:29.001Z","updated_at":"2025-03-16T20:30:27.310Z","avatar_url":"https://github.com/darenasc.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Automated File Exploration System\n\n![](https://img.shields.io/github/license/darenasc/auto-fes)\n![](https://img.shields.io/github/last-commit/darenasc/auto-fes)\n![](https://img.shields.io/github/stars/darenasc/auto-fes?style=social)\n[![view - Documentation](https://img.shields.io/badge/view-Documentation-526CFE?style=for-the-badge\u0026logo=MaterialForMkDocs\u0026logoColor=white)](https://darenasc.github.io/auto-fes/ \"Go to project documentation\")\n\nAutomated exploration of files with structured data on them (`csv`, `txt`, \n`Excel`) in a folder structure to extract metadata and potential usage of \ninformation.\n\nIf you have a bunch of sctructured data in plain files, this library is for you.\n\n# Installation\n\n```bash\npip install -q git+https://github.com/darenasc/auto-fes.git\npip install -q ydata_profiling sweetviz # to make profiling tools work\n```\n\n## How to use it\n\n### Command line\n\n```bash\nafes --help\n\nafes explore --help\nafes explore \u003cPATH_TO_FILES_TO_EXPLORE\u003e\n\nafes generate --help\nafes generate \u003cPATH_TO_FILES_TO_EXPLORE\u003e # or\nafes generate \u003cPATH_TO_FILES_TO_EXPLORE\u003e \u003cOUTPUT_FILE_WITH_CODE\u003e\n\nafes profile --help\nafes profile \u003cPATH_TO_FILES_TO_EXPLORE\u003e # or\nafes profile \u003cPATH_TO_FILES_TO_EXPLORE\u003e \u003cOUTPUTS_PATH_FOR_REPORTS\u003e # or\nafes profile \u003cPATH_TO_FILES_TO_EXPLORE\u003e \u003cOUTPUTS_PATH_FOR_REPORTS\u003e \u003cPROFILE_TOOL\u003e # 'ydata-profiling' or 'sweetviz'\n```\n\n### Python scripts and notebooks\n```python\nfrom afes import afe\n\n# Path to folder with files to be explored\nTARGET_FOLDER = \"\u003cPATH_TO_FILES_TO_EXPLORE\u003e\"\nOUTPUT_FOLDER = \"\u003cPATH_TO_OUTPUTS\u003e\"\n\n# Run exploration on the files\ndf_files = afe.explore_files(TARGET_FOLDER)\n\n# Generate pandas code to load the files\nafe.generate_code(df_files)\n\n# Run profiling on each file\nafe.profile_files(df_files, profile_tool=\"ydata-profiling\", output_path=OUTPUT_FOLDER)\nafe.profile_files(df_files, profile_tool=\"sweetviz\", output_path=OUTPUT_FOLDER)\n```\n\n# What can you do with AFES\n\n* Explore\n* Generate code\n* Profile\n  \n```mermaid\nflowchart LR\n    Explore --\u003e Generate\n    Explore --\u003e Profile\n    Generate --\u003e PandasCode\n    Profile --\u003e ydata-profile@{ shape: doc }\n    Profile --\u003e sweetviz@{ shape: doc }\n```\n\n## Explore\n\n```python\nfrom afes import afe\n\n# Path to folder with files to be explored\nTARGET_FOLDER = \"\u003cPATH_TO_FILES_TO_EXPLORE\u003e\"\n\n# Run exploration on the files\ndf_files = afe.explore_files(TARGET_FOLDER)\ndf_files\n```\n\nThe `df_files` dataframe will look like the following table, depending on the \nfiles you plan to explore.\n\n```\n|      | path                                              | name                     | extension |    size | human_readable |  rows | separator |\n| ---: | :------------------------------------------------ | :----------------------- | :-------- | ------: | :------------- | ----: | :-------- |\n|    1 | /content/sample_data/auto_mpg.csv                 | auto_mpg                 | .csv      |   20854 | 20.4 KiB       |   399 | comma     |\n|    2 | /content/sample_data/car_evaluation.csv           | car_evaluation           | .csv      |   51916 | 50.7 KiB       |  1729 | comma     |\n|    3 | /content/sample_data/iris.csv                     | iris                     | .csv      |    4606 | 4.5 KiB        |   151 | comma     |\n|    4 | /content/sample_data/wine_quality.csv             | wine_quality             | .csv      |  414831 | 405.1 KiB      |  6498 | comma     |\n|    5 | /content/sample_data/california_housing_test.csv  | california_housing_test  | .csv      |  301141 | 294.1 KiB      |  3001 | comma     |\n|    6 | /content/sample_data/california_housing_train.csv | california_housing_train | .csv      | 1706430 | 1.6 MiB        | 17001 | comma     |\n```\n\nCheckout the [example.py](src/example.py) file and then run it from a terminal \nwith python as the following code, or using a Jupyter \n[notebook](src/notebook-example.ipynb).\n\n## Generate code\n\nUsing the dataframe `df_files` generated in the explore phase, you can generate \nworking python pandas code to be used. \n\nThe function `generate_files()` will generate python code to load the files using \n`pandas`.\n\n```python\nfrom afes import afe\n\n# Path to folder with files to be explored\nTARGET_FOLDER = \"\u003cPATH_TO_FILES_TO_EXPLORE\u003e\"\nOUTPUT_FOLDER = \"\u003cPATH_TO_OUTPUTS\u003e\"\n\ndf_files = afe.explore_files(TARGET_FOLDER)\nafe.generate_code(df_files)\n```\n\nThe generated code will look like this:\n\n```bash\n### Start of the code ###\nimport pandas as pd\n\ndf_auto_mpg = pd.read_csv('/content/sample_data/auto_mpg.csv', sep = ',')\ndf_car_evaluation = pd.read_csv('/content/sample_data/car_evaluation.csv', sep = ',')\ndf_iris = pd.read_csv('/content/sample_data/iris.csv', sep = ',')\ndf_wine_quality = pd.read_csv('/content/sample_data/wine_quality.csv', sep = ',')\ndf_california_housing_test = pd.read_csv('/content/sample_data/california_housing_test.csv', sep = ',')\ndf_california_housing_train = pd.read_csv('/content/sample_data/california_housing_train.csv', sep = ',')\n\n### End of the code ###\n\n\"code.txt\" has the generated Python code to load the files.\n```\n\nBy default the code is printed to the standard output but also written by \ndefault to the `./code.txt` file.\n\n\u003e Note: you can replace the `.txt`  extension by `.py` to make it a working \n\u003e Python script.\n\n### Profile\n\nUsing the dataframe `df_files` generated in the explore phase, the function \n`profile(df_files)` will automatically load and profiline the files using \n[ydata-profiling](https://github.com/ydataai/ydata-profiling) or \n[sweetviz](https://github.com/fbdesignpro/sweetviz).\n\n```python\n# Path to folder with files to be explored\nTARGET_FOLDER = \"\u003cPATH_TO_FILES_TO_EXPLORE\u003e\"\nOUTPUT_FOLDER = \"\u003cPATH_TO_OUTPUTS\u003e\"\n\n# Run exploration on the files\ndf_files = afe.explore_files(TARGET_FOLDER)\n\nafe.profile_files(df_files, profile_tool=\"ydata-profiling\", output_path=OUTPUT_FOLDER) # or\nafe.profile_files(df_files, profile_tool=\"sweetviz\", output_path=OUTPUT_FOLDER)\n```\n\nBy default, it will process the files using `ydata-profiling` by size order \nstarting with the smallest file. It will create the reports and export them in \nHTML format. It will store the reports in the same directory where the code is \nrunning or it save them in a given directory with the \n`output_path  = '\u003cYOUR_OUTPUT_PATH\u003e'` argument.\n\n# Contributing\n\n* Open an [issue](https://github.com/darenasc/auto-fes/issues) to request more \n* functionalities or feedback.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarenasc%2Fauto-fes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdarenasc%2Fauto-fes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdarenasc%2Fauto-fes/lists"}