{"id":16162428,"url":"https://github.com/sunsided/tensorflow-scaffold","last_synced_at":"2025-07-26T10:15:13.702Z","repository":{"id":141993002,"uuid":"133805446","full_name":"sunsided/tensorflow-scaffold","owner":"sunsided","description":"An attempt on creating a best practices TensorFlow project.","archived":false,"fork":false,"pushed_at":"2018-06-17T20:51:26.000Z","size":45592,"stargazers_count":4,"open_issues_count":0,"forks_count":1,"subscribers_count":3,"default_branch":"develop","last_synced_at":"2025-07-02T23:40:02.866Z","etag":null,"topics":["artificial-intelligence","python","tensorflow","work-in-progress"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/sunsided.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-05-17T11:44:44.000Z","updated_at":"2023-06-17T14:18:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"43c58bc5-a9a5-4f61-99cd-2a6d88caa884","html_url":"https://github.com/sunsided/tensorflow-scaffold","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/sunsided/tensorflow-scaffold","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Ftensorflow-scaffold","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Ftensorflow-scaffold/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Ftensorflow-scaffold/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Ftensorflow-scaffold/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/sunsided","download_url":"https://codeload.github.com/sunsided/tensorflow-scaffold/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/sunsided%2Ftensorflow-scaffold/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267149769,"owners_count":24043461,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-26T02:00:08.937Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","python","tensorflow","work-in-progress"],"created_at":"2024-10-10T02:30:06.367Z","updated_at":"2025-07-26T10:15:13.684Z","avatar_url":"https://github.com/sunsided.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# TensorFlow Project Scaffold\n\nThis project is meant to provide a starting point for new\nTensorFlow projects. It showcases\n\n- [`tf.estimator.Estimator`]-based training using custom\n  `input_fn` and `model_fn` functions, using \n  standard [`tf.estimator.EstimatorSpec`] definitions.\n  - Image files are read using [`tf.gfile.FastGFile`] for source-agnostic, lock-free file loading.\n  - JPEGs are decoding efficiently using [`tf.image.decode_and_crop_jpeg`].\n- Usage of pretrained models using [`tensorflow_hub.Module`].\n- [`tf.data.Dataset`] with `.list_files()` and `.from_generator()`\n   examples.\n  - Interleaved `TFRecord` input streams using [`tf.data.TFRecordDataset`] and \n    [`tf.contrib.data.parallel_interleave`].\n  - GPU prefetching using [`tf.contrib.data.prefetch_to_device`].\n- Automatic snapshotting of parameters with the best\n  validation loss into a separate directory using a custom [`SessionRunHook`].\n\n[`tf.estimator.Estimator`]: https://www.tensorflow.org/api_docs/python/tf/estimator/Estimator\n[`tf.estimator.EstimatorSpec`]: https://www.tensorflow.org/api_docs/python/tf/estimator/EstimatorSpec\n[`tf.gfile.FastGFile`]: https://www.tensorflow.org/api_docs/python/tf/gfile/FastGFile\n[`tf.image.decode_and_crop_jpeg`]: https://www.tensorflow.org/api_docs/python/tf/image/decode_and_crop_jpeg\n[`tensorflow_hub.Module`]: https://www.tensorflow.org/hub/\n[`tf.data.TFRecordDataset`]: https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset\n[`tf.data.Dataset`]: https://www.tensorflow.org/api_docs/python/tf/data/Dataset\n[`tf.contrib.data.parallel_interleave`]: https://www.tensorflow.org/api_docs/python/tf/contrib/data/parallel_interleave\n[`tf.contrib.data.prefetch_to_device`]: https://www.tensorflow.org/api_docs/python/tf/contrib/data/prefetch_to_device\n[`SessionRunHook`]: https://www.tensorflow.org/api_docs/python/tf/train/SessionRunHook\n\nInspirations and sources:\n\n- [Importing Data](https://www.tensorflow.org/programmers_guide/datasets)\n- [Input Pipeline Performance Guide](https://www.tensorflow.org/versions/master/performance/datasets_performance    )\n- [Preparing a large-scale image dataset with TensorFlow's TFRecord files](https://kwotsin.github.io/tech/2017/01/29/tfrecords.html)\n- [Getting Text into Tensorflow with the Dataset API](https://medium.com/@TalPerry/getting-text-into-tensorflow-with-the-dataset-api-ffb832c8bec6)\n- [How to write into and read from a TFRecords file in TensorFlow](http://www.machinelearninguru.com/deep_learning/tensorflow/basics/tfrecord/tfrecord.html)\n- [Use HParams and YAML to Better Manage Hyperparameters in Tensorflow](https://hanxiao.github.io/2017/12/21/Use-HParams-and-YAML-to-Better-Manage-Hyperparameters-in-Tensorflow/)\n- [generator-tf](https://github.com/jrabary/generator-tf/)\n\n## Structure of the project\n\n- `project`: project modules such as networks, input pipelines, etc.\n- `library`: scripts and boilerplate code\n\nTwo configuration files exist:\n\n- `project.yaml`: Serialized command-line options\n- `hyperparameters.yaml`: Model hyperparameters\n\nHere's an example `hyperparameters.yaml`, with a default hyper-parameter\nset (conveniently called `default`), and an additional set named `mobilenet`.\nHere, the `mobilenet` set inherits from `default` and overwrites\nonly the default parameters with the newly defined ones.\n\n```yaml\ndefault: \u0026DEFAULT\n  # batch_size: 100\n  # num_epoch: 1000\n  # optimizer: Adam\n  learning_rate: 1e-4\n  dropout_rate: 0.5\n  l2_regularization: 1e-8\n  xentropy_label_smoothing: 0.\n  adam_beta1: 0.9\n  adam_beta2: 0.999\n  adam_epsilon: 1e-8\n\nmobilenet:\n  \u003c\u003c: *DEFAULT\n  learning_rate: 1e-5\n  fine_tuning: True\n```\n\nLikewise, the `project.yaml` contains serialized command-line\nparameters:\n\n```yaml\ndefault: \u0026DEFAULT\n  train_batch_size: 32\n  train_epochs: 1000\n  epochs_between_evals: 100\n  hyperparameter_file: hyperparameters.yaml\n  hyperparameter_set: default\n  model: latest\n  model_dir: out/current/checkpoints\n  best_model_dir: out/current/best\n\ngtx1080ti:\n  \u003c\u003c: *DEFAULT\n  train_batch_size: 512\n\nthinkpadx201t:\n  \u003c\u003c: *DEFAULT\n  train_batch_size: 10\n  train_epochs: 10\n  epochs_between_evals: 1\n  random_seed: 0\n```\n\nBy selecting a configuration set on startup using the `--config_set` command-line\noption, best configurations can be stored and versioned easily.\nConfiguration provided on the command-line overrides values defined\nin `project.yaml`, allowing for quick iteration.\n\n## Run training\n\nIn order to run a training session (manually overriding configuration\nfrom `project.yaml`), try\n\n```bash\npython run.py \\\n    --xla \\\n    --epochs_between_evals 1000 \\\n    --train_epochs 10000 \\\n    --learning_rate 0.0001 \n```\n\n## Prepare the dataset\n\nIn order to improve processing speed later on, the image files are\nconverted to `TFRecord` format first. For this, run\n\n```bash\npython convert_dataset.py \\\n    --dataset_dir dataset/train \\\n    --tfrecord_filename train \\\n    --tfrecord_dir dataset/train \\\n    --max_edge 384\npython convert_dataset.py \\\n    --dataset_dir dataset/test \\\n    --tfrecord_filename test \\\n    --tfrecord_dir dataset/test \\\n    --max_edge 384\n```\n\nThis example stores image data as JPEG encoded raw bytes and decodes\nthem on the fly in the input pipeline. While this leads to much smaller\nTFRecord files compared to storing raw pixel values, it also creates\na (noticeable) latency. There's a tradeoff here.\n\n## TensorFlow Hub\n\nIn order to use [TensorFlow Hub](https://github.com/tensorflow/hub), install it using e.g.\n\n```bash\npip install tensorflow-hub\n```\n\nWhen initializing a Conda environment from `environment.yaml`, this is\nalready taken care of.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Ftensorflow-scaffold","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsunsided%2Ftensorflow-scaffold","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsunsided%2Ftensorflow-scaffold/lists"}