{"id":21221709,"url":"https://github.com/glambard/threadsafe_generator_for_keras","last_synced_at":"2026-04-12T18:51:05.430Z","repository":{"id":202193843,"uuid":"139675236","full_name":"GLambard/threadsafe_generator_for_keras","owner":"GLambard","description":"An ultimate thread safe data generation for Keras","archived":false,"fork":false,"pushed_at":"2018-07-04T07:26:14.000Z","size":4,"stargazers_count":4,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-01-21T17:12:40.069Z","etag":null,"topics":["generator","keras","python","tensorflow","tensorflow-gpu","threadsafe"],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GLambard.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-07-04T06:00:50.000Z","updated_at":"2023-04-25T05:43:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"8abb4f01-3a80-442b-bf50-623b10e2db86","html_url":"https://github.com/GLambard/threadsafe_generator_for_keras","commit_stats":null,"previous_names":["glambard/threadsafe_generator_for_keras"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GLambard%2Fthreadsafe_generator_for_keras","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GLambard%2Fthreadsafe_generator_for_keras/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GLambard%2Fthreadsafe_generator_for_keras/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GLambard%2Fthreadsafe_generator_for_keras/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GLambard","download_url":"https://codeload.github.com/GLambard/threadsafe_generator_for_keras/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243668234,"owners_count":20328042,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["generator","keras","python","tensorflow","tensorflow-gpu","threadsafe"],"created_at":"2024-11-20T22:31:56.438Z","updated_at":"2025-12-29T18:17:42.548Z","avatar_url":"https://github.com/GLambard.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"# An \"ultimate\" thread safe data generator for Keras :zap::zap::zap: :satisfied:\n\nIf you use [Keras](https://keras.io) for training a neural network's model, you may use ```fit()``` or ```fit_generator()``` \nfunctions (see [here](https://keras.io/models/sequential/#sequential-model-methods) for details). The latter is particularly \nuseful when dealing with big datasets but you usually need to define a data generator which fits your needs. \n\n## Why a thread safe data generator\n1. You want to use more than 1 worker (CPU-thread)\n2. **You don't want your training data to be read more than once per epoch!**\n\n## But how?\n[keras.utils.Sequence()](https://keras.io/utils/#sequence) is your new friend!\n\n## System\n- python 3.6\n- keras 2.1.6\n- tensorflow(-gpu) 1.8.0\n\n## Usage\n```\nfrom keras.utils import Sequence\n''' add necessary libraries from keras and others to define your model, e.g. \nfrom keras.model import Sequential\n...\n'''\n\n\u003cmodel definition and compilation\u003e\nmodel = Sequential()\nmodel.add(...)\n\nmodel.compile(...)\n\nclass data_generator(Sequence):\n\n    def __init__(self, x_set, y_set, batch_size):\n        self.x, self.y = x_set, y_set\n        self.batch_size = batch_size\n\n    def __len__(self):\n        return int(np.ceil(len(self.x) / float(self.batch_size)))\n\n    def __getitem__(self, idx):\n        batch_x = self.x[idx * self.batch_size:(idx + 1) * self.batch_size]\n        batch_y = self.y[idx * self.batch_size:(idx + 1) * self.batch_size]\n\n        return np.array(batch_x), np.array(batch_y)\n        \nbatch_size = 100\nhistory = model.fit_generator(\n    generator = data_generator(x_train_gen, y_train_gen, batch_size), \n    steps_per_epoch = math.floor(x_train_gen.shape[0]/batch_size), \n    epochs = 100, \n    validation_data = DataSequence(x_valid_gen, y_valid_gen, batch_size), \n    validation_steps = math.floor(x_valid_gen.shape[0]/batch_size), \n    max_queue_size = 10, \n    workers = multiprocessing.cpu_count(),\n    use_multiprocessing = True, \n    shuffle = True,\n    initial_epoch = 0, \n    ...)\n```\nHere, x_set and y_set are numpy arrays. Additionally, you can add any transformation to the data which pleases you in the \n```__get_item__()``` (see [here](https://keras.io/utils/#sequence) for another example with images) \n\n# Source \n\n[keras.utils.Sequence()](https://keras.io/utils/#sequence)\n\n# I hope it helps! :smiley:\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglambard%2Fthreadsafe_generator_for_keras","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fglambard%2Fthreadsafe_generator_for_keras","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fglambard%2Fthreadsafe_generator_for_keras/lists"}