{"id":30523994,"url":"https://github.com/lightonai/opu-benchmarks","last_synced_at":"2025-08-26T20:52:23.330Z","repository":{"id":37638634,"uuid":"246346561","full_name":"lightonai/opu-benchmarks","owner":"lightonai","description":"ML benchmarks performance featuring LightOn's Optical Processing Unit (OPU) vs CPU and GPU.","archived":false,"fork":false,"pushed_at":"2022-11-22T04:27:57.000Z","size":732,"stargazers_count":21,"open_issues_count":2,"forks_count":0,"subscribers_count":7,"default_branch":"master","last_synced_at":"2023-03-04T05:22:23.981Z","etag":null,"topics":["hardware-acceleration","machine-learning","photonic-computing"],"latest_commit_sha":null,"homepage":"https://medium.com/@LightOnIO/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lightonai.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-03-10T16:04:06.000Z","updated_at":"2022-01-07T04:09:27.000Z","dependencies_parsed_at":"2023-01-21T12:47:00.231Z","dependency_job_id":null,"html_url":"https://github.com/lightonai/opu-benchmarks","commit_stats":null,"previous_names":[],"tags_count":null,"template":null,"template_full_name":null,"purl":"pkg:github/lightonai/opu-benchmarks","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fopu-benchmarks","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fopu-benchmarks/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fopu-benchmarks/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fopu-benchmarks/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lightonai","download_url":"https://codeload.github.com/lightonai/opu-benchmarks/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lightonai%2Fopu-benchmarks/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":272254541,"owners_count":24901064,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-26T02:00:07.904Z","response_time":60,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["hardware-acceleration","machine-learning","photonic-computing"],"created_at":"2025-08-26T20:52:22.671Z","updated_at":"2025-08-26T20:52:23.320Z","avatar_url":"https://github.com/lightonai.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# OPU Benchmarks\n\nThis repository contains three different benchmarks to compare the performance of CPU and GPU with LightOn's Optical Processing Unit (OPU).\n\n## Access to Optical Processing Units\n\nTo request access to LightOn Cloud and try our photonic co-processor, please visit: https://cloud.lighton.ai/\n\nFor researchers, we also have a LightOn Cloud for Research program, please visit https://cloud.lighton.ai/lighton-research/ for more information.\n\n## Installation\n\nWe advise creating a `virtualenv` before running these commands. You can create one with `python3 -m venv \u003cvenv_name\u003e`. \nActivate it with source `\u003cpath_to_venv\u003e/bin/activate`  before proceeding. We used `python 3.5` and `pytorch 1.2` \nfor all the simulations.\n\n- Clone the repository and then do `pip install \u003cpath_to_repo\u003e`. \n\n## Transfer learning on images\n\nThe standard transfer learning procedure with a CNN usually involves choosing a neural network architecture, pre-trained on ImageNet (e.g. a VGG, ResNet or DenseNet), and then fine-tune its weights on the dataset of choice.\n\nThe fine-tuning is typically carried out with the backpropagation algorithm on one or more GPUs, whereas the inference might be carried out on either a CPU or GPU, since the inference phase can be optimized in many ways for faster execution.\n\nThe pipeline involving the OPU is the following:\n\n- Compute the convolutional features by processing the training set through a CNN.\n- Encode the convolutional features in a binary format. We use an encoding scheme based on the sign: +1 if the sign of an element is positive, 0 otherwise.\n- Project the encoded convolutional features in a lower dimensional space with the OPU.\n- Fit a linear classifier to the random features of the train set. We chose the Ridge Classifier due to the fast implementation in Scikit-Learn.\n\nIn the inference phase we repeat these steps, except of course for the fitting of the linear classifier, which is instead used to get the class predictions.\n\nThe advantage of this algorithm is that it does not require the computation of gradients in the training phase, and the training itself requires just one pass over the training set. The bottleneck is represented by the fit of the classifier, but its running time can be improved by projecting to a lower dimensional space.\n\n### Run it\n\nDownload the dataset from the [Kaggle page](https://www.kaggle.com/alessiocorrado99/animals10). The dataset should be in the root folder of this repository, but all scripts have an option to change the path with `-dataset_path`.\n\nUse the script `OPU_training.py` in the `scripts/images` folder. An example call is the following one:\n\n```\npython3 OPU_training.py resnet50 Saturn -model_options noavgpool -model_dtype float32 \n-n_components 2 -device cuda:0 -dataset_path ~/datasets/ -save_path data/ \n```\n\nThe script implements the following pipeline:\n\n- Extract the convolutional features of the dataset with a ResNet50 model (without the avgpool at the end); The features \nare extracted in the dtype specified by `model_dtype` (either `float32` or `float16`) with Pytorch on the specified \n`device`: `cuda:n` selects the GPU #n, whereas `cpu` uses the CPU.\n- Encode the data, project it to a space that is half the original size with the OPU (`n_components=2`) and decode \nthe outcome.\n- Fit a Ridge Classifier to the random features matrix of the training dataset. \n\nThe steps are the same for the inference phase, except that the ridge fit is replaced by the evaluation on the \nmatrix of random features of the test set.\n\nThere are two arguments, `dataset_path` and `save_path`, that allow to specify the path to the dataset and\nsave folder respectively.\n\nFor the backpropagation training, you can use the `backprop_training.py` script in the same folder.\n \n```\npython3 backprop_training.py $model $dataset Adam $OPU $n_epochs -dataset_path=$dataset_path -save_path=$save_path\n```\n\nShould you want to iterate on multiple models, you can use the `cnn2d.sh` script in the `bash` folder. \nCall `chmod +x cnn2d.sh` and execute with `./cnn2d.sh`. There are instructions inside to pick the models for the simulations\n\n## Graph simulation\n\nOur data consists in a time-evolving graph, and we want to detect changes in its structure, such as an abnormal increase or decrease in the number of connections between nodes or the formation of one or more tightly connected communities *cliques*. \n\nWe propose to use NEWMA, presented in [this paper](https://ieeexplore.ieee.org/document/9078835). It involves computing two exponentially weighted moving averages (EWMA), with different forgetting factors. These track a certain function of the adjacency matrix, and flag a possible change point whenever the difference between the two EWMAs cross a threshold.\n\nWith this method, we can detect the formation of a clique in the graph and discriminate it from a simple increase in the number of edges. Once the clique has been detected, we can diagonalize the matrix to recover the eigenvector of the second largest eigenvalue, and recover the members of the clique.\n\n### Run it\n\nIn the `bash` folder there is a script called `graphs.sh`. Just launch that and it will run the same simulation with \nboth the OPU and GPU for graphs of different sizes.\n \n## Transfer Learning on videos\n\nWe propose to use the pipeline employed for images on videos. \nTraining on videos can be performed in many different ways. In this document we focus on the method proposed in [Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset](https://arxiv.org/abs/1705.07750) that allows to reach state-of-the-art results in the video action recognition task.\n\n### Run it\n\n#### Datasets and model\nThe training pipeline was developed starting from this paper:  \n[Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset](https://arxiv.org/abs/1705.07750).\n\nThe most popular datasets in the action recognition task are:\n\n- [HMDB51](http://serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database/) \n- [UCF101](https://www.crcv.ucf.edu/research/data-sets/ucf101/)\n\nThe videos are in `.avi` format. To use the flow/frames of the videos you can either extract them yourself with `ffmpeg`, or \ndownload them from [here](https://github.com/feichtenhofer/twostreamfusion). We opted for the download of the pre-extracted flow/frames for better reproducibility of the results.\n \nObtain the archives and extract them, then rename the folder `frames`/`flow` depending on the stream you picked. \n\nDownload the three splits [for the HMDB51](http://serre-lab.clps.brown.edu/wp-content/uploads/2013/10/test_train_splits.rar) \nand [for the UCF101](https://www.crcv.ucf.edu/wp-content/uploads/2019/03/UCF101TrainTestSplits-RecognitionTask.zip) dataset to get the annotation files.\nRename the resulting folder `annotations`.\n\nThe final structure for both datasets should be something like this:\n```\ndataset_folder\n|----frames\n|----flow\n|----annotations\n```\nFinally, download the pretrained rgb/flow models from this [git repository](https://github.com/AlxDel/i3d_crf/tree/master/models).   \n   \n#### Simulations\n\nWe recommend using the bash script `cnn3d_i3d.sh` in the `bash` folder. \n - Go to the `bash` folder and open the `cnn3d_i3d.sh` script;\n - Set the `OPU`/`backprop` flag to `true` depending on which simulation you want to launch \n - Edit the parameters at the top to match the path to the dataset, script and save folder along with other things you might want to change;\n - Make it executable with `chmod +x cnn3d_i3d.sh` and run with `./cnn3d_i3d.sh`.\n \n## Finetuning with RAY\n\nThe method used with the OPU has far less hyperparameters and it is a quick, easy way to get good performance. To have an idea of how much time it can take to fine-tune the hyperparameters of gradient-based optimization, this is a script that performs a hyperparameter search using `ray[tune]`.\n\n```\npython3 i3d_backprop_tune.py rgb hmdb51 -pretrained_path_rgb /home/ubuntu/opu-benchmarks/pretrained_weights/i3d_rgb_imagenet_kin.pt -dataset_\npath /home/ubuntu/datasets_video/HMDB51/ -save_path /home/ubuntu/opu-benchmarks/data/\n```\n\n## Hardware specifics\n\nAll the simulations have been run on a Tesla P100 GPU with 16GB memory and a Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz with 12 cores. \nFor the int8 simulations we use an RTX 2080 with 12GB memory.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fopu-benchmarks","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flightonai%2Fopu-benchmarks","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flightonai%2Fopu-benchmarks/lists"}