{"id":21482846,"url":"https://github.com/ibmstreams/sample.edge-mnist-notebook","last_synced_at":"2025-07-29T01:35:41.470Z","repository":{"id":74839419,"uuid":"287036208","full_name":"IBMStreams/sample.edge-mnist-notebook","owner":"IBMStreams","description":"MNIST digit recognition notebook sample ","archived":false,"fork":false,"pushed_at":"2020-12-04T23:20:32.000Z","size":6170,"stargazers_count":0,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-06-05T21:02:46.892Z","etag":null,"topics":["edge-computing","samples","stream-processing"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IBMStreams.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-12T14:31:30.000Z","updated_at":"2020-12-04T23:20:34.000Z","dependencies_parsed_at":"2023-07-22T16:30:33.121Z","dependency_job_id":null,"html_url":"https://github.com/IBMStreams/sample.edge-mnist-notebook","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/IBMStreams/sample.edge-mnist-notebook","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fsample.edge-mnist-notebook","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fsample.edge-mnist-notebook/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fsample.edge-mnist-notebook/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fsample.edge-mnist-notebook/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IBMStreams","download_url":"https://codeload.github.com/IBMStreams/sample.edge-mnist-notebook/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IBMStreams%2Fsample.edge-mnist-notebook/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":267616717,"owners_count":24116162,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-28T02:00:09.689Z","response_time":68,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["edge-computing","samples","stream-processing"],"created_at":"2024-11-23T12:38:15.522Z","updated_at":"2025-07-29T01:35:41.446Z","avatar_url":"https://github.com/IBMStreams.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# sample.edge-mnist-notebook\n\nThis sample demonstrates the use of a Streams Python Notebook, and Edge Analytics in\nCloud Pak for Data, to recognize digit images using a simple scikit-learn ML\nmodel trained with the standard MNIST digit dataset.  The ML model scores data right at\nthe micro-edge, but sends back metrics and low-confidence predictions to an application\nrunning on the CP4D Hub for later analysis.\n\n![Example Digit Predictions](preview.gif)\n\n## Requirements\n\nThis sample requires Cloud Pak for Data (CP4D) and several CP4D services: Streams, Watson Studio,\nand Edge Analytics.  A Streams Instance should be provisioned, and Edge systems should be\navailable. It also requires read/write access to an IBM Event Streams or Kafka topic, accessible\nto the CP4D Streams Instance, as well as the Edge systems.  Depending on where the IBM Event\nStreams or Kafka instance is provisioned, the topic may need to be created on the IBM Cloud.\n\nPlease see the appropriate documentation links for installing and provisioning each item.  Be sure\nto use the instructions for the version you're using.\n\n1. IBM Cloud Pak for Data ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/cpd/install/install.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/cpd/install/install.html))\n2. Edge Analytics beta service on CP4D ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/svc-edge/install.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-edge/install.html))\n3. IBM Streams service on CP4D ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/cpd/svc/streams/install-intro.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-streams/streams-svc-install.html))\n4. Watson Studio service on CP4D ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/wsj/install/install-ws.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/wsj/install/install-ws.html))\n5. Streams Instance ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/cpd/svc/streams/provision.html#provision), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-streams/provision.html))\n6. Edge systems ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/svc-edge/admin.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-edge/admin.html))\n7. [IBM Event Streams instance on IBM Cloud](https://ibmstreams.github.io/streamsx.documentation/docs/edgeanalytics/kafka-options#event-streams-in-ibm-cloud)\n   or [other Kafka Options](https://ibmstreams.github.io/streamsx.documentation/docs/edgeanalytics/kafka-options)\n\n## Architectural Overview\n\nThe sample consists of three primary notebooks:\n- `build-edge-application` creates the micro-edge application.\n- `build-metro-application` creates the metro-edge application and submits it to run on the CP4D Hub.\n- `render-metro-views` displays live information from the metro-edge application, which is receiving\n   and aggregating messages from the micro-edge applications.\n\nWhen running on the Edge systems, the **micro-edge application** iterates through a set of test images,\npreparing and scoring them against a digit prediction model.  It sends aggregated metrics and\nlow-certainty images to a topic in Event Streams, which are then picked up by the **metro-edge application**,\nrunning on the CP4D Hub in a Streams instance, where metrics can be aggregated across multiple micro-edge\napplication instances.\n\nA notebook running in the CP4D Hub can be used to see result data from the metro-edge application (which\nis receiving and aggregating data from the micro-edge applications), in real-time,\ndisplaying dashboards of current digit prediction statistics, uncertain digit prediction images and\nscores, and a mocked-up \"Correction Station\", which could be used to re-train the prediction model\nto improve accuracy, etc.\n\n![Application Architecture](arch.png)\n\n\n## Instructions\n\n### 1. Import the Sample into CP4D as a Project\n\nIn order to try out the sample, you need to first import it into CP4D as a new Project.\n1. From the Projects interface ([CPD4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/wsj/getting-started/projects.html), [CPD4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/wsj/getting-started/projects.html)), choose \"New Project\".\n2. Import ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/wsj/manage-data/import-project.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/wsj/manage-data/import-project.html)) by choosing \"Create a project from a file\".\n   (even though we'll be importing from a GitHub repository, you need to use the \"... from a file\" option).\n3. Select the \"From a Git Repository\" tab.\n4. Enter a Name to identify your project.\n5. Choose a Token, if you already have added a GitHub token to CP4D, or [create a new one](https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token)\n   and add it to CP4D using the \"New Token\" link.\n6. Enter the Repository URL: \u003chttps://github.com/IBMStreams/sample.edge-mnist-notebook.git\u003e\n7. Choose the \"main\" branch.\n8. Do _not_ enable on-demand synchronization with this git repository.\n9. Choose the \"Create\" button.\n\nFurther documentation for creating a project and integrating with GitHub is available here: ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/wsj/manage-data/git-integration.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/wsj/manage-data/git-integration.html)).\n\n### 2. Build and Deploy Micro-Edge Application\n1. Open the `build-edge-application.jupyter-py36` notebook in CP4D for editing and execution (click the pencil icon to the right\n   of the notebook you want to edit).  In CP4D v3.5, be sure to specify to use Python 3.6.\n2. In the first code cell, be sure the Streams Instance name (`STREAMS_INSTANCE_NAME`) and the Event Streams/Kafka topic\n   (`EVENTSTREAMS_TOPIC`) are set appropriately to match your environment (Requirements 6 and 7, respectively, above).\n   Edit the cell if necessary.\n3. Execute each cell in the notebook.\n   - Be sure to enter your Event Streams/Kafka credentials string in the fourth code cell when it prompts.  This should have\n     been acquired while setting up the Event Streams or Kafka instance, above in Requirement 7.\n4. The last cell submits the build request and waits for the application image to finish building, which might take a while.\n   - After successful completion, the application container image is available in the configured CP4D Docker registry, with\n     the image name `edge-camera-classifier-app:v1`.\n5. After building the image, it needs to be packaged for deployment ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/svc-edge/usage-register-app.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-edge/usage-register-app.html)),\n   either directly in CP4D or in Edge Application Manager.  If you wish to change any of the application parameters, this is the stage\n   you would do that (see documentation for interface details).  Possible parameters for this particular application are:\n   - `parallelism`: By default, each micro-edge application only has one instance of the ML model running, potentially limiting performance if\n     the input image stream is bringing in new images faster than the model can score them.  You can add additional parallel model instances\n     by specifying this parameter at an number higher than \"1\", enabling higher image scoring rates to be achieved.  Be cautious, as depending\n     on the size and load of the Edge system you will deploy the application to, you may overload the system by specifying too many parallel\n     scoring paths here.\n   - `confidence`: By default, any image where the highest prediction confidence is less than 0.70 will be sent to the metro-edge application\n     for potential manual scoring.  Setting this parameter to some other value adjusts that threshold.  The value should be between \"0.00\"\n     (no images will be sent to the metro-edge app) and \"1.00\" (all images will be sent to the metro-edge app).\n   - `repeat`: By default, the limited set of test images are sent into the application as fake camera images over and over, forever. You can\n     set a repeat count here, instead, so that after some number of times sending in the full set of test images, it stops. \"0\" will repeat forever,\n     \"1\" will send in the full set once, etc.\n   - `delay`: By default, the test images are sent in as fast as the application can handle them.  Setting a delay (in decimal seconds) here will artificially\n     slow down the test images, by pausing between images for the given amount of `delay`.  \"0\" sends images as fast as possible, \"0.5\" would pause\n     for half of a second between images, etc.\n   - `camera`: When reporting metrics back to the metro-edge application, the micro-edge application includes a camera identifier with its metrics,\n     so that problem instances can be identified, etc.  This setting controls the camera identifier prefix (which will have a system-based ID\n     added to the end by the application to ensure uniqueness).  By default, the prefix is simply \"Camera\".\n6. Finally, it can be deployed to edge systems ([CP4D v3.0](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.0.1/svc-edge/usage-deploy.html), [CP4D v3.5](https://www.ibm.com/support/knowledgecenter/SSQNUZ_3.5.0/svc-edge/usage-deploy.html)).\n7. Optionally, after the application is running on one or more edge systems, the `testing-kafka.jupyter-py36` notebook\n   can be used to directly view the messages the micro-edge application is writing to the Event Streams/Kafka topic, for debug.\n   - Before running the cells in that notebook, be sure to edit the first code cell, and set `EVENTSTREAMS_TOPIC` appropriately,\n     as well as setting `SHOW_IMAGES` to True if you wish to see the actual images sent over due to low-confidence predictions, along\n     with the possible predictions and scores. If `SHOW_IMAGES` is left at the default False, only the aggregated digit\n     prediction and scoring performance metrics will be shown.\n   - You'll also need to enter the Event Streams/Kafka credentials string when prompted, as above.\n\n### 3. Build and Submit Metro-Edge Application\n1. Open the `build-metro-application.jupyter-py36` notebook in CP4D for editing and execution.  In CP4D v3.5, be sure to specify to use Python 3.6.\n2. In the first code cell, be sure the Streams Instance name (`STREAMS_INSTANCE_NAME`) and Event Streams/Kafka topic\n   (`EVENTSTREAMS_TOPIC`) are set appropriately to match your environment, as above.  Edit the cell if necessary.\n3. Execute each cell in the notebook.\n   - Be sure to enter your Event Streams/Kafka credentials string when it prompts, as above.\n4. The last cell submits the build request and waits for the application to finish building.  Once it has finished, it\n   submits the application as a job in the local CP4D Streams Instance (that is, this application runs on the CP4D Hub,\n   _not_ on an Edge system).\n   - The running job can be viewed or canceled via the CP4D \"My Instances\" interface, under the \"Jobs\" tab (in CP4D v3.0), or, in CP4D 3.5, from within the project, under the \"Jobs\" tab (click on the job name to get to the \"Runs\" view, were the current run can be cancelled).\n\n### 4. Observe Running System\nOnce both applications are up and running, the micro-edge application will be sending occasional aggregate performance\nand prediction metrics up to the metro-edge application, along with images which it had difficulty predicting\n(that is, the prediction confidence was low for all possible options).  While the metro-edge application could perform\nsome additional analytics or action on those images and metrics, across all instances of the micro-edge application,\nthe current metro-edge application just aggregates them and exposes them as Streams Views so that local notebooks can\nperform interactive analysis of the current behavior.  The `render-metro-views` notebook is an example of this.\n1. Open the `render-metro-views.jupyter-py36` notebook in CP4D for editing and execution.  In CP4D v3.5, be sure to specify to use Python 3.6.\n2. Be sure the Streams Instance name (`STREAMS_INSTANCE_NAME`) is set appropriately to match your environment, as above.\n3. Execute the cells in the notebook.\n   -  While the early cells simply set up the Streams View connection queues, the last three sections are more notable,\n      and probably should be executed one at a time, reading the description and interacting with the graphs and images as\n      described in the notebook.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibmstreams%2Fsample.edge-mnist-notebook","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fibmstreams%2Fsample.edge-mnist-notebook","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fibmstreams%2Fsample.edge-mnist-notebook/lists"}