{"id":15222309,"url":"https://github.com/googlecloudplatform/dataflow-video-analytics","last_synced_at":"2026-03-02T07:33:24.737Z","repository":{"id":55961158,"uuid":"271614177","full_name":"GoogleCloudPlatform/dataflow-video-analytics","owner":"GoogleCloudPlatform","description":"Video Clip Analytics By Using Dataflow and Video AI For Object Tracking","archived":false,"fork":false,"pushed_at":"2024-05-01T14:47:57.000Z","size":1300,"stargazers_count":36,"open_issues_count":7,"forks_count":30,"subscribers_count":32,"default_branch":"master","last_synced_at":"2025-04-19T22:18:17.938Z","etag":null,"topics":["analytics","dataflow","mlapi","streaming","video","videoai"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/GoogleCloudPlatform.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-06-11T18:03:13.000Z","updated_at":"2024-12-30T22:25:10.000Z","dependencies_parsed_at":"2024-09-28T15:11:33.946Z","dependency_job_id":"b09f8100-b54f-4cf4-885a-ec5e2bcad315","html_url":"https://github.com/GoogleCloudPlatform/dataflow-video-analytics","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/GoogleCloudPlatform/dataflow-video-analytics","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdataflow-video-analytics","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdataflow-video-analytics/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdataflow-video-analytics/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdataflow-video-analytics/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/GoogleCloudPlatform","download_url":"https://codeload.github.com/GoogleCloudPlatform/dataflow-video-analytics/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/GoogleCloudPlatform%2Fdataflow-video-analytics/sbom","scorecard":{"id":57964,"data":{"date":"2025-08-11","repo":{"name":"github.com/GoogleCloudPlatform/dataflow-video-analytics","commit":"e457fd8b7c6b7a89b09ef560eb62e13dffb7aee5"},"scorecard":{"version":"v5.2.1-40-gf6ed084d","commit":"f6ed084d17c9236477efd66e5b258b9d4cc7b389"},"score":3.6,"checks":[{"name":"Binary-Artifacts","score":10,"reason":"no binaries found in the repo","details":null,"documentation":{"short":"Determines if the project has generated executable (binary) artifacts in the source repository.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#binary-artifacts"}},{"name":"Token-Permissions","score":-1,"reason":"No tokens found","details":null,"documentation":{"short":"Determines if the project's workflows follow the principle of least privilege.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#token-permissions"}},{"name":"Code-Review","score":1,"reason":"Found 3/17 approved changesets -- score normalized to 1","details":null,"documentation":{"short":"Determines if the project requires human code review before pull requests (aka merge requests) are merged.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#code-review"}},{"name":"Packaging","score":-1,"reason":"packaging workflow not detected","details":["Warn: no GitHub/GitLab publishing workflow detected."],"documentation":{"short":"Determines if the project is published as a package that others can easily download, install, easily update, and uninstall.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#packaging"}},{"name":"Maintained","score":0,"reason":"0 commit(s) and 0 issue activity found in the last 90 days -- score normalized to 0","details":null,"documentation":{"short":"Determines if the project is \"actively maintained\".","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#maintained"}},{"name":"Dangerous-Workflow","score":-1,"reason":"no workflows found","details":null,"documentation":{"short":"Determines if the project's GitHub Action workflows avoid dangerous patterns.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#dangerous-workflow"}},{"name":"CII-Best-Practices","score":0,"reason":"no effort to earn an OpenSSF best practices badge detected","details":null,"documentation":{"short":"Determines if the project has an OpenSSF (formerly CII) Best Practices Badge.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#cii-best-practices"}},{"name":"Pinned-Dependencies","score":-1,"reason":"no dependencies found","details":null,"documentation":{"short":"Determines if the project has declared and pinned the dependencies of its build process.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#pinned-dependencies"}},{"name":"Vulnerabilities","score":10,"reason":"0 existing vulnerabilities detected","details":null,"documentation":{"short":"Determines if the project has open, known unfixed vulnerabilities.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#vulnerabilities"}},{"name":"Fuzzing","score":0,"reason":"project is not fuzzed","details":["Warn: no fuzzer integrations found"],"documentation":{"short":"Determines if the project uses fuzzing.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#fuzzing"}},{"name":"License","score":9,"reason":"license file detected","details":["Info: project has a license file: LICENSE:0","Warn: project license file does not contain an FSF or OSI license."],"documentation":{"short":"Determines if the project has defined a license.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#license"}},{"name":"Signed-Releases","score":-1,"reason":"no releases found","details":null,"documentation":{"short":"Determines if the project cryptographically signs release artifacts.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#signed-releases"}},{"name":"Security-Policy","score":0,"reason":"security policy file not detected","details":["Warn: no security policy file detected","Warn: no security file to analyze","Warn: no security file to analyze","Warn: no security file to analyze"],"documentation":{"short":"Determines if the project has published a security policy.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#security-policy"}},{"name":"Branch-Protection","score":-1,"reason":"internal error: error during branchesHandler.setup: internal error: githubv4.Query: Resource not accessible by integration","details":null,"documentation":{"short":"Determines if the default and release branches are protected with GitHub's branch protection settings.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#branch-protection"}},{"name":"SAST","score":0,"reason":"SAST tool is not run on all commits -- score normalized to 0","details":["Warn: 0 commits out of 20 are checked with a SAST tool"],"documentation":{"short":"Determines if the project uses static code analysis.","url":"https://github.com/ossf/scorecard/blob/f6ed084d17c9236477efd66e5b258b9d4cc7b389/docs/checks.md#sast"}}]},"last_synced_at":"2025-08-15T01:08:24.812Z","repository_id":55961158,"created_at":"2025-08-15T01:08:24.812Z","updated_at":"2025-08-15T01:08:24.812Z"},"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":279970432,"owners_count":26252761,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-10-19T02:00:07.647Z","response_time":64,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","dataflow","mlapi","streaming","video","videoai"],"created_at":"2024-09-28T15:11:31.847Z","updated_at":"2025-10-20T01:30:55.695Z","avatar_url":"https://github.com/GoogleCloudPlatform.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Video Analytics Solution Using Dataflow \u0026 Video AI\nThis repo contains a reference implementations for video analytics solutions by using Dataflow \u0026 Video AI.  The goal is to provide an easy to use end to end solution to process large scale unstructured video data by bringing multiple data streams together to drive insight using Video AI. \n\n## Table of Contents  \n* [Object Detection in Video Clips](#object-detection-in-video-clips).  \n\t* [Reference Architecture](#reference-architecture-using-video-intelligence-api).      \n\t* [Build \u0026 Run Using Dataflow Flex Template](#build-run).  \n\t* [Test Using a Drone  Video Clip Dataset from Kaggle ](#test). \n\t* [Custom Json Output and Filtering ](#custom-json-output-and-filtering ). \n\n## Object Detection in Video Clips \nMany customers across various industries  are producing large volumes of unstructured  data and are looking for easy to use streaming solutions to  analyze in near real time. For example, alarm monitoring companies want to augment motion sensor data with the analysis of video clips (and, eventually, live video feeds) to determine if a dispatch of a security team to a customer’s premises is justified and therefore reduce the false positive rate that drives the costs of their operations up. This section of this repo highlights how you can use this pipeline to detect objects in large scale video clips and customize the Json response for downstream systems to consume.  \n\nFor testing purpose, we use this [dataset](https://www.kaggle.com/kmader/drone-videos) from Kaggle collected from drone video clips.\n\n### Reference Architecture Using Video Intelligence API\n ![ref_arch](diagram/video_blog_diagram.png)\n\n### How the pipeline works?\n1. Solution assumes video clips are uploaded and stored in a GCS bucket  and a metadata notification is sent out to a PubSub topic.\n\n2. Dataflow pipeline process the video files in micro batch  and based on the list of features passed as pipeline argument.  \n\n3. Dataflow pipeline uses the list of entities and confidence score to filter the Video Intelligence API response and output to following sinks:\n\t *  In a nested table in BigQuery for further analysis. \n\t * In a PubSub topic by customizing the Json response so that downstream applications can consume in near real time. \n\n### Build \u0026 Run\n\n1. Enable some Google Cloud APIs:\n\n```\ngcloud services enable dataflow.googleapis.com containerregistry.googleapis.com videointelligence.googleapis.com\n```\n\n2. Set some environment variables (replace values with your project ID and preferred region):\n\n```\nexport PROJECT=[PROJECT]\nexport REGION=[REGION]\n```\n\n3. Create two buckets, one to store input video files and another one to store Dataflow Flex template config files:\n\n```\nexport VIDEO_CLIPS_BUCKET=${PROJECT}_videos\nexport DATAFLOW_TEMPLATE_BUCKET=${PROJECT}_dataflow_template_config\ngsutil mb -c standard -l ${REGION} gs://${VIDEO_CLIPS_BUCKET}\ngsutil mb -c standard -l ${REGION} gs://${DATAFLOW_TEMPLATE_BUCKET}\n```\n\n4. Create required topics and subscriptions as below\n\n```\nexport GCS_NOTIFICATION_TOPIC=\"gcs-notification-topic\"\nexport GCS_NOTIFICATION_SUBSCRIPTION=\"gcs-notification-subscription\"\nexport OBJECT_DETECTION_TOPIC=\"object-detection-topic\"\nexport OBJECT_DETECTION_SUBSCRIPTION=\"object-detection-subscription\"\nexport ERROR_TOPIC=\"object-detection-error-topic\"\nexport ERROR_SUBSCRIPTION=\"object-detection-error-subscription\"\ngcloud pubsub topics create ${GCS_NOTIFICATION_TOPIC}\ngcloud pubsub subscriptions create ${GCS_NOTIFICATION_SUBSCRIPTION} --topic=${GCS_NOTIFICATION_TOPIC}\ngcloud pubsub topics create ${OBJECT_DETECTION_TOPIC}\ngcloud pubsub subscriptions create ${OBJECT_DETECTION_SUBSCRIPTION} --topic=${OBJECT_DETECTION_TOPIC}\ngcloud pubsub topics create ${ERROR_TOPIC}\ngcloud pubsub subscriptions create ${ERROR_SUBSCRIPTION} --topic=${ERROR_TOPIC}\n```\n\n5. Create a BigQuery dataset and Table. \n\n```\nexport BIGQUERY_DATASET=\"video_analytics\"\nbq mk -d --location=US ${BIGQUERY_DATASET}\n\nbq mk -t \\\n--schema src/main/resources/table_schema.json \\\n--description \"object_tracking_data\" \\\n${PROJECT}:${BIGQUERY_DATASET}.object_tracking_analysis\n```\n\n6. Gradle Build\n\n```\ngradle spotlessApply -DmainClass=com.google.solutions.df.video.analytics.VideoAnalyticsPipeline \ngradle build -DmainClass=com.google.solutions.df.video.analytics.VideoAnalyticsPipeline \n```  \n\n7.  Trigger using Gradle Run \n\nThis configuration is defaulted to 1 \n\n- 1 second processing time\n- filter for window and person entity with confidence greater than 90%\n\n```\ngradle run -Pargs=\"\n--project=${PROJECT} --region=${REGION}\n--runner=DataflowRunner --streaming --enableStreamingEngine\n--autoscalingAlgorithm=THROUGHPUT_BASED --numWorkers=3 --maxNumWorkers=5 --workerMachineType=n1-highmem-4\n--inputNotificationSubscription=projects/${PROJECT}/subscriptions/${GCS_NOTIFICATION_SUBSCRIPTION}\n--outputTopic=projects/${PROJECT}/topics/${OBJECT_DETECTION_TOPIC}\n--errorTopic=projects/${PROJECT}/topics/${ERROR_TOPIC}\n--features=OBJECT_TRACKING --entities=cat --confidenceThreshold=0.9 --windowInterval=1 \n--tableReference=${PROJECT}:${BIGQUERY_DATASET}.object_tracking_analysis\"\n```\n\n8. Create a docker image for flex template. \n \n```\ngradle jib -Djib.to.image=gcr.io/${PROJECT}/dataflow-video-analytics:latest\n```\n\n9. Upload the template JSON config file to GCS.\n\n```\ncat \u003c\u003c EOF | gsutil cp - gs://${DATAFLOW_TEMPLATE_BUCKET}/dynamic_template_video_analytics.json\n{\n  \"image\": \"gcr.io/${PROJECT}/dataflow-video-analytics:latest\",\n  \"sdk_info\": {\"language\": \"JAVA\"}\n}\nEOF\n```\n\n10. Trigger using Dataflow flex template\n\n```\ngcloud beta dataflow flex-template run \"video-object-tracking\" \\\n--project=${PROJECT} \\\n--region=${REGION} \\\n--template-file-gcs-location=gs://${DATAFLOW_TEMPLATE_BUCKET}/dynamic_template_video_analytics.json \\\n--parameters=\u003c\u003c'EOF'\n^~^autoscalingAlgorithm=\"NONE\"~numWorkers=5~maxNumWorkers=5~workerMachineType=n1-highmem-4\n  ~inputNotificationSubscription=projects/${PROJECT}/subscriptions/${GCS_NOTIFICATION_SUBSCRIPTION}\n  ~outputTopic=projects/${PROJECT}/topics/${OBJECT_DETECTION_TOPIC}\n  ~errorTopic=projects/${PROJECT}/topics/${ERROR_TOPIC}\n  ~features=OBJECT_TRACKING~entities=window,person~confidenceThreshold=0.9~windowInterval=1\n  ~tableReference=${PROJECT}:${BIGQUERY_DATASET}.object_tracking_analysis\n  ~streaming=true\nEOF\n```\n\n### Test\n1.  Validate the pipeline is running from the Dataflow console\n ![ref_arch](diagram/video_analytics_dag.png)\n \n2. Enable GCS metadata notification for the PubSub and copy sample data to your bucket. \n\n```\ngsutil notification create -t ${GCS_NOTIFICATION_TOPIC} -f json gs://${VIDEO_CLIPS_BUCKET}\n```\n\n3. Copy test files to the bucket:\n\n```\ngsutil -m cp \"gs://df-video-analytics-drone-dataset/*\" gs://${VIDEO_CLIPS_BUCKET}\n```\n\n4. Please validate if pipeline has successfully processed the data by looking the elements count in the write transform. \n\n ![t1](diagram/transform_1.png)\n \n ![t2](diagram/transform_2.png)\n \n ![t3](diagram/transform_3.png)\n \n ![t4](diagram/transform_4.png)\n\n### Custom Json Output and Filtering \nPipeline uses a nested table in BigQuery to store the API response and also publishes a customized json message to a PubSub topic so that downstream applications can consume it in near real time. This reference implementation shows how you can customize the standard Json response received from Video intelligence API by using [Row/Schema](https://github.com/GoogleCloudPlatform/dataflow-video-analytics/blob/master/src/main/java/com/google/solutions/df/video/analytics/common/Util.java) and built in Beam transform like [ToJson and Filter](https://github.com/GoogleCloudPlatform/dataflow-video-analytics/blob/master/src/main/java/com/google/solutions/df/video/analytics/common/ResponseWriteTransform.java) by column name. \n\n#### BigQuery Schema \n\n ![t4](diagram/table_schema.png). \n\n* You can use the following query to investigate different objects and confidence level found from our kaggle dataset collected from the video clips\n\n```\nSELECT gcsUri, entity \nFROM `video_analytics.object_tracking_analysis` \nWHERE entity like 'person'\nGROUP by gcsUri, entity\n\n```\n ![t4](diagram/bigquery_table.png). \n\n*  In the test pipeline, you can see from this argument  \"entities=window,person\" and \"confidenceThreshold=0.9\" , pipeline is filtering the response that may be required  for near real time processing for downstream applications.  You can use the command below to see the publish message from the output subscription. \n\n```\ngcloud pubsub subscriptions pull ${OBJECT_DETECTION_SUBSCRIPTION} --auto-ack --limit 1 --project ${PROJECT}\n```\n\n\n* You should see json output like below:\n\n```{\n   \"file_name\":\"cat.mp4\",\n   \"entity\":\"cat\",\n   \"frame_data\":[\n      {\n         \"processing_timestamp\":\"2020-06-25 13:50:14.964000\",\n         \"timeOffset\":\"0.0\",\n         \"confidence\":0.8674923181533813,\n         \"left\":0.14,\n         \"top\":0.22545259,\n         \"right\":0.74,\n         \"bottom\":0.86\n      },\n      {\n         \"processing_timestamp\":\"2020-06-25 13:50:15.270000\",\n         \"timeOffset\":\"0.12\",\n         \"confidence\":0.8674923181533813,\n         \"left\":0.140104,\n         \"top\":0.22684973,\n         \"right\":0.740104,\n         \"bottom\":0.8611095\n      },\n      {\n         \"processing_timestamp\":\"2020-06-25 13:50:15.273000\",\n         \"timeOffset\":\"0.24\",\n         \"confidence\":0.8674923181533813,\n         \"left\":0.14010431,\n         \"top\":0.22685367,\n         \"right\":0.7401043,\n         \"bottom\":0.861113\n      },\n      {\n         \"processing_timestamp\":\"2020-06-25 13:50:15.275000\",\n         \"timeOffset\":\"0.36\",\n         \"confidence\":0.8674923181533813,\n         \"left\":0.14010426,\n         \"top\":0.22762112,\n         \"right\":0.7401043,\n         \"bottom\":0.8618804\n      },\n      {\n         \"processing_timestamp\":\"2020-06-25 13:50:15.276000\",\n         \"timeOffset\":\"0.48\",\n         \"confidence\":0.8603168725967407,\n         \"left\":0.14002976,\n         \"top\":0.23130082,\n         \"right\":0.7400298,\n         \"bottom\":0.86003596\n      },\n  ....  \n   ]\n}\n```\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fdataflow-video-analytics","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgooglecloudplatform%2Fdataflow-video-analytics","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgooglecloudplatform%2Fdataflow-video-analytics/lists"}