{"id":19176886,"url":"https://github.com/googleads/adwords-scripts-linkchecker","last_synced_at":"2025-05-07T19:42:09.600Z","repository":{"id":52951105,"uuid":"94106523","full_name":"googleads/adwords-scripts-linkchecker","owner":"googleads","description":"App Engine-based link checker for AdWords Scripts and Apps Script","archived":false,"fork":false,"pushed_at":"2022-12-14T19:42:30.000Z","size":59,"stargazers_count":12,"open_issues_count":4,"forks_count":15,"subscribers_count":10,"default_branch":"main","last_synced_at":"2025-04-20T02:34:56.433Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/googleads.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-06-12T14:37:56.000Z","updated_at":"2025-01-20T12:54:31.000Z","dependencies_parsed_at":"2023-01-29T00:32:17.755Z","dependency_job_id":null,"html_url":"https://github.com/googleads/adwords-scripts-linkchecker","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleads%2Fadwords-scripts-linkchecker","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleads%2Fadwords-scripts-linkchecker/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleads%2Fadwords-scripts-linkchecker/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/googleads%2Fadwords-scripts-linkchecker/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/googleads","download_url":"https://codeload.github.com/googleads/adwords-scripts-linkchecker/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":252945801,"owners_count":21829660,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-09T10:30:58.191Z","updated_at":"2025-05-07T19:42:09.569Z","avatar_url":"https://github.com/googleads.png","language":"Java","funding_links":[],"categories":[],"sub_categories":[],"readme":"# AdWords Scripts Linkchecker\n\n\n## Overview\n\nThis project is an App Engine application for verifying the status of URLs. The\napplication provides a simple API for submitting lists of URLs to be checked and\nquerying their progress.\n\n## Using the application\n\nThere are several ways to use this application for checking URLs:\n\n*   **Within AdWords Scripts**: If your goal is to check the status of URLs,\n    such as landing page URLs, from within AdWords, then you may wish to\n    consider the [AdWords Scripts larger-scale link\n    checker](https://developers.google.com/adwords/scripts/docs/solutions/larger-scale-link-checker)\n    which uses this App Engine solution directly.\n*   **[Within Apps Script](#using-within-apps-script)**: If your aim is to use\n    the application from within Apps Script, then this can again be achieved\n    without building a copy of the application, but instead using the pre-built\n    version maintained on Google Cloud Storage.\n*   **[From a custom client](#using-the-application-from-a-custom-client)**:\n    Even if working with your own custom client, you may wish to avoid building\n    the App Engine application yourself and just deploy the existing build from\n    Cloud Storage. Then, use the [linkchecker API](#the-linkchecker-api) to\n    interact with the application.\n*   **[Building the\n    application](#building-and-deploying-your-own-version-of-the-application)**:\n    If you wish to modify the App Engine application itself, you'll want to\n    build it and host it yourself on Cloud Storage before working with it in one\n    of the above ways.\n\n## Using within Apps Script\n\nOne of the easiest ways to use the link checker for your own purposes, beyond\nthe AdWords Scripts solution, is via Apps Script: The means to deploy and\nauthenticate with the application, as well as examples of how to interact with\nthe API are already available in a sample script.\n\n1.  Make a copy of the template spreadsheet for the [AdWords Scripts\n    solution](https://goo.gl/SiacAZ)\n    by clicking **File \u003e Make a copy**.\n\n    The **Cloud Setup** and **App Engine Performance** sheets will be the ones\n    of use, with the others not relevant outside of AdWords Scripts.\n\n2.  Follow the instructions on the **Cloud Setup** sheet. Note that if you are\n    intending to interact with other APIs in your Apps Script solution, for\n    example the DoubleClick Search API, then extra scopes should be added in\n    Step 2 on that sheet.\n\n3.  Once all steps on the **Cloud Setup** sheet are complete, locate the example\n    Apps Script application: Still within the spreadsheet click **Tools \u003e Script\n    editor** and locate the **Example.gs** script.\n\n    Here you will see, within the `main` function, calls to `listOperations`,\n    `createOperation`, `getOperation` and `deleteOperation` respectively. This\n    is preceded by the necessary setup to get authentication up and running with\n    the application.\n\n    Using these examples, combined with [the API reference](#the-linkchecker-api),\n    it is possible to quickly develop a custom script to work with the link\n    checker application.\n\n4.  Tune the settings for the application, such as the number of checks to be\n    performed in parallel, by configuring the **App Engine Performance** sheet.\n\n## Using the application from a custom client\n\n### Deployment\n\nIf you are using a custom client outside of Apps Script, you may wish first to\nuse the spreadsheet described in the above sections to ease the deployment of\nthe application. This is highly recommended because:\n\n1.  The deployment configuration, and location of the Cloud Storage file are all\n    pre-populated.\n2.  The application settings, such as the TaskQueue configuration and cron setup\n    are all taken care of.\n\nIt is possible to deploy the application yourself by other means, using the [App\nEngine Admin API](https://cloud.google.com/appengine/docs/admin-api/), and the\nApp Engine API to configure the Task Queue and cron. If you require this, then\nthe best option is to examine the source code of the spreadsheet.\n\n1.  Make a copy of the [template\n    spreadsheet](https://goo.gl/SiacAZ)\n    by clicking **File \u003e Make a copy**.\n2.  Click **Tools \u003e Script editor** and locate **CloudSetup.gs**. This script\n    has the details of all calls to the App Engine Admin API, and calls for\n    configuring the TaskQueue and cron, which you can transpose to your language\n    of choice.\n\n### Interacting with the linkchecker API\n\nIn order to interact with the linkchecker API, it is necessary to obtain the\nshared key which is required with all API calls. This approach was chosen for\nits simplicity when working within AdWords Scripts.\n\nThe shared key is stored in Google Datastore, which can both be accessed by the\nApp Engine application, and by any client using the [Datastore\nAPI](https://cloud.google.com/datastore/docs/reference/rest/).\n\nThe shared key should then be set in the HTTP authorization header: e.g:\n\n`Authorization: \u003cyour_shared_key\u003e`\n\n### Examples of retrieving the shared key\n\n*   **Apps Script**: As shown in the **CloudSetup.gs** file, in the\n    `getSharedKey_` function.\n\n*   **Java**: Using the [Datastore client\n    library](https://cloud.google.com/datastore/docs/reference/libraries):\n\n```java\npublic String getSharedKey() {\n  String projectId = \"\u003cyour_project_id\u003e\";\n  Datastore datastore = DatastoreOptions\n      .newBuilder()\n      .setProjectId(projectId)\n      .build()\n      .getService();\n  String kind = \"SharedKey\";\n  String name = \"key\";\n  Key taskKey = datastore.newKeyFactory().setKind(kind).newKey(name);\n  Entity retrieved = datastore.get(taskKey);\n\n  return retrieved.getString(\"key\");\n}\n```\n\n*   **Python**: Using the [Datastore client\n    library](https://cloud.google.com/datastore/docs/reference/libraries):\n\n```python\ndef get_shared_key():\n  datastore_client = datastore.Client(project=\"\u003cyour_project_id\u003e\")\n\n  kind = \"SharedKey\"\n  name = \"key\"\n\n  task_key = datastore_client.key(kind, name)\n  entry = datastore_client.get(task_key)\n\n  return entry[\"key\"]\n```\n\n### The linkchecker API\n\nThe application provides an API with the following methods. All operational\nmethods are relative to the account base URL of:\n\n```\nhttps://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/account/\u003caccount-id\u003e/\n```\n\nwhere:\n\n*   `project-id` is the project ID taken from Google Cloud console.\n*   `account-id` is a variable provided to allow a single instance of the App\n    Engine application to be used by multiple sources. It can be any numeric\n    value.\n\n| Method            | HTTP request                                     | Description                                            |\n| ----------------- | ------------------------------------------------ | ------------------------------------------------------ |\n| [Add](#add)       | `POST [account_base_url]/batchOperation`         | Submits a batch of URLs to be processed.               |\n| [List](#list)     | `GET [account_base_url]/batchOperation`          | Retrieves a list of current batches and their status.  |\n| [Get](#get)       | `GET [account_base_url]/batchOperation/[id]`     | Retrieves results for a specified batch operation.     |\n| [Delete](#delete) | `DELETE [account_base_url]/batchOperation/[id]`  | Deletes results for a specific operation.              |\n\nFurthermore, the API provides methods for retrieving and modifying settings for\nthe linkchecker. All methods are relative to the application base URL of:\n\n```\nhttps://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1\n```\n\n| Method                                | HTTP request                  | Description                        |\n| ------------------------------------- | ----------------------------- | ---------------------------------- |\n| [Get settings](#get-settings)         | `GET [app_base_url]/settings` | Retrieve user-modifiable settings. |\n| [Update settings](#update-settings)   | `PUT [app_base_url]/settings` | Update user-modifiable settings.   |\n\n#### **Add**\n\n##### HTTP Request\n\n```\nPOST https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/account/\u003caccount-id\u003e/batchOperation`\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Request body\n\nThe request body should be in JSON format.\n\n| Property              | Value  | Required | Description                                                            |\n| --------------------- | ------ | -------- | ---------------------------------------------------------------------- |\n| `urls[]`              | `list` | Yes      | A list of URL strings for checking, with a maximum of 15000.           |                       :\n| `failureMatchTexts[]` | `list` | No       | A list of strings e.g. \"Out of Office\" that also constitute a failure. |                     :\n\n##### Response\n\n```json\n{\n  \"items\": [\n    string\n  ]\n}\n```\n\nProperty  | Value  | Description\n--------- | ------ | ----------------------------------------\n`items[]` | `list` | A list with one entry, the ID of the job\n\n#### **List**\n\n##### HTTP Request\n\n```\nGET https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/account/\u003caccount-id\u003e/batchOperation`\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Request body\n\nThe request body should be empty\n\n##### Response\n\n```json\n{\n  \"items\": [\n    BatchOperation\n  ]\n}\n```\n\nwhere `BatchOperation` is the following structure:\n\n```json\n{\n  \"createdDate\": datetime,\n  \"batchId\": string,\n  \"status\": string\n}\n```\n\nProperty      | Value      | Description\n------------- | ---------- | ----------------------------------------------\n`createdDate` | `datetime` | The date and time of job creation (RFC 3339).\n`batchId`     | `string`   | The ID of the job\n`status`      | `string`   | Valid responses are `COMPLETE` or `PROCESSING`\n\n#### **Get**\n\n##### HTTP Request\n\n```\nGET https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/account/\u003caccount-id\u003e/batchOperation/\u003cid\u003e`\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Parameters\n\nParameter | Value    | Description\n--------- | -------- | ------------------------------------------\n`id`      | `string` | The ID of the job to retrieve results for.\n\n##### Request body\n\nThe request body should be empty\n\n##### Response\n\n```json\n{\n  \"errors\": [\n    BatchOperationError\n  ],\n  \"status\": string,\n  \"batchId\": string,\n  \"checkedUrlCount\": integer\n}\n```\n\n| Property          | Value                 | Required | Description                                                                             |\n| ----------------- | --------------------- | -------- | --------------------------------------------------------------------------------------- |\n| `errors[]`        | `BatchOperationError` | No       | If errors were encountered, will be present as a list of `BatchOperationError` objects. |\n| `batchId`         | `string`              | Yes      | The ID of the job                                                                       |\n| `status`          | `string`              | Yes      | Valid responses are `COMPLETE` or `PROCESSING`.                                         |\n| `checkedUrlCount` | `integer`             | Yes      | If the job is complete, contains the total number of URLs checked, otherwise is zero.   |\n\nwhere `BatchOperationError` is the following structure:\n\n```json\n{\n  \"url\": string,\n  \"message\": string\n}\n```\n\n#### **Delete**\n\n##### HTTP Request\n\n```\nDELETE https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/account/\u003caccount-id\u003e/batchOperation/\u003cid\u003e`\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Parameters\n\nParameter | Value    | Description\n--------- | -------- | ----------------------------\n`id`      | `string` | The ID of the job to delete.\n\n##### Request body\n\nThe request body should be empty\n\n##### Response\n\nThe response is empty\n\n#### **Get Settings**\n\n##### HTTP Request\n\n```\nGET https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/settings\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Response\n\n```json\n{\n \"rateInChecksPerMinute\": integer,\n \"userAgentString\": string\n}\n```\n\n| Property                | Value     | Description                                                 |\n| ----------------------- | --------- | ----------------------------------------------------------- |\n| `rateInChecksPerMinute` | `integer` | The number of URLs to check per minute per parallel worker. |\n| `userAgentString`       | `string`  | The User-Agent to use with each request.                    |\n\n#### **Update Settings**\n\n##### HTTP Request\n\n```\nPUT https://\u003cproject-id\u003e.appspot.com/_ah/api/batchLinkChecker/v1/settings\n```\n\n##### Authorization\n\nThe shared key must be provided in the `Authorization` header\n\n##### Request body\n\n```json\n{\n \"rateInChecksPerMinute\": integer,\n \"userAgentString\": string\n}\n```\n\n| Property                | Value     | Required | Description                                                 |\n| ----------------------- | --------- | -------- | ----------------------------------------------------------- |\n| `rateInChecksPerMinute` | `integer` | No       | The number of URLs to check per minute per parallel worker. |\n| `userAgentString`       | `string`  | No       | The User-Agent to use with each request.                    |\n\n##### Response\n\nThe response is the new settings, if updated, as per the *Get Settings* request.\n\n## Building and deploying your own version of the application\n\nYou will need [Maven](https://maven.apache.org/) to build this application.\n\n### Building\n\nTo build the application, having clone the github repository:\n\n```\nmvn package\n```\n\n### Deploying\n\n1.  Upload the generated WAR file to [Google Cloud\n    Storage](https://console.cloud.google.com/storage/browser) for your project\n    and enable a public link to the WAR file.\n2.  Use this URL with the [App Engine Admin\n    API](https://cloud.google.com/appengine/docs/admin-api/) when specifying the\n    location of the application. This is best seen by examining the template\n    spreadsheet as described in the [Apps Script\n    section](#using-within-apps-script) above, and inspecting the\n    **CloudSetup.gs** script within **Tools \u003e Script editor**, as this\n    illustrates how to use the App Engine Admin API to deploy from a package on\n    Cloud Storage.\n\n## Performance tuning\n\nIf too many parallel tasks are enabled on the App Engine application, or too\nhigh a rate of checking per task is allowed, then where many URLs belong to the\nsame domain, there exists a risk that requests will be blocked, owing to the\nhigh volume of traffic resembling a Denial of Service attack.\n\nThere are two settings that are relevant in controlling performance:\n\n1.  **Number of parallel tasks**: URLs are checked using tasks within an App\n    Engine [Task Queue](https://cloud.google.com/appengine/docs/standard/java/taskqueue/push/).\n    The number of parallel tasks can be configured using the App Engine API.\n    The **App Engine Performance** sheet in the template spreadsheet utilizes\n    this API to allow the user to set the number of tasks.\n\n    If you wish to modify these settings within a custom client, then the format\n    of the API request can be seen in the `updateTaskQueue` function within\n    **CloudSetup.gs**.\n1.  **Number of requests per minute**: The [linkchecker API](#the-linkchecker-api)\n    provides the means to set the request rate for each parallel task.\n\nUsing the two in conjunction allow an appropriate rate of URL checking to be\nachieved.\n\n## Miscellaneous\n\n### Issue tracker\n- https://github.com/googleads/adwords-scripts-linkchecker/issues\n\n### Support forum\n- https://groups.google.com/forum/#!forum/adwords-scripts\n\n### Authors\n- https://github.com/garanj\n- https://github.com/AnashOommen\n- [AdWords Scripts Team](mailto:adwords-scripts@googlegroups.com)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleads%2Fadwords-scripts-linkchecker","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fgoogleads%2Fadwords-scripts-linkchecker","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fgoogleads%2Fadwords-scripts-linkchecker/lists"}