{"id":19448063,"url":"https://github.com/scrapinghub/varanus","last_synced_at":"2025-04-25T02:30:41.250Z","repository":{"id":66021375,"uuid":"203850950","full_name":"scrapinghub/varanus","owner":"scrapinghub","description":"A command line spider monitoring tool","archived":false,"fork":false,"pushed_at":"2024-07-06T01:26:15.000Z","size":80,"stargazers_count":8,"open_issues_count":4,"forks_count":7,"subscribers_count":7,"default_branch":"master","last_synced_at":"2025-04-22T05:54:38.981Z","etag":null,"topics":["monitoring","python36","spider"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/scrapinghub.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-08-22T18:27:33.000Z","updated_at":"2024-07-29T14:57:08.000Z","dependencies_parsed_at":null,"dependency_job_id":"63130d2b-aba4-46a0-8fd7-3f34d6aab54e","html_url":"https://github.com/scrapinghub/varanus","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fvaranus","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fvaranus/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fvaranus/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/scrapinghub%2Fvaranus/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/scrapinghub","download_url":"https://codeload.github.com/scrapinghub/varanus/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250741874,"owners_count":21479682,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["monitoring","python36","spider"],"created_at":"2024-11-10T16:23:39.989Z","updated_at":"2025-04-25T02:30:40.991Z","avatar_url":"https://github.com/scrapinghub.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Varanus\n\nThis tool wraps an API POST request to scrapinghub's job and other data storage\nAPI.\n\n## Requirements\n\n- Python 3.6+\n- Poetry\n\n## Developer Notes\n\nFor those unfamiliar with poetry, it's a virtualenv + package manager.\n\nThe project originally was built to use it, but instead of beginning the\n`poetry new` command,  we do a hybrid clone + poetry install\n\n(Since the project uses a `.lock` file, using pipenv plus some virtualenv\nmanager should also work, but these instructions use poetry.)\n\n### Install\n\n    $ pip install poetry\n\n    $ git clone https://github.com/scrapinghub/varanus.git\n\n    $ cd varanus\n\nWhen you install an application using poetry, a virtualenv is created\nautomagically:\n\n    $ poetry install\n\n    Creating virtualenv varanus-mrejzrgU-py3.8 in /home/mns/.cache/pypoetry/virtualenvs\n\n    Installing dependencies from lock file\n\n    Package operations: 47 installs, 0 updates, 0 removals\n\n      - Installing decorator (4.4.0)\n      - Installing ipython-genutils (0.2.0)\n      - Installing six (1.12.0)\n      - Installing attrs (19.1.0)\n      - Installing certifi (2019.6.16)\n      - Installing chardet (3.0.4)\n      - Installing idna (2.8)\n      [ . . . snip . . . ]\n      - Installing zipp (0.5.2)\n      - Installing importlib-metadata (0.19)\n      - Installing atomicwrites (1.3.0)\n      - Installing more-itertools (7.2.0)\n      - Installing pluggy (0.12.0)\n      - Installing py (1.8.0)\n      - Installing pytest (3.10.1)\n      - Installing varanus (0.1.0)\n\n### Usage\n\nExample usage:\n\n    $ poetry run varanus jobs -p 376566 -s dod_953_tripadvisor\n\n    ●▬▬▬▬▬▬▬▬▬●   \u003cResponse [200]\u003e https://storage.scrapinghub.com/jobq/376566/list?content=results\u0026fit_width=False\u0026formatter=table\u0026max_width=0\u0026noindent=False\u0026print_empty=False\u0026project=376566\u0026quote_mode=nonnumeric\u0026start=0\u0026jobmeta=project\u0026jobmeta=spider\u0026jobmeta=spider_args\u0026jobmeta=job_cmd\u0026jobmeta=tags\u0026jobmeta=scrapystats\u0026jobmeta=units\u0026jobmeta=version\u0026jobmeta=priority\u0026jobmeta=pending_time\u0026jobmeta=running_time\u0026jobmeta=finished_time\u0026jobmeta=scheduled_by\u0026jobmeta=state\u0026jobmeta=close_reason\u0026state=finished\u0026spider=dod_953_tripadvisor\u0026count=10 ●  varanus.__patch__:scrapinghub.client.HubstorageClient.request\n\n    +----------------+---------------------+----------+----------+------------------+------------------+-----+-------+-------+--------+----------+----------+-----------------+\n    | Key            | Spider              | Pnd mins | Run mins | Start            | Finish           | Err |  Warn | Items |  Pages | State    | Reason   | Version         |\n    +================+=====================+==========+==========+==================+==================+=====+=======+=======+========+==========+==========+=================+\n    | 376566/418/805 | dod_953_tripadvisor |        0 |        9 | 2020/04/19 19:10 | 2020/04/19 19:19 |   0 |    41 |    73 |    567 | finished | finished | 2233af50-master |\n    +----------------+---------------------+----------+----------+------------------+------------------+-----+-------+-------+--------+----------+----------+-----------------+\n\n#### Options\n\nTo see the command line arguments run varanus help:\n\n    $ poetry run varanus help\n\n    usage: varanus [--version] [-v | -q] [--log-file LOG_FILE] [-h] [--debug]\n\n    optional arguments:\n      --version            show program's version number and exit\n      -v, --verbose        Increase verbosity of output. Can be repeated.\n      -q, --quiet          Suppress output except warnings and errors.\n      --log-file LOG_FILE  Specify a file to log output. Disabled by default.\n      -h, --help           Show help message and exit.\n      --debug              Show tracebacks on errors.\n\n    Commands:\n      collect        List project collections\n      complete       print bash completion command (cliff)\n      help           print detailed help for another command (cliff)\n      item           List item attributes for a given key\n      job            List job attributes for a given job key\n      jobs           List jobs filtered by various options\n      project        Show project attributes\n      scripts        List the project scripts \u0026 spiders\n      spiders        List the project scripts \u0026 spiders\n      stats          Show jobs statistics\n      workers        List the project scripts \u0026 spiders\n\nYou can also get help for individual commands:\n\n    $ poetry run varanus jobs --help\n\n    usage: varanus jobs [-h] [-f {csv,graph,json,table,value,yaml}] [-c COLUMN]\n                        [--quote {all,minimal,none,nonnumeric}] [--noindent]\n                        [--max-width \u003cinteger\u003e] [--fit-width] [--print-empty]\n                        [--sort-column SORT_COLUMN] [--project PROJECT]\n                        [--spider SPIDER] [--key JOBKEY]\n                        [--all-tags ALL_TAGS [ALL_TAGS ...]]\n                        [--any-tags HAS_TAG [HAS_TAG ...]]\n                        [--not-tags LACKS_TAG [LACKS_TAG ...]] [--arg WORKER_ARG]\n                        [--count COUNT] [--start START] [--running]\n                        [{all,args,codes,info,results,tags,time}]\n\n    List jobs filtered by various options\n\n    positional arguments:\n      {all,args,codes,info,results,tags,time}\n                            Job listing content\n\n    optional arguments:\n      -h, --help            show this help message and exit\n      --project PROJECT, -p PROJECT\n      --spider SPIDER, -s SPIDER\n                            Filter for given spider name\n      --key JOBKEY, -k JOBKEY\n                            Job key, e.g. 123/456/789 or just 456/789\n      --all-tags ALL_TAGS [ALL_TAGS ...], -t ALL_TAGS [ALL_TAGS ...]\n                            Jobs have all of the tags\n      --any-tags HAS_TAG [HAS_TAG ...]\n                            Jobs have any of the tags\n      --not-tags LACKS_TAG [LACKS_TAG ...]\n                            Jobs do not have any of the tags\n      --arg WORKER_ARG, -a WORKER_ARG\n                            Filter for given argument\n      --count COUNT         How many jobs show\n      --start START         How many jobs to skip\n      --running             Also show running jobs\n\n    output formatters:\n      output formatter options\n\n      -f {csv,graph,json,table,value,yaml}, --format {csv,graph,json,table,value,yaml}\n                            the output format, defaults to table\n      -c COLUMN, --column COLUMN\n                            specify the column(s) to include, can be repeated\n      --sort-column SORT_COLUMN\n                            specify the column(s) to sort the data (columns\n                            specified first have a priority, non-existing columns\n                            are ignored), can be repeated\n\n    CSV Formatter:\n      --quote {all,minimal,none,nonnumeric}\n                            when to include quotes, defaults to nonnumeric\n\n    json formatter:\n      --noindent            whether to disable indenting the JSON\n\n    table formatter:\n      --max-width \u003cinteger\u003e\n                            Maximum display width, \u003c1 to disable. You can also use\n                            the CLIFF_MAX_TERM_WIDTH environment variable, but the\n                            parameter takes precedence.\n      --fit-width           Fit the table to the display width. Implied if --max-\n                            width greater than 0. Set the environment variable\n                            CLIFF_FIT_WIDTH=1 to always enable\n      --print-empty         Print empty table if there is no data to show.\n\nAlso, take a look at the `add_argument` calls in\nThe varanus [CLI folder](https://github.com/scrapinghub/varanus/tree/master/src/varanus/cli).\n\n#### Graph\n\nYou can use a Cliff output formatter to display data with Plotly as a graph on\nan HTML page using `-f graph`:\n\n    $ poetry run varanus jobs -f graph\n\n![Job graph](https://user-images.githubusercontent.com/204645/87793796-bc75ec00-c813-11ea-8eed-8bd9d1e615cc.png)\n\n### Debugging\n\nThere are a couple ways Cliff can assist in debugging.\n\n#### Debug\n\nAdd the `--debug` command-line flag to set `app.options.debug` which you can\nreference in your program:\n\n  $ poetry run varanus scripts --debug\n\nThen in your code you can use it:\n\n    if app.options.debug:\n        log_response(response)\n\n#### Verbosity\n\nSet the `-v` flag to set the logging level:\n\n  $ poetry run varanus scripts -vv\n\nThe log level is set depending on how many *v*'s you supply:\n\n*  0: level = `warning` if you do not supply any\n*  1: level = `info` if you supply one `-v`\n*  2: level = `debug` if you supply two `-vv`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapinghub%2Fvaranus","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fscrapinghub%2Fvaranus","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fscrapinghub%2Fvaranus/lists"}