{"id":17118078,"url":"https://github.com/my8100/logparser","last_synced_at":"2025-04-12T21:33:54.299Z","repository":{"id":35047877,"uuid":"166683483","full_name":"my8100/logparser","owner":"my8100","description":"A tool for parsing Scrapy log files periodically and incrementally, extending the HTTP JSON API of Scrapyd.","archived":false,"fork":false,"pushed_at":"2025-01-05T10:06:10.000Z","size":176,"stargazers_count":92,"open_issues_count":2,"forks_count":26,"subscribers_count":5,"default_branch":"master","last_synced_at":"2025-04-04T01:08:59.162Z","etag":null,"topics":["log-analyse","log-parser","log-parsing","scrapy","scrapy-log-analysis","scrapyd","scrapyd-log-analysis","visualization"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/my8100.png","metadata":{"files":{"readme":"README.md","changelog":"HISTORY.md","contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2019-01-20T16:32:43.000Z","updated_at":"2025-02-19T08:01:33.000Z","dependencies_parsed_at":"2025-01-27T23:10:30.928Z","dependency_job_id":"a0ba94c7-18dc-4c83-af39-3e1c3ec95fcb","html_url":"https://github.com/my8100/logparser","commit_stats":{"total_commits":20,"total_committers":1,"mean_commits":20.0,"dds":0.0,"last_synced_commit":"ed7948b271884af68eb3bb13fa9ee51a4892552c"},"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/my8100%2Flogparser","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/my8100%2Flogparser/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/my8100%2Flogparser/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/my8100%2Flogparser/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/my8100","download_url":"https://codeload.github.com/my8100/logparser/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248636715,"owners_count":21137509,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["log-analyse","log-parser","log-parsing","scrapy","scrapy-log-analysis","scrapyd","scrapyd-log-analysis","visualization"],"created_at":"2024-10-14T17:53:24.317Z","updated_at":"2025-04-12T21:33:54.274Z","avatar_url":"https://github.com/my8100.png","language":"Python","readme":"# LogParser: A tool for parsing Scrapy log files periodically and incrementally, designed for [*ScrapydWeb*](https://github.com/my8100/scrapydweb).\n\n[![PyPI - logparser Version](https://img.shields.io/pypi/v/logparser.svg)](https://pypi.org/project/logparser/)\n[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/logparser.svg)](https://pypi.org/project/logparser/)\n[![CircleCI](https://circleci.com/gh/my8100/logparser/tree/master.svg?style=shield)](https://circleci.com/gh/my8100/logparser/tree/master)\n[![codecov](https://codecov.io/gh/my8100/logparser/branch/master/graph/badge.svg)](https://codecov.io/gh/my8100/logparser)\n[![Coverage Status](https://coveralls.io/repos/github/my8100/logparser/badge.svg?branch=master)](https://coveralls.io/github/my8100/logparser?branch=master)\n[![Downloads - total](https://pepy.tech/badge/logparser)](https://pepy.tech/project/logparser)\n[![GitHub license](https://img.shields.io/github/license/my8100/logparser.svg)](https://github.com/my8100/logparser/blob/master/LICENSE)\n\n\n## Installation\n- Use pip:\n```bash\npip install logparser\n```\n:heavy_exclamation_mark: Note that you may need to execute `python -m pip install --upgrade pip` first in order to get the latest version of logparser, or download the tar.gz file from https://pypi.org/project/logparser/#files and get it installed via `pip install logparser-x.x.x.tar.gz`\n\n- Use git:\n```bash\npip install --upgrade git+https://github.com/my8100/logparser.git\n```\nOr:\n```bash\ngit clone https://github.com/my8100/logparser.git\ncd logparser\npython setup.py install\n```\n\n## Usage\n### To use in Python\n\u003cdetails\u003e\n\u003csummary\u003eView codes\u003c/summary\u003e\n\n```python\nIn [1]: from logparser import parse\n\nIn [2]: log = \"\"\"2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)\n   ...: 2018-10-23 18:29:41 [scrapy.statscollectors] INFO: Dumping Scrapy stats:\n   ...: {'downloader/exception_count': 3,\n   ...:  'downloader/exception_type_count/twisted.internet.error.TCPTimedOutError': 3,\n   ...:  'downloader/request_bytes': 1336,\n   ...:  'downloader/request_count': 7,\n   ...:  'downloader/request_method_count/GET': 7,\n   ...:  'downloader/response_bytes': 1669,\n   ...:  'downloader/response_count': 4,\n   ...:  'downloader/response_status_count/200': 2,\n   ...:  'downloader/response_status_count/302': 1,\n   ...:  'downloader/response_status_count/404': 1,\n   ...:  'dupefilter/filtered': 1,\n   ...:  'finish_reason': 'finished',\n   ...:  'finish_time': datetime.datetime(2018, 10, 23, 10, 29, 41, 174719),\n   ...:  'httperror/response_ignored_count': 1,\n   ...:  'httperror/response_ignored_status_count/404': 1,\n   ...:  'item_scraped_count': 2,\n   ...:  'log_count/CRITICAL': 5,\n   ...:  'log_count/DEBUG': 14,\n   ...:  'log_count/ERROR': 5,\n   ...:  'log_count/INFO': 75,\n   ...:  'log_count/WARNING': 3,\n   ...:  'offsite/domains': 1,\n   ...:  'offsite/filtered': 1,\n   ...:  'request_depth_max': 1,\n   ...:  'response_received_count': 3,\n   ...:  'retry/count': 2,\n   ...:  'retry/max_reached': 1,\n   ...:  'retry/reason_count/twisted.internet.error.TCPTimedOutError': 2,\n   ...:  'scheduler/dequeued': 7,\n   ...:  'scheduler/dequeued/memory': 7,\n   ...:  'scheduler/enqueued': 7,\n   ...:  'scheduler/enqueued/memory': 7,\n   ...:  'start_time': datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)}\n   ...: 2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)\"\"\"\n\nIn [3]: odict = parse(log, headlines=1, taillines=1)\n\nIn [4]: odict\nOut[4]:\nOrderedDict([('head',\n              '2018-10-23 18:28:34 [scrapy.utils.log] INFO: Scrapy 1.5.0 started (bot: demo)'),\n             ('tail',\n              '2018-10-23 18:29:42 [scrapy.core.engine] INFO: Spider closed (finished)'),\n             ('first_log_time', '2018-10-23 18:28:34'),\n             ('latest_log_time', '2018-10-23 18:29:42'),\n             ('runtime', '0:01:08'),\n             ('first_log_timestamp', 1540290514),\n             ('latest_log_timestamp', 1540290582),\n             ('datas', []),\n             ('pages', 3),\n             ('items', 2),\n             ('latest_matches',\n              {'telnet_console': '',\n               'resuming_crawl': '',\n               'latest_offsite': '',\n               'latest_duplicate': '',\n               'latest_crawl': '',\n               'latest_scrape': '',\n               'latest_item': '',\n               'latest_stat': ''}),\n             ('latest_crawl_timestamp', 0),\n             ('latest_scrape_timestamp', 0),\n             ('log_categories',\n              {'critical_logs': {'count': 5, 'details': []},\n               'error_logs': {'count': 5, 'details': []},\n               'warning_logs': {'count': 3, 'details': []},\n               'redirect_logs': {'count': 1, 'details': []},\n               'retry_logs': {'count': 2, 'details': []},\n               'ignore_logs': {'count': 1, 'details': []}}),\n             ('shutdown_reason', 'N/A'),\n             ('finish_reason', 'finished'),\n             ('crawler_stats',\n              OrderedDict([('source', 'log'),\n                           ('last_update_time', '2018-10-23 18:29:41'),\n                           ('last_update_timestamp', 1540290581),\n                           ('downloader/exception_count', 3),\n                           ('downloader/exception_type_count/twisted.internet.error.TCPTimedOutError',\n                            3),\n                           ('downloader/request_bytes', 1336),\n                           ('downloader/request_count', 7),\n                           ('downloader/request_method_count/GET', 7),\n                           ('downloader/response_bytes', 1669),\n                           ('downloader/response_count', 4),\n                           ('downloader/response_status_count/200', 2),\n                           ('downloader/response_status_count/302', 1),\n                           ('downloader/response_status_count/404', 1),\n                           ('dupefilter/filtered', 1),\n                           ('finish_reason', 'finished'),\n                           ('finish_time',\n                            'datetime.datetime(2018, 10, 23, 10, 29, 41, 174719)'),\n                           ('httperror/response_ignored_count', 1),\n                           ('httperror/response_ignored_status_count/404', 1),\n                           ('item_scraped_count', 2),\n                           ('log_count/CRITICAL', 5),\n                           ('log_count/DEBUG', 14),\n                           ('log_count/ERROR', 5),\n                           ('log_count/INFO', 75),\n                           ('log_count/WARNING', 3),\n                           ('offsite/domains', 1),\n                           ('offsite/filtered', 1),\n                           ('request_depth_max', 1),\n                           ('response_received_count', 3),\n                           ('retry/count', 2),\n                           ('retry/max_reached', 1),\n                           ('retry/reason_count/twisted.internet.error.TCPTimedOutError',\n                            2),\n                           ('scheduler/dequeued', 7),\n                           ('scheduler/dequeued/memory', 7),\n                           ('scheduler/enqueued', 7),\n                           ('scheduler/enqueued/memory', 7),\n                           ('start_time',\n                            'datetime.datetime(2018, 10, 23, 10, 28, 35, 70938)')])),\n             ('last_update_time', '2019-03-08 16:53:50'),\n             ('last_update_timestamp', 1552035230),\n             ('logparser_version', '0.8.1')])\n\nIn [5]: odict['runtime']\nOut[5]: '0:01:08'\n\nIn [6]: odict['pages']\nOut[6]: 3\n\nIn [7]: odict['items']\nOut[7]: 2\n\nIn [8]: odict['finish_reason']\nOut[8]: 'finished'\n```\n\n\u003c/details\u003e\n\n### To run as a service\n1. **Make sure that [*Scrapyd*](https://github.com/scrapy/scrapyd) has been installed and started on the current host.**\n2. Start ***LogParser*** via command `logparser`\n3. Visit http://127.0.0.1:6800/logs/stats.json **(Assuming the Scrapyd service runs on port 6800.)**\n4. Visit http://127.0.0.1:6800/logs/projectname/spidername/jobid.json to get stats of a job in details.\n\n### To work with *ScrapydWeb* for visualization\nCheck out https://github.com/my8100/scrapydweb for more info.\n\n![stats](https://raw.githubusercontent.com/my8100/files/master/scrapydweb/screenshots/stats.gif)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmy8100%2Flogparser","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmy8100%2Flogparser","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmy8100%2Flogparser/lists"}