{"id":13583752,"url":"https://github.com/HenryQW/mercury-parser-api","last_synced_at":"2025-04-06T21:32:54.246Z","repository":{"id":36969097,"uuid":"169833306","full_name":"HenryQW/mercury-parser-api","owner":"HenryQW","description":"🐋 A Dockerized drop-in replacement for the Mercury Parser API.","archived":false,"fork":true,"pushed_at":"2024-07-01T06:58:10.000Z","size":3181,"stargazers_count":116,"open_issues_count":7,"forks_count":38,"subscribers_count":6,"default_branch":"master","last_synced_at":"2024-11-06T00:39:23.664Z","etag":null,"topics":["dockerized","mercury","mercury-fulltext","mercury-parser"],"latest_commit_sha":null,"homepage":"https://mercury.postlight.com/web-parser/","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":"postlight/parser-api","license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/HenryQW.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-02-09T04:56:53.000Z","updated_at":"2024-07-25T20:52:10.000Z","dependencies_parsed_at":"2023-02-16T04:45:21.055Z","dependency_job_id":"e72358de-4469-411f-afe3-fa42ed4ef1e0","html_url":"https://github.com/HenryQW/mercury-parser-api","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryQW%2Fmercury-parser-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryQW%2Fmercury-parser-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryQW%2Fmercury-parser-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/HenryQW%2Fmercury-parser-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/HenryQW","download_url":"https://codeload.github.com/HenryQW/mercury-parser-api/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247556361,"owners_count":20957957,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dockerized","mercury","mercury-fulltext","mercury-parser"],"created_at":"2024-08-01T15:03:45.512Z","updated_at":"2025-04-06T21:32:54.009Z","avatar_url":"https://github.com/HenryQW.png","language":"JavaScript","readme":"# Mercury Parser API\n\n[![Docker Pulls](https://img.shields.io/docker/pulls/wangqiru/mercury-parser-api.svg)](https://hub.docker.com/r/wangqiru/mercury-parser-api)\n[![Docker Stars](https://img.shields.io/docker/stars/wangqiru/mercury-parser-api.svg)](https://hub.docker.com/r/wangqiru/mercury-parser-api)\n[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2FHenryQW%2Fmercury-parser-api.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2FHenryQW%2Fmercury-parser-api?ref=badge_shield)\n\nThis repo provides a dockerized drop-in replacement for the [Mercury Parser](https://github.com/postlight/mercury-parser) API.\n\n## Deploy\n\n### Pull And Run\n\n```bash\ndocker run -p 3000:3000 -d wangqiru/mercury-parser-api\n```\n\n### Build Your Own\n\n```bash\ndocker build -t mercury-parser-api .\n```\n\nthen\n\n```bash\ndocker run -p 3000:3000 -d mercury-parser-api\n```\n\n## Usage\n\nGET /parser?url=[required:url]\u0026contentType=[optional:contentType]\u0026headers=[optional:url-encoded-headers]\n\n```bash\ncurl localhost:3000/parser?url=https://www.bbc.co.uk/news/science-environment-35876621\n```\n\nResponse\n\n```json\n{\n    \"title\": \"Ash tree set for extinction in Europe\",\n    \"author\": \"Claire Marshall BBC Environment Correspondent\",\n    \"date_published\": null,\n    \"dek\": null,\n    \"lead_image_url\": \"https://ichef.bbci.co.uk/news/1024/branded_news/9736/production/_88901783_88901782.jpg\",\n    \"content\": \"\u003cdiv\u003e\u003cp class=\\\"byline\\\"\u003e \u003cspan class=\\\"byline__name\\\"\u003eBy Claire Marshall\u003c/span\u003e \u003cspan class=\\\"byline__title\\\"\u003eBBC Environment Correspondent\u003c/span\u003e \u003c/p\u003e\u003cdiv class=\\\"story-body__inner\\\"\u003e \u003cfigure class=\\\"media-landscape has-caption full-width lead\\\"\u003e \u003cspan class=\\\"image-and-copyright-container\\\"\u003e \u003cimg class=\\\"js-image-replace\\\" alt=\\\"Ash tree with suspected dieback\\\" src=\\\"https://ichef.bbci.co.uk/news/320/cpsprodpb/9736/production/_88901783_88901782.jpg\\\" width=\\\"976\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage copyright\u003c/span\u003e \u003cspan class=\\\"story-image-copyright\\\"\u003ePA\u003c/span\u003e \u003c/span\u003e \u003cfigcaption class=\\\"media-caption\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage caption\u003c/span\u003e \u003cspan class=\\\"media-caption__text\\\"\u003e The chalara dieback has devastated ash trees across Europe \u003c/span\u003e \u003c/figcaption\u003e \u003c/figure\u003e\u003cp class=\\\"story-body__introduction\\\"\u003eThe ash tree is likely to be wiped out in Europe, according to a review of the evidence.\u003c/p\u003e\u003cp\u003eThe trees are being killed off by the fungal disease ash-dieback along with an invasive beetle called the emerald ash borer.\u003c/p\u003e\u003cp\u003eAccording to the research, published in the Journal of Ecology, the British countryside will never look the same again.\u003c/p\u003e\u003cp\u003eThe paper says that the ash will most likely be \u0026quot;eliminated\u0026quot; in Europe.\u003c/p\u003e\u003cp\u003eThis could mirror the way Dutch elm disease largely wiped out the elm in the 1980s.\u003c/p\u003e\u003cp\u003e\u003ca href=\\\"http://www.bbc.co.uk/news/science-environment-33744042\\\" class=\\\"story-body__link\\\"\u003eWarning over ash dieback disease\u003c/a\u003e\u003c/p\u003e\u003cp\u003e\u003ca href=\\\"/news/uk-northern-ireland-33480275\\\" class=\\\"story-body__link\\\"\u003e100,000 trees destroyed over disease\u003c/a\u003e\u003c/p\u003e\u003cp\u003e\u003ca href=\\\"http://www.bbc.co.uk/news/science-environment-20171524\\\" class=\\\"story-body__link\\\"\u003eHow to spot ash dieback\u003c/a\u003e\u003c/p\u003e\u003cp\u003eAsh trees are a key part of the treescape of Britain. You don\u0026apos;t have to go to the countryside to see them. In and around towns and cities there are 2.2 million. In woodland, only the oak is more common.\u003c/p\u003e\u003cp\u003eHowever, according to a review led by Dr Peter Thomas of Keele University and published in the Journal of Ecology, \u0026quot;between the fungal disease ash dieback and a bright green beetle called the emerald ash borer, it is likely that almost all ash trees in Europe will be wiped out - just as the elm was largely eliminated by Dutch elm disease\u0026quot;.\u003c/p\u003e\u003cp\u003eAsh dieback, also known as Chalara, is a disease that was first seen in Eastern Europe in 1992. It now affects more than 2 million sq km, from Scandinavia to Italy.\u003c/p\u003e\u003cfigure class=\\\"media-landscape no-caption full-width\\\"\u003e \u003c/figure\u003e\u003cfigure class=\\\"media-landscape has-caption full-width\\\"\u003e \u003cdiv class=\\\"image-and-copyright-container\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage copyright\u003c/span\u003e \u003cspan class=\\\"story-image-copyright\\\"\u003eGetty Images\u003c/span\u003e \u003c/div\u003e \u003cfigcaption class=\\\"media-caption\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage caption\u003c/span\u003e \u003cspan class=\\\"media-caption__text\\\"\u003e The loss of ash trees won\u0026apos;t just change the landscape, it will have a severe impact on biodiversity \u003c/span\u003e \u003c/figcaption\u003e \u003c/figure\u003e\u003cp\u003eIt was identified in England in 2012 in a consignment of imported infected trees. It has since spread from Norfolk and Suffolk to South Wales. Caused by the fungus \u003ci\u003eHymenoscyphus fraxineus\u003c/i\u003e, it kills the leaves, then the branches, trunk and eventually the whole tree. It has the potential to destroy 95% of ash trees in the UK.\u003c/p\u003e\u003cp\u003eThe emerald ash borer is a bright green beetle that, like ash dieback, is native to Asia. It\u0026apos;s not yet in the UK but is spreading west from Moscow at a rate of 25 miles (41 km) a year and is thought to have reached Sweden.\u003c/p\u003e\u003cp\u003eThe adult beetles feed on ash trees and cause little damage. However the larvae bore under the bark and in to the wood, killing the tree.\u003c/p\u003e\u003cp\u003eAccording to Dr Thomas: \u0026quot;Our European ash is very susceptible to the beetle. It is only a matter of time before it spreads across the rest of Europe - including Britain - and the beetle is set to become the biggest threat faced by ash in Europe, potentially far more serious than ash dieback.\u0026quot;\u003c/p\u003e\u003cfigure class=\\\"media-landscape has-caption full-width\\\"\u003e \u003cdiv class=\\\"image-and-copyright-container\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage copyright\u003c/span\u003e \u003cspan class=\\\"story-image-copyright\\\"\u003eScience Photo Library\u003c/span\u003e \u003c/div\u003e \u003cfigcaption class=\\\"media-caption\\\"\u003e \u003cspan class=\\\"off-screen\\\"\u003eImage caption\u003c/span\u003e \u003cspan class=\\\"media-caption__text\\\"\u003e The emerald ash borer also threatens ash trees \u003c/span\u003e \u003c/figcaption\u003e \u003c/figure\u003e\u003cp\u003eThis won\u0026apos;t just change our landscape - it will have a severe impact on biodiversity. 1,000 species are associated with ash or ash woodland, including 12 types of bird, 55 mammals and 239 invertebrates.\u003c/p\u003e\u003cp\u003eMr Thomas said, \u0026quot;Of these, over 100 species of lichens, fungi and insects are dependent upon the ash tree and are likely to decline or become extinct if the ash was gone.\u003c/p\u003e\u003cp\u003e\u0026quot;Some other trees such as alder, small-leaved lime and rowan can provide homes for some of these species... but if the ash went, the British countryside would never look the same again.\u0026quot;\u003c/p\u003e\u003cp\u003eOne small hope is that some cloned ash trees have shown resistance against the fungus. But that won\u0026apos;t protect them against the beetle.\u003c/p\u003e\u003cp\u003eFollow Claire \u003ca href=\\\"http://twitter.com/bbcmarshall\\\" class=\\\"story-body__link-external\\\"\u003eon Twitter.\u003c/a\u003e\u003c/p\u003e \u003c/div\u003e\u003c/div\u003e\",\n    \"next_page_url\": null,\n    \"url\": \"https://www.bbc.co.uk/news/science-environment-35876621\",\n    \"domain\": \"www.bbc.co.uk\",\n    \"excerpt\": \"The ash tree is likely to be wiped out in Europe, according to the largest-ever survey of the species.\",\n    \"word_count\": 585,\n    \"direction\": \"ltr\",\n    \"total_pages\": 1,\n    \"rendered_pages\": 1\n}\n```\n\n## Adding a custom extractor\n\nYou can add [a custom extractor](https://github.com/postlight/parser/blob/main/src/extractors/custom/README.md) to\nthe parser by binding your customizer module at `/app/customizer`.\n\n```bash\ndocker run -p 3000:3000 -d \\\n    -v my-customizer-dir:/app/customizer \\\n    mercury-parser-api\n```\n\nIn the above example, the `my-customizer-dir` directory will contain `index.js`, such as:\n\n```js\nconst NaverMobileBlogExtractor = {\n  domain: 'm.blog.naver.com',\n  title: {\n    selectors: ['.se-title-text'],\n  },\n  author: {\n    selectors: ['.blog_author'],\n  },\n  content: {\n    selectors: ['.se-main-container'],\n  }, \n  date_published: {\n    selectors: ['.blog_date'],\n    format: 'YYYY. MM. DD. HH:mm',\n    timezone: 'Asia/Seoul',\n  },\n};\n\nfunction customize(parser) {\n  parser.addExtractor(NaverMobileBlogExtractor);\n}\n\nmodule.exports = { customize };\n\nconsole.log('📜My custom extractor is loaded.');\n```\n\n## License\n\nLicensed under either of the below, at your preference:\n\n- Apache License, Version 2.0\n  ([LICENSE-APACHE](http://www.apache.org/licenses/LICENSE-2.0))\n- MIT license\n  ([LICENSE-MIT](http://opensource.org/licenses/MIT))\n\n[![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2FHenryQW%2Fmercury-parser-api.svg?type=large)](https://app.fossa.io/projects/git%2Bgithub.com%2FHenryQW%2Fmercury-parser-api?ref=badge_large)\n","funding_links":[],"categories":["JavaScript"],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHenryQW%2Fmercury-parser-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FHenryQW%2Fmercury-parser-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FHenryQW%2Fmercury-parser-api/lists"}