{"id":16817462,"url":"https://github.com/mike442144/seenreq","last_synced_at":"2025-08-31T20:33:14.925Z","repository":{"id":31808142,"uuid":"35374750","full_name":"mike442144/seenreq","owner":"mike442144","description":"Generate an object for testing if a request is sent, request is Mikeal's request.","archived":false,"fork":false,"pushed_at":"2020-10-15T09:46:51.000Z","size":61,"stargazers_count":44,"open_issues_count":4,"forks_count":9,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-27T17:35:25.639Z","etag":null,"topics":["crawler","duplicates-removed","post","request","spider","url"],"latest_commit_sha":null,"homepage":"","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mike442144.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-05-10T14:44:26.000Z","updated_at":"2024-01-22T09:28:24.000Z","dependencies_parsed_at":"2022-08-07T16:31:08.215Z","dependency_job_id":null,"html_url":"https://github.com/mike442144/seenreq","commit_stats":null,"previous_names":[],"tags_count":2,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mike442144%2Fseenreq","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mike442144%2Fseenreq/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mike442144%2Fseenreq/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mike442144%2Fseenreq/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mike442144","download_url":"https://codeload.github.com/mike442144/seenreq/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":243841207,"owners_count":20356443,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","duplicates-removed","post","request","spider","url"],"created_at":"2024-10-13T10:47:16.872Z","updated_at":"2025-03-17T03:31:42.612Z","avatar_url":"https://github.com/mike442144.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"[![NPM](https://nodei.co/npm/seenreq.png?downloads=true\u0026downloadRank=true\u0026stars=true)](https://nodei.co/npm/seenreq/)\n\n[![build status](https://secure.travis-ci.org/mike442144/seenreq.png)](https://travis-ci.org/mike442144/seenreq)\n[![Dependency Status](https://david-dm.org/mike442144/seenreq/status.svg)](https://david-dm.org/mike442144/seenreq)\n[![NPM download][download-image]][download-url]\n[![NPM quality][quality-image]][quality-url]\n\n[quality-image]: http://npm.packagequality.com/shield/seenreq.svg?style=flat-square\n[quality-url]: http://packagequality.com/#?package=seenreq\n[download-image]: https://img.shields.io/npm/dm/seenreq.svg?style=flat-square\n[download-url]: https://npmjs.org/package/seenreq\n\n# seenreq\nA library to test if a url/request is crawled, usually used in a web crawler. Compatible with [request](https://github.com/request/request) and [node-crawler](https://github.com/bda-research/node-crawler). The 1.x or newer version has quite different APIs and is not compatible with 0.x versions. Please read the [upgrade guide](./UPGRADE.md) document.\n\n# Table of Contents\n\n* [Quick Start](#quick-start)\n  * [Installation](#installation)\n  * [Basic Usage](#basic-usage)\n  * [Use Redis](#use-redis)\n  * [Use Mongodb](#use-mongodb)\n* [Class:seenreq](#classseenreq)\n  * [seen.initialize()](#seeninitialize)\n  * [seen.normalize(uri|option[,options])](#seennormalizeurioptionoptions)\n  * [seen.exists(uri|option|array[,options])](#seenexistsurioptionarrayoptions)\n  * [seen.dispose()](#seen_dispose)\n* [Options](#options)\n\n## Quick Start\n\n### Installation\n\n    $ npm install seenreq --save\n\n### Basic Usage\n\n```javascript\nconst seenreq = require('seenreq')\n, seen = new seenreq();\n\n//url to be normalized\nlet url = \"http://www.GOOGLE.com\";\nconsole.log(seen.normalize(url));//{ sign: \"GET http://www.google.com/\\r\\n\", options: {} }\n\n//request options to be normalized\nlet option = {\n    uri: 'http://www.GOOGLE.com',\n    rupdate: false\n};\n\nconsole.log(seen.normalize(option));//{sign: \"GET http://www.google.com/\\r\\n\", options:{rupdate: false} }\n\nseen.initialize().then(()=\u003e{\n    return seen.exists(url);\n}).then( (rst) =\u003e {\n    console.log(rst[0]);//false if ask for a `request` never see\n    return seen.exists(opt);\n}).then( (rst) =\u003e {\n    console.log(rst[0]);//true if got same `request`\n}).catch(e){\n    console.error(e);\n};\n```\nWhen you call `exists`, the module will do normalization itself first and then check if exists.\n\n### Use Redis\n`seenreq` stores keys in memory by default, memory usage will soar as number of keys increases. Redis will solve this problem. Because seenreq uses `ioredis` as redis client, all `ioredis`' [options](https://github.com/luin/ioredis/blob/master/API.md) are recived and supported. You should first install:\n\n```javascript\nnpm install seenreq-repo-redis --save\n```\nand then set repo to `redis`:\n\n```javascript\nconst seenreq = require('seenreq')\nlet seen = new seenreq({\n    repo:'redis',// use redis instead of memory\n    host:'127.0.0.1', \n    port:6379,\n    clearOnQuit:false // clear redis cache or don't when calling dispose(), default true.\n});\n\nseen.initialize().then(()=\u003e{\n    //do stuff...\n}).catch(e){\n    console.error(e);\n}\n```\n\n### Use mongodb\nIt is similar with redis above:\n\n```javascript\nnpm install seenreq-repo-mongo --save\n```\n\n```javascript\nconst seenreq = require('seenreq')\nlet seen = new seenreq({\n    repo:'mongo',\n    url:'mongodb://xxx/seenreq',\n    collection: 'foor'\n});\n```\n\n\n## Class:seenreq\n\nInstance of seenreq\n\n### __seen.initialize()__\nInitialize the repo, returns a promise.\n\n### __seen.normalize(uri|option[,options])__\n * `uri` String, `option` is Option of [request](https://github.com/request/request) or [node-crawler](https://github.com/bda-research/node-crawler)\n * [options](#options)\n\nReturns normalized Object: {sign,options}.\n\n### __seen.exists(uri|option|array[,options])__\n * uri|option\n * [options](#options)\n\nReturns a promise with an Boolean array, e.g. [true, false, true, false, false].\n\n### __seen.dispose()__\n\nDispose resources of repo. If you are using repo other than memory, like Redis you should call `dispose` to release connection. Returns a promise.\n\n## Options\n\n * removeKeys: Array, Ignore specified keys when doing normalization. For instance, there is a `ts` property in the url like `http://www.xxx.com/index?ts=1442382602504` which is timestamp and it should be same whenever you visit.\n * stripFragment: Boolean, Remove the fragment at the end of the URL (Default true).\n * rupdate: Boolean, it is short for `repo update`. Store in repo so that `seenreq` can hit the same `req` next time (Default true).\n\n# RoadMap\n * add `mysql` repo to persist keys to disk.\n * add keys life time management.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmike442144%2Fseenreq","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmike442144%2Fseenreq","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmike442144%2Fseenreq/lists"}