https://github.com/pocesar/actor-diff-datasets
https://github.com/pocesar/actor-diff-datasets
Last synced: about 1 year ago
JSON representation
- Host: GitHub
- URL: https://github.com/pocesar/actor-diff-datasets
- Owner: pocesar
- License: apache-2.0
- Created: 2020-10-26T04:17:37.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-11-22T05:39:15.000Z (over 5 years ago)
- Last Synced: 2024-10-18T06:28:34.569Z (over 1 year ago)
- Language: JavaScript
- Size: 66.4 KB
- Stars: 1
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Diff datasets
Take one dataset on Apify platform, compare to another, and output the missing ones.
This can also be used to output only changed items, using a compound key.
Supports using whole nested objects as value, they are `JSON.stringify`'d before being turned
into a small non-cryptographic space efficient hash
## Example
```js
await Apify.call('pocesar/diff-datasets', {
baseDatasetId: 'LdNAlaOY1aKGhwAah', // place the datasets here. The order of "base" and "other" matters
otherDatasetId: 'Bzu1pgOjenN43VhPY', // existing items in "base" are not output from "other"
uniqueFields: [
// simple primitive field value, like string, number, boolean
"pageUrl",
// you can use lodash.get notation to get nested items,
// in this case `sub.fields.0` works like `sub.fields[0]` and the object looks like
// {
// pageUrl: "https//pageurl",
// sub: {
// fields: [
// {...},
// {...}
// ]
// }
// }
"sub.fields.0",
// you can also use .length to count arrays or string characters, as in
"sub.fields.length",
"pageUrl.length"
],
});
```
## Limitations
* Every value is kept in memory while reading from the `base` dataset, more items more memory needed.
* The key value store might choke when trying to save the in-memory `Set` with too many items
## License
Apache 2.0