Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/npm/download-counts
Background jobs and a minimal service for collecting and delivering download counts
https://github.com/npm/download-counts
Last synced: about 1 month ago
JSON representation
Background jobs and a minimal service for collecting and delivering download counts
- Host: GitHub
- URL: https://github.com/npm/download-counts
- Owner: npm
- Archived: true
- Created: 2014-02-17T07:27:29.000Z (over 10 years ago)
- Default Branch: master
- Last Pushed: 2020-03-24T19:01:08.000Z (over 4 years ago)
- Last Synced: 2024-05-09T22:12:09.667Z (about 2 months ago)
- Language: JavaScript
- Size: 1.96 MB
- Stars: 328
- Watchers: 36
- Forks: 27
- Open Issues: 12
-
Metadata Files:
- Readme: README.md
Lists
- awesome-npm - Stats API
- awesome-npm - Stats API
README
# npm stats microservice
__Note!__ This code base isn't what npm uses to serve download counts anymore, and its documentation is likely to drift out of correctness as time passes. See [the registry API documentation](https://github.com/npm/registry/blob/master/docs/download-counts.md) for up-to-date usage info.
Gives you download counts. Eventually, maybe other stuff.
Our blog has an explanation of [how npm download counts work](http://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts), including "what counts as a download?"
## Data source
npm's raw log data is continuously written to a series of buckets on AWS S3. Once per day, soon
after UTC midnight, a map-reduce cluster is spun up that crunches the previous day's logs and
pushes them into the database. Because this is UTC this creates some slightly unintuitive results,
e.g. if you are on the west coast on the 19th of September, the data for the 19th of September will
become available at 5pm (because UTC already moved to the 20th) during the winter, but not until 6pm
during the summer, because the US observes daylight savings but UTC is fixed.## Point values
Gets the total downloads for a given period, for all packages or a specific package.
GET https://api.npmjs.org/downloads/point/{period}[/{package}]
### Examples
- All packages, last day:
- /downloads/point/last-day
- All packages, specific date:
- /downloads/point/2014-02-01
- Package "express", last week:
- /downloads/point/last-week/express
- Package "express", given 7-day period:
- /downloads/point/2014-02-01:2014-02-08/express
- Package "jquery", last 30 days:
- /downloads/point/last-month/jquery
- Package "jquery", specific month:
- /downloads/point/2014-01-01:2014-01-31/jquery
### Parameters
Acceptable values are:
- last-day
- Gets downloads for the last available day. In practice, this will usually be "yesterday" (in GMT) but if stats for that day have not yet landed, it will be the day before.
- last-week
- Gets downloads for the last 7 available days.
- last-month
- Gets downloads for the last 30 available days.
### Output
The following incredibly simple JSON is the output:
```javascript
{
downloads: 31623,
start: "2014-01-01",
end: "2014-01-31",
package: "jquery"
}
```
If you have not specified a package, that key will not be present. The start and end dates are inclusive.
## Ranges
Gets the downloads per day for a given period, for all packages or a specific package.
GET https://api.npmjs.org/downloads/range/{period}[/{package}]
### Examples
- Downloads per day, last 7 days
- /downloads/range/last-week
- Downloads per day, specific 7 days
- /downloads/range/2014-02-07:2014-02-14
- Downloads per day, last 30 days
- /downloads/range/last-month/jquery
- Downloads per day, specific 30 day period
- /downloads/range/2014-01-03:2014-02-03/jquery
### Parameters
Same as for /downloads/point.
### Output
Responses are very similar to the point API, except that downloads is now an array of days with downloads on each day:
```javascript
{
downloads: [
{
day: "2014-02-27",
downloads: 1904088
},
..
{
day: "2014-03-04",
downloads: 7904294
}
],
start: "2014-02-25",
end: "2014-03-04",
package: "somepackage"
}
```
As before, the package key will not be present if you have not specified a package.
## Bulk Queries
To perform a bulk query, you can hit the range or point endpoints with a comma
separated list of packages rather than a single package, e.g.,
`/downloads/point/last-day/npm,express`
## Development
The code requires node and a mysql database to talk to. We have a conveniently
pre-configured VM available for download. First, install VirtualBox:
https://www.virtualbox.org/wiki/Downloads
And then install Vagrant:
https://www.vagrantup.com/downloads.html
Now just cd into the root of this repo and run
vagrant up
When you see "Done!" you are ready to rock.
### Running the web service
Install dependencies:
npm install
You will need a config file:
cp test/config.dev.js config.js
For development, you shouldn't need to change anything in here
unless your VM didn't come up at the usual IP (192.168.33.10)
Run the server on port 3000:
node index.js 3000
Test that it's working:
curl "http://localhost:3000/downloads/point/2014-03-01"
You can ssh into the VM to play with MySQL or whatever:
vagrant ssh
### Importing data from S3 (npm, Inc. only)
New data is generated daily and stored in S3. You can get it with the
backfill script like so:
node scripts/backfill.js YYYY-MM-DD N
YYYY-MM-DD is the date you want new data to start. If omitted,
it will start importing from the first available data, which is
a bad idea except when creating a new production host
N is the number of days to import after that date. If omitted,
it will import all available days. So to get everything after
April 1, for instance, run
node scripts/backfill.js 2014-04-01
For the AWS JS SDK to work, you must have a `~/.aws/credentials` file
containing
```
aws_access_key_id = XXXXX
aws_secret_access_key = YYYYY
```
Where X and Y are your AWS access credentials. The production server has
its own credentials specifically for this purpose.