Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

https://github.com/npm/download-counts

Background jobs and a minimal service for collecting and delivering download counts
https://github.com/npm/download-counts

Last synced: about 1 month ago
JSON representation

Background jobs and a minimal service for collecting and delivering download counts

Host: GitHub
URL: https://github.com/npm/download-counts
Owner: npm
Archived: true
Created: 2014-02-17T07:27:29.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2020-03-24T19:01:08.000Z (over 4 years ago)
Last Synced: 2024-05-09T22:12:09.667Z (about 2 months ago)
Language: JavaScript
Size: 1.96 MB
Stars: 328
Watchers: 36
Forks: 27
Open Issues: 12
Metadata Files:
- Readme: README.md

Lists

awesome-npm - Stats API
awesome-npm - Stats API

README

# npm stats microservice

__Note!__ This code base isn't what npm uses to serve download counts anymore, and its documentation is likely to drift out of correctness as time passes. See [the registry API documentation](https://github.com/npm/registry/blob/master/docs/download-counts.md) for up-to-date usage info.

Gives you download counts. Eventually, maybe other stuff.

Our blog has an explanation of [how npm download counts work](http://blog.npmjs.org/post/92574016600/numeric-precision-matters-how-npm-download-counts), including "what counts as a download?"

## Data source

npm's raw log data is continuously written to a series of buckets on AWS S3. Once per day, soon
after UTC midnight, a map-reduce cluster is spun up that crunches the previous day's logs and
pushes them into the database. Because this is UTC this creates some slightly unintuitive results,
e.g. if you are on the west coast on the 19th of September, the data for the 19th of September will
become available at 5pm (because UTC already moved to the 20th) during the winter, but not until 6pm
during the summer, because the US observes daylight savings but UTC is fixed.

## Point values

Gets the total downloads for a given period, for all packages or a specific package.

GET https://api.npmjs.org/downloads/point/{period}[/{package}]

### Examples

All packages, last day:: /downloads/point/last-day
All packages, specific date:: /downloads/point/2014-02-01
Package "express", last week:: /downloads/point/last-week/express
Package "express", given 7-day period:: /downloads/point/2014-02-01:2014-02-08/express
Package "jquery", last 30 days:: /downloads/point/last-month/jquery
Package "jquery", specific month:: /downloads/point/2014-01-01:2014-01-31/jquery

### Parameters

Acceptable values are:

last-day: Gets downloads for the last available day. In practice, this will usually be "yesterday" (in GMT) but if stats for that day have not yet landed, it will be the day before.
last-week: Gets downloads for the last 7 available days.
last-month: Gets downloads for the last 30 available days.

### Output

The following incredibly simple JSON is the output:

```javascript
{
downloads: 31623,
start: "2014-01-01",
end: "2014-01-31",
package: "jquery"
}
```

If you have not specified a package, that key will not be present. The start and end dates are inclusive.

## Ranges

Gets the downloads per day for a given period, for all packages or a specific package.

GET https://api.npmjs.org/downloads/range/{period}[/{package}]

### Examples

Downloads per day, last 7 days: /downloads/range/last-week
Downloads per day, specific 7 days: /downloads/range/2014-02-07:2014-02-14
Downloads per day, last 30 days: /downloads/range/last-month/jquery
Downloads per day, specific 30 day period: /downloads/range/2014-01-03:2014-02-03/jquery

### Parameters

Same as for /downloads/point.

### Output

Responses are very similar to the point API, except that downloads is now an array of days with downloads on each day:

```javascript
{
downloads: [
{
day: "2014-02-27",
downloads: 1904088
},
..
{
day: "2014-03-04",
downloads: 7904294
}
],
start: "2014-02-25",
end: "2014-03-04",
package: "somepackage"
}
```

As before, the package key will not be present if you have not specified a package.

## Bulk Queries

To perform a bulk query, you can hit the range or point endpoints with a comma
separated list of packages rather than a single package, e.g.,

`/downloads/point/last-day/npm,express`

## Development

The code requires node and a mysql database to talk to. We have a conveniently
pre-configured VM available for download. First, install VirtualBox:

https://www.virtualbox.org/wiki/Downloads

And then install Vagrant:

https://www.vagrantup.com/downloads.html

Now just cd into the root of this repo and run

vagrant up

When you see "Done!" you are ready to rock.

### Running the web service

Install dependencies:

npm install

You will need a config file:

cp test/config.dev.js config.js

For development, you shouldn't need to change anything in here
unless your VM didn't come up at the usual IP (192.168.33.10)

Run the server on port 3000:

node index.js 3000

Test that it's working:

curl "http://localhost:3000/downloads/point/2014-03-01"

You can ssh into the VM to play with MySQL or whatever:

vagrant ssh

### Importing data from S3 (npm, Inc. only)

New data is generated daily and stored in S3. You can get it with the
backfill script like so:

node scripts/backfill.js YYYY-MM-DD N

YYYY-MM-DD is the date you want new data to start. If omitted,
it will start importing from the first available data, which is
a bad idea except when creating a new production host

N is the number of days to import after that date. If omitted,
it will import all available days. So to get everything after
April 1, for instance, run

node scripts/backfill.js 2014-04-01

For the AWS JS SDK to work, you must have a `~/.aws/credentials` file
containing

```
aws_access_key_id = XXXXX
aws_secret_access_key = YYYYY
```

Where X and Y are your AWS access credentials. The production server has
its own credentials specifically for this purpose.