Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/openknowledge-archive/activityapi

[Deprecated] An API which aggregates online activity of the Open Knowledge
https://github.com/openknowledge-archive/activityapi

Last synced: 4 months ago
JSON representation

[Deprecated] An API which aggregates online activity of the Open Knowledge

Host: GitHub
URL: https://github.com/openknowledge-archive/activityapi
Owner: openknowledge-archive
Archived: true
Created: 2012-07-31T23:30:24.000Z (over 12 years ago)
Default Branch: master
Last Pushed: 2021-12-13T19:39:45.000Z (about 3 years ago)
Last Synced: 2024-08-01T12:18:53.683Z (7 months ago)
Language: CSS
Homepage:
Size: 300 KB
Stars: 18
Watchers: 23
Forks: 1
Open Issues: 11
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

awesome-starred - openknowledge-archive/activityapi - [Deprecated] An API which aggregates online activity of the Open Knowledge (others)

README

# OKFN Activity API

http://activityapi.herokuapp.com

http://okfnlabs.org/dashboard

## Local installation

#### Prerequisites:

# Create a Postgres database and configure an environment variable:
export DATABASE_URL=postgresql://user:pass@localhost/mydatabase

#### Run the front-end

# Install the dashboard requirements
virtualenv env.dashboard
. env.dashboard/bin/activate
pip install -r requirements.txt
# Start the server
./frontend.py

#### Run the Twitter scraper

# Grab a daily snapshot of twitter statistics for our accounts.
# The account list is coded into the script.
./snapshot_twitter.py --verbose

#### Run the Facebook scraper

# Grab a daily snapshot of facebook likes statistics for our account.
./snapshot_facebook.py --verbose

#### Run the Google Analytics scraper

# This runs a daily job to snapshot the web hits of all our websites.
# You'll need to mess around with getting OAuth set up to create a sample.dat authentication file.
# Eg. try the example scripts from: https://developers.google.com/analytics/solutions/articles/hello-analytics-api'
# We store this file into an environment variable, GOOGLEANALYTICS_AUTH.
# If you're hooked up to heroku, you can just steal it from there (heroku run 'echo $GOOGLEANALYTICS_AUTH')
./snapshot_googleanalytics.py daily --verbose

#### Run the Github scraper

# This runs a daily job to snapshot the watchers, forks and other statistics of our repositories.
./snapshot_github.py daily --verbose

#### Run the Mailman scraper

# The scraper connects to the public Mailman archives to download user activity.
./getactivity_mailman.py --verbose

# It also runs a daily job to snapshot the number of subscribers and daily posts to each mailing list.
# This accesses the subscriber roster, which requires a password set in an environment variable.
# NOTE: This will record 0 posts-per-day unless the activity scraper has been run first!
export MAILMAN_ADMIN=password1234...
./snapshot_mailman.py --verbose

## Hooking up to Heroku

The system is deployed on Heroku. `Procfile` declares the web frontend to be run. The Scheduler add-on runs each of the `snapshot_*.py` and `getactivity_*.py` scripts at regular intervals to keep the database up-to-date.

Installation:

* Get set up with the Heroku toolbelt
* Get your Heroku account authorized as a collaborator on the activityapi project
* Add the heroku remote and start pushing. Try to keep it in sync with Github!

git remote add heroku [email protected]:activityapi.git

## Interactive Mode

You can interact with the database and scrapers via Python's interactive terminal. For example:

(%activityapi)$ DATABASE_URL=postgresql://localhost/zephod python
Python 2.7.1 (r271:86832, Jun 16 2011, 16:59:05)
[GCC 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2335.15.00)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from lib.backend import model,Session

Pull snapshots from the database:

>>> query = Session.query(model.SnapshotOfFacebook).order_by(model.SnapshotOfFacebook.timestamp.desc()).limit(5)
>>> query[0].toJson()
{'timestamp': '2013-02-20', 'likes': 2840}
>>>
>>> from pprint import pprint
>>> pprint( [ x.toJson() for x in query ] )
[{'likes': 2840, 'timestamp': '2013-02-20'},
{'likes': 2820, 'timestamp': '2013-02-18'},
{'likes': 2812, 'timestamp': '2013-02-17'},
{'likes': 2798, 'timestamp': '2013-02-16'},
{'likes': 2775, 'timestamp': '2013-02-15'}]