Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/csev/gmane-cache

This will function as a caching server for gmane content
https://github.com/csev/gmane-cache

Last synced: about 2 months ago
JSON representation

This will function as a caching server for gmane content

Awesome Lists containing this project

README

        

Cache for the gmane service
---------------------------

This is a front-end to cache the content of a mailing list
hosted on gmane.org primarily to off-load their site when
some other process (i.e. 10,000 students doing their homework)
is going to pound the heck out of a particular mailing list.

You can play with an implementation of this at URLs like

http://gmane.dr-chuck.net/gmane.comp.cms.sakai.devel/12/13

Where 12 and 13 are a range of message numbers. This caches the
gmane content in a MySQL database on my 1and1 ISP and then the URLs
are further cached using my CloudFlare account. You can compare
this to looking at the original from gmane at:

http://download.gmane.org/gmane.comp.cms.sakai.devel/12/13

My cached copy scales very nicely and is much quicker once the
messages have been retrieved once from gmane to my 1and1 database.

For fun, take a look at the developers console on my cached copy -
I have a little response header in there to show what is happening
behind the scenes.

Configuration
-------------

Copy the *config-dist.php* to *config.php* and edit to set up
the database tabel and various settings:

$CFG = new stdClass();

$CFG->pdo = 'mysql:host=127.0.0.1;port=8889;dbname=gmane'; // MAMP
$CFG->dbuser = 'fred';
$CFG->dbpass = 'zap';

$CFG->expire = 7*24*60*60; // A week
$CFG->maxtext = 200000;

// Only add these at the end and keep the same order unless
// you completely empty out the messages table.
$ALLOWED = array(
'gmane.comp.cms.sakai.devel'
);

Pre-Filling Your Database
-------------------------

You can run the PY4E crawler and point it at yourself by changing the base url.

python3 gmane.py

This is a restartable crawler as it storest its current state in sqlite - pitch
the in-progress database when you are done or want to restart the crawl.

rm content.sqlite