https://github.com/dbaio/portsfallout
An easy way to search/query the FreeBSD pkg-fallout archive
https://github.com/dbaio/portsfallout
django django-application django-rest-framework hacktoberfest python python-script python3 scraper scraping scraping-python scraping-websites scrapy
Last synced: 6 months ago
JSON representation
An easy way to search/query the FreeBSD pkg-fallout archive
- Host: GitHub
- URL: https://github.com/dbaio/portsfallout
- Owner: dbaio
- License: bsd-2-clause
- Created: 2020-07-31T01:13:13.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2025-03-11T10:09:03.000Z (12 months ago)
- Last Synced: 2025-09-01T19:54:19.270Z (6 months ago)
- Topics: django, django-application, django-rest-framework, hacktoberfest, python, python-script, python3, scraper, scraping, scraping-python, scraping-websites, scrapy
- Language: Python
- Homepage: https://portsfallout.com/
- Size: 445 KB
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.rst
- Changelog: changelog.rst
- License: LICENSE
Awesome Lists containing this project
README
Ports Fallout
=============
https://portsfallout.com/
- Django application
- Web crawling (Scrapy)
An easy way to search the FreeBSD pkg-fallout reports.
Be nice!
Running
-------
Install all requirements:
::
django
requests
scrapy
djangorestframework
python-dateutil
dnspython
Copy the sample ``settings.py`` and configure your database access:
::
$ cp portsfallout/settings_dev.py portsfallout/settings.py
Create initial database:
::
$ python manage.py migrate
Operations to perform:
Apply all migrations: admin, auth, contenttypes, ports, sessions
Running migrations:
Applying contenttypes.0001_initial... OK
Applying auth.0001_initial... OK
Applying admin.0001_initial... OK
Applying admin.0002_logentry_remove_auto_add... OK
Applying admin.0003_logentry_add_action_flag_choices... OK
Applying contenttypes.0002_remove_content_type_name... OK
Applying auth.0002_alter_permission_name_max_length... OK
Applying auth.0003_alter_user_email_max_length... OK
Applying auth.0004_alter_user_username_opts... OK
Applying auth.0005_alter_user_last_login_null... OK
Applying auth.0006_require_contenttypes_0002... OK
Applying auth.0007_alter_validators_add_error_messages... OK
Applying auth.0008_alter_user_username_max_length... OK
Applying auth.0009_alter_user_last_name_max_length... OK
Applying auth.0010_alter_group_name_max_length... OK
Applying auth.0011_update_proxy_permissions... OK
Applying ports.0001_initial... OK
Applying sessions.0001_initial... OK
Populate database (ports and fallout info):
::
$ ./scripts/cron-import-index.sh
$ ./scripts/cron-scrapy.sh
Start web-server:
::
$ python manage.py runserver
You can also fetch older fallouts:
::
$ cd scripts
Crawling messages from an specific month / Verbose
$ scrapy runspider -O scrapy_output/2021-May.json \
-a scrapydate="2021-May" pkgfallout_scrapy_spider.py
Then import all .json files to database:
$ python import-scrapy.py
More info in ``scripts/pkgfallout_scrapy_spider.py``.
Cron jobs
---------
Execution for keeping the database always updated:
::
# Update ports tree reference in the database
30 0 * * * /portsfallout/scripts/cron-import-index.sh
# Fetch/import all pkg-fallout's reports from the Mlmmj archive of the
# current month. Requests are cached, only new fallouts are fetched.
45 0 * * * /portsfallout/scripts/cron-scrapy.sh
# Fetch/import pkg-fallout's from the last month
30 10 * * * /portsfallout/scripts/cron-scrapy.sh lastmonth
# Update DNS values of the pkg-fallout servers
45 3 * * * python manage.py server_update
45 3 * * * python manage.py server_update -v 0 # no output