Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eepp/findtb
Tarball finder
https://github.com/eepp/findtb
crawling tarball
Last synced: 8 days ago
JSON representation
Tarball finder
- Host: GitHub
- URL: https://github.com/eepp/findtb
- Owner: eepp
- License: mit
- Created: 2020-04-20T16:35:19.000Z (over 4 years ago)
- Default Branch: master
- Last Pushed: 2021-03-12T18:20:14.000Z (over 3 years ago)
- Last Synced: 2024-10-12T00:41:30.074Z (about 1 month ago)
- Topics: crawling, tarball
- Language: Python
- Size: 7.81 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.adoc
- License: LICENSE
Awesome Lists containing this project
README
// Render with Asciidoctor
= findtb
Philippe Proulx
:toc:image:https://img.shields.io/pypi/v/findtb.svg?label=Latest%20version[link="https://pypi.python.org/pypi/findtb"]
_**findtb**_ is a Python{nbsp}3 package which finds
https://en.wikipedia.org/wiki/Tar_(computing)[tarball] URLs within an
HTML (HTTP) page or FTP directory.findtb finds tarball URLs directly into the URL's resource as well as in
minor/path version directories, if any.== Installation
Install findtb from https://pypi.org/project/findtb/[PyPI]:
----
$ sudo pip3 install findtb
----== Usage
Use `findtb.find_tarball_urls()` to get a set of all the tarball URLs
found from a given URL:[source,python]
----
import findtb
import pprint# find tarball URLs
urls = findtb.find_tarball_urls('ansible',
'https://releases.ansible.com/ansible/')# print URLs
pprint.pprint(urls)
--------
{'https://releases.ansible.com/ansible/ansible-1.1.tar.gz',
'https://releases.ansible.com/ansible/ansible-1.2.1.tar.gz',
'https://releases.ansible.com/ansible/ansible-1.2.2.tar.gz',
'https://releases.ansible.com/ansible/ansible-1.2.3.tar.gz',
'https://releases.ansible.com/ansible/ansible-1.2.tar.gz',
'https://releases.ansible.com/ansible/ansible-1.3.0.tar.gz',
...
'https://releases.ansible.com/ansible/ansible-2.9.0rc5.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.1.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.2.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.3.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.4.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.5.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.6.tar.gz',
'https://releases.ansible.com/ansible/ansible-2.9.7.tar.gz'}
----The first parameter is the project's name as expected in the tarball
names. The second parameter is the URL from which to start the search.You can pass a `logging.Logger` object to `findtb.find_tarball_urls()`.
This can be useful as the search process can be long:[source,python]
----
import findtb
import pprint
import logging# configure logging
logging.basicConfig(style='{',
format='{asctime} [{name}] {{{levelname}}}: {message}')# create logger
logger = logging.getLogger('findtb')
logger.setLevel(logging.INFO)# find tarball URLs
urls = findtb.find_tarball_urls('glib',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/',
logger)# print URLs
pprint.pprint(urls)
--------
2020-04-20 12:08:22,200 [findtb] {INFO}: FTP: connecting to `ftp.gnome.org`.
2020-04-20 12:08:27,908 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/`.
2020-04-20 12:08:28,061 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/`.
2020-04-20 12:08:28,824 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.39`.
2020-04-20 12:08:28,983 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.39`.
2020-04-20 12:08:29,770 [findtb] {INFO}: FTP: changing working directory to `..`.
2020-04-20 12:08:29,922 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.19`.
2020-04-20 12:08:30,079 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.19`.
2020-04-20 12:08:30,871 [findtb] {INFO}: FTP: changing working directory to `..`.
2020-04-20 12:08:31,027 [findtb] {INFO}: FTP: changing working directory to `/pub/gnome/sources/glib/2.60`.
2020-04-20 12:08:31,177 [findtb] {INFO}: FTP: listing files in `/pub/gnome/sources/glib/2.60`.
...
{'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.12.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.1/glib-1.1.15.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.0.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.10.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.2.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.5.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/1.2/glib-1.2.6.tar.gz',
...
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.bz2',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.4.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.bz2',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.5.tar.gz',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.bz2',
'ftp://ftp.gnome.org/pub/gnome/sources/glib/2.9/glib-2.9.6.tar.gz'}
----`findtb.find_tarball_urls()` is quite low-level: it returns a set of
strings. Use `findtb.find_tarballs()` to get a set of `findtb.Tarball`
objects instead.A `findtb.Tarball` object contains useful properties such as `name`
(tarball file name) and `version` (a `packaging.version.Version`
object):[source,python]
----
import findtb# find tarballs
tarballs = findtb.find_tarballs('python',
'https://www.python.org/ftp/python/')# print names and versions
for tarball in tarballs:
print(tarball.name, tarball.version.release)
--------
Python-3.7.0b2.tgz (3, 7, 0)
Python-3.4.0a1.tgz (3, 4, 0)
Python-2.4.6.tgz (2, 4, 6)
Python-3.3.4rc1.tar.xz (3, 3, 4)
Python-3.1.3rc1.tar.bz2 (3, 1, 3)
python-3.6.0b4-embed-win32.zip None
Python-3.7.0a1.tgz (3, 7, 0)
python-3.5.0rc3-embed-amd64.zip None
...
Python-2.2.1.tgz (2, 2, 1)
python-3.9.0a4-embed-win32.zip None
Python-2.7.1rc1.tgz (2, 7, 1)
Python-3.4.0a3.tgz (3, 4, 0)
Python-3.5.2.tar.xz (3, 5, 2)
Python-3.0a2.tgz (3, 0)
Python-3.6.6rc1.tgz (3, 6, 6)
----Any networking/parsing error raises `findtb.Error`.
== Limitations
findtb is not guaranteed to work for all projects. Its very
sophisticated algorithms rely on nasty regular expressions and there are
dozens of software versioning schemes in the wild.Feel free to contribute if findtb does not work for you.