{"id":19232058,"url":"https://github.com/0xibra/blackfeed","last_synced_at":"2025-02-23T11:27:37.552Z","repository":{"id":57415500,"uuid":"227772011","full_name":"0xIbra/blackfeed","owner":"0xIbra","description":"Python package that allows you easily and elastically download thousands of files concurrently.","archived":false,"fork":false,"pushed_at":"2022-04-30T16:03:57.000Z","size":65,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-01-30T23:47:39.221Z","etag":null,"topics":["batch-processing","concurrency","downloader","http","python"],"latest_commit_sha":null,"homepage":"https://pypi.org/project/blackfeed/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/0xIbra.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2019-12-13T06:29:54.000Z","updated_at":"2022-04-30T16:04:01.000Z","dependencies_parsed_at":"2022-09-26T17:40:45.110Z","dependency_job_id":null,"html_url":"https://github.com/0xIbra/blackfeed","commit_stats":null,"previous_names":["ibra-akv/blackfeed"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xIbra%2Fblackfeed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xIbra%2Fblackfeed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xIbra%2Fblackfeed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/0xIbra%2Fblackfeed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/0xIbra","download_url":"https://codeload.github.com/0xIbra/blackfeed/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240307371,"owners_count":19780813,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["batch-processing","concurrency","downloader","http","python"],"created_at":"2024-11-09T16:05:13.954Z","updated_at":"2025-02-23T11:27:37.518Z","avatar_url":"https://github.com/0xIbra.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# BlackFeed\n\u003e BlackFeed is a micro python library that allows you download and upload files concurrently.\n\u003e You can download your files locally but you can also upload them to your cloud without writing them to disk.\n\n### Packages required\n\u003e Installed automatically with **pip**\n- requests\n- boto3\n\n## Install\n```bash\npip install blackfeed\n```\n\n## Usage\nDownload and upload files to AWS S3\n**For this to work, AWS CLI must be configured**\n```python\nfrom blackfeed import Downloader\nfrom blackfeed.adapter import S3Adapter\n\nqueue = [\n    {\n        'url': 'https://www.example.com/path/to/image.jpg', # Required\n        'destination': 'some/key/image.jpg' # S3 key - Required \n    },{\n        'url': 'https://www.example.com/path/to/image2.jpg',\n        'destination': 'some/key/image2.jpg' \n    }\n]\n\ndownloader = Downloader(\n    S3Adapter(bucket='bucketname'),\n    multi=True, # If true, uploads files to images to S3 with multithreading\n    stateless=False # If set to False, it generates and stores md5 hashes of files in a file\n    state_id='flux_states' # name of the file where hashes will be stored (states.txt) not required\n    bulksize=200 # Number of concurrent downloads\n)\ndownloader.process(queue)\nstats = downloader.get_stats() # Returns a dict with information about the process\n```\n\n### Download files with states\nLoading states can be useful if you don't want to re-download the same file twice.\n```python\nfrom blackfeed import Downloader\nfrom blackfeed.adapter import S3Adapter\n\nqueue = [\n...\n]\n\ndownloader = Downloader(\n    S3Adapter(bucket='bucketname'),\n    multi=True,\n    stateless=False,\n    state_id='filename'\n)\n\n# You can add a callback function if needed\n# This function will be called after each bulk is processed\ndef callback(responses):\n    # response: {\n    #    'destination': destination of the file can be local or can be S3 key,\n    #    'url': URL from where the file was downloaded,\n    #    'httpcode': HTTP code returned by the server,\n    #    'status': True|False,\n    #    'content-type': Mime type of the downloaded resource Example: image/jpeg\n    # }\n    # responses: response[]\n\n    pass # Your logic\n\ndownloader.set_callback(callback)\n\ndownloader.load_states('filename') # This will load states from \"filename.txt\"\ndownloader.process(queue)\nstats = downloader.get_stats() # Statistics \n```\n\n## ElasticDownloader\n\u003e Let's you to download/retrieve files from FTP, SFTP and HTTP/S servers easily.\n\n### Examples\n#### Downloading file from FTP \n```python\nfrom blackfeed import ElasticDownloader\n\nuri = 'ftp://user:password@ftp.server.com/path/to/file.csv'\n\nretriever = ElasticDownloader()\nres = retriever.download(uri, localpath='/tmp/myfile.csv') # localfile is optional\n# .download() function returns False if there was an error or return the local path of the downloaded file if it was a success.\nprint(res)\n```\n```bash\n/tmp/myfile.csv\n```\n\n### Retrieving binary content of file from FTP\n```python\nfrom blackfeed import ElasticDownloader\n\nuri = 'ftp://user:password@ftp.server.com/path/to/file.csv'\n\nretriever = ElasticDownloader()\nres = retriever.retrieve(uri) # Return type: io.BytesIO | False\n\nwith open('/tmp/myfile.csv', 'wb') as f:\n    f.write(res.getvalue())\n```\n**ElasticDownloader** can handle FTP, SFTP and HTTP URIs automatically.\nUse the method **download** to download file locally and use the **retrieve** method to get the binary content of a file.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xibra%2Fblackfeed","html_url":"https://awesome.ecosyste.ms/projects/github.com%2F0xibra%2Fblackfeed","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2F0xibra%2Fblackfeed/lists"}