{"id":31956283,"url":"https://github.com/davanstrien/data-lifeboat-converter","last_synced_at":"2026-04-02T02:05:45.068Z","repository":{"id":300275858,"uuid":"1004904420","full_name":"davanstrien/data-lifeboat-converter","owner":"davanstrien","description":"Convert Data Lifeboats from Flickr Foundation to HuggingFace datasets and Spaces","archived":false,"fork":false,"pushed_at":"2025-06-20T10:41:38.000Z","size":238,"stargazers_count":2,"open_issues_count":1,"forks_count":0,"subscribers_count":0,"default_branch":"main","last_synced_at":"2025-10-23T19:52:00.845Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/davanstrien.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-06-19T11:07:10.000Z","updated_at":"2025-08-30T23:46:26.000Z","dependencies_parsed_at":"2025-06-20T20:13:06.954Z","dependency_job_id":"c1ce5084-1f6e-4b3c-abce-bad48825f8f0","html_url":"https://github.com/davanstrien/data-lifeboat-converter","commit_stats":null,"previous_names":["davanstrien/data-lifeboat-converter"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/davanstrien/data-lifeboat-converter","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fdata-lifeboat-converter","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fdata-lifeboat-converter/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fdata-lifeboat-converter/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fdata-lifeboat-converter/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/davanstrien","download_url":"https://codeload.github.com/davanstrien/data-lifeboat-converter/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/davanstrien%2Fdata-lifeboat-converter/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":31294398,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-04-02T01:43:37.129Z","status":"online","status_checked_at":"2026-04-02T02:00:08.535Z","response_time":89,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-10-14T14:48:18.646Z","updated_at":"2026-04-02T02:05:45.021Z","avatar_url":"https://github.com/davanstrien.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Data Lifeboat to HuggingFace Converter\n\nConvert [Data Lifeboats](https://www.flickr.org/new-research-report-on-data-lifeboat/) from the [Flickr Foundation](https://www.flickr.org/) into HuggingFace datasets and interactive Spaces for preservation, research, and machine learning.\n\n## 🚀 Quick Start\n\n### Installation Options\n\nWe suggest using [`uv`](https://docs.astral.sh/uv/) for installing this tool. \n\n#### Option 1: Install Tool Globally (Recommended)\n\n```bash\n# Install uv (if not already installed)\ncurl -LsSf https://astral.sh/uv/install.sh | sh\n\n# Install the tool globally from GitHub\nuv tool install git+https://github.com/davanstrien/data-lifeboat-converter.git\n\n# Now use it anywhere\nlifeboat-to-hf path/to/Data_Lifeboat --push-to-hub username/dataset-name\n```\n\n#### Option 2: Run Directly from GitHub\n\n```bash\n# Run without installing (downloads and runs each time)\nuvx --from git+https://github.com/davanstrien/data-lifeboat-converter.git lifeboat-to-hf path/to/Data_Lifeboat --push-to-hub username/dataset-name\n```\n\n#### Option 3: Clone and Run Locally\n\n```bash\n# Clone repository and run with uv\ngit clone https://github.com/davanstrien/data-lifeboat-converter.git\ncd data-lifeboat-converter\nuv run lifeboat_to_hf_dataset.py path/to/Data_Lifeboat --push-to-hub username/dataset-name\n```\n\n### Basic Usage\n\n```bash\n# Upload dataset to HuggingFace Hub (creates both raw and processed versions)\nlifeboat-to-hf path/to/Data_Lifeboat --push-to-hub username/dataset-name\n\n# Upload dataset + create interactive Space in one command\nlifeboat-to-hf path/to/Data_Lifeboat --push-to-hub username/dataset-name --create-space username/space-name\n\n# Save dataset locally for testing\nlifeboat-to-hf path/to/Data_Lifeboat --save-local ./output\n```\n\n## 📦 What This Tool Creates\n\n### 1. **Raw Data Lifeboat** (`repo-name-raw`)\n- **Purpose:** Digital preservation in original format\n- **Contents:** Complete Data Lifeboat archive with built-in viewer\n- **Access:** `data/LIFEBOAT_NAME/viewer/index.html`\n- **Use cases:** Archival research, digital humanities, web archaeology\n\n### 2. **Processed Dataset** (`repo-name`)\n- **Purpose:** Machine learning and analysis\n- **Contents:** HuggingFace Dataset with Image features and structured metadata\n- **Access:** Standard `datasets` library compatibility\n- **Use cases:** Computer vision, NLP research, data analysis\n\n### 3. **Interactive Space** (optional)\n- **Purpose:** Web-based Data Lifeboat viewer\n- **Technology:** Dynamic Docker Space with runtime download\n- **Features:** Full Data Lifeboat functionality, any size support\n- **Use cases:** Public access, demos, collaborative research\n\n## 📋 Examples\n\n### Upload Datasets\n\n```bash\n# Upload both raw and processed versions (default)\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/flickr-commons-1k\n\n# Upload only raw Data Lifeboat\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/flickr-commons-1k --raw-only\n\n# Upload only processed dataset\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/flickr-commons-1k --processed-only\n\n# Make repositories private\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/flickr-commons-1k --private\n```\n\n### Create Interactive Spaces\n\n```bash\n# Create Space (auto-detects raw dataset from upload)\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/dataset --create-space myusername/viewer\n\n# Create Space with custom raw dataset source\nlifeboat-to-hf Commons_1K_2025 --create-space myusername/viewer --raw-dataset-repo-id myusername/existing-raw\n\n# Create private Space\nlifeboat-to-hf Commons_1K_2025 --push-to-hub myusername/dataset --create-space myusername/viewer --private\n```\n\n### Local Development\n\n```bash\n# Save locally for testing\nlifeboat-to-hf Commons_1K_2025 --save-local ./my-dataset\n\n# Process different format options locally\nlifeboat-to-hf Commons_1K_2025 --save-local ./output --processed-only\n```\n\n## 🔧 Command Options\n\n| Option | Description | Example |\n|--------|-------------|---------|\n| `lifeboat_path` | Path to Data Lifeboat directory (required) | `Commons_1K_2025` |\n| `--push-to-hub` | Upload to HuggingFace Hub | `username/dataset-name` |\n| `--create-space` | Create interactive Space | `username/space-name` |\n| `--raw-dataset-repo-id` | Raw dataset source (auto-detected if not specified) | `username/raw-repo` |\n| `--save-local` | Save dataset locally instead of uploading | `./output-directory` |\n| `--private` | Make repositories private | (flag) |\n| `--raw-only` | Upload only raw Data Lifeboat | (flag) |\n| `--processed-only` | Upload only processed dataset | (flag) |\n\n## 🌐 Interactive Spaces\n\nDynamic Spaces download Data Lifeboats at runtime, enabling hosting of any size collection:\n\n### How They Work\n1. User visits Space URL\n2. Docker container downloads raw Data Lifeboat from HuggingFace Hub\n3. HTTP server serves the complete archive\n4. All Data Lifeboat features work: viewer, search, metadata browsing\n\n### Benefits\n- **Big size limits** - Can host multi-gigabyte collections\n- **Automatic provisioning** - Content downloaded when Space starts\n- **Archival integrity** - Serves Data Lifeboat exactly as created\n- **Free hosting** - Leverages HuggingFace infrastructure\n\n## 📊 Dataset Features\n\n### Processed Dataset Structure\n```python\nfrom datasets import load_dataset\n\n# Load the dataset\ndataset = load_dataset(\"username/dataset-name\")\n\n# Access features\nprint(dataset.features)\n# ['image', 'thumbnail', 'photo_id', 'title', 'description', \n#  'uploader_username', 'license_label', 'date_taken', 'tags', ...]\n\n# Work with images and metadata\nfor example in dataset['train']:\n    image = example['image']  # PIL Image\n    title = example['title']\n    tags = example['tags']    # List of tag strings\n    # ... analysis code\n```\n\n### Rich Metadata Included\n- **Images:** Original photos + thumbnails\n- **Descriptive:** Titles, descriptions, tags\n- **Technical:** Upload dates, formats, dimensions\n- **Social:** View counts, favorites, comments\n- **Geographic:** Latitude/longitude (when available)\n- **Legal:** License information and URLs\n\n## 🔐 Authentication\n\nFor private repositories or uploading, authenticate with HuggingFace:\n\n```bash\n# Login to HuggingFace Hub\nhuggingface-cli login\n\n# Or set token as environment variable\nexport HUGGINGFACE_HUB_TOKEN=\"your_token_here\"\n```\n\n## 🐛 Troubleshooting\n\n### Common Issues\n\n**\"Repository not found\" errors:**\n- Ensure you're authenticated for private repos\n- Check repository names are spelled correctly\n- Verify the raw dataset exists when creating Spaces\n\n**Memory issues with large collections:**\n- Use `--raw-only` for very large Data Lifeboats\n- Process smaller subsets locally first with `--save-local`\n\n**Space startup failures:**\n- Check the raw dataset repository exists and is accessible\n- Verify the Space logs in the HuggingFace Spaces interface\n\n### Getting Help\n\n1. Check the Space logs on HuggingFace for runtime errors\n2. Verify your Data Lifeboat has the expected structure\n3. Test locally with `--save-local` before uploading\n\n## 📚 About Data Lifeboats\n\nData Lifeboats are self-contained digital preservation archives created by the [Flickr Foundation](https://www.flickr.org/). They preserve not just images, but complete social and cultural context:\n\n- ✅ Original high-quality photos\n- ✅ Complete metadata (titles, descriptions, tags, dates, locations)\n- ✅ Interactive web viewer (no external dependencies)\n- ✅ Structured data for research and analysis\n\nLearn more about Data Lifeboats and their role in digital preservation at [flickr.org](https://www.flickr.org/).\n\n---\n\n*This tool helps bridge digital preservation (Data Lifeboats) with modern ML/research infrastructure (HuggingFace), ensuring valuable cultural collections remain accessible for future generations.*\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavanstrien%2Fdata-lifeboat-converter","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdavanstrien%2Fdata-lifeboat-converter","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdavanstrien%2Fdata-lifeboat-converter/lists"}