{"id":28410638,"url":"https://github.com/chriskyfung/instapaperscraper","last_synced_at":"2026-05-21T13:34:41.318Z","repository":{"id":296333464,"uuid":"993009273","full_name":"chriskyfung/InstapaperScraper","owner":"chriskyfung","description":"Effortlessly scrape Instapaper bookmarks and format them into CSV, JSON, and SQLite using Python—no API key required","archived":false,"fork":false,"pushed_at":"2026-04-28T09:49:06.000Z","size":342,"stargazers_count":3,"open_issues_count":0,"forks_count":1,"subscribers_count":0,"default_branch":"master","last_synced_at":"2026-04-28T10:32:07.117Z","etag":null,"topics":["bookmark-manager","csv-export","csv-exporter","data-exporter","instapaper","json-export","json-exporter","scraper","sqlite-exporter","web-scraping"],"latest_commit_sha":null,"homepage":"https://chriskyfung.github.io/blog/productivity/instapaper-scraper-v1/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/chriskyfung.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE","code_of_conduct":".github/CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":".github/SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null},"funding":{"github":"chriskyfung","buy_me_a_coffee":"chriskyfung"}},"created_at":"2025-05-30T04:13:21.000Z","updated_at":"2026-04-28T09:48:47.000Z","dependencies_parsed_at":"2025-05-30T05:31:33.269Z","dependency_job_id":"92772f35-84a3-404a-8f41-881629218304","html_url":"https://github.com/chriskyfung/InstapaperScraper","commit_stats":null,"previous_names":["chriskyfung/instapaperscraper"],"tags_count":16,"template":false,"template_full_name":null,"purl":"pkg:github/chriskyfung/InstapaperScraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriskyfung%2FInstapaperScraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriskyfung%2FInstapaperScraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriskyfung%2FInstapaperScraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriskyfung%2FInstapaperScraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/chriskyfung","download_url":"https://codeload.github.com/chriskyfung/InstapaperScraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/chriskyfung%2FInstapaperScraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33302546,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-21T12:23:38.849Z","status":"ssl_error","status_checked_at":"2026-05-21T12:22:11.673Z","response_time":62,"last_error":"SSL_read: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bookmark-manager","csv-export","csv-exporter","data-exporter","instapaper","json-export","json-exporter","scraper","sqlite-exporter","web-scraping"],"created_at":"2025-06-02T12:08:13.463Z","updated_at":"2026-05-21T13:34:41.306Z","avatar_url":"https://github.com/chriskyfung.png","language":"Python","funding_links":["https://github.com/sponsors/chriskyfung","https://buymeacoffee.com/chriskyfung","https://www.buymeacoffee.com/chriskyfung","https://github.com/sponsors/chriskyfung):","https://www.buymeacoffee.com/chriskyfung):"],"categories":[],"sub_categories":[],"readme":"# Instapaper Scraper\n\n\u003c!-- Badges --\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://pypi.org/project/instapaper-scraper/\"\u003e\n    \u003cimg src=\"https://img.shields.io/pypi/v/instapaper-scraper.svg\" alt=\"PyPI version\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://pepy.tech/projects/instapaper-scraper\"\u003e\n    \u003cimg src=\"https://static.pepy.tech/personalized-badge/instapaper-scraper?period=total\u0026left_text=downloads\" alt=\"PyPI Downloads\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/chriskyfung/InstapaperScraper\"\u003e\n    \u003cimg src=\"https://img.shields.io/python/required-version-toml?tomlFilePath=https%3A%2F%2Fraw.githubusercontent.com%2Fchriskyfung%2FInstapaperScraper%2Frefs%2Fheads%2Fmaster%2Fpyproject.toml\" alt=\"Python Version from PEP 621 TOML\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://github.com/astral-sh/ruff\"\u003e\n    \u003cimg src=\"https://img.shields.io/endpoint?url=https%3A%2F%2Fraw.githubusercontent.com%2Fastral-sh%2Fruff%2Fmain%2Fassets%2Fbadge%2Fv2.json\" alt=\"Ruff\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://codecov.io/gh/chriskyfung/InstapaperScraper\"\u003e\n    \u003cimg src=\"https://codecov.io/gh/chriskyfung/InstapaperScraper/graph/badge.svg\" alt=\"Code Coverage\"\u003e\n  \u003c/a\u003e\n  \u003cwbr /\u003e\n  \u003ca href=\"https://github.com/chriskyfung/InstapaperScraper/actions/workflows/ci.yml\"\u003e\n    \u003cimg src=\"https://github.com/chriskyfung/InstapaperScraper/actions/workflows/ci.yml/badge.svg\" alt=\"CI Status\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.gnu.org/licenses/gpl-3.0.en.html\"\u003e\n    \u003cimg src=\"https://img.shields.io/github/license/chriskyfung/InstapaperScraper\" alt=\"GitHub License\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://deepwiki.com/chriskyfung/InstapaperScraper\"\u003e\n    \u003cimg src=\"https://deepwiki.com/badge.svg\" alt=\"Ask DeepWiki\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\nA powerful and reliable Python tool to automate the export of all your saved Instapaper bookmarks into various formats, giving you full ownership of your data.\n\n\u003c!-- Sponsors --\u003e\n\u003cp align=\"center\"\u003e\n  \u003ca href=\"https://github.com/sponsors/chriskyfung\" title=\"Sponsor on GitHub\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Sponsor-GitHub-blue?style=for-the-badge\u0026logo=github-sponsors\u0026colorA=263238\u0026colorB=EC407A\" alt=\"GitHub Sponsors Default\"\u003e\n  \u003c/a\u003e\n  \u003ca href=\"https://www.buymeacoffee.com/chriskyfung\" title=\"Support Coffee\"\u003e\n    \u003cimg src=\"https://img.shields.io/badge/Support-Coffee-ffdd00?style=for-the-badge\u0026logo=buy-me-a-coffee\u0026logoColor=ffdd00\u0026colorA=263238\" alt=\"Buy Me A Coffee\"\u003e\n  \u003c/a\u003e\n\u003c/p\u003e\n\n## ✨ Features\n\n- Scrapes all bookmarks from your Instapaper account.\n- Supports scraping from specific folders, including the special \"Liked\" and \"Archive\" collections.\n- Exports data to CSV, JSON, or a SQLite database.\n- Securely stores your session for future runs.\n- Modern, modular, and tested architecture.\n\n## 🚀 Getting Started\n\n### 📋 1. Requirements\n\n- Python 3.10+\n\n### 📦 2. Installation\n\nThis package is available on PyPI and can be installed with pip:\n\n```sh\npip install instapaper-scraper\n```\n\n### 💻 3. Usage\n\nRun the tool from the command line, specifying your desired output format:\n\n```sh\n# Scrape and export to the default CSV format\ninstapaper-scraper\n\n# Scrape and export to JSON\ninstapaper-scraper --format json\n\n# Scrape and export to a SQLite database with a custom name\ninstapaper-scraper --format sqlite --output my_articles.db\n```\n\n## ⚙️ Configuration\n\n### 🔐 Authentication\n\nThe script authenticates using one of the following methods, in order of priority:\n\n1. **Command-line Arguments**: Provide your username and password directly when running the script:\n\n    ```sh\n    instapaper-scraper --username your_username --password your_password\n    ```\n\n2. **Session Files (`.session_key`, `.instapaper_session`)**: The script attempts to load these files in the following order:\n    a.  Path specified by `--session-file` or `--key-file` arguments.\n    b.  Files in the current working directory (e.g., `./.session_key`).\n    c.  Files in the user's configuration directory (`~/.config/instapaper-scraper/`).\n    After the first successful login, the script creates an encrypted `.instapaper_session` file and a `.session_key` file to reuse your session securely.\n\n3. **Interactive Prompt**: If no other method is available, the script will prompt you for your username and password.\n\n\u003e **Note on Security:** Your session file (`.instapaper_session`) and the encryption key (`.session_key`) are stored with secure permissions (read/write for the owner only) to protect your credentials.\n\n### 📁 Folder and Field Configuration\n\nYou can define and quickly access your Instapaper folders and set default output fields using a `config.toml` file. The scraper will look for this file in the following locations (in order of precedence):\n\n1. The path specified by the `--config-path` argument.\n2. `config.toml` in the current working directory.\n3. `~/.config/instapaper-scraper/config.toml`\n\nHere is an example of `config.toml`:\n\n```toml\n# Default output filename for the main unread article list\noutput_filename = \"unread-articles.csv\"\n\n# Default output filenames for special collections\nliked_output_filename = \"liked-articles.csv\"\narchive_output_filename = \"archive-articles.csv\"\n\n# Optional fields to include in the output.\n# These can be overridden by command-line flags.\n[fields]\nread_url = false\narticle_preview = false\n\n# Default output format. Can be \"csv\", \"json\", or \"sqlite\".\n# This can be overridden by the --format command-line flag.\n[output]\nformat = \"csv\"\n\n[[folders]]\nkey = \"ml\"\nid = \"1234567\"\nslug = \"machine-learning\"\noutput_filename = \"ml-articles.json\"\n\n[[folders]]\nkey = \"python\"\nid = \"7654321\"\nslug = \"python-programming\"\noutput_filename = \"python-articles.db\"\n```\n\n- **output_filename (top-level)**: The default output filename to use when scraping the main unread articles list.\n- **liked_output_filename**: The default output filename for the **Liked** collection.\n- **archive_output_filename**: The default output filename for the **Archive** collection.\n- **[fields]**: A section to control which optional data fields are included in the output.\n    -   `read_url`: Set to `true` to include the Instapaper read URL for each article.\n    -   `article_preview`: Set to `true` to include the article's text preview.\n- **[output]**: A section to control the output file generation.\n    -   `format`: Sets the default output format (`csv`, `json`, or `sqlite`). This is overridden by the `--format` command-line flag.\n- **[[folders]]**: Each `[[folders]]` block defines a specific folder.\n    -   **key**: A short alias for the folder.\n    -   **id**: The folder ID from the Instapaper URL.\n    -   **slug**: The human-readable part of the folder URL.\n    -   **output_filename (folder-specific)**: A preset output filename for scraped articles from this specific folder.\n\nWhen a `config.toml` file is present and no `--folder` argument is provided, the scraper will prompt you to select a folder. The special **Liked** and **Archive** collections will always be available as the first options.\n\nYou can also specify a folder directly using the `--folder` argument.\n- Use `--folder=liked` or `--folder=archive` to scrape these special collections.\n- For folders defined in your `config.toml`, use their `key`, `id`, or `slug`.\n- Use `--folder=none` to explicitly disable folder mode and scrape your main list of unread articles.\n\n### 💻 Command-line Arguments\n\n| Argument | Description |\n| --- | --- |\n| `--config-path \u003cpath\u003e`| Path to the configuration file. Searches `~/.config/instapaper-scraper/config.toml` and `config.toml` in the current directory by default. |\n| `--folder \u003cvalue\u003e` | Specify a folder by key, ID, or slug from your `config.toml`. **Requires a configuration file to be loaded.** Use `none` to explicitly disable folder mode. If a configuration file is not found or fails to load, and this option is used (not set to `none`), the program will exit. |\n| `--format \u003cformat\u003e` | Output format (`csv`, `json`, `sqlite`). Defaults to the value in `config.toml` or 'csv'. |\n| `--output \u003cfilename\u003e` | Specify a custom output filename. The file extension will be automatically corrected to match the selected format. |\n| `--username \u003cuser\u003e` | Your Instapaper account username. |\n| `--password \u003cpass\u003e` | Your Instapaper account password. |\n| `--[no-]read-url` | Includes the Instapaper read URL. (Old flag `--add-instapaper-url` is deprecated but supported). Can be set in `config.toml`. Overrides config. |\n| `--[no-]article-preview` | Includes the article preview text. (Old flag `--add-article-preview` is deprecated but supported). Can be set in `config.toml`. Overrides config. |\n\n### 📄 Output Formats\n\nYou can control the output format using the `--format` argument. The supported formats are:\n\n- `csv` (default): Exports data to `output/bookmarks.csv`.\n- `json`: Exports data to `output/bookmarks.json`.\n- `sqlite`: Exports data to an `articles` table in `output/bookmarks.db`.\n\nIf the `--format` flag is omitted, the script will use the format specified in `config.toml`, or default to `csv` if not configured.\n\nWhen using `--output \u003cfilename\u003e`, the file extension is automatically corrected to match the chosen format. For example, `instapaper-scraper --format json --output my_articles.txt` will create `my_articles.json`.\n\n#### 📖 Opening Articles in Instapaper\n\nThe output data includes a unique `id` for each article. You can use this ID to construct a URL to the article's reader view: `https://www.instapaper.com/read/\u003carticle_id\u003e`.\n\nFor convenience, you can use the `--read-url` flag to have the script include a full, clickable URL in the output.\n\n```sh\ninstapaper-scraper --read-url\n```\n\nThis adds a `instapaper_url` field to each article in the JSON output and a `instapaper_url` column in the CSV and SQLite outputs. The original `id` field is preserved.\n\n## 🛠️ How It Works\n\nThe tool is designed with a modular architecture for reliability and maintainability.\n\n1. **Authentication**: The `InstapaperAuthenticator` handles secure login and session management.\n2. **Scraping**: The `InstapaperClient` iterates through all pages of your bookmarks, fetching the metadata for each article with robust error handling and retries. Shared constants, like the Instapaper base URL, are managed through `src/instapaper_scraper/constants.py`.\n3. **Data Collection**: All fetched articles are aggregated into a single list.\n4. **Export**: Finally, the collected data is written to a file in your chosen format (`.csv`, `.json`, or `.db`).\n\n## 📊 Example Output\n\n### 📄 CSV (`output/bookmarks.csv`) (with --add-instapaper-url and --add-article-preview)\n\n```csv\n\"id\",\"instapaper_url\",\"title\",\"url\",\"article_preview\"\n\"999901234\",\"https://www.instapaper.com/read/999901234\",\"Article 1\",\"https://www.example.com/page-1/\",\"This is a preview of article 1.\"\n\"999002345\",\"https://www.instapaper.com/read/999002345\",\"Article 2\",\"https://www.example.com/page-2/\",\"This is a preview of article 2.\"\n```\n\n### 📄 JSON (`output/bookmarks.json`) (with --add-instapaper-url and --add-article-preview)\n\n```json\n[\n    {\n        \"id\": \"999901234\",\n        \"title\": \"Article 1\",\n        \"url\": \"https://www.example.com/page-1/\",\n        \"instapaper_url\": \"https://www.instapaper.com/read/999901234\",\n        \"article_preview\": \"This is a preview of article 1.\"\n    },\n    {\n        \"id\": \"999002345\",\n        \"title\": \"Article 2\",\n        \"url\": \"https://www.example.com/page-2/\",\n        \"instapaper_url\": \"https://www.instapaper.com/read/999002345\",\n        \"article_preview\": \"This is a preview of article 2.\"\n    }\n]\n```\n\n### 🗄️ SQLite (`output/bookmarks.db`)\n\nA SQLite database file is created with an `articles` table. The table includes `id`, `title`, and `url` columns. If the `--add-instapaper-url` flag is used, a `instapaper_url` column is also included. This feature is fully backward-compatible and will automatically adapt to the user's installed SQLite version, using an efficient generated column on modern versions (3.31.0+) and a fallback for older versions.\n\n## 🤗 Support and Community\n\n- **🐛 Bug Reports:** For any bugs or unexpected behavior, please [open an issue on GitHub](https://github.com/chriskyfung/InstapaperScraper/issues).\n- **💬 Questions \u0026 General Discussion:** For questions, feature requests, or general discussion, please use our [GitHub Discussions](https://github.com/chriskyfung/InstapaperScraper/discussions).\n\n## 🙏 Support the Project\n\n`Instapaper Scraper` is a free and open-source project that requires significant time and effort to maintain and improve. If you find this tool useful, please consider supporting its development. Your contribution helps ensure the project stays healthy, active, and continuously updated.\n\n- **[Sponsor on GitHub](https://github.com/sponsors/chriskyfung):** The best way to support the project with recurring monthly donations. Tiers with special rewards like priority support are available!\n- **[Buy Me a Coffee](https://www.buymeacoffee.com/chriskyfung):** Perfect for a one-time thank you.\n\n## 🤝 Contributing\n\nContributions are welcome! Whether it's a bug fix, a new feature, or documentation improvements, please feel free to open a pull request.\n\nPlease read the **[Contribution Guidelines](CONTRIBUTING.md)** before you start.\n\n## 🧑‍💻 Development \u0026 Testing\n\nThis project uses `pytest` for testing, `ruff` for code formatting and linting, and `mypy` for static type checking. A `Makefile` is provided to simplify common development tasks.\n\n### 🚀 Using the Makefile\n\nThe most common commands are:\n-   `make install`: Installs development dependencies.\n-   `make format`: Formats the entire codebase.\n-   `make check`: Runs the linter, type checker, and test suite.\n-   `make test`: Runs the test suite.\n-   `make build`: Builds the distributable packages.\n\nRun `make help` to see all available commands.\n\n### 🔧 Setup\n\nTo install the development dependencies:\n\n```sh\npip install -e .[dev]\n```\n\nTo set up the pre-commit hooks:\n\n```sh\npre-commit install\n```\n\n### ▶️ Running the Scraper\n\nTo run the scraper directly without installing the package:\n\n```sh\npython -m src.instapaper_scraper.cli\n```\n\n### ✅ Testing\n\nTo run the tests, execute the following command from the project root (or use `make test`):\n\n```sh\npytest\n```\n\nTo check test coverage (or use `make test-cov`):\n\n```sh\npytest --cov=src/instapaper_scraper --cov-report=term-missing\n```\n\n### ✨ Code Quality\n\nYou can use the `Makefile` for convenience (e.g., `make format`, `make lint`).\n\nTo format the code with `ruff`:\n\n```sh\nruff format .\n```\n\nTo check for linting errors with `ruff`:\n\n```sh\nruff check .\n```\n\nTo run static type checking with `mypy`:\n\n```sh\nmypy src\n```\n\nTo run license checks:\n\n```sh\nlicensecheck --zero\n```\n\n\n## 📜 Disclaimer\n\nThis script requires valid Instapaper credentials. Use it responsibly and in accordance with Instapaper’s Terms of Service.\n\n## 📄 License\n\nThis project is licensed under the terms of the **GNU General Public License v3.0**. See the [LICENSE](LICENSE) file for the full license text.\n\n## Contributors\n\n[![Contributors](https://contrib.rocks/image?repo=chriskyfung/InstapaperScraper)](https://github.com/chriskyfung/InstapaperScraper/graphs/contributors)\n\nMade with [contrib.rocks](https://contrib.rocks).\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchriskyfung%2Finstapaperscraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fchriskyfung%2Finstapaperscraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fchriskyfung%2Finstapaperscraper/lists"}