{"id":31039152,"url":"https://github.com/urpagin/doujinstyle-scraper","last_synced_at":"2025-09-14T07:48:41.123Z","repository":{"id":310010075,"uuid":"1038365657","full_name":"Urpagin/doujinstyle-scraper","owner":"Urpagin","description":"Ethically scrapes doujinstyle.com","archived":false,"fork":false,"pushed_at":"2025-08-28T21:14:38.000Z","size":65,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-09-06T04:36:19.542Z","etag":null,"topics":["async","asynchronous","asyncio","data-extraction","doujinstyle","download-links","get-requests","html-parsing","http","http-requests","httpx","item-id-extraction","json-export","post-requests","python","python3","requests","scrape","scraper","web-scraping"],"latest_commit_sha":null,"homepage":"https://doujinstyle.com/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Urpagin.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2025-08-15T04:13:20.000Z","updated_at":"2025-08-28T21:14:42.000Z","dependencies_parsed_at":"2025-08-15T06:22:57.772Z","dependency_job_id":"40b33676-7280-430f-9256-60f4eb7e754b","html_url":"https://github.com/Urpagin/doujinstyle-scraper","commit_stats":null,"previous_names":["urpagin/doujinstyle-scraper"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Urpagin/doujinstyle-scraper","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Urpagin%2Fdoujinstyle-scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Urpagin%2Fdoujinstyle-scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Urpagin%2Fdoujinstyle-scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Urpagin%2Fdoujinstyle-scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Urpagin","download_url":"https://codeload.github.com/Urpagin/doujinstyle-scraper/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Urpagin%2Fdoujinstyle-scraper/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":275076529,"owners_count":25401315,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-09-14T02:00:10.474Z","response_time":75,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["async","asynchronous","asyncio","data-extraction","doujinstyle","download-links","get-requests","html-parsing","http","http-requests","httpx","item-id-extraction","json-export","post-requests","python","python3","requests","scrape","scraper","web-scraping"],"created_at":"2025-09-14T07:48:40.217Z","updated_at":"2025-09-14T07:48:41.102Z","avatar_url":"https://github.com/Urpagin.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# doujinstyle-scraper\n\nEthically scrapes doujinstyle.com.\n\n\u003e [!IMPORTANT]\n\u003e **Work in progress** *(nothing so far! -a literal stub right now)*\n\n## doujinstyle.com 🌐\n\n\u003e DoujinStyle functions as an index of content found publicly on the Internet\n\n*[https://doujinstyle.com/?p=dmca](https://doujinstyle.com/?p=dmca)*\n\nIn this case, \"content\" being mostly music.\n\n\u003e [!WARNING]\n\u003e Tested and developed using Python 3.13. I don't expect the app to run beneath 3.12.\n\n## Format 📦\n\nExports each entry into a singular JSON file containing W.I.P.\n\n## Time \u0026 Breakage ⏳\n\nSince we are scraping and parsing from the website's public HTTP, and not from any kind of API, it is very likely this\nproject will not\nlast long into time. The website need only become prettier, modifying or adding HTML, the existing parser will most\nlikely break.\n\nIt is also likely the website may modernize in a way that adds a cruel CAPTCHA or rate limiter.\n\nThere is also the possibility of the website being taken down, somehow. At the time of writing this, it is written \"\nVersion 3\" near the\nsite's logo, implying other versions of the website might have been taken down, or just modernized.\n\n![doujinstyle site logo](./doujinstyle-logo.png)\n\n## Motivation 💿\n\nWhile searching for a high quality FLAC recording\nof [LEMON MELON COOKIE](https://youtu.be/5l8VZEyNRH8) ([TAK](https://www.youtube.com/channel/UCktjMRvuBnE_XLVWIMa2H1w)),\nI stumbled upon this website, it immediately sparked a flame of need within me; the need to **SCRAPE**; doujinstyle.com\nlooked so *docile and scrapable*, I couldn't resist but to scrape it to the bone!\n\n## Requests \u0026 Inner Workings ⚡\n\nLet N be the number of IDs you want to fetch.\nThe program does 2 * N HTTP requests:\n\n* One HTTP GET to fetch the contents of the page item.\n* One HTTP POST on the download form to fetch the download link.\n\nPOST also redirects, it may be more than 2 * N, but for the sake of simplicity we'll say it's 2 * N.\n\nI reckon this POST request allows the website to count the number of times an item has been downloaded,\nvisible with the `# of Downloads:` label on each item.\n\n---\n\n\u003e [!NOTE]\n\u003e Replace `\u003citem_id\u003e` with the ID of the item.\n\n1. The HTTP GET URL request is like so:\n\n```text\nhttps://doujinstyle.com/?p=page\u0026type=1\u0026id=\u003citem_id\u003e\n```\n\nThis returns the normal HTTP that is also sent when visiting via a web browser.\n\n2. The HTTP POST request data is as follows:\n\n```json\n{\n  \"type\": \"1\",\n  \"id\": \"\u003citem_id\u003e\",\n  \"source\": \"0\",\n  \"download_link\": \"\"\n}\n```\n\nThis returns the download link linked with the item (usually Mediafire or Mega).\n\nIt can be sent to either an item URL `https://doujinstyle.com/?p=page\u0026type=1\u0026id=\u003citem_id\u003e` or directly\nthe base URL `https://doujinstyle.com/`, both seem to work.\n\nConcerning the values of the POST data:\n\n* `type`: I don't know what it means, only that sometimes, e.g., ID=6, when setting it to `1`\n  this download URL is returned:\n\n```text\nhttps://mega.nz/#!ZE5UXYIA!VYp8h5mG1_pgQA8PebVN0gEElMjNAOijtUZf-_-dxLc\n```  \n\nAnd when setting it to `2`, this one is returned:\n\n```text\nhttps://mega.nz/#!8QMF3YBI!Bj7OJnXHpfTBnr6jfY5O_k_oXVyEV8OMUpPIxH1OERM\n```  \n\nDifferent URLs, the first one seems is the good one though, that when a user clicks on the 'Download' button,\nit redirects to the same URL.\n\n* `id`: The item ID.\n\n* `source`: I don't know what it means. It is set by default to `0`. Maybe a different CDN, however when\n  set to `1` the posted URL is returned, not the download link. When set to `` (empty string), the POST\n  request still seems to function.\n\n* `download_link`: I don't know what it means. Only that it is required to exist with an empty string for\n  the download URL to be returned, otherwise, the posted URL is returned.\n\n### App Components\n\nThe app has three main components:\n\n* The `logger` which initializes an app logger.\n* The `fetcher` which does all the asynchronous requests to the website.\n* The `parser` which parses the response from the `fetcher` to get usable data.\n\nThe `fetcher` and `parser` communicate via a callback function that is called whenever the `fetcher` fetched\nthe data.\n\n## Find Highest Item ID 🗻\n\n1. Visit [doujinstyle.com](https://doujinstyle.com/) and click on the title of the latest item (top left hand corner)\n2. Copy the URL's ID following this format: `https://doujinstyle.com/?p=page\u0026type=1\u0026id=\u003citem_id\u003e`\n3. `\u003citem_id\u003e` is the latest, highest ID.\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furpagin%2Fdoujinstyle-scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Furpagin%2Fdoujinstyle-scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Furpagin%2Fdoujinstyle-scraper/lists"}