{"id":16510570,"url":"https://github.com/wragge/recordsearch_data_scraper","last_synced_at":"2025-11-27T16:05:43.569Z","repository":{"id":37682991,"uuid":"356488581","full_name":"wragge/recordsearch_data_scraper","owner":"wragge","description":null,"archived":false,"fork":false,"pushed_at":"2023-01-20T03:28:28.000Z","size":1246,"stargazers_count":0,"open_issues_count":1,"forks_count":1,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-02-12T23:46:56.683Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/wragge.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2021-04-10T06:06:57.000Z","updated_at":"2022-03-03T00:13:16.000Z","dependencies_parsed_at":"2023-02-11T23:20:30.667Z","dependency_job_id":null,"html_url":"https://github.com/wragge/recordsearch_data_scraper","commit_stats":null,"previous_names":[],"tags_count":6,"template":false,"template_full_name":"fastai/nbdev_template","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wragge%2Frecordsearch_data_scraper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wragge%2Frecordsearch_data_scraper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wragge%2Frecordsearch_data_scraper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/wragge%2Frecordsearch_data_scraper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/wragge","download_url":"https://codeload.github.com/wragge/recordsearch_data_scraper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":241476423,"owners_count":19968916,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-11T15:55:57.506Z","updated_at":"2025-11-27T16:05:43.524Z","avatar_url":"https://github.com/wragge.png","language":"Jupyter Notebook","readme":"RecordSearch Data Scraper\n================\n\n\u003c!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! --\u003e\n\nThe National Archives of Australia’s online database, RecordSearch,\ncontains lots of rich, historical data. Unfortunately there’s no API, so\nwe have to resort to screen scrapers to get it out in reusable form.\nThis is a library of scrapers to extract data about the main entities in\nRecordSearch – Items, Series, and Agencies – from both individual\nrecords, and search results.\n\nThe main classes are:\n\n- `RSItem()` – an individual item\n- `RSItemSearch()` – an advanced search for items\n- `RSSeries()` – an individual series\n- `RSSeriesSearch()` – an advanced search for series\n- `RSAgency()` – an individual agency\n- `RSAgencySearch()` – an advanced search for agencies\n\nRecordSearch makes use of an odd assortment of sessions, redirects, and\nhidden forms, which make scraping a challenge. Please let me know if\nsomething isn’t working as expected, as problems can be difficult to pin\ndown!\n\nThis is a replacement for the original Recordsearch_tools library. The\nmain changes are:\n\n- Requirements have been updated (dropping RoboBrowser which seems to be\n  no longer maintained)\n- The full range of search parameters are now supported for Items,\n  Series, and Agencies\n- There’s a built-in cache for improved efficiency and speed\n\nSee the\n[documentation](https://wragge.github.io/recordsearch_data_scraper/) for\nmore details. And check out the [RecordSearch\nsection](https://glam-workbench.net/recordsearch/) of the GLAM Workbench\nfor examples of what’s possible.\n\n## Install\n\n`pip install recordsearch-data-scraper`\n\n## How to use\n\nRetrieve an item using its Item ID.\n\n``` python\nfrom recordsearch_data_scraper.scrapers import *\n\nitem = RSItem('3445411')\n```\n\nView the item data.\n\n``` python\nitem.data\n```\n\n    {'title': 'WRAGGE Clement Lionel Egerton : SERN 647 : POB Cheadle England : POE Enoggera QLD : NOK  (Father) WRAGGE Clement Lindley',\n     'identifier': '3445411',\n     'series': 'B2455',\n     'control_symbol': 'WRAGGE C L E',\n     'digitised_status': True,\n     'digitised_pages': 47,\n     'access_status': 'Open',\n     'access_decision_reasons': [],\n     'location': 'Canberra',\n     'retrieved': '2021-04-25T21:12:22.620414+10:00',\n     'contents_date_str': '1914 - 1920',\n     'contents_start_date': '1914',\n     'contents_end_date': '1920',\n     'access_decision_date_str': '12 Apr 2001',\n     'access_decision_date': '2001-04-12'}\n\nSearch for items.\n\n``` python\nsearch = RSItemSearch(kw='wragge')\n```\n\nView the total number of items in the results set.\n\n``` python\nsearch.total_results\n```\n\n    209\n\nAccess the first page of results.\n\n``` python\nitems = search.get_results()\n```\n\nView the first result.\n\n``` python\nitems['results'][0]\n```\n\n    {'series': 'A2479',\n     'control_symbol': '17/1306',\n     'title': 'The Wragge Estate. Property for sale.',\n     'identifier': '149309',\n     'access_status': 'Open',\n     'location': 'Canberra',\n     'contents_date_str': '1917 - 1917',\n     'contents_start_date': '1917',\n     'contents_end_date': '1917',\n     'digitised_status': True}\n\nThe Series and Agency classes follow exactly the same pattern. See the\n[docs](https://wragge.github.io/recordsearch_data_scraper/) for more\nexamples.\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwragge%2Frecordsearch_data_scraper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fwragge%2Frecordsearch_data_scraper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fwragge%2Frecordsearch_data_scraper/lists"}