{"id":20710033,"url":"https://github.com/oxylabs/python-cache-tutorial","last_synced_at":"2025-07-25T19:41:58.521Z","repository":{"id":166317522,"uuid":"641774075","full_name":"oxylabs/python-cache-tutorial","owner":"oxylabs","description":"A guide to caching web scraping scripts in Python.","archived":false,"fork":false,"pushed_at":"2025-02-11T13:03:34.000Z","size":427,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-02-11T14:22:57.220Z","etag":null,"topics":["api-python","cache","caching","jason-database-python","json-database","python","python-ecommerce","python-image-scraper","python-web-crawler","python-web-scraper","python-web-scraping","scraper","web-scraping-python"],"latest_commit_sha":null,"homepage":"https://oxylabs.io/blog/python-cache-how-to-use-effectively","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oxylabs.png","metadata":{"files":{"readme":"readme.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-17T06:26:37.000Z","updated_at":"2025-02-11T13:03:38.000Z","dependencies_parsed_at":"2024-04-19T11:56:13.372Z","dependency_job_id":null,"html_url":"https://github.com/oxylabs/python-cache-tutorial","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fpython-cache-tutorial","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fpython-cache-tutorial/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fpython-cache-tutorial/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oxylabs%2Fpython-cache-tutorial/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oxylabs","download_url":"https://codeload.github.com/oxylabs/python-cache-tutorial/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242980784,"owners_count":20216285,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["api-python","cache","caching","jason-database-python","json-database","python","python-ecommerce","python-image-scraper","python-web-crawler","python-web-scraper","python-web-scraping","scraper","web-scraping-python"],"created_at":"2024-11-17T02:09:37.029Z","updated_at":"2025-07-25T19:41:58.505Z","avatar_url":"https://github.com/oxylabs.png","language":"Python","readme":"# Python Cache: How to Speed Up Your Code with Effective Caching\n\n[![Oxylabs promo code](https://raw.githubusercontent.com/oxylabs/product-integrations/refs/heads/master/Affiliate-Universal-1090x275.png)](https://oxylabs.io/pages/gitoxy?utm_source=877\u0026utm_medium=affiliate\u0026groupid=877\u0026utm_content=python-cache-tutorial-github\u0026transaction_id=102f49063ab94276ae8f116d224b67)\n\n[![](https://dcbadge.vercel.app/api/server/eWsVUJrnG5)](https://discord.gg/GbxmdGhZjq)\n\n  * [How to implement a cache in Python](#how-to-implement-a-cache-in-python)\n    + [Install the required libraries](#install-the-required-libraries)\n    + [Method 1: Python caching using a manual decorator](#method-1-python-caching-using-a-manual-decorator)\n    + [Method 2: Python caching using LRU cache decorator](#method-2-python-caching-using-lru-cache-decorator)\n    + [Performance comparison](#performance-comparison)\n\nThis article will show you how to use caching in Python with your web\nscraping tasks. You can read the [\u003cu\u003efull\narticle\u003c/u\u003e](https://oxylabs.io/blog/python-cache-how-to-use-effectively)\non our blog, where we delve deeper into the different caching\nstrategies.\n\n## How to implement a cache in Python\n\nThere are different ways to implement caching in Python for different\ncaching strategies. Here we’ll see two methods of Python caching for a\nsimple web scraping example. If you’re new to web scraping, take a look\nat our [\u003cu\u003estep-by-step Python web scraping\nguide\u003c/u\u003e](https://oxylabs.io/blog/python-web-scraping).\n\n### Install the required libraries\n\nWe’ll use the [\u003cu\u003erequests\nlibrary\u003c/u\u003e](https://pypi.org/project/requests/) to make HTTP requests\nto a website. Install it with\n[\u003cu\u003epip\u003c/u\u003e](https://pypi.org/project/pip/) by entering the following\ncommand in your terminal:\n\n```bash\npython -m pip install requests\n```\n\nOther libraries we’ll use in this project, specifically `time` and\n`functools`, come natively with Python 3.11.2, so you don’t have to\ninstall them.\n\n### Method 1: Python caching using a manual decorator\n\nA [\u003cu\u003edecorator\u003c/u\u003e](https://peps.python.org/pep-0318/) in Python is a\nfunction that accepts another function as an argument and outputs a new\nfunction. We can alter the behavior of the original function using a\ndecorator without changing its source code.\n\nOne common use case for decorators is to implement caching. This\ninvolves creating a dictionary to store the function's results and then\nsaving them in the cache for future use.\n\nLet’s start by creating a simple function that takes a URL as a function\nargument, requests that URL, and returns the response text:\n\n```python\ndef get_html_data(url):\n    response = requests.get(url)\n    return response.text\n```\n\nNow, let's move toward creating a memoized version of this function:\n\n```python\ndef memoize(func):\n    cache = {}\n\n    def wrapper(*args):\n        if args in cache:\n            return cache[args]\n        else:\n            result = func(*args)\n            cache[args] = result\n            return result\n\n    return wrapper\n\n\n@memoize\ndef get_html_data_cached(url):\n    response = requests.get(url)\n    return response.text\n```\n\nThe `wrapper` function determines whether the current input arguments have\nbeen previously cached and, if so, returns the previously cached result.\nIf not, the code calls the original function and caches the result\nbefore being returned. In this case, we define a `memoize` decorator that\ngenerates a `cache` dictionary to hold the results of previous function\ncalls.\n\nBy adding `@memoize` above the function definition, we can use the memoize\ndecorator to enhance the `get_html_data` function. This generates a new\nmemoized function that we’ve called `get_html_data_cached`. It only makes\na single network request for a URL and then stores the response in the\ncache for further requests.\n\nLet’s use the `time` module to compare the execution speeds of the\n`get_html_data` function and the memoized `get_html_data_cached` function:\n\n```python\nimport time\n\n\nstart_time = time.time()\nget_html_data('https://books.toscrape.com/')\nprint('Time taken (normal function):', time.time() - start_time)\n\n\nstart_time = time.time()\nget_html_data_cached('https://books.toscrape.com/')\nprint('Time taken (memoized function using manual decorator):', time.time() - start_time)\n```\n\nHere’s what the complete code looks like:\n\n```python\n# Import the required modules\nfrom functools import lru_cache\nimport time\nimport requests\n\n\n# Function to get the HTML Content\ndef get_html_data(url):\n    response = requests.get(url)\n    return response.text\n\n\n# Memoize function to cache the data\ndef memoize(func):\n    cache = {}\n\n    # Inner wrapper function to store the data in the cache\n    def wrapper(*args):\n        if args in cache:\n            return cache[args]\n        else:\n            result = func(*args)\n            cache[args] = result\n            return result\n\n    return wrapper\n\n\n# Memoized function to get the HTML Content\n@memoize\ndef get_html_data_cached(url):\n    response = requests.get(url)\n    return response.text\n\n\n# Get the time it took for a normal function\nstart_time = time.time()\nget_html_data('https://books.toscrape.com/')\nprint('Time taken (normal function):', time.time() - start_time)\n\n# Get the time it took for a memoized function (manual decorator)\nstart_time = time.time()\nget_html_data_cached('https://books.toscrape.com/')\nprint('Time taken (memoized function using manual decorator):', time.time() - start_time)\n```\n\nAnd here’s the output:\n\n![](images/output_normal_memoized.png)\n\nNotice the time difference between the two functions. Both take almost\nthe same time, but the supremacy of caching lies behind the re-access.\n\nSince we’re making only one request, the memoized function also has to\naccess data from the main memory. Therefore, with our example, a\nsignificant time difference in execution isn’t expected. However, if you\nincrease the number of calls to these functions, the time difference\nwill significantly increase (see [\u003cu\u003ePerformance\nComparison\u003c/u\u003e](#performance-comparison)). \n\n### Method 2: Python caching using LRU cache decorator\n\nAnother method to implement caching in Python is to use the built-in\n`@lru_cache` decorator from `functools`. This decorator implements cache\nusing the least recently used (LRU) caching strategy. This LRU cache is\na fixed-size cache, which means it’ll discard the data from the cache\nthat hasn’t been used recently.\n\nTo use the `@lru_cache` decorator, we can create a new function for\nextracting HTML content and place the decorator name at the top. Make\nsure to import the `functools` module before using the decorator: \n\n```python\nfrom functools import lru_cache\n\n\n@lru_cache(maxsize=None)\ndef get_html_data_lru(url):\n    response = requests.get(url)\n    return response.text\n```\n\nIn the above example, the `get_html_data_lru` method is memoized using the\n`@lru_cache` decorator. The cache can grow indefinitely when the `maxsize`\noption is set to `None`.\n\nTo use the `@lru_cache` decorator, just add it above the `get_html_data_lru`\nfunction. Here’s the complete code sample:\n\n```python\n# Import the required modules\nfrom functools import lru_cache\nimport time\nimport requests\n\n\n# Function to get the HTML Content\ndef get_html_data(url):\n    response = requests.get(url)\n    return response.text\n\n\n# Memoized using LRU Cache\n@lru_cache(maxsize=None)\ndef get_html_data_lru(url):\n    response = requests.get(url)\n    return response.text\n\n\n# Get the time it took for a normal function\nstart_time = time.time()\nget_html_data('https://books.toscrape.com/')\nprint('Time taken (normal function):', time.time() - start_time)\n\n# Get the time it took for a memoized function (LRU cache)\nstart_time = time.time()\nget_html_data_lru('https://books.toscrape.com/')\nprint('Time taken (memoized function with LRU cache):', time.time() - start_time)\n```\n\nThis produced the following output:\n\n![](images/output_normal_lru.png)\n\n### Performance comparison\n\nIn the following table, we’ve determined the execution times of all\nthree functions for different numbers of requests to these functions:\n\n| **No. of requests** | **Time taken by normal function** | **Time taken by memoized function (manual decorator)** | **Time taken by memoized function (lru_cache decorator)** |\n|---------------------|-----------------------------------|--------------------------------------------------------|-----------------------------------------------------------|\n| 1                   | 2.1 Seconds                       | 2.0 Seconds                                            | 1.7 Seconds                                               |\n| 10                  | 17.3 Seconds                      | 2.1 Seconds                                            | 1.8 Seconds                                               |\n| 20                  | 32.2 Seconds                      | 2.2 Seconds                                            | 2.1 Seconds                                               |\n| 30                  | 57.3 Seconds                      | 2.22 Seconds                                           | 2.12 Seconds                                              |\n\nAs the number of requests to the functions increases, you can see a\nsignificant reduction in execution times using the caching strategy. The\nfollowing comparison chart depicts these results:\n\n![](images/comparison-chart.png)\n\nThe comparison results clearly show that using a caching strategy in\nyour code can significantly improve overall performance and speed.\n\nFeel free to visit our [\u003cu\u003eblog\u003c/u\u003e](https://oxylabs.io/blog) for an\narray of intriguing web scraping topics that will keep you hooked!\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fpython-cache-tutorial","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foxylabs%2Fpython-cache-tutorial","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foxylabs%2Fpython-cache-tutorial/lists"}