https://github.com/sezanzeb/goodreads-recommender
  
  
    Filters books in lists and shelves, or makes recommendations based on your previous reads. 
    https://github.com/sezanzeb/goodreads-recommender
  
audiobooks books goodreads reading recommendation-engine recommendation-system recommender recommender-engine recommender-system scraper
        Last synced: 6 months ago 
        JSON representation
    
Filters books in lists and shelves, or makes recommendations based on your previous reads.
- Host: GitHub
- URL: https://github.com/sezanzeb/goodreads-recommender
- Owner: sezanzeb
- License: gpl-3.0
- Created: 2024-08-10T13:19:27.000Z (about 1 year ago)
- Default Branch: main
- Last Pushed: 2024-08-18T15:33:31.000Z (about 1 year ago)
- Last Synced: 2025-04-04T02:41:21.152Z (7 months ago)
- Topics: audiobooks, books, goodreads, reading, recommendation-engine, recommendation-system, recommender, recommender-engine, recommender-system, scraper
- Language: Python
- Homepage:
- Size: 126 KB
- Stars: 1
- Watchers: 1
- Forks: 1
- Open Issues: 0
- 
            Metadata Files:
            - Readme: readme.md
- License: LICENSE
 
Awesome Lists containing this project
README
          📖 📚
Advanced Goodreads Filters
and Recommendations
I'm not happy about the possibilities for filtering and getting recommendations on
goodreads. If you know some python, you can use this library to write simple little
scripts to filter goodreads for you.
Filters books in lists and shelves, or makes recommendations based on your previous
reads. By using the custom callback for filters, you can have very powerful and
fine-grained control over which books to include and remove from your recommendations,
based on many different custom criteria. An example implementation is available with
`strict_filter`.
Downloads are cached in the `goodreads_cache` directory, so the next time you run your
script, it will be a lot faster.
If this tool stops working, please try to make a backward-compatible fix, so that old
cached files are still working, and create a pull request.
When `parse_args` of `recommend` or `bootstrap_list_service` is `True`, you can use
`--help` to display some command line options when running your script.
There are various methods for your own custom filters available in the `Book` class.
Sometimes parsing a page can fail, the scraper should usually continue doing its job
and ignore that particular page.
Requires python 3.11 or newer.
```bash
sudo apt install python3-bs4
git clone https://github.com/sezanzeb/goodreads-recommender.git
cd goodreads-recommender
pip install -e .
```
# Recommendations Based on Previous Reads
```python
#!/usr/bin/env python
from goodreads_recommender.bootstrap import recommend
from goodreads_recommender.filters.strict_filter import strict_filter
def main():
    # Cookie extracted from the browser (Firefox)
    # - Open goodreads.com and log in
    # - Go to the developer menu (F12)
    # - Go to the "Network" tab
    # - Open any page on goodreads
    # - Find the "html" request to any https://www.goodreads.com/... site
    # - Go to the "Request Headers"
    # - check the switch to view the raw headers, otherwise they are truncated
    # - Copy the value of the "Cookie" header here.
    cookie = '...'
    # I don't know if goodreads eventually blocks users who scrape their website. Use
    # at your own risk. The cookie and the `user_id` don't have to be of the same user,
    # the cookie is just needed to scrape profile pages.
    recommend(
        user_id=1234,
        cookie=cookie,
        output_file="./recommendations.txt",
        verbose=True,
        # optional:
        book_filter=strict_filter(
            important_genres=["fantasy", "adult"],
            avoid_genres=["robots", "aliens"],
            require_audiobook=True,
        ),
        number_of_recommendations=20
    )
if __name__ == "__main__":
    main()
```
The result in recommendations.txt is sorted by how much a particular book is
recommended, starting with the best one.
Truncated example output from recommendations.txt:
```
# Raw
25307.Robin_Hobb                   41452-the-farseer-trilogy (6)      77197.Assassin_s_Apprentice        1995    4.18    fantasy, fiction, h...
4763.John_Scalzi                   40789-old-man-s-war (15)           51964.Old_Man_s_War                2005    4.23    science-fiction, fi...
153394.Suzanne_Collins             73758-the-hunger-games (8)         2767052-the-hunger-games           2008    4.34    young-adult, fictio...
58.Frank_Herbert                   45935-dune (20)                    53764.The_Great_Dune_Trilogy       1979    4.37    science-fiction, fi...
...
# Filtered
25307.Robin_Hobb                   41452-the-farseer-trilogy (6)      77197.Assassin_s_Apprentice        1995    4.18    fantasy, fiction, h...
4763.John_Scalzi                   40789-old-man-s-war (15)           51964.Old_Man_s_War                2005    4.23    science-fiction, fi...
58.Frank_Herbert                   45935-dune (20)                    53764.The_Great_Dune_Trilogy       1979    4.37    science-fiction, fi...
346732.George_R_R_Martin           43790-a-song-of-ice-and-fire (11)  13496.A_Game_of_Thrones            1996    4.44    fantasy, fiction, e...
...
```

# Filtering Lists and Shelves
```python
from goodreads_recommender.bootstrap import bootstrap_list_service
from goodreads_recommender.filters.strict_filter import strict_filter
list_service = bootstrap_list_service(
    book_filter=strict_filter(
        important_genres=["fantasy", "adult"],
        avoid_genres=["robots", "aliens"],
        minimum_rating=3,
        require_audiobook=True,
    ),
    verbose=True,
    output_file="./output.txt",
)
# Add a "Fantasy" section to output.txt, using books from various lists and shelves.
# They are going to be filtered in accordance to above configuration.
# Repeat this step multiple times with different configurations to extend output.txt.
list_service.scan_books(
    name="Fantasy",
    list_ids=["176302.Best_Cozy_Fantasy_Books"],
    shelf_ids=["fantasy"],
)
# More `list_service.scan_books` calls to your hearts desire may follow. The result
# will be appended to output.txt.
```
The result in output.txt is sorted by author and series. It looks similar to the output
of recommendations, with the `name` as the healdine of each section.
## Development
```bash
sudo apt install mypy pylint black
black goodreads_recommender
pylint -E goodreads_recommender
mypy goodreads_recommender
```