{"id":26943522,"url":"https://github.com/jim-by/scrape_analysis_books","last_synced_at":"2026-04-09T11:37:51.590Z","repository":{"id":285549666,"uuid":"958520735","full_name":"Jim-by/scrape_analysis_books","owner":"Jim-by","description":"Scraping book data from the website books.toscrape.com and performing analysis on the collected data.","archived":false,"fork":false,"pushed_at":"2025-04-01T10:45:20.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-04-01T11:32:08.128Z","etag":null,"topics":["beautifulsoup","json","matplotlib","numpy","pandas","python","scipy","scraping","seaborn"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Jim-by.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-04-01T10:32:25.000Z","updated_at":"2025-04-01T10:45:24.000Z","dependencies_parsed_at":"2025-04-01T11:42:15.327Z","dependency_job_id":null,"html_url":"https://github.com/Jim-by/scrape_analysis_books","commit_stats":null,"previous_names":["jim-by/scrape_analysis_books"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jim-by%2Fscrape_analysis_books","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jim-by%2Fscrape_analysis_books/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jim-by%2Fscrape_analysis_books/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Jim-by%2Fscrape_analysis_books/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Jim-by","download_url":"https://codeload.github.com/Jim-by/scrape_analysis_books/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246856687,"owners_count":20844974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","json","matplotlib","numpy","pandas","python","scipy","scraping","seaborn"],"created_at":"2025-04-02T17:15:42.726Z","updated_at":"2025-12-30T23:11:02.459Z","avatar_url":"https://github.com/Jim-by.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Books Scraping and Analysis Project\n## Overview\nThis project involves scraping book data from the website books.toscrape.com and performing analysis on the collected data. The project consists of two main parts: data scraping and data analysis.\n\n## Data Scraping\nThe data scraping part of the project uses Python libraries requests and BeautifulSoup to extract book data from the website. The scraped data includes book title, price, availability, and description. The data is stored in a JSON file named books.json.\n\n## Data Analysis\nThe data analysis part of the project uses Python libraries pandas, numpy, matplotlib, and seaborn to analyze the collected data. The analysis includes:\n\n* Overview of the data\n* Handling missing values\n* Price analysis (average, median, highest, and * lowest prices)\n* Availability analysis (average and median availability)\n* Category analysis (number of books by category)\n* Price distribution by category\n* Correlation analysis between price and availability\n* Top 5 most expensive and cheapest books\n\n## Requirements\nTo run the project, you need to install the following Python libraries:\n\nrequests\nbeautifulsoup4\npandas\nnumpy\nmatplotlib\nseaborn\nscipy\n\nYou can install the libraries using pip:\nbash pip install -r requirements.txt\n\n## Usage\n1. Clone the repository: git clone https://github.com/your-username/books-scraping-analysis.git\n2. Navigate to the project directory: cd books-scraping-analysis\n3. Run the data scraping script: python src/scraping.py\n4. Run the data analysis script: python src/analysis.py\n\n## Results\nThe results of the analysis are stored in the data/analysis directory. The results include:\n\n* cleaned_books.csv: cleaned data in CSV format\n* price_distribution.png: histogram of price distribution\n* books_by_category.png: bar chart of number of books by category\n* price_by_category.png: box plot of price distribution by category\n* price_vs_availability.png: scatter plot of correlation between price and availability\n\n## License\nThis project is licensed under the MIT License. See the LICENSE file for details.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjim-by%2Fscrape_analysis_books","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjim-by%2Fscrape_analysis_books","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjim-by%2Fscrape_analysis_books/lists"}