{"id":25977020,"url":"https://github.com/insectmk/douban-crawler","last_synced_at":"2026-03-10T05:03:00.481Z","repository":{"id":276262511,"uuid":"928749490","full_name":"insectmk/douban-crawler","owner":"insectmk","description":"豆瓣电影Top250爬虫及数据展示","archived":false,"fork":false,"pushed_at":"2025-02-11T09:30:33.000Z","size":15998,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-03-05T04:32:30.030Z","etag":null,"topics":["analysis","crawler","django","echarts","mysql","python3","website"],"latest_commit_sha":null,"homepage":"","language":"CSS","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/insectmk.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2025-02-07T07:01:32.000Z","updated_at":"2025-02-11T09:27:50.000Z","dependencies_parsed_at":"2025-02-07T07:39:39.171Z","dependency_job_id":null,"html_url":"https://github.com/insectmk/douban-crawler","commit_stats":null,"previous_names":["insectmk/douban-crawler"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/insectmk/douban-crawler","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insectmk%2Fdouban-crawler","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insectmk%2Fdouban-crawler/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insectmk%2Fdouban-crawler/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insectmk%2Fdouban-crawler/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/insectmk","download_url":"https://codeload.github.com/insectmk/douban-crawler/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/insectmk%2Fdouban-crawler/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":30325599,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-03-10T01:36:58.598Z","status":"online","status_checked_at":"2026-03-10T02:00:06.579Z","response_time":106,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analysis","crawler","django","echarts","mysql","python3","website"],"created_at":"2025-03-05T04:29:56.167Z","updated_at":"2026-03-10T05:03:00.463Z","avatar_url":"https://github.com/insectmk.png","language":"CSS","funding_links":[],"categories":[],"sub_categories":[],"readme":"# 豆瓣爬虫\n\n## 项目说明\n\n### 功能\n\n爬取数据后存储到MySQL数据库中，可再次爬取刷新\n\n爬取豆瓣Top250电影数据并展示到网站中\n\n数据展示包括**表格**、**图表**、**词云图**\n\n页面及功能设计参考[Douban_Flask](https://gitee.com/xiaobai_long/Douban_Flask)\n\n### 架构\n\n前端使用BootStrap5、JQuery、Vue3、Element-Plus\n\n后端采用Django、Django Template\n\n数据库采用MySQL\n\n**软件版本**\n\n- Python`3.13.2`\n- Django`5.1.6`\n- MySQL`8.0`\n\n## 页面展示\n\n![image-20250211104206291](./README.assets/image-20250211104206291.png)\n\n![image-20250211104416642](./README.assets/image-20250211104416642.png)\n\n![image-20250211104450561](./README.assets/image-20250211104450561.png)\n\n![image-20250211104513225](./README.assets/image-20250211104513225.png)\n\n## 运行\n\n1. 数据库配置\n\n   安装[MySQL8.0](https://dev.mysql.com/downloads/mysql/8.0.html)，可以使用以下脚本创建数据库与用户\n\n   ```sql\n   -- 创建数据库 douban_crawler，字符集为 utf8mb4\n   CREATE DATABASE IF NOT EXISTS douban_crawler CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;\n   \n   -- 创建用户 douban_crawler，密码为 douban_crawler\n   CREATE USER 'douban_crawler'@'%' IDENTIFIED BY 'douban_crawler';\n   \n   -- 授予 douban_crawler 用户对 douban_crawler 数据库的所有权限\n   GRANT ALL PRIVILEGES ON douban_crawler.* TO 'douban_crawler'@'%';\n   \n   -- 刷新权限\n   FLUSH PRIVILEGES;\n   ```\n\n   修改`douban_crawler/douban_crawler/settings.py`，填入数据库信息\n\n   ```python\n   \"\"\"\n   时区设置\n   \"\"\"\n   LANGUAGE_CODE = 'zh-hans'\n   \n   TIME_ZONE = 'Asia/Shanghai'\n   \n   USE_I18N = True\n   \n   USE_TZ = True\n   \n   # 数据库\n   DATABASES = {\n       # 默认使用MySQL\n       'default': {\n           'ENGINE': 'django.db.backends.mysql',\n           'NAME': 'douban_crawler',\n           'USER': 'douban_crawler',\n           'PASSWORD': 'douban_crawler',\n           'HOST': 'localhost',\n           'PORT': '3306',\n       }\n   }\n   ```\n\n   \n\n2. 安装依赖：运行以下命令\n\n   ```bash\n   pip install -r requirements.txt\n   ```\n\n3. 初始化数据库：进入到`douban_crawle`目录，执行以下命令\n\n   ```bash\n   python manage.py migrate\n   ```\n\n4. 启用项目：进入到`douban_crawle`目录，执行以下命令\n\n   ```bash\n   python manage.py runserver\n   ```\n\n   \n\n## 其他\n\n### 403问题\n\n豆瓣网站会检测爬虫，如果被检测了会报错403，需要在请求头配置上cookie信息，编辑`douban_crawler/main/config.py`：\n\n```\n# 爬虫请求头\nCRAWLER_HEADERS = {\n    \"User-Agent\": \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/132.0.0.0 Safari/537.36\",\n    \"Cookie\": '''你的cookie信息'''\n}\n```\n\n**cookie信息获取**：登录[豆瓣](https://movie.douban.com/)后，在浏览器控制台查看：\n\n![image-20250211112636547](./README.assets/image-20250211112636547.png)\n\n### pip\n\n```bash\n# 生成requirements.txt\npip freeze \u003erequirements.txt\n# 下载依赖\npip install -r requirements.txt\n```\n\n## 参考文档\n\n[Douban_Flask：豆瓣电影TOP250数据分析](https://gitee.com/xiaobai_long/Douban_Flask)\n\n[Python数据爬取超详细讲解（零基础入门，老年人都看的懂）](https://blog.csdn.net/bookssea/article/details/107309591)\n\n[python轻松入门——爬取豆瓣Top250时出现403报错（418报错，爬虫）](https://blog.csdn.net/weixin_42710807/article/details/121187996)\n\n[如何管理静态文件（如图片、JavaScript、CSS）](https://docs.djangoproject.com/zh-hans/5.1/howto/static-files/)\n\n[djiango官方文档](https://docs.djangoproject.com/zh-hans/5.1/)\n\n[手把手教你使用Django如何连接Mysql](https://developer.aliyun.com/article/1458957)\n\n[BooStrap5官方文档](https://getbootstrap.com/docs/5.3/getting-started/introduction/)\n\n[Echarts图表](https://echarts.apache.org/zh/index.html)\n\n[Django 内置模板标签和过滤器](https://docs.djangoproject.com/zh-hans/5.1/ref/templates/builtins/)\n\n[Vue3官方文档](https://cn.vuejs.org/guide/introduction.html)\n\n[Element-Plus官方文档](https://element-plus.org/zh-CN/guide/design.html)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finsectmk%2Fdouban-crawler","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Finsectmk%2Fdouban-crawler","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Finsectmk%2Fdouban-crawler/lists"}