https://github.com/lemonhu/finance-qa-spider
金融问答平台文本数据采集/爬取,数据源涉及上交所,深交所,全景网及新浪股吧
https://github.com/lemonhu/finance-qa-spider
deprecated-repo finance python2 question-answering scrapy-crawler
Last synced: 7 months ago
JSON representation
金融问答平台文本数据采集/爬取,数据源涉及上交所,深交所,全景网及新浪股吧
- Host: GitHub
- URL: https://github.com/lemonhu/finance-qa-spider
- Owner: lemonhu
- Created: 2017-08-20T06:59:16.000Z (almost 9 years ago)
- Default Branch: master
- Last Pushed: 2017-08-20T07:16:50.000Z (almost 9 years ago)
- Last Synced: 2025-06-29T21:02:57.222Z (11 months ago)
- Topics: deprecated-repo, finance, python2, question-answering, scrapy-crawler
- Language: Python
- Homepage:
- Size: 10.7 KB
- Stars: 39
- Watchers: 2
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# 基于Scrapy框架的金融问答文本数据库建设
---
## 开发语言
Python
## 开发平台
Eclipse+Pydev
## 数据来源
1. 上交所官方平台的问答系统
http://sns.sseinfo.com/qa.do
2. 深交所官方平台的问答系统
http://irm.cninfo.com.cn/szse/index.html
3. 全景网投资者关系互动平台
http://rs.p5w.net/index/company/showQuestionPage.shtml
4. 新浪股吧
http://guba.sina.com.cn/?s=channel&chi
## 数据库表shse_qa
mysql> CREATE TABLE IF NOT EXISTS `shse_qa`(
-> `current_time` TIMESTAMP NOT NULL,
-> `user_name` VARCHAR(100) NOT NULL,
-> `company_name` VARCHAR(100) NOT NULL,
-> `company_id` int(20) NOT NULL,
-> `question_time` VARCHAR(100) NOT NULL,
-> `question_content` text NOT NULL,
-> `answer_time` VARCHAR(100),
-> `answer_content` text
-> )ENGINE=InnoDB DEFAULT CHARSET=utf8;