{"id":18860351,"url":"https://github.com/lining0806/ridgecvtest","last_synced_at":"2025-10-04T01:41:52.989Z","repository":{"id":80295353,"uuid":"126132213","full_name":"lining0806/ridgecvtest","owner":"lining0806","description":"量化交易股票预测系统","archived":false,"fork":false,"pushed_at":"2018-03-21T06:40:05.000Z","size":90,"stargazers_count":40,"open_issues_count":0,"forks_count":15,"subscribers_count":3,"default_branch":"master","last_synced_at":"2025-03-28T01:50:04.975Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lining0806.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2018-03-21T06:17:44.000Z","updated_at":"2025-03-16T12:17:20.000Z","dependencies_parsed_at":null,"dependency_job_id":"c11478b8-b1d6-4054-9395-f0e704496424","html_url":"https://github.com/lining0806/ridgecvtest","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lining0806%2Fridgecvtest","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lining0806%2Fridgecvtest/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lining0806%2Fridgecvtest/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lining0806%2Fridgecvtest/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lining0806","download_url":"https://codeload.github.com/lining0806/ridgecvtest/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248881304,"owners_count":21176828,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-08T04:23:40.834Z","updated_at":"2025-10-04T01:41:47.919Z","avatar_url":"https://github.com/lining0806.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"optforwardtest system\n=====================\n\n\ncommand:  \n\n    python optforwardtest.py -i ./data/IF.csv -of ./results/results.csv -op ./results/results.png -nj 3 -lp 10 -dn 1 -cl knn\nhelp:  \n    \n    -i the input file\n    -of the output file which records time index, predict label, and close\n    -op the output file which describes sigsum and accuracy\n    -nj number of jobs to run in parallel\n    -lp length of periods to predict, not number of points to predict\n    -dn length of periods to shift\n    -cl classifier\n\t\n**'''1. DATA PREPARING'''**  \nkey point:  \n\n    rs_num and nrows:\n    \tyou can change rs_num and nrows to determine the beginning and last time for the csv file reading \n    resample_time:\n    \tyou can change resample time, by time-resampling way or point-resampling way\n    zero_propotion:\n    \tyou can change the variable, which describes the propotion of zero, to calculate the train data label\n\t\t\n**'''2. FEATURE EXTRACTION'''**  \nkey point:\n \n\tfeatures: \n\twindow_size:\n\t\tlength of data default, you can change it if you like\n\ttest_size:\n\t\tdiff_n default, or greater than diff_n\n\t\t\n**'''3. CLASSIFICATION'''**  \n* in this stage, you can achivement classification by multi_feature way or multi_classification way  \n* in multi_feature way, different kinds of features may be gernerated together for one classifier  \n* in multi_classification way, each classifier may output results, and you can combine them by vote or other  \n* a simple classifier named mean classifier is added, based on the assumption that next close is the mean close of current window including current point\n\n\u003e\n窗口长度是window_size，这里的diff_n即timeshift  \ndiff_n的取值可以是1,2,3,4,5等等，也就是说如果diff_n等于2的时候，在训练数据定义label的时候，是指当前close与其后第2个点close的对比做出的一个方向。  \n换句话说，当前的close的label利用了未来第2个点的close信息。  \n所以我们在划分trainset和testset的时候，test_size必须大于等于diff_n才行，才能保证trainset不会偷看testset之后的close信息。（由此得出test_size=diff_n）  \n而我们要预测的是testset的label，特别是test[-1]。这就意味着，test[-1]得到的label，我们可以知道其后第2个点的close的升降。  \n所以窗口滑动的时候，要保证每次都取到当前预测点其后第2个点。（由此得出step=diff_n）  \n这样写入文件的，是每隔2分钟的预测点的close，index和预测的label。我们可以根据这些信息画出sigsum曲线来。  \n\u003e\n而resampletime是我们每次读取窗口内数据的采样，对于最后一个点test[-1]来说，意味着前面数据的稀释。我们采样的目的是，数据可能冗余太多，所以要按照resampletime采样。  \n但是对于采样后的窗口来说，里面数据的label定义还是基于diff_n来做的，也就是一个点的label是当前close与其后第2个点的close的对比做出的一个方向。  \n\n\n**前推的思路：**\n\n    先截取原始数据，窗口步长为diff_n，因为预测的是后diff_n的方向。\n    再对截取的片断采样。\n    注意计算label时候采用periods=int(diff_n/resample_time)，并生成特征。\n    为了保证训练集相邻点之间的特征计算，resample_time应该与diff_n一致。\n    测试集为int(diff_n/resample_time)。\n\n**不前推的思路：** \n\n    先采样原始数据。\n    注意计算label时候采用periods=int(diff_n/resample_time)，并生成特征。\n    为了保证训练集相邻点之间的特征计算，resample_time应该与diff_n一致。\n    再截取生成特征，窗口步长为int(diff_n/resample_time)，因为预测的是后diff_n的方向。\n    测试集为int(diff_n/resample_time)。\n\n\n## The Default Project for LiNing\n\n##### Please Do Not make any change without permission~\n\n### 一般修改的地方：\n\n    file_path\n    size1\n    size2\n    resample_num_list\n    timeshift_num_list\n    results_dir\n\tdefault_nrows\n    whether timeshift equals resample or not\n    num\n    dynamic\n    \n    step\n    target\n    maxlag = [30, 90]\n    windowsize = [50000, 900000]\n    threshold = [95, 100]\n    search = {\n        'algorithm':{'ridgecv':None},\n        # 'algorithm':{'ridgecv':None,'elasticnetcv':None,'knnreg':None,'linearsvr':None},\n        'maxlag':maxlag,\n        'windowsize':windowsize,\n        'threshold':threshold,\n        }\n    num_evals = 100\n    optunity.maximize or optunity.maximize_structured\n\n### Optunity Revision Note\n\n\t一直遇到的一个问题就是：\n\top开启了比如30个进程，其实真正在计算的并没有这么多，很多同名进程都在挂载等待。\n\t\n\t真正的原因还在于optunity内部：\n\t在PS0源码中suggest_from_box定义，有\n\td = dict(kwargs)\n\tif num_evals \u003e 1000:\n\t\td['num_particles'] = 100\n\telif num_evals \u003e= 200:\n\t\td['num_particles'] = 20\n\telif num_evals \u003e= 10:\n\t\td['num_particles'] = 10\n\telse:\n\t\td['num_particles'] = num_evals\n\td['num_generations'] = int(math.ceil(float(num_evals) / d['num_particles']))\n\treturn d\n\t\n\t在PS0源码中optimize定义，有\n\tpop = [self.generate() for _ in range(self.num_particles)]\n\tbest = None\n\t\n\tfor g in range(self.num_generations):\n\t\tfitnesses = pmap(evaluate, list(map(self.particle2dict, pop)))\n\t\n\t由此，可以看到，其实真正运行的进程数，不与num_evals相关，也不与你开启的进程数相关，而是与num_particles相关：它是对每个粒子的分支进行map操作！！！！\n\tnum_evals为3，同一时间调优就是3个，也就占3个进程，所以开30个进程的话，会出现27个等待进程\n\tnum_evals为30，同一时间调优就是10个，也就占10个进程，所以开30个进程的话，会出现20个等待进程\n\tnum_evals为300，同一时间调优就是20个，也就占20个进程，所以开30个进程的话，会出现10个等待进程\n\t但是总共寻优的次数是与num_evals相关的，可能会略小于num_evals。\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flining0806%2Fridgecvtest","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flining0806%2Fridgecvtest","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flining0806%2Fridgecvtest/lists"}