正样本可分为训练集,测试集,验证集,按数量需求批量得到相应的样本是要解决的问题
把所有正样本放在all.txt中,然后按照一定的样本比例得到相应的其他文件。
代码:
#!/usr/bin/env python# -*- coding: UTF-8 -*-from numpy.matlib import random'''edited by zr 2017/6/3 '''flpath = '/home/zr/projects/all.txt'fpath = open(flpath)dataset = []for line in fpath.readlines(): dataset.append(line.strip())random.shuffle(dataset)posnm = int((len(dataset))*0.75)posset = dataset[:posnm]negset = dataset[posnm:]f1=open('/home/zr/projects/pos.txt','w')for name in posset: f1.write(name+'\n')f1.closedf2=open('/home/zr/projects/neg.txt','w')for name in negset: f2.write(name+'\n')f2.closed
会自动在相应文件夹内创建pos.txt和neg.txt.