如今我们写个简单的函数,为每个PE文件结垢荷琐字典,字典的键为特点字段,其值为特点值,如许每个样本都可以表示为一个python字典对象。如下所示:
- def pe2vec():
- dataset = {}
- for subdir, dirs, files in os.walk(direct):
- for f in files:
- file_path = os.path.join(subdir, f)
- try:
- pe = pedump.PEFile(file_path)
- dataset[str(f)] = pe.Construct()
- except Exception as e:
- print e
- return dataset
- # now that we have a dictionary let's put it in a clean csv file
- def vec2csv(dataset):
- df = pd.DataFrame(dataset)
- infected = df.transpose() # transpose to have the features as columns and samples as rows
- # utf-8 is prefered
- infected.to_csv('dataset.csv', sep=',', encoding='utf-8')
接下来我们预备处理这些数据。
(二)摸索数据
这不是须要步调,但可以让你对这些数据有直不雅上的懂得。
- import pandas as pd
- import numpy as np
- import matplotlib.pyplot as plt
- malicious = pd.read_csv("bucket-set.csv")
- clean = pd.read_csv("clean-set.csv"
推荐阅读
众所周知,Python的并行处理才能很不睬想。我认为如不雅不推敲线程和GIL的标准参数(它们大年夜多是合法的),其原因不是因为技巧不到位,而是我们的应用办法不恰当。大年夜多半关于Python线>>>详细阅读
本文标题:机器学习在恶意软件检测中的应用
地址:http://www.17bianji.com/lsqh/34783.html
1/2 1