- t1 = time.time()
- centroids = k_means(signals, 100, 100)
- t2 = time.time()
- print("Took {} seconds".format(t2 - t1))
耗时 3.5 分钟多一点。很不错!但我们还想完成得更快。
k-means++ 实现
我们的下一?实现应用了 k-means++ 算法。这个算法的目标是选择更优的初始质心。让我们看看这种优化办法有没有效……
- def init_centroids(data, num_clust):
- centroids = np.zeros([num_clust, data.shape[1]])
- centroids[0,:] = data[np.random.randint(0, data.shape[0], 1)]
- for i in range(1, num_clust):
- D2 = np.min([np.linalg.norm(data - c, axis = 1)**2 for c in centroids[0:i, :]], axis = 0)
- probs = D2/D2.sum()
- cumprobs = probs.cumsum()
- ind = np.where(cumprobs >= np.random.random())[0][0]
- centroids[i, :] = np.expand_dims(data[ind], axis = 0)
- return centroids
- def k_means(data, num_clust, num_iter):
- centroids = init_centroids(data, num_clust)
- last_centroids = centroids
- for n in range(num_iter):
- closest = closest_centroids(data, centroids)
- centroids = move_centroids(data, closest, centroids)
- if not np.any(last_centroids != centroids):
- print("Early finish!")
- break
- last_centroids = centroids
- return centroids
- t1 = time.time()
- centroids = k_means(signals, 100, 100)
- t2 = time.time()
- print("Took {} seconds".format(t2 - t1))
并行实现
- import ipyparallel as ipp
推荐阅读
Tech Neo技巧沙龙 | 11月25号,九州云/ZStack与您一路商量云时代收集界线治理实践 固然 git 出生距今已有 12 年之久,网上各类关于 git 的介绍文┞仿数不堪数,然则依然有很多人(包含我本>>>详细阅读
本文标题:如何为时间序列数据优化K-均值聚类速度?
地址:http://www.17bianji.com/lsqh/38787.html
1/2 1