Unveiling the power of unsupervised learning through a step-by-step implementation of the K-Means algorithm, transforming raw data into meaningful clusters.
1. implementation using numpy only
step 1: import numpy and matplotlib
import numpy as np
import matplotlib.pyplot as plt
*step 2:generate sample data *
# 1. Generate sample data
np.random.seed(0)## Set the random seed for reproducibility (like starting a game with the same dice roll every time)
# normal distribution N(mean,std,[rows,columns])
X = np.concatenate([np.random.normal(0, 1, (100, 2)),
np.random.normal(5, 1, (100, 2))])
step 3:initialize centroids randomly
k = 2 #number of clusters
#This function randomly selects k (which is 2 in this case) distinct indices (positions) from the range of 0 to the total number of data points.
#replace=False ensures that the same index is not chosen twice.
#X.shape[0] = number of rows
centroids = X[np.random.choice(X.shape[0], k, replace=False)]
step 4: interation
#K-means iterations
max_iterations = 100
for _ in range(max_iterations):
# Assign points to nearest centroid
distances = np.sqrt(np.sum((X[:, np.newaxis, :] - centroids)**2, axis=2))
labels = np.argmin(distances, axis=1)
# Update centroids
new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
# Check for convergence
if np.allclose(centroids, new_centroids):
break
centroids = new_centroids
step 5:plot
plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='*', s=200, c='red')
plt.title('K-means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()
Tidak ada komentar:
Posting Komentar