Rabu, 30 April 2025

K-Means Clustering: A Practical Implementation Guide

| Rabu, 30 April 2025

Unveiling the power of unsupervised learning through a step-by-step implementation of the K-Means algorithm, transforming raw data into meaningful clusters.

1. implementation using numpy only

step 1: import numpy and matplotlib

import numpy as np
import matplotlib.pyplot as plt

*step 2:generate sample data *

# 1. Generate sample data
np.random.seed(0)## Set the random seed for reproducibility (like starting a game with the same dice roll every time)

# normal distribution N(mean,std,[rows,columns])
X = np.concatenate([np.random.normal(0, 1, (100, 2)),
                    np.random.normal(5, 1, (100, 2))])

step 3:initialize centroids randomly

k = 2 #number of clusters

#This function randomly selects k (which is 2 in this case) distinct indices (positions) from the range of 0 to the total number of data points.
#replace=False ensures that the same index is not chosen twice.
#X.shape[0] = number of rows

centroids = X[np.random.choice(X.shape[0], k, replace=False)]

step 4: interation

#K-means iterations
max_iterations = 100
for _ in range(max_iterations):
  # Assign points to nearest centroid
  distances = np.sqrt(np.sum((X[:, np.newaxis, :] - centroids)**2, axis=2))
  labels = np.argmin(distances, axis=1)

  # Update centroids
  new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])

  # Check for convergence
  if np.allclose(centroids, new_centroids):
    break

  centroids = new_centroids

step 5:plot

plt.scatter(X[:, 0], X[:, 1], c=labels)
plt.scatter(centroids[:, 0], centroids[:, 1], marker='*', s=200, c='red')
plt.title('K-means Clustering')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

2. implementation using scikit-learn


Related Posts

Tidak ada komentar:

Posting Komentar