K-Means Clustering — Algorithm & How It Works

K-means clustering is a way to group numeric data into $k$ clusters. If you choose $k$ and use the standard Euclidean version, the algorithm repeats one loop: assign each point to the nearest center, then move each center to the mean of the points assigned to it.

In plain language, it tries to make points in the same group look close to one another and points in different groups look farther apart. It is fast and useful, but it only works well when those "groups" are reasonably compact and distance is meaningful.

What k-means clustering is optimizing

In the standard Euclidean form, k-means tries to make points inside each cluster as close as possible to that cluster's centroid. A common objective is

\sum_{j=1}^{k} \sum_{x_i \in C_j} \|x_i - \mu_j\|^2

Here, $C_j$ is the $j$ th cluster and $\mu_j$ is its centroid.

This is the within-cluster sum of squares. Smaller values mean the assigned points sit more tightly around their centroids.

That objective explains the two parts of the algorithm:

If the centroids are fixed, the best move is to assign each point to its nearest centroid.
If the assignments are fixed, the best centroid is the mean of the assigned points.

That is why the update rule is not arbitrary. The "means" in k-means are literal arithmetic means.

How the k-means algorithm works

The usual loop is short:

Choose $k$ initial centroids.
Assign each point to the nearest centroid.
Recompute each centroid as the mean of its assigned points.
Repeat until assignments stop changing or the improvement is very small.

This process usually converges quickly, but not necessarily to the best possible clustering. Different starting centroids can lead to different final answers, so initialization matters.

K-means clustering example step by step

Take these one-dimensional data points:

1,\ 2,\ 3,\ 10,\ 11,\ 12

Suppose we want $k = 2$ clusters and we start with centroids at $1$ and $10$ . This is a good example because the centroids actually move after the first update.

Step 1: assign points to the nearest centroid

Points $1, 2, 3$ are closer to $1$ .

Points $10, 11, 12$ are closer to $10$ .

So the clusters are

C_1 = \{1,2,3\}, \qquad C_2 = \{10,11,12\}

Step 2: update the centroids

The new centroid of the first cluster is

\mu_1 = \frac{1+2+3}{3} = 2

The new centroid of the second cluster is

\mu_2 = \frac{10+11+12}{3} = 11

Both centroids moved, from $1$ to $2$ and from $10$ to $11$ .

Step 3: assign again

Now check the nearest centroid again using $2$ and $11$ .

Points $1, 2, 3$ still belong to the first cluster, and points $10, 11, 12$ still belong to the second cluster. Because the assignments no longer change, the algorithm has converged.

This is a clean example because the data naturally separates into two compact groups. Real datasets are messier, which is where k-means can start to mislead you.

When k-means works well

K-means works best when these conditions are roughly true:

The features are numeric.
Euclidean distance is a reasonable way to measure similarity.
The clusters are fairly compact, not long or curved.
The features have been scaled so one variable does not dominate the rest.

If those conditions fail, the output can still look tidy while missing the real structure in the data.

Common k-means mistakes

Treating k-means as a universal clustering method

K-means works best when clusters are reasonably compact and the mean is a sensible summary. It is not a good default for every dataset.

Ignoring feature scaling

If one feature is measured on a much larger scale than another, that feature can dominate the distance calculation. Standardizing or normalizing features is often important before running k-means.

Assuming the answer is unique

K-means can converge to different local minima from different starting points. That is why repeated runs or methods like k-means++ initialization are commonly used.

Using non-numeric or poorly encoded features

Because centroids are means, standard k-means is built for numeric variables. If a feature is categorical, taking an arithmetic mean may not make sense.

Using it on strongly non-spherical clusters

If the true groups are long, curved, or very uneven in density, k-means may split one natural group or merge two different ones. The method prefers compact, centroid-based clusters.

Forgetting that outliers can pull centroids

Because centroids are means, extreme values can shift them noticeably. If outliers are important in your data, check this before trusting the result.

Where k-means clustering is used

K-means is often used for exploratory grouping, customer or behavior segmentation, image color quantization, and as a quick baseline in unsupervised learning.

It is most useful when you have numeric features, want a fast simple model, and expect clusters that are roughly compact in Euclidean space.

A simple mental model

Imagine placing $k$ movable pins on a scatterplot. Every point attaches to the nearest pin. Then each pin slides to the average location of the points attached to it. Repeat that until the pins stop moving much.

That picture is not just intuition. It is almost the full algorithm.

Try a similar clustering problem

Take a small set of points on a line, pick $k = 2$ , and run one full assign-update cycle by hand. Then change the starting centroids or add one outlier and see how the result changes. If you want to go one step further, try your own version on a small dataset and compare what happens before and after feature scaling.

Need help with a problem?

Upload your question and get a verified, step-by-step solution in seconds.

Open GPAI Solver →