Why we chose UMAP over PCA, t-SNE, and other algorithms for projecting word embeddings to 3D

Dimensionality Reduction: Why We Use UMAP

The Challenge

Word Space 3D needs to project 512-dimensional word embeddings into a 3D space that players can see and navigate. This is a classic dimensionality reduction problem, and there are several algorithms we could use:

PCA (Principal Component Analysis)
t-SNE (t-distributed Stochastic Neighbor Embedding)
Laplacian Eigenmaps (Spectral Embedding)
UMAP (Uniform Manifold Approximation and Projection)

We chose UMAP. Here's why.

The Algorithms Compared

PCA (Principal Component Analysis)

How it works: Finds the directions (principal components) that capture the most variance in the data. Projects data onto these directions.

Pros:

Very fast
Mathematically simple
Can transform new points instantly
Preserves global structure well

Cons:

Assumes linear relationships
Poor at capturing complex, nonlinear semantic relationships
Clusters tend to overlap and blur together

Verdict for Word Space: Not suitable. Word embeddings have highly nonlinear relationships. "King" is related to "queen" and "royalty" in ways that PCA's linear projection can't capture well. The 3D space would look like a blurry cloud with little structure.

t-SNE (t-distributed Stochastic Neighbor Embedding)

How it works: Preserves local neighborhood structure by modeling probability distributions of distances in high and low dimensions, then minimizing their divergence.

Pros:

Excellent at revealing clusters
Creates visually striking, well-separated groups
Good for static visualizations

Cons:

Very slow (O(n²) complexity)
Cannot transform new points — must rerun on entire dataset
Global structure is distorted (cluster positions are arbitrary)
Results vary with random initialization

Verdict for Word Space: Not suitable. The inability to transform new points is a dealbreaker. When a player guesses "banana", we need to project it into 3D instantly. With t-SNE, we'd have to rerun the entire algorithm on 6,017+ words — far too slow for real-time gameplay.

Laplacian Eigenmaps (Spectral Embedding)

How it works: Builds a graph of nearest neighbors, computes the graph Laplacian matrix, and uses its eigenvectors to embed points in low dimensions.

Pros:

Good at preserving local structure
Theoretically elegant (graph theory foundations)
Works well for manifold learning

Cons:

Cannot transform new points (no out-of-sample extension)
Assumes uniform data distribution
Computationally expensive for large datasets
Sensitive to graph construction parameters

Verdict for Word Space: Not suitable. Same fundamental problem as t-SNE — no way to project new words without rebuilding the entire graph and recomputing eigenvectors.

UMAP (Uniform Manifold Approximation and Projection)

How it works: Combines topological data analysis with manifold learning. Builds a fuzzy topological representation of the high-dimensional data, then optimizes a low-dimensional representation to match it.

Pros:

Can transform new points via .transform() method
Fast (competitive with PCA for moderate datasets)
Preserves both local and global structure
Theoretically grounded (Riemannian geometry, algebraic topology)
Handles non-uniform data distributions well

Cons:

More parameters to tune than PCA
Results can vary with random seed (we fix this with random_state=42)

Verdict for Word Space: Perfect fit. UMAP gives us the nonlinear projection we need, preserves semantic clusters, and critically — lets us transform player guesses in real-time.

The Dealbreaker: Transforming New Points

This comparison really comes down to one critical requirement: real-time projection of new words.

When a player guesses a word:

We embed it using OpenAI's API → 512-dimensional vector
We project it into 3D using our trained reducer → (x, y, z) coordinates
We display it in the visualization → instant feedback

Algorithm	Can Transform New Points?	Suitable for Word Space?
PCA	Yes	No (poor nonlinear capture)
t-SNE	No	No
Laplacian Eigenmaps	No	No
UMAP	Yes	Yes

Only PCA and UMAP can transform new points. And only UMAP handles the nonlinear semantic relationships in word embeddings properly.

How UMAP's Transform Works

When we train UMAP on our 6,017-word vocabulary:

reducer = umap.UMAP(
    n_components=3,
    metric="cosine",
    n_neighbors=15,
    min_dist=0.08,
    random_state=42
)
reducer.fit(vocabulary_embeddings)  # Train on 6,017 × 512 matrix

The trained reducer learns the structure of semantic space. Later, when we need to project a new word:

new_embedding = get_embedding("banana")  # 512-dimensional vector
new_coords = reducer.transform([new_embedding])  # → [x, y, z]

This transform operation:

Takes ~50-100ms (fast enough for real-time)
Uses the learned manifold structure
Places the new point where it semantically belongs
Doesn't require retraining or accessing the original vocabulary

Our UMAP Configuration

umap.UMAP(
    n_components=3,       # 3D output for visualization
    metric="cosine",      # Best for normalized embeddings
    n_neighbors=15,       # Balance: local detail vs global structure
    min_dist=0.08,        # Tighter clusters (range: 0.0-1.0)
    random_state=42       # Reproducibility
)

Why these parameters?

n_components=3: We need 3D for spatial visualization
metric="cosine": Word embeddings are typically compared via cosine similarity, not Euclidean distance
n_neighbors=15: Higher values (30+) preserve more global structure but blur local clusters. Lower values (5-10) create tighter clusters but lose global relationships. 15 is a good balance.
min_dist=0.08: Controls how tightly points can pack together. Lower values create denser clusters. 0.08 gives us visible clusters without points overlapping.
random_state=42: UMAP has stochastic elements. Fixing the seed ensures reproducible results.

Visual Comparison

If we visualized the same word embeddings with each algorithm:

PCA: A diffuse cloud with some directional stretching. Similar words might be vaguely near each other, but clusters are poorly defined. Global directions are preserved but local neighborhoods are noisy.

t-SNE: Beautiful, well-separated clusters. "Animals" clearly separate from "vehicles" from "emotions". But the positions are arbitrary — animals might appear "above" vehicles in one run and "below" in another. And we can't add new words without starting over.

Laplacian Eigenmaps: Similar to t-SNE but often with more continuous manifold structure. Still can't handle new points.

UMAP: Clear clusters like t-SNE, but with meaningful global arrangement. Related clusters (e.g., "animals" and "nature") tend to be near each other. And new words slot into the right neighborhoods seamlessly.

Summary

Requirement	PCA	t-SNE	Laplacian	UMAP
Nonlinear relationships	No	Yes	Yes	Yes
Transform new points	Yes	No	No	Yes
Preserve local structure	No	Yes	Yes	Yes
Preserve global structure	Yes	No	Partial	Yes
Fast training	Yes	No	No	Yes
Suitable for Word Space	No	No	No	Yes

UMAP is the only algorithm that meets all our requirements. It's the right tool for the job.

Dimensionality Reduction

Dimensionality Reduction: Why We Use UMAP

The Challenge

The Algorithms Compared

PCA (Principal Component Analysis)

t-SNE (t-distributed Stochastic Neighbor Embedding)

Laplacian Eigenmaps (Spectral Embedding)

UMAP (Uniform Manifold Approximation and Projection)

The Dealbreaker: Transforming New Points

How UMAP's Transform Works

Our UMAP Configuration

Visual Comparison

Summary

Further Reading