How We Project Words to 3D

How We Project Words to 3D

How Word Space 3D uses a mystery-aligned linear projection to turn 512-dimensional embeddings into meaningful 3D positions

Updated: February 7, 2026


How We Project Words to 3D

The Challenge

Word Space 3D needs to project 512-dimensional word embeddings into a 3D space that players can see and navigate. This is a classic dimensionality reduction problem, but ours comes with a twist: the game has a mystery word, and we want a player's position in 3D to directly reflect how close they are to finding it.

Generic algorithms like PCA, t-SNE, and traditional dimensionality reducers optimize for preserving neighborhood structure or variance. They don't know anything about our game. A word at 90% similarity and a word at 20% similarity might land the same distance from center — position doesn't encode progress.

We wanted position itself to tell the story.


Our Approach: Mystery-Aligned Linear Projection

Instead of a general-purpose algorithm, we align the entire 3D space to the current mystery word. The result: distance from center directly encodes how close you are to winning.

The x-axis: Your Progress

The x-axis maps directly to similarity percentile:

x = (1 - percentile / 100) ^ 0.6
  • Percentile ranks your word against all 6,017 vocabulary words by cosine similarity to the mystery word
  • Hot words (high percentile) cluster near the center (x close to 0)
  • Cold words (low percentile) spread outward (x approaching 1)
  • The power curve (0.6 exponent) gives more visual spread to the hot words that matter most during gameplay, preventing them from collapsing into an indistinguishable clump at the center

The mystery word itself is forced to the exact origin (0, 0, 0).

The y and z axes: Semantic Variety

Once we've assigned each word its x-position, we still have two axes to fill. We use PCA on embedding residuals:

  1. Compute residuals: For each word, subtract the component along the mystery word direction. What's left is the part of the embedding orthogonal to the mystery-similarity axis.
  2. Fit PCA: Run PCA on these residuals to find the two directions of greatest variance.
  3. Normalize: Scale y and z to [-1, 1] using the vocabulary's min/max values.

This preserves semantic clusters in the remaining two dimensions. Animals still group together. Emotions still cluster. The 3D space has structure beyond just "how close to the answer."

Putting It Together

# For each word in the vocabulary:
similarity = dot(word_embedding, mystery_embedding)  # cosine similarity
percentile = binary_search(sorted_similarities, similarity)
x = (1 - percentile / 100) ** 0.6

residual = word_embedding - similarity * mystery_embedding
y, z = pca_model.transform(residual)  # PCA on the orthogonal component
y, z = normalize_to_range(y, z, [-1, 1])

Why This Beats General-Purpose Algorithms

Traditional dimensionality reduction algorithms (PCA, t-SNE, UMAP) optimize for a generic goal: preserve as much structure as possible from the high-dimensional space. That's great for data visualization, but our use case is specific — we're building a game.

Position = progress. In our projection, moving toward the center means moving toward the answer. You don't need arrows, color gradients, or score readouts to understand your situation — spatial position itself communicates it. Hot words are physically close to the mystery word at the origin. Cold words are far away.

Game-meaningful axes. The x-axis has a direct interpretation (percentile rank). The y and z axes show semantic variety within each "temperature zone." A generic reducer would distribute information across all three axes with no interpretable meaning on any single one.

Deterministic. The same mystery word always produces the same projection. No stochastic variation between runs.

No pre-trained model files. The projection is computed fresh for each mystery word at startup. Nothing to save, version, or deploy.


Why the Cone Shape Is Geometry, Not an Artifact

When you look at the 3D visualization, you'll notice the words form a roughly conical shape: tight near the origin, spreading outward. This isn't a rendering bug — it's the natural geometry of cosine similarity in high dimensions.

Words very similar to the mystery word (high percentile, x near 0) have embeddings nearly collinear with the mystery direction. Their residuals — the orthogonal component that feeds y and z — are small, so they cluster tightly. Words far from the mystery word (low percentile, x near 1) have large residuals and span more of the orthogonal subspace, so they spread wide.

The set of vectors with cosine similarity > 0.9 to any fixed direction forms a narrow cone in 512-dimensional space. The set with similarity ~0.3 forms a wide band. Our projection faithfully renders this structure. The cone is the truth about how embedding space is organized around any given word.


Why Percentile Rank With Power Scaling Is Principled

Raw cosine similarity is a poor visual encoding for two reasons. First, the distribution is concentrated — most word pairs have similarity between 0.1 and 0.5, so a linear mapping would waste most of the visual range on cold words while cramming all the interesting warm and hot words into a tiny sliver near the origin.

Percentile rank solves this by making the x-axis uniform over the vocabulary: every 1% slice contains the same number of words. This is a standard technique in statistics — a probability integral transform, converting an arbitrary distribution into a uniform one.

The power scaling (^0.6) on top is a mild concave compression that gives more visual real estate to the hot region where the player is actually searching. It's the same idea as gamma correction in imaging — perceptual linearity matters more than mathematical linearity when a human is the consumer.


Why We Moved Away From Our Previous Approach

We originally used UMAP (Uniform Manifold Approximation and Projection) — a popular nonlinear dimensionality reduction algorithm. It worked, but had significant drawbacks for our game:

  • Slow initialization: Fitting and loading the UMAP model took 3-10 seconds, plus a JIT warmup penalty on first transform
  • Stochastic: Results varied with random seed, requiring random_state pinning
  • Positions weren't game-meaningful: A word with 90% similarity might appear no closer to center than one with 40%. Position and score were disconnected.
  • Model files: Required a pre-trained .joblib file (~50MB) to be deployed alongside the server
  • No per-mystery adaptation: A single UMAP model was trained once on the vocabulary, regardless of which word was the daily mystery. The projection couldn't adapt.

Our mystery-aligned projection initializes in ~0.07 seconds for 6,017 words, is fully deterministic, and produces coordinates where position directly encodes gameplay progress.


Technical Details

Initialization

Called once when the daily mystery word is set:

service = MysteryProjectionService()
service.set_mystery(
    mystery_emb=mystery_embedding,      # 512-dim vector
    all_words=vocabulary_words,          # 6,017 words
    all_embeddings=vocabulary_matrix,    # (6017, 512) array
    mystery_word="telescope"             # Force to origin
)

Time: ~0.07s for 6,017 words on a single CPU core.

Dependencies: NumPy, scikit-learn (PCA only).

Runtime Transform

For vocabulary words, coordinates come from a pre-computed cache (O(1) dict lookup). For out-of-vocabulary guesses:

x, y, z = service.transform("banana", banana_embedding)

This computes the similarity, percentile, residual, and PCA transform on the fly — typically under 1ms.

Percentile Lookup

Percentile rank is computed via binary search (np.searchsorted) on a pre-sorted array of all vocabulary similarities. O(log n) per query.

Coordinate Ranges

AxisRangeMeaning
x[0, 1]Native range from power curve; 0 = mystery word, 1 = lowest percentile
y[-1, 1]Normalized PCA component 1 of residuals
z[-1, 1]Normalized PCA component 2 of residuals

Summary

PropertyMystery-Aligned Projection
Initialization~0.07s for 6,017 words
Per-word transform<1ms (cached) / ~1ms (new)
DeterministicYes
Model files neededNone (computed at runtime)
Position encodes progressYes (x-axis = percentile)
Preserves semantic clustersYes (y/z via PCA on residuals)
Adapts to mystery wordYes (re-initialized daily)
DependenciesNumPy, scikit-learn

The right projection isn't the one that preserves the most generic structure — it's the one that makes the game make sense.


Further Reading