How We Project Words to 3D
The Challenge
Word Space 3D needs to project 512-dimensional word embeddings into a 3D space that players can see and navigate. This is a classic dimensionality reduction problem, but ours comes with a twist: the game has a mystery word, and we want a player's position in 3D to directly reflect how close they are to finding it.
Generic algorithms like PCA, t-SNE, and traditional dimensionality reducers optimize for preserving neighborhood structure or variance. They don't know anything about our game. A word at 90% similarity and a word at 20% similarity might land the same distance from center — position doesn't encode progress.
We wanted position itself to tell the story.
Our Approach: Mystery-Aligned Linear Projection
Instead of a general-purpose algorithm, we align the entire 3D space to the current mystery word. The result: distance from center directly encodes how close you are to winning.
The x-axis: Your Progress
The x-axis maps directly to similarity percentile:
x = (1 - percentile / 100) ^ 0.6
- Percentile ranks your word against all 6,017 vocabulary words by cosine similarity to the mystery word
- Hot words (high percentile) cluster near the center (x close to 0)
- Cold words (low percentile) spread outward (x approaching 1)
- The power curve (0.6 exponent) gives more visual spread to the hot words that matter most during gameplay, preventing them from collapsing into an indistinguishable clump at the center
The mystery word itself is forced to the exact origin (0, 0, 0).
The y and z axes: Semantic Variety
Once we've assigned each word its x-position, we still have two axes to fill. We use PCA on embedding residuals:
- Compute residuals: For each word, subtract the component along the mystery word direction. What's left is the part of the embedding orthogonal to the mystery-similarity axis.
- Fit PCA: Run PCA on these residuals to find the two directions of greatest variance.
- Normalize: Scale y and z to [-1, 1] using the vocabulary's min/max values.
This preserves semantic clusters in the remaining two dimensions. Animals still group together. Emotions still cluster. The 3D space has structure beyond just "how close to the answer."
Putting It Together
# For each word in the vocabulary:
similarity = dot(word_embedding, mystery_embedding) # cosine similarity
percentile = binary_search(sorted_similarities, similarity)
x = (1 - percentile / 100) ** 0.6
residual = word_embedding - similarity * mystery_embedding
y, z = pca_model.transform(residual) # PCA on the orthogonal component
y, z = normalize_to_range(y, z, [-1, 1])
Why This Beats General-Purpose Algorithms
Traditional dimensionality reduction algorithms (PCA, t-SNE, UMAP) optimize for a generic goal: preserve as much structure as possible from the high-dimensional space. That's great for data visualization, but our use case is specific — we're building a game.
Position = progress. In our projection, moving toward the center means moving toward the answer. You don't need arrows, color gradients, or score readouts to understand your situation — spatial position itself communicates it. Hot words are physically close to the mystery word at the origin. Cold words are far away.
Game-meaningful axes. The x-axis has a direct interpretation (percentile rank). The y and z axes show semantic variety within each "temperature zone." A generic reducer would distribute information across all three axes with no interpretable meaning on any single one.
Deterministic. The same mystery word always produces the same projection. No stochastic variation between runs.
No pre-trained model files. The projection is computed fresh for each mystery word at startup. Nothing to save, version, or deploy.
Why the Cone Shape Is Geometry, Not an Artifact
When you look at the 3D visualization, you'll notice the words form a roughly conical shape: tight near the origin, spreading outward. This isn't a rendering bug — it's the natural geometry of cosine similarity in high dimensions.
Words very similar to the mystery word (high percentile, x near 0) have embeddings nearly collinear with the mystery direction. Their residuals — the orthogonal component that feeds y and z — are small, so they cluster tightly. Words far from the mystery word (low percentile, x near 1) have large residuals and span more of the orthogonal subspace, so they spread wide.
The set of vectors with cosine similarity > 0.9 to any fixed direction forms a narrow cone in 512-dimensional space. The set with similarity ~0.3 forms a wide band. Our projection faithfully renders this structure. The cone is the truth about how embedding space is organized around any given word.
Why Percentile Rank With Power Scaling Is Principled
Raw cosine similarity is a poor visual encoding for two reasons. First, the distribution is concentrated — most word pairs have similarity between 0.1 and 0.5, so a linear mapping would waste most of the visual range on cold words while cramming all the interesting warm and hot words into a tiny sliver near the origin.
Percentile rank solves this by making the x-axis uniform over the vocabulary: every 1% slice contains the same number of words. This is a standard technique in statistics — a probability integral transform, converting an arbitrary distribution into a uniform one.
The power scaling (^0.6) on top is a mild concave compression that gives more visual real estate to the hot region where the player is actually searching. It's the same idea as gamma correction in imaging — perceptual linearity matters more than mathematical linearity when a human is the consumer.
Why We Moved Away From Our Previous Approach
We originally used UMAP (Uniform Manifold Approximation and Projection) — a popular nonlinear dimensionality reduction algorithm. It worked, but had significant drawbacks for our game:
- Slow initialization: Fitting and loading the UMAP model took 3-10 seconds, plus a JIT warmup penalty on first transform
- Stochastic: Results varied with random seed, requiring
random_statepinning - Positions weren't game-meaningful: A word with 90% similarity might appear no closer to center than one with 40%. Position and score were disconnected.
- Model files: Required a pre-trained
.joblibfile (~50MB) to be deployed alongside the server - No per-mystery adaptation: A single UMAP model was trained once on the vocabulary, regardless of which word was the daily mystery. The projection couldn't adapt.
Our mystery-aligned projection initializes in ~0.07 seconds for 6,017 words, is fully deterministic, and produces coordinates where position directly encodes gameplay progress.
Technical Details
Initialization
Called once when the daily mystery word is set:
service = MysteryProjectionService()
service.set_mystery(
mystery_emb=mystery_embedding, # 512-dim vector
all_words=vocabulary_words, # 6,017 words
all_embeddings=vocabulary_matrix, # (6017, 512) array
mystery_word="telescope" # Force to origin
)
Time: ~0.07s for 6,017 words on a single CPU core.
Dependencies: NumPy, scikit-learn (PCA only).
Runtime Transform
For vocabulary words, coordinates come from a pre-computed cache (O(1) dict lookup). For out-of-vocabulary guesses:
x, y, z = service.transform("banana", banana_embedding)
This computes the similarity, percentile, residual, and PCA transform on the fly — typically under 1ms.
Percentile Lookup
Percentile rank is computed via binary search (np.searchsorted) on a pre-sorted array of all vocabulary similarities. O(log n) per query.
Coordinate Ranges
| Axis | Range | Meaning |
|---|---|---|
| x | [0, 1] | Native range from power curve; 0 = mystery word, 1 = lowest percentile |
| y | [-1, 1] | Normalized PCA component 1 of residuals |
| z | [-1, 1] | Normalized PCA component 2 of residuals |
Summary
| Property | Mystery-Aligned Projection |
|---|---|
| Initialization | ~0.07s for 6,017 words |
| Per-word transform | <1ms (cached) / ~1ms (new) |
| Deterministic | Yes |
| Model files needed | None (computed at runtime) |
| Position encodes progress | Yes (x-axis = percentile) |
| Preserves semantic clusters | Yes (y/z via PCA on residuals) |
| Adapts to mystery word | Yes (re-initialized daily) |
| Dependencies | NumPy, scikit-learn |
The right projection isn't the one that preserves the most generic structure — it's the one that makes the game make sense.
Further Reading
- Why 512 Dimensions? — Embedding dimension research
- PCA (scikit-learn)
- Cosine Similarity (Wikipedia)