Embedding Dimensions: Why We Use 512
The Question
When building Word Space 3D, we had a choice: OpenAI's text-embedding-3-small model can output anywhere from 256 to 1536 dimensions. Would using more dimensions give us a better 3D visualization?
Our answer: No. We use 512 dimensions, and here's why.
What Are Embedding Dimensions?
When we send a word like "telescope" to OpenAI's embedding API, we get back a vector — a list of numbers that represents that word's meaning in a high-dimensional space. The number of dimensions determines how many numbers are in that list:
- 256 dimensions = 256 numbers
- 512 dimensions = 512 numbers
- 1536 dimensions = 1536 numbers
More dimensions means more "axes" to capture semantic relationships. Intuitively, you might think more is better. But for single words, that's not the case.
Why 512 Is the Sweet Spot for Single Words
1. Semantic Complexity Matters
A single word like "telescope" has limited semantic content:
- It's related to astronomy, optics, observation, science
- It's similar to microscope, binoculars, observatory
- It's distant from words like "happiness" or "breakfast"
These relationships don't need 1536 dimensions to capture. The word's meaning fits comfortably in 512 dimensions with room to spare. Extra dimensions would just be encoding noise.
Compare this to a full paragraph about telescopes — discussing Hubble, gravitational lensing, and the history of astronomical observation. That content has enough semantic richness to benefit from more dimensions.
2. The Curse of Dimensionality
This is a well-known phenomenon in machine learning: as dimensions increase, distance metrics become less meaningful.
Imagine measuring distances in 2D — some points are clearly close, others clearly far. Now imagine 1000D space. Mathematically, all points start to seem roughly equidistant. The meaningful differences get drowned out.
Our projection uses cosine similarity and PCA on residuals. In very high dimensions, distance metrics become less meaningful. For our ~6,000 single words, 512 dimensions keeps distances meaningful without this degradation.
3. OpenAI's Own Design
OpenAI offers two embedding models:
text-embedding-3-small(up to 1536 dim): Optimized for shorter, simpler contenttext-embedding-3-large(up to 3072 dim): Better for complex documents
The "small" model isn't just cheaper — it's architecturally designed for content like single words and short phrases. Using it at 512 dimensions puts us in its optimal range.
Interestingly, OpenAI's benchmarks show the small model at 512 dimensions (62.3% MTEB score) outperforms the large model truncated to 256 dimensions (62.0%). Model architecture matters more than raw dimension count.
4. Projection Quality Depends on Vocabulary Size, Not Dimensions
Our projection computes cosine similarities and PCA on embedding residuals to produce 3D coordinates. Its quality depends primarily on:
- Number of data points: More words = better PCA estimation of residual structure
- Embedding quality: How well the 512-dim vectors capture single-word semantics
- Similarity distribution: A well-spread distribution gives more informative percentile ranks
We improved projection quality by expanding from 2,405 to 6,017 words — not by increasing embedding dimensions. More data points give PCA more information about semantic variance in the residual space.
When Would Higher Dimensions Help?
Higher dimensions are beneficial when embedding:
- Long documents with multiple topics and themes
- Technical content with fine-grained distinctions
- Nuanced passages where subtle differences matter
None of these apply to our vocabulary of common single words.
The Practical Benefits of 512
Beyond quality, 512 dimensions gives us:
| Benefit | Impact |
|---|---|
| Lower API costs | ~3x cheaper than 1536 dimensions |
| Smaller database | ~8MB vs ~24MB for embeddings |
| Faster projection | Less computation per point |
| Faster similarity calculations | Smaller vectors to compare |
Our Configuration
# Embedding settings
model = "text-embedding-3-small"
dimensions = 512
# Projection settings
power_exponent = 0.6 # x-axis: (1 - percentile/100)^0.6
pca_components = 2 # y, z axes from residual PCA
Summary
For single-word embeddings:
- 512 dimensions captures full semantic content
- More dimensions adds noise, not signal
- Projection quality improves with more words, not more dimensions
- We get cost and performance benefits as a bonus
The lesson: match your embedding dimensions to your content complexity. For single words, 512 is the sweet spot.