Add support for configuring Embedding dimensions #47

papgmez · 2025-03-19T17:59:38Z

First of all, I want to thank you for creating RubyLLM, I’m particularly impressed with the clean code, design and documentation.

I’ve been exploring the embeddings functionality and noticed that there isn't a way to explicitly configure the dimensions of the embeddings generated as output.

For applications using pgvector or similar vector databases, this capability would allow developers to optimize their vector storage and query performance while maintaining consistency across all embeddings generated from the same model.

Models that generate embeddings allow you to configure the output dimension:

I’d like to propose adding support for configuring embedding dimensions, both globally and per-request.

This would allow users to:

Ensure consistent dimensionality across different embedding
Standardize embedding dimensions throughout an application, making vector operations more predictable
Simplify integration with external systems that may expect specific vector dimensions

The implementation could look something like:

# Global configuration
RubyLLM.configure do |config|
  config.default_embedding_model = "text-embedding-3-small"
  config.default_embedding_dimensions = 1024
end

# Per-request configuration
embedding = RubyLLM.embed(
  "Ruby is a programmer's best friend",
  model: "text-embedding-3-small",
  dimensions: 512
)

I would be happy to implement this feature myself. Please let me know if this aligns with your vision for the project and if you have any specific implementation preferences or considerations.

Again, thank you for this work, and I look forward to potentially contributing to this project!

crmne · 2025-03-23T11:51:04Z

Thanks for the kind words about RubyLLM! Your suggestion to add configurable embedding dimensions makes perfect sense - both the global and per-request implementation approach look good.

Feel free to open a PR with this feature. Just make sure to include tests verifying it works correctly with the underlying APIs.

Looking forward to your contribution!

This PR implements the ability to specify custom dimensions when generating embeddings, as requested in issue #47. ### What's included - Added support for passing a dimensions parameter to the embed method - Implemented dimensions handling in both OpenAI and Gemini providers - Added tests to verify dimension param works correctly - Optimized the Gemini provider's `embed` method to reduce unnecessary API calls when embedding texts, resulting in lower token usage. From now on, it uses `batchEmbedContents` endpoint within one request, for both single and multiple text embeddings. - Modernize Gemini embeddings following DIP principle, as implemented in `openai/embeddings.rb`. - The Gemini embeddings API response does not contain the promptTokenCount attribute, so I have removed it. ### Implementation notes I've decided to only implement the per-request dimension configuration and not the global configuration option that was initially proposed in the issue. This is because each embedding model has its own default dimensions, making a global setting potentially confusing. With this implementation, users can set the embedding dimensions like: ```ruby embedding = RubyLLM.embed( "Ruby is a programmer's best friend", model: "text-embedding-3-small", dimensions: 512 ) ``` ### References - OpenAI API docs: https://platform.openai.com/docs/api-reference/embeddings - Gemini API docs: https://ai.google.dev/api/embeddings Resolves #47 --------- Co-authored-by: Carmine Paolino <[email protected]>

papgmez changed the title ~~Add Support for Configuring Embedding Dimensions~~ Add support for configuring Embedding dimensions Mar 19, 2025

crmne added the enhancement New feature or request label Mar 23, 2025

papgmez mentioned this issue Mar 27, 2025

Support Embedding Dimensions #73

Merged

crmne closed this as completed in #73 Apr 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add support for configuring Embedding dimensions #47

Add support for configuring Embedding dimensions #47

papgmez commented Mar 19, 2025 •

edited

Loading

crmne commented Mar 23, 2025

Uh oh!

Uh oh!

Add support for configuring Embedding dimensions #47

Add support for configuring Embedding dimensions #47

Comments

papgmez commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

crmne commented Mar 23, 2025

Uh oh!

papgmez commented Mar 19, 2025 •

edited

Loading