Typesense

本節將引導您設定 TypesenseVectorStore 以儲存文件嵌入並執行相似性搜尋。

Typesense Typesense 是一個開源、容錯字元的搜尋引擎，針對即時亞 50 毫秒搜尋進行了最佳化，同時提供直覺的開發人員體驗。

先決條件

Typesense 執行個體
- Typesense Cloud (推薦)
- Docker 映像檔 typesense/typesense:latest
EmbeddingModel 執行個體以計算文件嵌入。有多種選項可用
- 如果需要，用於 EmbeddingModel 的 API 金鑰，以產生 TypesenseVectorStore 儲存的嵌入。

自動配置

Spring AI 為 Typesense Vector Sore 提供 Spring Boot 自動配置。若要啟用它，請將下列相依性新增至專案的 Maven pom.xml 檔案

<dependency>
        <groupId>org.springframework.ai</groupId>
        <artifactId>spring-ai-typesense-spring-boot-starter</artifactId>
</dependency>

或新增至您的 Gradle build.gradle 建置檔案。

dependencies {
    implementation 'org.springframework.ai:spring-ai-typesense-spring-boot-starter'
}

請參閱相依性管理章節，將 Spring AI BOM 新增至您的建置檔案。

請參閱儲存庫章節，將里程碑和/或快照儲存庫新增至您的建置檔案。

此外，您還需要已配置的 EmbeddingModel Bean。請參閱EmbeddingModel 章節以取得更多資訊。

以下是所需 Bean 的範例

@Bean
public EmbeddingModel embeddingModel() {
    // Can be any other EmbeddingModel implementation.
    return new OpenAiEmbeddingModel(new OpenAiApi(System.getenv("SPRING_AI_OPENAI_API_KEY")));
}

若要連線至 Typesense，您需要提供執行個體的存取詳細資訊。可以透過 Spring Boot 的 application.yml 提供簡單的配置，

spring:
  ai:
    vectorstore:
      typesense:
          collectionName: "vector_store"
          embeddingDimension: 1536
          client:
              protocl: http
              host: localhost
              port: 8108
              apiKey: xyz

請查看組態參數列表，以了解向量儲存的預設值和組態選項。

現在您可以在應用程式中自動裝配 Typesense Vector Store 並使用它

@Autowired VectorStore vectorStore;

// ...

List <Document> documents = List.of(
    new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("meta1", "meta1")),
    new Document("The World is Big and Salvation Lurks Around the Corner"),
    new Document("You walk forward facing the past and you turn back toward the future.", Map.of("meta2", "meta2")));

// Add the documents to Typesense
vectorStore.add(documents);

// Retrieve documents similar to a query
List<Document> results = this.vectorStore.similaritySearch(SearchRequest.query("Spring").withTopK(5));

組態屬性

您可以在 Spring Boot 組態中使用下列屬性來自訂 Typesense 向量儲存。

屬性描述預設值

屬性	描述	預設值
`spring.ai.vectorstore.typesense.client.protocol`	HTTP 協定	`http`
`spring.ai.vectorstore.typesense.client.host`	主機名稱	`localhost`
`spring.ai.vectorstore.typesense.client.port`	連接埠	`8108`
`spring.ai.vectorstore.typesense.client.apiKey`	API 金鑰	`xyz`
`spring.ai.vectorstore.typesense.initialize-schema`	是否初始化所需的 schema	`false`
`spring.ai.vectorstore.typesense.collection-name`	集合名稱	`vector_store`
`spring.ai.vectorstore.typesense.embedding-dimension`	嵌入維度	`1536`

spring.ai.vectorstore.typesense.client.protocol

HTTP 協定

http

spring.ai.vectorstore.typesense.client.host

主機名稱

localhost

spring.ai.vectorstore.typesense.client.port

連接埠

8108

spring.ai.vectorstore.typesense.client.apiKey

API 金鑰

xyz

spring.ai.vectorstore.typesense.initialize-schema

是否初始化所需的 schema

false

spring.ai.vectorstore.typesense.collection-name

集合名稱

vector_store

spring.ai.vectorstore.typesense.embedding-dimension

嵌入維度

1536

中繼資料篩選

您也可以搭配 TypesenseVectorStore 使用通用的、可攜式的中繼資料篩選器。

例如，您可以使用文字表達式語言

vectorStore.similaritySearch(
   SearchRequest
      .query("The World")
      .withTopK(TOP_K)
      .withSimilarityThreshold(SIMILARITY_THRESHOLD)
      .withFilterExpression("country in ['UK', 'NL'] && year >= 2020"));

或以程式設計方式使用表達式 DSL

FilterExpressionBuilder b = new FilterExpressionBuilder();

vectorStore.similaritySearch(
   SearchRequest
      .query("The World")
      .withTopK(TOP_K)
      .withSimilarityThreshold(SIMILARITY_THRESHOLD)
      .withFilterExpression(b.and(
         b.in("country", "UK", "NL"),
         b.gte("year", 2020)).build()));

可攜式篩選表達式會自動轉換為 Typesense 搜尋篩選器。例如，下列可攜式篩選表達式

country in ['UK', 'NL'] && year >= 2020

會轉換為 Typesense 篩選器

country: ['UK', 'NL'] && year: >=2020

手動配置

如果您不想使用自動配置，您可以手動配置 Typesense Vector Store。新增 Typesense Vector Store 和 Jedis 相依性

<dependency>
  <groupId>org.springframework.ai</groupId>
  <artifactId>spring-ai-typesense</artifactId>
</dependency>

請參閱相依性管理章節，將 Spring AI BOM 新增至您的建置檔案。

然後，在您的 Spring 組態中建立 TypesenseVectorStore Bean

@Bean
public VectorStore vectorStore(Client client, EmbeddingModel embeddingModel) {

    TypesenseVectorStoreConfig config = TypesenseVectorStoreConfig.builder()
        .withCollectionName("test_vector_store")
        .withEmbeddingDimension(embeddingModel.dimensions())
        .build();

    return new TypesenseVectorStore(client, embeddingModel, config);
}

@Bean
public Client typesenseClient() {
    List<Node> nodes = new ArrayList<>();
    nodes
        .add(new Node("http", typesenseContainer.getHost(), typesenseContainer.getMappedPort(8108).toString()));

    Configuration configuration = new Configuration(nodes, Duration.ofSeconds(5), "xyz");
    return new Client(configuration);
}

將 TypesenseVectorStore 建立為 Bean 會更方便且是較佳做法。但是，如果您決定手動建立它，則必須在設定屬性之後且在使用用戶端之前呼叫 TypesenseVectorStore#afterPropertiesSet()。

然後在您的主要程式碼中，建立一些文件

List<Document> documents = List.of(
   new Document("Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!! Spring AI rocks!!", Map.of("country", "UK", "year", 2020)),
   new Document("The World is Big and Salvation Lurks Around the Corner", Map.of()),
   new Document("You walk forward facing the past and you turn back toward the future.", Map.of("country", "NL", "year", 2023)));

現在將文件新增至您的向量儲存

vectorStore.add(documents);

最後，檢索與查詢類似的文件

List<Document> results = vectorStore.similaritySearch(
   SearchRequest
      .query("Spring")
      .withTopK(5));

如果一切順利，您應該會檢索包含文字 "Spring AI rocks!!" 的文件。

如果您未以預期順序檢索文件，或搜尋結果不如預期，請檢查您正在使用的嵌入模型。

嵌入模型可能會對搜尋結果產生重大影響 (即，如果您的資料是西班牙文，請務必使用西班牙文或多語言嵌入模型)。