How can I limit results of vector search based on injected source #846

petrarca · 2025-08-03T09:34:57Z

petrarca
Aug 3, 2025

With the now working capability to allow multiple sources in a flow (works great, thank you!), I like to scope the vector search, especially with PG Vector to e.g. one or multiple injected sources using the source name. Search can easily be done using SQL (again, for PG only).

I did not find a way to pass the source or something similar from the embeddings/chunk to the exporter:

...
    code_embeddings.export(
       // how can I add here additional data
        "my_table_name",
        cocoindex.targets.Postgres(),
        primary_key_fields=["filename", "location"],
        vector_indexes=[
            cocoindex.VectorIndexDef(
                field_name="embedding",
                metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
            )
        ],
    )

Filename is also the relative filename for the source like "some/path/inside/source", so I cannot use that for filtering.
Any idea, how to solve that?

Answered by georgeh0

Aug 3, 2025

You can add a new field when you collect the rows to be exported, to identify the source, e.g.

doc_embeddings.collect(
    source_key="/your/source/path/or/anything",
    filename=doc["filename"],
    location=chunk["location"],
    text=chunk["text"],
    embedding=chunk["embedding"],
)

Here the source_key can be any value (e.g. strings, integers). It will become an additional field in your exported database. You give different values for data from your different source. Likely you need this to be a primary key part too:

code_embeddings.export(
   // how can I add here additional data
    "my_table_name",
    cocoindex.targets.Postgres(),
    primary_key_fields=["source_key", "filename", "…

View full answer

georgeh0 · 2025-08-03T15:04:41Z

georgeh0
Aug 3, 2025
Maintainer

You can add a new field when you collect the rows to be exported, to identify the source, e.g.

doc_embeddings.collect(
    source_key="/your/source/path/or/anything",
    filename=doc["filename"],
    location=chunk["location"],
    text=chunk["text"],
    embedding=chunk["embedding"],
)

Here the source_key can be any value (e.g. strings, integers). It will become an additional field in your exported database. You give different values for data from your different source. Likely you need this to be a primary key part too:

code_embeddings.export(
   // how can I add here additional data
    "my_table_name",
    cocoindex.targets.Postgres(),
    primary_key_fields=["source_key", "filename", "location"],
    vector_indexes=[
        cocoindex.VectorIndexDef(
            field_name="embedding",
            metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
        )
    ],
)

Now your database will have a source_key column that you can read and query against!

3 replies

petrarca Aug 3, 2025
Author

Excellent! That was what I was looking for. It works. Thanks for fast reply.
Would be great to have some typing support for Python for such arguments.

georgeh0 Aug 3, 2025
Maintainer

That's great!

Note that all field names here (like source_key, filename, location, text and embedding) are not fixed arguments - you can freely come up with your own name (e.g. source_key can be source_id, you can add whatever new fields). You have full control here :)

petrarca Aug 3, 2025
Author

Get it. Makes it super flexible. Thank you for your great work here! Like it very much.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I limit results of vector search based on injected source #846

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I limit results of vector search based on injected source #846

Uh oh!

Uh oh!

petrarca Aug 3, 2025

Replies: 1 comment · 3 replies

Uh oh!

Uh oh!

georgeh0 Aug 3, 2025 Maintainer

Uh oh!

petrarca Aug 3, 2025 Author

Uh oh!

georgeh0 Aug 3, 2025 Maintainer

Uh oh!

petrarca Aug 3, 2025 Author

petrarca
Aug 3, 2025

Replies: 1 comment 3 replies

georgeh0
Aug 3, 2025
Maintainer

petrarca Aug 3, 2025
Author

georgeh0 Aug 3, 2025
Maintainer

petrarca Aug 3, 2025
Author