-
With the now working capability to allow multiple sources in a flow (works great, thank you!), I like to scope the vector search, especially with PG Vector to e.g. one or multiple injected sources using the source name. Search can easily be done using SQL (again, for PG only). I did not find a way to pass the source or something similar from the embeddings/chunk to the exporter: ...
code_embeddings.export(
// how can I add here additional data
"my_table_name",
cocoindex.targets.Postgres(),
primary_key_fields=["filename", "location"],
vector_indexes=[
cocoindex.VectorIndexDef(
field_name="embedding",
metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
)
],
) Filename is also the relative filename for the source like "some/path/inside/source", so I cannot use that for filtering. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
You can add a new field when you collect the rows to be exported, to identify the source, e.g. doc_embeddings.collect(
source_key="/your/source/path/or/anything",
filename=doc["filename"],
location=chunk["location"],
text=chunk["text"],
embedding=chunk["embedding"],
) Here the code_embeddings.export(
// how can I add here additional data
"my_table_name",
cocoindex.targets.Postgres(),
primary_key_fields=["source_key", "filename", "location"],
vector_indexes=[
cocoindex.VectorIndexDef(
field_name="embedding",
metric=cocoindex.VectorSimilarityMetric.COSINE_SIMILARITY,
)
],
) Now your database will have a |
Beta Was this translation helpful? Give feedback.
You can add a new field when you collect the rows to be exported, to identify the source, e.g.
Here the
source_key
can be any value (e.g. strings, integers). It will become an additional field in your exported database. You give different values for data from your different source. Likely you need this to be a primary key part too: