PGVector
To enable vector search in a generic PostgreSQL database, LangChain.js supports using the pgvector
Postgres extension.
Setup
To work with PGVector, you need to install the pg
package:
- npm
- Yarn
- pnpm
npm install pg
yarn add pg
pnpm add pg
Setup a pgvector
self hosted instance with docker-compose
- npm
- Yarn
- pnpm
npm install @langchain/openai @langchain/community
yarn add @langchain/openai @langchain/community
pnpm add @langchain/openai @langchain/community
pgvector
provides a prebuilt Docker image that can be used to quickly setup a self-hosted Postgres instance.
Create a file below named docker-compose.yml
:
# Run this command to start the database:
# docker-compose up --build
version: "3"
services:
db:
hostname: 127.0.0.1
image: ankane/pgvector
ports:
- 5432:5432
restart: always
environment:
- POSTGRES_DB=api
- POSTGRES_USER=myuser
- POSTGRES_PASSWORD=ChangeMe
volumes:
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
And then in the same directory, run docker compose up
to start the container.
You can find more information on how to setup pgvector
in the official repository.
Usage
User-generated data such as usernames should not be used as input for table and column names.
This may lead to SQL Injection!
One complete example of using PGVectorStore
is the following:
import { OpenAIEmbeddings } from "@langchain/openai";
import {
DistanceStrategy,
PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";
// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector
const config = {
postgresConnectionOptions: {
type: "postgres",
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
} as PoolConfig,
tableName: "testlangchain",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
// supported distance strategies: cosine (default), innerProduct, or euclidean
distanceStrategy: "cosine" as DistanceStrategy,
};
const pgvectorStore = await PGVectorStore.initialize(
new OpenAIEmbeddings(),
config
);
await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
{ pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);
const results = await pgvectorStore.similaritySearch("water", 1);
console.log(results);
/*
[ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/
// Filtering is supported
const results2 = await pgvectorStore.similaritySearch("water", 1, {
a: 2,
});
console.log(results2);
/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/
// Filtering on multiple values using "in" is supported too
const results3 = await pgvectorStore.similaritySearch("water", 1, {
a: {
in: [2],
},
});
console.log(results3);
/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/
await pgvectorStore.delete({
filter: {
a: 1,
},
});
const results4 = await pgvectorStore.similaritySearch("water", 1);
console.log(results4);
/*
[ Document { pageContent: 'what's this', metadata: { a: 2 } } ]
*/
// Filtering using arrayContains (?|) is supported
const results5 = await pgvectorStore.similaritySearch("water", 1, {
b: {
arrayContains: ["tag1"],
},
});
console.log(results5);
/*
[ Document { pageContent: "what's this", metadata: { a: 2, b: ['tag1', 'tag2'] } } } ]
*/
await pgvectorStore.end();
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- DistanceStrategy from
@langchain/community/vectorstores/pgvector
- PGVectorStore from
@langchain/community/vectorstores/pgvector
You can also specify a collectionTableName
and a collectionName
to partition vectors between multiple users or namespaces.
Advanced: reusing connections
You can reuse connections by creating a pool, then creating new PGVectorStore
instances directly via the constructor.
Note that you should call .initialize()
to set up your database at least once to set up your tables properly
before using the constructor.
import { OpenAIEmbeddings } from "@langchain/openai";
import { PGVectorStore } from "@langchain/community/vectorstores/pgvector";
import pg from "pg";
// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector
const reusablePool = new pg.Pool({
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
});
const originalConfig = {
pool: reusablePool,
tableName: "testlangchain",
collectionName: "sample",
collectionTableName: "collections",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
};
// Set up the DB.
// Can skip this step if you've already initialized the DB.
// await PGVectorStore.initialize(new OpenAIEmbeddings(), originalConfig);
const pgvectorStore = new PGVectorStore(new OpenAIEmbeddings(), originalConfig);
await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2 } },
{ pageContent: "Cat drinks milk", metadata: { a: 1 } },
]);
const results = await pgvectorStore.similaritySearch("water", 1);
console.log(results);
/*
[ Document { pageContent: 'Cat drinks milk', metadata: { a: 1 } } ]
*/
const pgvectorStore2 = new PGVectorStore(new OpenAIEmbeddings(), {
pool: reusablePool,
tableName: "testlangchain",
collectionTableName: "collections",
collectionName: "some_other_collection",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
});
const results2 = await pgvectorStore2.similaritySearch("water", 1);
console.log(results2);
/*
[]
*/
await reusablePool.end();
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- PGVectorStore from
@langchain/community/vectorstores/pgvector
Create HNSW Index
By default, the extension performs a sequential scan search, with 100% recall. You might consider creating an HNSW index for approximate nearest neighbor (ANN) search to speed up similaritySearchVectorWithScore execution time. To create the HNSW index on your vector column, use the createHnswIndex()
method:
The optional method parameters include:
dims: Defines the number of dimensions in your vector data, max: 2000. For example, use 1536 for OpenAI's text-embedding-ada-002 model and 1024 for amazon.titan-embed-text-v2:0
m: The max number of connections per layer (16 by default)
efConstruction: The size of the dynamic candidate list for constructing the graph (64 by default)
distanceFunction: The distance function name you want to use, is automatically selected based on the distanceStrategy.
More info at the pgvector github project
import { OpenAIEmbeddings } from "@langchain/openai";
import {
DistanceStrategy,
PGVectorStore,
} from "@langchain/community/vectorstores/pgvector";
import { PoolConfig } from "pg";
// First, follow set-up instructions at
// https://js.langchain.com/docs/modules/indexes/vector_stores/integrations/pgvector
const config = {
postgresConnectionOptions: {
type: "postgres",
host: "127.0.0.1",
port: 5433,
user: "myuser",
password: "ChangeMe",
database: "api",
} as PoolConfig,
tableName: "testlangchain",
columns: {
idColumnName: "id",
vectorColumnName: "vector",
contentColumnName: "content",
metadataColumnName: "metadata",
},
// supported distance strategies: cosine (default), innerProduct, or euclidean
distanceStrategy: "cosine" as DistanceStrategy,
};
const pgvectorStore = await PGVectorStore.initialize(
new OpenAIEmbeddings(),
config
);
// create the index
await pgvectorStore.createHnswIndex({
dims: 1536,
efConstruction: 64,
m: 16,
});
await pgvectorStore.addDocuments([
{ pageContent: "what's this", metadata: { a: 2, b: ["tag1", "tag2"] } },
{ pageContent: "Cat drinks milk", metadata: { a: 1, b: ["tag2"] } },
]);
const model = new OpenAIEmbeddings();
const query = await model.embedQuery("water");
const results = await pgvectorStore.similaritySearchVectorWithScore(query, 1);
console.log(results);
await pgvectorStore.end();
API Reference:
- OpenAIEmbeddings from
@langchain/openai
- DistanceStrategy from
@langchain/community/vectorstores/pgvector
- PGVectorStore from
@langchain/community/vectorstores/pgvector