OpenSearch
OpenSearch is a scalable, flexible, and extensible open-source software suite for search, analytics, and observability applications licensed under Apache 2.0.
OpenSearch
is a distributed search and analytics engine based onApache Lucene
.
This notebook shows how to use functionality related to the OpenSearch
database.
To run, you should have an OpenSearch instance up and running: see here for an easy Docker installation.
similarity_search
by default performs the Approximate k-NN Search which uses one of the several algorithms like lucene, nmslib, faiss recommended for
large datasets. To perform brute force search we have other search methods known as Script Scoring and Painless Scripting.
Check this for more details.
Installationโ
Install the Python client.
%pip install --upgrade --quiet opensearch-py langchain-community
We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.
import getpass
import os
if "OPENAI_API_KEY" not in os.environ:
os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import OpenSearchVectorSearch
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter
from langchain_community.document_loaders import TextLoader
loader = TextLoader("../../how_to/state_of_the_union.txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
similarity_search using Approximate k-NNโ
similarity_search
using Approximate k-NN
Search with Custom Parameters
docsearch = OpenSearchVectorSearch.from_documents(
docs, embeddings, opensearch_url="http://localhost:9200"
)
# If using the default Docker installation, use this instantiation instead:
# docsearch = OpenSearchVectorSearch.from_documents(
# docs,
# embeddings,
# opensearch_url="https://localhost:9200",
# http_auth=("admin", "admin"),
# use_ssl = False,
# verify_certs = False,
# ssl_assert_hostname = False,
# ssl_show_warn = False,
# )
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query, k=10)
print(docs[0].page_content)
docsearch = OpenSearchVectorSearch.from_documents(
docs,
embeddings,
opensearch_url="http://localhost:9200",
engine="faiss",
space_type="innerproduct",
ef_construction=256,
m=48,
)
query = "What did the president say about Ketanji Brown Jackson"
docs = docsearch.similarity_search(query)
print(docs[0].page_content)