vLLM Chat
vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. This server can be queried in the same format as OpenAI API.
Overviewโ
This will help you getting started with vLLM chat models, which leverage the langchain-openai
package. For detailed documentation of all ChatOpenAI
features and configurations head to the API reference.
Integration detailsโ
Class | Package | Local | Serializable | JS support | Package downloads | Package latest |
---|---|---|---|---|---|---|
ChatOpenAI | langchain_openai | โ | beta | โ |
Model featuresโ
Specific model features-- such as tool calling, support for multi-modal inputs, support for token-level streaming, etc.-- will depend on the hosted model.
Setupโ
See the vLLM docs here.
To access vLLM models through LangChain, you'll need to install the langchain-openai
integration package.
Credentialsโ
Authentication will depend on specifics of the inference server.
If you want to get automated tracing of your model calls you can also set your LangSmith API key by uncommenting below:
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")
Installationโ
The LangChain vLLM integration can be accessed via the langchain-openai
package:
%pip install -qU langchain-openai
Instantiationโ
Now we can instantiate our model object and generate chat completions:
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts.chat import (
ChatPromptTemplate,
HumanMessagePromptTemplate,
SystemMessagePromptTemplate,
)
from langchain_openai import ChatOpenAI
inference_server_url = "http://localhost:8000/v1"
llm = ChatOpenAI(
model="mosaicml/mpt-7b",
openai_api_key="EMPTY",
openai_api_base=inference_server_url,
max_tokens=5,
temperature=0,
)
Invocationโ
messages = [
SystemMessage(
content="You are a helpful assistant that translates English to Italian."
),
HumanMessage(
content="Translate the following sentence from English to Italian: I love programming."
),
]
llm.invoke(messages)
AIMessage(content=' Io amo programmare', additional_kwargs={}, example=False)
Chainingโ
We can chain our model with a prompt template like so:
from langchain_core.prompts import ChatPromptTemplate
prompt = ChatPromptTemplate(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human", "{input}"),
]
)
chain = prompt | llm
chain.invoke(
{
"input_language": "English",
"output_language": "German",
"input": "I love programming.",
}
)