FlockMTL is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL
(s) and PROMPT
(s) akin to TABLE
(s).
While the extension will load successfully, it requires the environment variable OPENAI_API_KEY
in your environment to make requests. If not found, the use of our functions will throw an error regarding obtaining an API key.
Install the extension as a Community Extension.
Run:
INSTALL flockmtl FROM community;
LOAD flockmtl;
Next, we demonstrate how to use the DuckDB LLM Extension and as an example we would like to analyze product reviews. Specifically, we’ll focus on generating text, classifying reviews as positive or negative, and working with text embeddings using a product_reviews
table with attributes review_text
, review_id
, and customer_name
.
llm_complete
generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.
-
Basic Text Generation – Product Description:
Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description FROM products;
Result:
product_name product_description Running Shoes "These lightweight running shoes offer superior comfort and breathability for long-distance runners. Featuring a cushioned sole and durable upper, they ensure maximum performance on various terrains." -
Example: Sentiment Classification – Complex Review:
Classify whether a review is positive based on more than just the review text, such as review length and other metadata. This could help you get a more nuanced classification.SELECT review_id, llm_complete('is_positive', 'default', {'text': review_text, 'length': LENGTH(review_text)}, {'max_tokens': 100}) AS results FROM product_reviews;
Result:
review_id results 12345 "Positive - Great in-depth feedback, long review indicating high engagement." -
Example: Generate Responses to Negative Reviews:
Automatically generate personalized responses to negative reviews, allowing your team to respond quickly and professionally.SELECT review_id, llm_complete('generate_response', 'default', {'review': review_text, 'sentiment': 'negative'}, {'max_tokens': 100}) AS response FROM product_reviews WHERE llm_filter('is_negative', 'default', {'text': review_text});
Result:
review_id response 54321 "We're sorry to hear about your experience. We value your feedback and would like to offer a solution. Please contact our support team for further assistance."
llm_complete_json
is useful when you want structured output, such as detailed classification or multi-part answers.
-
Example: Sentiment and Detailed Analysis:
Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis FROM product_reviews;
Result:
review_id analysis 112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]} -
Example: Review Summary in JSON Format:
Summarize a review and return a JSON output with sentiment, summary, and key points.SELECT review_id, llm_complete_json('summarize_review', 'default', {'text': review_text}, {'max_tokens': 150}) AS summary FROM product_reviews;
Result:
review_id summary 332211 {"summary": "The customer experienced a slow response time from support but was satisfied with the product quality overall.", "sentiment": "mixed", "key_points": ["slow response", "good product quality"]}
Use llm_filter
to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.
-
Example: Filter Reviews Mentioning Specific Features:
Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."SELECT review_id, customer_name, review_text FROM product_reviews WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
Result:
review_id customer_name review_text 56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge." -
Example: Filter Reviews with High Confidence in Sentiment:
Filter reviews based on the confidence level of sentiment classification, showing only those with high confidence in the result.SELECT review_id, customer_name, review_text FROM product_reviews WHERE llm_filter('is_positive', 'default', {'text': review_text, 'confidence_threshold': 0.9});
Result:
review_id customer_name review_text 99887 Jane Smith "Absolutely loved this product! Exceeded all my expectations."
llm_embedding
generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.
- Example: Find Reviews Similar to a Target Review:
Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.Result:WITH input_embedding AS ( SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'}, 'text-embedding-3-small') AS embedding ), review_embeddings AS ( SELECT review_id, array_distance(input_embedding.embedding::DOUBLE[1536], llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity FROM product_reviews, input_embedding ) SELECT review_id AS similar_review_id FROM review_embeddings ORDER BY similarity LIMIT 1;
similar_review_id 4
You can manage LLM prompts and models dynamically in DuckDB using the following commands:
-
Get All Prompts:
GET PROMPTS;
-
Get Specific Prompt:
Retrieve the content of a specific prompt by name.GET PROMPT <prompt_name>;
-
Create a New Prompt:
CREATE PROMPT(<prompt_name>, <prompt_text>);
-
Update an Existing Prompt:
UPDATE PROMPT(<prompt_name>, <new_prompt_text>);
-
Delete a Prompt:
DELETE PROMPT <prompt_name>;
-
Get All Models:
GET MODELS;
-
Get Specific Model:
Retrieve the details of a specific model by name.GET MODEL <model_name>;
-
Create a New Model:
CREATE MODEL(<model_name>, <model_type>, <max_tokens>);
-
Update a Model:
UPDATE MODEL(<model_name>, <model_type>, <max_tokens>);
-
Delete a Model:
DELETE MODEL <model_name>;