FlockMTL Extension

FlockMTL is a DuckDB extension that integrates language model (LLM) capabilities directly into your queries and workflows. This experimental extension enables DuckDB users to add semantic analysis (classification, filtering, completion, all w/ structured output) and embeddings using GPT models—all from within SQL commands. Following the tradition of declarativity, we introduce an administrative view of MODEL(s) and PROMPT(s) akin to TABLE(s). While the extension will load successfully, it requires the environment variable OPENAI_API_KEY in your environment to make requests. If not found, the use of our functions will throw an error regarding obtaining an API key.

Next, we demonstrate how to use the DuckDB LLM Extension and as an example we would like to analyze product reviews. Specifically, we’ll focus on generating text, classifying reviews as positive or negative, and working with text embeddings using a product_reviews table with attributes review_text, review_id, and customer_name.

1. Text Generation with `llm_complete`

llm_complete generates text based on a given prompt and LLM model. The following examples show how this functionality can be used in real-world scenarios.

Basic Text Generation – Product Description:
Imagine you have a list of product names, and you want to automatically generate product descriptions based on the product name.

SELECT product_name, llm_complete('generate_product_description', 'default', {'product_name': product_name}, {'max_tokens': 150}) AS product_description 
FROM products;

Result:

product_name	product_description
Running Shoes	"These lightweight running shoes offer superior comfort and breathability for long-distance runners. Featuring a cushioned sole and durable upper, they ensure maximum performance on various terrains."

Example: Sentiment Classification – Complex Review:
Classify whether a review is positive based on more than just the review text, such as review length and other metadata. This could help you get a more nuanced classification.
```
SELECT review_id, llm_complete('is_positive', 'default', {'text': review_text, 'length': LENGTH(review_text)}, {'max_tokens': 100}) AS results 
FROM product_reviews;
```
Result:

review_id results

12345 "Positive - Great in-depth feedback, long review indicating high engagement."
Example: Generate Responses to Negative Reviews:
Automatically generate personalized responses to negative reviews, allowing your team to respond quickly and professionally.
```
SELECT review_id, llm_complete('generate_response', 'default', {'review': review_text, 'sentiment': 'negative'}, {'max_tokens': 100}) AS response 
FROM product_reviews
WHERE llm_filter('is_negative', 'default', {'text': review_text});
```
Result:

review_id response

54321 "We're sorry to hear about your experience. We value your feedback and would like to offer a solution. Please contact our support team for further assistance."

2. JSON Output with `llm_complete_json`

llm_complete_json is useful when you want structured output, such as detailed classification or multi-part answers.

Example: Sentiment and Detailed Analysis:
Classify whether a review is positive or negative and provide a detailed analysis of the sentiment in JSON format.
```
SELECT review_id, llm_complete_json('detailed_sentiment_analysis', 'default', {'text': review_text}, {'max_tokens': 200}) AS analysis 
FROM product_reviews;
```
Result:

review_id analysis

112233 {"sentiment": "negative", "confidence": 0.85, "keywords": ["bad service", "slow response"]}

Example: Review Summary in JSON Format:
Summarize a review and return a JSON output with sentiment, summary, and key points.

SELECT review_id, llm_complete_json('summarize_review', 'default', {'text': review_text}, {'max_tokens': 150}) AS summary 
FROM product_reviews;

Result:

review_id	summary
332211	{"summary": "The customer experienced a slow response time from support but was satisfied with the product quality overall.", "sentiment": "mixed", "key_points": ["slow response", "good product quality"]}

3. Filtering with `llm_filter`

Use llm_filter to filter records based on LLM-powered classifications, such as identifying specific sentiments or themes in product reviews.

Example: Filter Reviews Mentioning Specific Features:
Filter reviews to return only those that mention specific product features, such as "battery life" or "performance."
```
SELECT review_id, customer_name, review_text 
FROM product_reviews 
WHERE llm_filter('mentions_feature', 'default', {'text': review_text, 'feature': 'battery life'});
```
Result:

review_id customer_name review_text

56789 John Doe "The battery life on this phone is fantastic! It lasted me a full two days without needing a recharge."
Example: Filter Reviews with High Confidence in Sentiment:
Filter reviews based on the confidence level of sentiment classification, showing only those with high confidence in the result.
```
SELECT review_id, customer_name, review_text 
FROM product_reviews 
WHERE llm_filter('is_positive', 'default', {'text': review_text, 'confidence_threshold': 0.9});
```
Result:

review_id customer_name review_text

99887 Jane Smith "Absolutely loved this product! Exceeded all my expectations."

4. Text Embedding with `llm_embedding`

llm_embedding generates vector embeddings for text, which can be used for tasks like semantic similarity, clustering, or advanced search.

Example: Find Reviews Similar to a Target Review:
Generate embeddings for each review and compare them to a target review to find similar reviews based on their semantic content.

WITH input_embedding AS (
    SELECT llm_embedding({'text': 'Comfortable to wear, but the sound quality is just average. Expected better for the price.'},
             'text-embedding-3-small') AS embedding
),
review_embeddings AS (
    SELECT 
        review_id, 
        array_distance(input_embedding.embedding::DOUBLE[1536], 
                       llm_embedding({'text': review_text}, 'text-embedding-3-small')::DOUBLE[1536]) AS similarity
    FROM product_reviews, input_embedding
)
SELECT review_id AS similar_review_id
FROM review_embeddings
ORDER BY similarity
LIMIT 1;

Result:

similar_review_id
4

You can manage LLM prompts and models dynamically in DuckDB using the following commands:

5. Prompt Management

Get All Prompts:
```
GET PROMPTS;
```
Get Specific Prompt:
Retrieve the content of a specific prompt by name.
```
GET PROMPT <prompt_name>;
```

Create a New Prompt:

CREATE PROMPT(<prompt_name>, <prompt_text>);

Update an Existing Prompt:

UPDATE PROMPT(<prompt_name>, <new_prompt_text>);

Delete a Prompt:
```
DELETE PROMPT <prompt_name>;
```

6. Model Management

Get All Models:
```
GET MODELS;
```
Get Specific Model:
Retrieve the details of a specific model by name.
```
GET MODEL <model_name>;
```

Create a New Model:

CREATE MODEL(<model_name>, <model_type>, <max_tokens>);

Update a Model:

UPDATE MODEL(<model_name>, <model_type>, <max_tokens>);

Delete a Model:
```
DELETE MODEL <model_name>;
```

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
docs		docs
duckdb @ af39bd0		duckdb @ af39bd0
examples		examples
extension-ci-tools @ 00831df		extension-ci-tools @ 00831df
scripts		scripts
src		src
test		test
.clang-format		.clang-format
.cmake-format		.cmake-format
.editorconfig		.editorconfig
.gitignore		.gitignore
.gitmodules		.gitmodules
.pre-commit-config.yaml		.pre-commit-config.yaml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
extension_config.cmake		extension_config.cmake
vcpkg.json		vcpkg.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlockMTL Extension

Table of Contents

Installation

Documentation

1. Text Generation with `llm_complete`

2. JSON Output with `llm_complete_json`

3. Filtering with `llm_filter`

4. Text Embedding with `llm_embedding`

5. Prompt Management

6. Model Management

About

Releases 1

Packages

Contributors 3

Languages

License

dsg-polymtl/flockmtl-duckdb

Folders and files

Latest commit

History

Repository files navigation

FlockMTL Extension

Table of Contents

Installation

Documentation

1. Text Generation with llm_complete

2. JSON Output with llm_complete_json

3. Filtering with llm_filter

4. Text Embedding with llm_embedding

5. Prompt Management

6. Model Management

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 3

Languages

1. Text Generation with `llm_complete`

2. JSON Output with `llm_complete_json`

3. Filtering with `llm_filter`

4. Text Embedding with `llm_embedding`

Packages