azure database for postgresql
94 TopicsBringing Generative AI to Your Data: Semantic Operators in Azure Database for PostgreSQL
by: Arda Aytekin, PhD, Principal Software Engineer; Binnur Gorer, PhD, Senior Software Engineer; Serdar Mumcu, Principal Software Engineer While vector search has been a crucial component in Generative AI (GenAI) applications and agents, it only scratches the surface of what is possible. Vector search often fails to capture the intricate semantic relationships within enterprise data, leading to a significant loss of valuable knowledge at the foundational level. This limitation restricts the agent’s ability to fully understand and reason about the data, as the semantic connections remain hidden and this loss propagates upward. However, with the increasing power, affordability, and accessibility of Large Language Models (LLMs), we now have the tools to uncover these semantic relationships and deliver more accurate results. Although Azure AI Foundry provides powerful APIs for working with these advanced AI models, integrating them into data-centric applications often requires orchestration layers and external services, which can add operational overhead. To eliminate this friction, we are introducing Semantic Operators—is_true, extract, generate, and rank—within the azure_ai extension for Azure Database for PostgreSQL. These operators leverage the capabilities of LLMs directly from SQL workflows, enabling truth evaluation, high-level data extraction, intelligent text generation, and document reranking, respectively. This innovation empowers you to unlock deeper insights and elevate the performance of your AI applications. Meet the Operators We introduce four core semantic operators, each designed to address distinct categories of semantic analysis challenges. Their real power comes from the flexible input options and the variety of supported LLMs, enabling our users to explore the potential of LLMs for their specific use cases and challenges. Truth Evaluation azure_ai.is_true evaluates whether a natural language statement is likely true. It is ideal for filtering noisy or off-topic content. Below is a simple example of how to use the operator: WITH statements(stmt) AS ( VALUES ('Surface Pro is made with recycled content, including 100% recycled aluminum alloy in the enclosure of 12-inch and 13-inch devices.'), ('Surface Pro is a product of an EU-based startup.'), ('My laptop has a great battery life.') ) SELECT stmt, azure_ai.is_true(stmt) AS truthy FROM statements; which gives the following result set (subject to variability of LLMs’ responses): stmt truthy Surface Pro is made with recycled content, including 100% recycled aluminum alloy in the enclosure of 12-inch and 13-inch devices. True Surface Pro is a product of an EU-based startup. False My laptop has a great battery life. NULL As we can observe from the result set, azure_ai.is_true outputs NULL when the truthiness of the value cannot be determined and issues a warning with a detailed message (intentionally left out for brevity). High-Level Data Extraction azure_ai.extract extracts structured high-level data (e.g., features, entities) from unstructured text. It is particularly useful for identifying named entities, actions, or other key elements within a text, making it easier to organize and analyze information. Below is a simple example from a review processing task: WITH reviews(review) AS ( VALUES ('The product has a great battery life.'), ('Noise cancellation does not work as advertised. Avoid this product.'), ('The product has a good design, but it is a bit heavy. Not recommended for travel.'), ('Music quality is good but call quality could have been better.') ) SELECT review, azure_ai.extract( 'Concisely extract features from the review, leave n/a or 0 when unsure: ' || review, ARRAY[ 'battery: string - battery life of the product', 'design: string - design features of the product', 'sound: string - sound quality (e.g., music, call, noise cancellation) of the product', 'sentiment: number - sentiment score of the review; 1 (lowest) to 5 (highest)' ] ) AS data FROM reviews; This query yields the following result set (again, subject to variability of LLMs’ responses): review data The product has a great battery life. {"sound": "n/a", "design": "n/a", "battery": "great battery life", "sentiment": 5} Noise cancellation does not work as advertised. Avoid this product. {"sound": "Noise cancellation does not work as advertised.", "design": "n/a", "battery": "n/a", "sentiment": 1} The product has a good design, but it is a bit heavy. Not recommended for travel. {"sound": "n/a", "design": "good design, but a bit heavy", "battery": "n/a", "sentiment": 3} Music quality is good but call quality could have been better. {"sound": "Music quality is good, call quality could have been better.", "design": "n/a", "battery": "n/a", "sentiment": 3} Intelligent Text Generation azure_ai.generate generates freeform text using an LLM based on the input and system prompts. It can be used to summarize, translate, rewrite, or respond to content, providing users with the ability to create coherent and contextually appropriate text effortlessly. Additionally, its capabilities extend beyond these functions, as users can craft different prompts and provide a JSON schema to instruct the LLM to generate structured information for their specific use cases. Revisiting the previous review processing task, we can get a one-sentence summary of the reviews by employing the freeform version of this operator: WITH reviews(review) AS ( VALUES ('The product has a great battery life.'), ('Noise cancellation does not work as advertised. Avoid this product.'), ('The product has a good design, but it is a bit heavy. Not recommended for travel.'), ('Music quality is good but call quality could have been better.') ) SELECT azure_ai.generate( string_agg('Review: ' || review, '\n'), system_prompt => 'Summarize the reviews in a single sentence concisely.' ) AS summary FROM reviews; We get the following response from the LLM: summary The product has good battery life, design, and music quality, but falls short on noise cancellation, call quality, and portability. Document Reranking azure_ai.rank re-ranks a list of documents based on their relevance to a query. It empowers businesses to substantially improve the accuracy and relevance of information retrieved in search and retrieval-augmented generation (RAG) systems. In addition to supporting LLMs for reranking, the operator also accommodates cross-encoder models specifically designed for this function. To understand this operator’s power, let us revisit the previous review processing task and ask ourselves the question: “Which review best answers the customer query: ‘clear calling capability that blocks out background noise’?” Here is an example SQL query with azure_ai.rank: WITH reviews(id, review) AS ( VALUES (1, 'The product has a great battery life.'), (2, 'Noise cancellation does not work as advertised. Avoid this product.'), (3, 'The product has a good design, but it is a bit heavy. Not recommended for travel.'), (4, 'Music quality is good but call quality could have been better.') ) SELECT rank, id, review FROM azure_ai.rank( 'clear calling capability that blocks out background noise', ARRAY(SELECT review FROM reviews ORDER BY id ASC), ARRAY(SELECT id FROM reviews ORDER BY id ASC) ) rr LEFT JOIN reviews r USING (id) ORDER BY rank ASC; The above query yields the following reranked reviews (lower rank means higher relevance): rank id review 1 2 Noise cancellation does not work as advertised. Avoid this product. 2 4 Music quality is good but call quality could have been better. 3 3 The product has a good design, but it is a bit heavy. Not recommended for travel. 4 1 The product has a great battery life. Let us explore how these operators work in a real-world scenario. Use Case: Processing Product Descriptions and Reviews Imagine you're managing a platform that collects thousands of user reviews for various products daily. With such a vast amount of freeform, user-generated content, efficiently analyzing and surfacing the most meaningful insights become crucial. By leveraging automation and AI-driven techniques, you can streamline the review processing workflow to improve searchability, relevance, and moderation efficiency. For this use case, we will work on the Agentic Shop dataset, which contains product descriptions and the corresponding user reviews for three categories of products: headphones, smartwatches and tablets. The product table has IDs, names, categories and freeform text descriptions of the products, as well as the descriptions’ embeddings. Similarly, the review table has review IDs, product IDs and freeform review text, as well as the reviews’ embeddings. Both tables have indexes on the embedding columns to support vector (similarity) search. Step by step, we will perform the following operations: Do semantic similarity search on product descriptions and reviews, Improve the semantic search using reranking, Filter out the false positives using azure_ai.is_true, Extract high-level data to build a graph on top of products and reviews, and, Answer complex questions using the graph and semantic operators. With the help of semantic operators, you can do all of this in SQL! Semantic Similarity Search With the tables, embeddings and indexes at hand, let us try to list the Top 10 headphones with “good clear calling capability and good reviews about battery life.” By using semantic similarity search alone, you can try listing the Top 10 related products first, and then get the Top 10 semantically most relevant reviews for them: WITH potential_headphones AS ( SELECT id AS product_id, name, description FROM products WHERE category = 'headphones' ORDER BY description_emb <=> azure_openai.create_embeddings( 'text-embedding-3-small', 'good clear calling' )::vector ASC LIMIT 10 ) SELECT p.product_id, r.id AS review_id, p.name AS product_name, p.description, r.review_text FROM potential_headphones p LEFT JOIN reviews r USING (product_id) ORDER BY review_text_emb <=> azure_openai.create_embeddings( 'text-embedding-3-small', 'good battery life' )::vector ASC LIMIT 10; The query yields the following result set (kept to the Top 5 results and redacted for brevity): product_id review_id product_name description review_text 11 247 [redacted] [redacted] [redacted] 11 244 [redacted] [redacted] [redacted] 15 339 [redacted] [redacted] [redacted] 10 220 [redacted] [redacted] [redacted] 5 114 [redacted] [redacted] [redacted] When we inspect the result set carefully, we notice, for instance, that even though the product description for product_id=15 contains product features around “superior noise cancellation” and “high-quality sound performance,” it does not specifically mention “good calling capability.” This is of course expected due to the nature of semantic similarity search and our choice of first retrieving the top products and then fetching the topmost relevant reviews. Reranking the Results To alleviate the previous issue and improve the semantic similarity search results, we can benefit from azure_ai.rank. Let us try to improve the previous query by fetching the Top 50 relevant reviews first, and then reranking the results by using a cross-encoder: WITH potential_headphones AS ( SELECT id AS product_id, name, description FROM product WHERE category = 'headphones' ORDER BY description_emb <=> azure_openai.create_embeddings( 'text-embedding-3-small', 'good clear calling' )::vector ASC LIMIT 10 ), potential_headphones_reviews AS ( SELECT p.product_id, r.id AS review_id, p.name AS product_name, p.description, r.review_text, ROW_NUMBER() OVER () AS row_id FROM potential_headphones p LEFT JOIN review r USING (product_id) ORDER BY review_text_emb <=> azure_openai.create_embeddings( 'text-embedding-3-small', 'good battery life' )::vector ASC LIMIT 50 ), reranked_results AS ( SELECT id AS row_id, rank FROM azure_ai.rank( 'good clear calling capability and good battery life', ARRAY( SELECT 'Product Description: ' || description || '. Review: ' || review_text FROM potential_headphones_reviews ORDER BY row_id ASC ), ARRAY( SELECT row_id FROM potential_headphones_reviews ORDER BY row_id ASC ) ) ) SELECT phr.product_id, phr.review_id, phr.product_name, phr.description, phr.review_text FROM potential_headphones_reviews phr LEFT JOIN reranked_results rr USING (row_id) ORDER BY rr.rank ASC LIMIT 10; Now, this improved query yields the following result set: product_id review_id product_name description review_text 14 316 [redacted] [redacted] [redacted] 14 319 [redacted] [redacted] [redacted] 8 184 [redacted] [redacted] [redacted] 19 440 [redacted] [redacted] [redacted] 8 176 [redacted] [redacted] [redacted] As you can see, we have received different, and semantically more relevant, results than before. Specifically, we do not see product_id=15 in the Top 5 reranked results, anymore. This time, however, when we inspect the results carefully, we notice, for example, that the review text for review_id=319 is as follows: I enjoy these headphones, but they aren't without their flaws. The audio clarity is splendid, and I do find them quite comfortable, even after extended use. However, the battery life is not as long as advertised; I often find myself charging them more frequently than I would like. Still, they do provide solid sound for an overall great listening experience, especially for the price. The issue here is that the result set now has reviews that have better (semantic) relevance scores than before when compared to the provided natural language query, but the model is not able to differentiate negative reviews from the positive ones. Filtering out the False Positives To get rid of the false positives that appear in the reranked reviews, we leverage azure_ai.is_true as a filtering/where condition on the reranked results: WITH potential_headphones AS ( -- as before ), potential_headphones_reviews AS ( -- as before ), reranked_results AS ( -- as before ) SELECT phr.product_id, phr.review_id, phr.product_name, phr.description, phr.review_text FROM potential_headphones_reviews phr LEFT JOIN reranked_results rr USING (row_id) WHERE azure_ai.is_true( 'The following product description and review satisfies the following customer ask: ' || 'good clear calling capability and good battery life' || '. Product Description: ' || phr.description || '. Review: ' || phr.review_text ) ORDER BY rr.rank ASC LIMIT 10; Finally, we get the following result set, which excludes the false positives from the reranked results of the previous query: product_id review_id product_name description review_text 14 316 [redacted] [redacted] [redacted] 8 184 [redacted] [redacted] [redacted] 19 440 [redacted] [redacted] [redacted] 19 435 [redacted] [redacted] [redacted] 8 186 [redacted] [redacted] [redacted] Extracting High-Level Data to Build a Graph Extracting high-level data from product descriptions and reviews provides structured insights into customer experiences. By using azure_ai.extract, you can identify key aspects of the product being discussed and the overall sentiment of the review by configuring the description of the data you want to extract. After obtaining the required high-level data in a structured format, you can build even more complex relationships on your datasets by using a graph representation of data. To this end, you can leverage the Apache AGE extension in Azure Database for PostgreSQL. For our use case, let us assume that the following key aspects of the products are important for our customers: design (e.g., dimensions, shape, weight, material), sound quality (e.g., sound/music quality, bass, treble), and water resistance. As such, we need to extract these high-level data from product descriptions and store them as features of the product in, say, JSONB format. Similarly, we can extract the same set of features, if mentioned at all, from the reviews, together with the sentiments as inferred from the review text by the LLMs. By using azure_ai.extract, we can achieve this in two steps… one for the product descriptions: SELECT id, azure_ai.extract( description, ARRAY[ 'design: string - comma separated list of design features of the product (e.g., dimensions, shape, weight, material); empty if not mentioned', 'soundQuality: string - comma separated list of sound quality features of the product (e.g., sound/music quality, bass, treble); empty if not mentioned', 'waterResistance: string - comma separated list of water resistance features of the product; empty if not mentioned' ] ) AS features FROM product; and another one for the reviews: SELECT id, azure_ai.extract( review_text, ARRAY[ 'design: string - comma separated list of design features of the product (e.g., dimensions, shape, weight, material); empty if not mentioned', 'designSentiment: string - overall sentiment of the design features of the product, e.g., positive, negative, neutral, or n/a if not applicable', 'soundQuality: string - comma separated list of sound quality features of the product (e.g., sound/music quality, bass, treble); empty if not mentioned', 'soundQualitySentiment: string - overall sentiment of the sound quality features of the product, e.g., positive, negative, neutral, or n/a if not applicable', 'waterResistance: string - comma separated list of water resistance features of the product; empty if not mentioned', 'waterResistanceSentiment: string - overall sentiment of the water resistance features of the product, e.g., positive, negative, neutral, or n/a if not applicable', 'overallSentiment: string - overall sentiment of the review, e.g., positive, negative, neutral, or n/a if not applicable' ] ) AS features FROM review; The queries above help extract the high-level data we need to populate a knowledge graph with the following relationship diagram: Relationship diagram among products, their features, and the reviews that mention these features. Products have a "has-a" relationship with both features and reviews, and reviews have a "mentions-a" relationship with the features. Here, product nodes can contain some convenient data in addition to the product IDs (e.g., name, category), feature nodes contain the feature names (e.g., design), and review nodes contain the review IDs. Similarly, the directed edges from product nodes to feature nodes contain feature data (e.g., extracted data from product descriptions for the design feature), edges from product nodes to review nodes contain the overall sentiment of the review, and edges from review nodes to feature nodes contain both the feature data (e.g., extracted data from product reviews for the design feature) and the corresponding feature sentiment (e.g., extracted sentiment from product reviews specifically for the design feature, if at all). The graph representation depicted in Figure 1 can be implemented easily with the help of the Apache AGE extension in Azure Database for PostgreSQL. Answering Complex Questions Finally, after we build the graph representation of the relationship among products, reviews and the features of interest, we are ready to answer complex questions in SQL. Let us imagine that we would like to answer the following challenging question: “What are the Top 3 lightweight and waterproof headphones with many good reviews about music quality?” To answer this question, we can start with the following graph query: SELECT * FROM cypher( 'products_reviews_features_graph', $$ MATCH (p:product {category: 'headphones'})-[:HAS_FEATURE]->(:feature {name: 'design'}) WITH p MATCH (p)-[:HAS_FEATURE]->(:feature {name: 'waterResistance'}) WITH p MATCH (p)-[:HAS_REVIEW]->(r:review)-[:MENTIONS_FEATURE {sentiment: 'positive'}]->(:feature {name: 'soundQuality'}) RETURN p.id, r.id $$ ) AS result(product_id integer, review_id integer); This query lists all the ID pairs of products that are in the headphones category and have design and water resistance features specified, and those reviews that have positive sentiment about the specific product’s sound quality features. However, our featurization process in the previous section implies that the design feature of a product specification contains not only the weight of the product but also other design elements such as dimensions, shape and material. As such, we need to join the graph query with a common table expression that filters out those product descriptions that do not explicitly specify the weight of the product. Finally, let us also summarize the design and water resistance features of the product, as well as all the review elements considering the sound quality of the product. The final query looks like this: WITH graph_query AS ( SELECT * FROM cypher( 'shop_graph', $$ MATCH (p:product {category: 'headphones'})-[:HAS_FEATURE]->(:feature {name: 'design'}) WITH p MATCH (p)-[:HAS_FEATURE]->(:feature {name: 'waterResistance'}) WITH p MATCH (p)-[:HAS_REVIEW]->(r:review)-[:MENTIONS_FEATURE {sentiment: 'positive'}]->(:feature {name: 'soundQuality'}) RETURN p.id, r.id $$ ) AS result(product_id integer, review_id integer) ), detailed_data AS ( SELECT g.product_id, g.review_id, p.name, p.features->>'design' AS design, p.features->>'waterResistance' AS water_resistance, r.features->>'soundQuality' AS sound_quality FROM graph_query g JOIN product p ON g.product_id = p.id JOIN review r ON g.review_id = r.id ), filtered_products AS ( SELECT DISTINCT ON (product_id) product_id FROM detailed_data WHERE azure_ai.is_true('Product spec specifically mentions lightweight design: ' || design) ) SELECT f.product_id AS id, d.name, azure_ai.generate( 'Design features: ' || d.design || '. Water resistance features: ' || d.water_resistance, system_prompt => 'Summarize the features of the product concisely.' ) AS product_summary, azure_ai.generate( string_agg('Review: ' || d.sound_quality, ','), system_prompt => 'Summarize the reviews concisely.' ) AS review_summary, COUNT(d.review_id) AS review_count FROM filtered_products f JOIN detailed_data d ON f.product_id = d.product_id GROUP BY f.product_id, d.name, d.design, d.water_resistance ORDER BY review_count DESC LIMIT 3; We get the following result set: id name product_summary review_summary review_count 6 [redacted] [redacted] [redacted] 16 8 [redacted] [redacted] [redacted] 13 7 [redacted] [redacted] [redacted] 12 There we have it – Top 3 products with the desired features and the most positive reviews about sound quality! Let us examine the top match for example: ID. 6 Name. Nexvo Wireless Earbuds – 48H Playback, LED Display, Earhooks, Mic Product Summary. The product features a sport-focused design with three pairs of ear caps and flexible earhooks for a secure fit during outdoor activities. It includes a charging case with a dual LED display and offers water resistance suitable for use in rainy conditions. Review Summary. The reviews consistently praise the product's sound quality, highlighting it as crystal clear, rich, full, and immersive with good bass response and outstanding clarity. Review Count. 16 Final Thoughts Semantic operators in Azure PostgreSQL are more than just wrappers around LLMs—they represent a paradigm shift in how we think about AI in data systems. By embedding generative AI directly into SQL, we are not only simplifying the integration process but also empowering developers to build smarter, faster, and more maintainable applications. This innovative approach allows for seamless interaction with AI capabilities, enabling developers to leverage the full potential of LLMs to solve complex problems, enhance data analysis, enrich available data with high-level features, and create more intuitive user experiences. The flexibility and power of these operators open new possibilities for innovation, making it easier to develop cutting-edge applications that can adapt to evolving needs and challenges.Introducing support for Graph data in Azure Database for PostgreSQL (Preview)
We are excited to announce the addition of Apache AGE extension in Azure Database for PostgreSQL, a significant advancement that provides graph processing capabilities within the PostgreSQL ecosystem. This new extension brings a powerful toolset for developers looking to leverage a graph database with the robust enterprise features of Azure Database for PostgreSQL.New Generative AI Features in Azure Database for PostgreSQL
by: Maxim Lukiyanov, PhD, Principal PM Manager This week at Microsoft Build conference, we're excited to unveil a suite of new Generative AI capabilities in Azure Database for PostgreSQL flexible server. These features unlock a new class of applications powered by an intelligent database layer, expanding the horizons of what application developers can achieve. In this post, we’ll give you a brief overview of these announcements. Data is the fuel of AI. Looking back, the intelligence of Large Language Models (LLMs) can be reframed as intelligence that emerged from the vast data they were trained on. The LLMs just happened to be this technological leap necessary to extract that knowledge, but the knowledge itself was hidden in the data all along. In modern AI applications, the Retrieval-Augmented Generation (RAG) pattern applies this same principle to real-time data. RAG extracts relevant facts from data on the fly to augment an LLM’s knowledge. At Microsoft, we believe this principle will continue to transform technology. Every bit of data will be squeezed dry of every bit of knowledge it holds. And there’s no better place to find the most critical and up-to-date data than in databases. Today, we're excited to announce the next steps on our journey to make databases smarter – so they can help you capture the full potential of your data. Fast and accurate vector search with DiskANN First, we’re announcing the General Availability of DiskANN vector indexing in Azure Database for PostgreSQL. Vector search is at the heart of the RAG pattern, and it continues to be a cornerstone technology for the new generation of AI Agents - giving it contextual awareness and access to fresh knowledge hidden in data. DiskANN brings years of state-of-the-art innovation in vector indexing from Microsoft Research directly to our customers. This release introduces supports for vectors up to 16,000 dimensions — far surpassing the 2,000-dimension limit of the standard pgvector extension in PostgreSQL. This enables the development of highly accurate applications using high-dimensional embeddings. We’ve also accelerated index creation with enhanced memory management, parallel index building, and other optimizations – delivering up to 3x faster index builds while reducing disk I/O. Additionally, we're excited to announce the Public Preview of Product Quantization – a cutting-edge vector compression technique that delivers exceptional compression while maintaining high accuracy. DiskANN Product Quantization enables efficient storage of large vector volumes, making it ideal for production workloads where both performance and cost matter. With Product Quantization enabled, DiskANN offers up to 10x faster performance and 4x cost savings compared to pgvector HNSW. You can learn more about DiskANN in a dedicated blog post. Semantic operators in the database Next, we’re announcing the Public Preview of Semantic Operators in Azure Database for PostgreSQL – bringing a new intelligence layer to relational algebra, integrated directly into the SQL query engine. While vector search is foundational to the Generative AI (GenAI) apps and agents, it only scratches the surface of what’s possible. Semantic relationships between elements of the enterprise data are not visible to the vector search. This knowledge exists within the data but is lost at the lowest level of the stack – vector search – and this loss propagates upward, limiting the agent’s ability to reason about the data. This is where new Semantic Operators come in. Semantic Operators leverage LLMs to add semantic understanding of operational data. Today, we’re introducing four operators: generate() – a versatile generation operator capable of ChatGPT-style responses. is_true() – a semantic filtering operator that evaluates filter conditions and joins in natural language. extract() – a knowledge extraction operator that extracts hidden semantic relationships and other knowledge from your data, bringing a new level of intelligence to your GenAI apps and agents. rank() - a highly accurate semantic ranking operator, offering two types of state-of-the-art re-ranking models: Cohere Rank-v3.5 or OpenAI gpt-4.1 models from Azure AI Foundry Model Catalog. You can learn more about Semantic Operators in a dedicated blog post. Graph database and GraphRAG knowledge graph support Finally, we’re announcing the General Availability of GraphRAG support and the General Availability of the Apache AGE extension in Azure Database for PostgreSQL. Apache AGE extension on Azure Database for PostgreSQL offers a cost-effective, managed graph database service powered by PostgreSQL engine – and serves as the foundation for building GraphRAG applications. The semantic relationships in the data once extracted can be stored in various ways within the database. While relational tables with referential integrity can represent some relationships, this approach is suboptimal for knowledge graphs. Semantic relationships are dynamic; many aren’t known ahead of time and can’t be effectively modeled by a fixed schema. Graph databases provide a much more flexible structure, enabling knowledge graphs to be expressed naturally. Apache AGE supports openCypher, the emerging standard for querying graph data. OpenCypher offers an expressive, intuitive language well-suited for knowledge graph queries. We believe that combining semantic operators with graph support in Azure Database for PostgreSQL creates a compelling data platform for the next generation of AI agents — capable of effectively extracting, storing, and retrieving semantic relationships in your data. You can learn more about graph support in a separate blog post. Resources to help you get started We’re also happy to announce availability of the new resources and tools for application developers: Model Context Protocol (MCP) is an emerging open protocol designed to integrate AI models with external data sources and services. We have integrated MCP server for Azure Database for PostgreSQL into the Azure MCP Server, making it easy to connect your agentic apps not only to Azure Database for PostgreSQL, but to other Azure services as well through one unified interface. To learn more, refer to this blog post. New Solution Accelerator which showcases all of the capabilities we have announced today working together in one solution solving real world problems of ecommerce retail reimagined for agentic era. New PostgreSQL extension for VSCode for application developers and database administrators alike, bringing new generation of query editing and Copilot experiences to the world of PostgreSQL. And read about New enterprise features making Azure Database for PostgreSQL faster and more secure in the accompanying post. Begin your journey Generative AI innovation continues its advancement, bringing new opportunities every month. We’re excited for what is to come and look forward to sharing this journey of discovery with our customers. With today’s announcements - DiskANN vector indexing, Semantic Operators, and GraphRAG - Azure Database for PostgreSQL is ready to help you explore new boundaries of what’s possible. We invite you to begin your Generative AI journey today by exploring our new Solution Accelerator.1.2KViews3likes0CommentsHacking the migration service in Azure Database for PostgreSQL
Introduction Azure Database for PostgreSQL offers a powerful and scalable database solution for developers and enterprises. Whether you're looking to change your networking configuration from VNET integration to Private Endpoint or downsizing storage, the Migration Service in Azure Database for PostgreSQL provides options for smooth migrations. Historically, migrating between flexible servers was manual—using pg_dump/pg_restore or third-party tools—which can be time-consuming and error-prone. This guide introduces an alternative workaround: utilizing the Migration Service to facilitate flex-to-flex migrations and outlines known networking scenarios where this workaround has been used. Note: While flex to flex migration isn't officially supported by the Azure Database for PostgreSQL product team, it has been observed to work successfully for many customers. Why migrate from Flex to Flex? There are several reasons common reason to migrate between Azure Database for PostgreSQL Flexible Servers: Change from VNET integration to Private Endpoint networking Downsizing the storage capacity of a server Consolidating databases to optimize costs Regardless of your specific reason, utilizing the Migration Service in supported scenarios simplifies the process and helps maintain data integrity. How the migration workaround works? In brief, select "On-Premise Server" as the migration source in the Migration Service, then provide the details of your existing flexible server. This approach leverages the standard migration tool, sidestepping some limitations. Restrictions & Limitations Keep in mind, the same restrictions outlined by Microsoft in their official documentation apply here: Known issues with the Migration Service Prerequisites for migration Migration of extensions Known Networking Scenarios (Unofficially Supported) PostgreSQL Flex source PostgreSQL Flex target Status Public Access Public Access Works Private Endpoint Private Endpoint Works Vnet Integrated Private Endpoint Works Vnet Integrated Vnet Integrated Not tested How to perform the migration? Step 1: Launch the Migration Service: Step 2: Enter Source Details Select “On-Premise” as the source server type Choose the migration option Choose the migration mode Include the Azure Database for PostgreSQL source server details - this is where we input the flexible server credentials Step 3: Connect to the Target Server Choose your target flexible server and complete the migration wizard Remember to review extension compatibility if you use extensions. Conclusion Migrating between Azure PostgreSQL Flexible Servers using this workaround is a practical option when supported scenarios don't fit your environment. Follow the outlined steps for a smooth, reliable migration with minimal disruption. With careful planning and validation, you’ll maintain data integrity and build a scalable database infrastructure on Azure. Happy migrating! Additional Tips & Resources If you encounter issues related to extensions or other limitations, consider exploring pgcopydbfor advanced migration options. Feedback: We welcome your thoughts and suggestions - as well as net new supported scenarios.April 2025 Recap: Azure Database for PostgreSQL Flexible Server
Hello Azure Community, April has brought powerful capabilities to Azure Database for PostgreSQL flexible server, On-Demand backups are now Generally Available, a new Terraform version for our latest REST API has been released, the Public Preview of the MCP Server is now live, and there are also a few other updates that we are excited to share in this blog. Stay tuned as we dive into the details of these new features and how they can benefit you! Feature Highlights General Availability of On-Demand Backups Public Preview of Model Context Protocol (MCP) Server Additional Tuning Parameters in PG 17 Terraform resource released for latest REST API version General Availability of pg_cron extension in PG 17 General Availability of On-Demand Backups We are excited to announce General Availability of On-Demand backups for Azure Database for PostgreSQL flexible server. With this it becomes easier to streamline the process of backup management, including automated, scheduled storage volume snapshots encompassing the entire database instance and all associated transaction logs. On-demand backups provide you with the flexibility to initiate backups at any time, supplementing the existing scheduled backups. This capability is useful for scenarios such as application upgrades, schema modifications, or major version upgrades. For instance, before making schema changes, you can take a database backup, in an unlikely case, if you run into any issues, you can quickly restore (PITR) database back to a point before the schema changes were initiated. Similarly, during major version upgrades, on-demand backups provide a safety net, allowing you to revert to a previous state if anything goes wrong. In the absence of on-demand backup, the PITR could take much longer as it would need to take the last snapshot which could be 24 hours earlier and then replay the WAL. Azure Database for PostgreSQL flexible server already does on-demand backup behind the scenes for you and then deletes it when the upgrade is successful. Key Benefits: Immediate Backup Creation: Trigger full backups instantly. Cost Control: Delete on-demand backups when no longer needed. Improved Safety: Safeguard data before major changes or refreshes. Easy Access: Use via Azure Portal, CLI, ARM templates, or REST APIs. For more details and on how to get started, check out this announcement blog post. Create your first on-demand backup using the Azure portal or Azure CLI. Public Preview of Model Context Protocol (MCP) Server Model Context Protocol (MCP) is a new and emerging open protocol designed to integrate AI models with the environments where your data and tools reside in a scalable, standardized, and secure manner. We are excited to introduce the Public Preview of MCP Server for Azure Database for PostgreSQL flexible server which enables your AI applications and models to talk to your data hosted in Azure Database for PostgreSQL flexible servers according to the MCP standard. The MCP Server exposes a suite of tools including listing databases, tables, and schema information, reading and writing data, creating and dropping tables, listing Azure Database for PostgreSQL configurations, retrieving server parameter values, and more. You can either build custom AI apps and agents with MCP clients to invoke these capabilities or use AI tools like Claude Desktop and GitHub Copilot in Visual Studio Code to interact with your Azure PostgreSQL data simply by asking questions in plain English. For more details and demos on how to get started, check out this announcement blog post. Additional Tuning Parameters in PG17 We have now provided an expanded set of configuration parameters in Azure Database for PostgreSQL flexible server (V17) that allows you to modify and have greater control to optimize your database performance for unique workloads. You can now tune internal buffer settings like commit timestamp, multixact member and offset, notify, serializable, subtransaction, and transaction buffers, allowing you to better manage memory and concurrency in high-throughput environments. Additionally, you can also configure parallel append, plan cache mode, and event triggers that opens powerful optimization and automation opportunities for analytical workloads and custom logic execution. This gives you more control for memory intensive and high-concurrency applications, increased control over execution plans and allowing parallel execution of queries. To get started, all newly modifiable parameters are available now through the Azure portal, Azure CLI, and ARM templates, just like any other server configuration setting. To learn more, visit our Server Parameter Documentation. Terraform resource released for latest REST API version A new version of the Terraform resource for Azure Databases for PostgreSQL flexible server is now available, this brings several key improvements including the ability to easily revive dropped databases with geo-redundancy and customer-managed keys (Geo + CMK - Revive Dropped), seamless switchover of read replicas to a new site (Read Replicas - Switchover), improved connectivity through virtual endpoints for read replicas, and using on-demand backups for your servers. To get started with Terraform support, please follow this link: Deploy Azure Database for PostgreSQL flexible server with Terraform General Availability of pg_cron extension in PG 17 We’re excited to announce that the pg_cron extension is now supported in Azure Database for PostgreSQL flexible server major versions including PostgreSQL 17. This extension enables simple, time-based job scheduling directly within your database, making maintenance and automation tasks easier than ever. You can get started today by enabling the extension through the Azure portal or CLI. To learn more, please refer Azure Database for PostgreSQL flexible server list of extensions. Azure Postgres Learning Bytes 🎓 Setting up alerts for Azure Database PostgreSQL flexible server using Terraform Monitoring metrics and setting up alerts for your Azure Database for PostgreSQL flexible server instance is crucial for maintaining optimal performance and troubleshooting workload issues. By configuring alerts, you can track key metrics like CPU usage and storage etc. and receive notifications by creating an action group for your alert metrics. This guide will walk you through the process of setting up alerts using Terraform. First, create an instance of Azure Database for PostgreSQL flexible server (if not already created) Next, create a Terraform File and add these resources 'azurerm_monitor_action_group', 'azurerm_monitor_metric_alert' as shown below. resource "azurerm_monitor_action_group" "example" { name = "<action-group-name>" resource_group_name = "<rg-name>" short_name = "<short-name>" email_receiver { name = "sendalerts" email_address = "<youremail>" use_common_alert_schema = true } } resource "azurerm_monitor_metric_alert" "example" { name = "<alert-name>" resource_group_name = "<rg-name>" scopes = [data.azurerm_postgresql_flexible_server.demo.id] description = "Alert when CPU usage is high" severity = 3 frequency = "PT5M" window_size = "PT5M" enabled = true criteria { metric_namespace = "Microsoft.DBforPostgreSQL/flexibleServers" metric_name = "cpu_percent" aggregation = "Average" operator = "GreaterThan" threshold = 80 } action { action_group_id = azurerm_monitor_action_group.example.id } } 3. Run the terraform initialize, plan and apply commands to create an action group and attach a metric to the Azure Database for PostgreSQL flexible server instance. terraform init -upgrade terraform plan -out <file-name> terraform apply <file-name>.tfplan Note: This script assumes you have already created an Azure Database for PostgreSQL flexible server instance. To verify your alert, check the Azure portal under Monitoring -> Alerts -> Alert Rules tab. Conclusion That's a wrap for the April 2025 feature updates! Stay tuned for our Build announcements, as we have a lot of exciting updates and enhancements for Azure Database for PostgreSQL flexible server coming up this month. We’ve also published our Yearly Recap Blog, highlighting many improvements and announcements we’ve delivered over the past year. Take a look at our yearly recap blog here: What's new with Postgres at Microsoft, 2025 edition We are always dedicated to improving our service with new array of features, if you have any feedback or suggestions we would love to hear from you. 📢 Share your thoughts here: aka.ms/pgfeedback Thanks for being part of our growing Azure Postgres community.