Azure Database for PostgreSQL - Flexible Server
7 TopicsInnovating with PostgreSQL @Build
At this year's Microsoft Build, we're excited to share the latest updates and innovations in Microsoft Azure Database for PostgreSQL. Whether you're building AI powered apps and agents or just looking to uplevel your PostgreSQL experience, we've got sessions packed with insights and tools tailored for developers and technical leaders. As a fully managed, AI-ready open source relational database that offers 58% cost savings over an on-premises PostgreSQL database – Azure Database for PostgreSQL enhances your security, scalability, and management of enterprise workloads. Check out what’s happening with Postgres at Build — in Seattle and online: 🔍 Breakout Sessions BRK211: Building Advanced Agentic Apps with PostgreSQL on Azure What benefits do agentic architectures bring compared to traditional RAG patterns? Find out how we answer this question by exploring advanced agentic capabilities offered by popular GenAI frameworks (LangChain, LlamaIndex, Semantic Kernel) and how they transform RAG applications built on Azure Database for PostgreSQL. Learn how to further improve agentic apps by integrating advanced RAG techniques, making vector search faster with the DiskANN vector search algorithm, and more accurate with Semantic Ranking and GraphRAG. BRK204: What’s New in Microsoft Databases: Empowering AI-Driven App Dev Explore advanced applications powered by Microsoft databases on-premises, on Azure and in Microsoft Fabric. Uncover innovative approaches to scalability and learn about intelligent data processing with AI-driven insights and agentic integrations. See new features with engaging demos across all databases including Azure Database for PostgreSQL. 💻 Demo Session DEM564: Boost Your Development Workflows with PostgreSQL Discover how to transform your development workflow on PostgreSQL. Whether you're building AI-powered apps, managing complex datasets, or just looking to streamline your PostgreSQL experience, this demo will show you how to level up your productivity with PostgreSQL on Azure. 🧪 Hands-On Labs LAB360: Build an Agentic App with PostgreSQL, GraphRAG, and Semantic Kernel Sign up to get hands-on experience building an agent-driven, RAG-based application with Azure Database for PostgreSQL and VS Code. Explore coding and architectural concepts while using DiskANN Index for Vector Search, and integrating Apache AGE for PostgreSQL to extend into a GraphRAG pattern leveraging the Semantic Kernel Agent Framework. 💬 Meet the Experts Have questions? Looking to talk open source, AI agentic apps, or migration? Visit us in the Expert Meetup Zone to connect with the Postgres product teams, engineers, and architects. 🔎 How to find it: Log into the Microsoft Build 2025 website or use the official event mobile app to view the venue map and session schedule. 📍 To find the Expert Meetup zone, check out the official MS Build Event Guide for a venue maps and other logistical information. 🐘Get Started with Azure Database for PostgreSQL Want to try it out firsthand? 🚀 Start building 📘 Explore the documentation Let's connect, code, and grow together at Build 2025!Microsoft PostgreSQL OSS engine team: reflecting on 2024
In "Microsoft PostgreSQL OSS engine team: reflecting on 2024", the first part of a two part blog post, you will learn about who the Microsoft PostgreSQL OSS Engine team is, their code contributions to the upstream PostgreSQL & journey during 2024. In the second part, you will get a sneak preview of upcoming work in 2025 (and PG18 cycle) and more.Managing bloat in PostgreSQL using pgstattuple on Azure Database for PostgreSQL flexible server
Bloat refers to the unused space within database objects like tables and indexes, caused by accumulated dead tuples that have not been reclaimed by the storage engine. This often results from frequent updates, deletions, or insertions, leading to inefficient storage and performance issues. Addressing bloat is crucial for maintaining optimal database performance, as it can significantly impact storage efficiency, increase I/O operations, reduce cache efficiency, prolong vacuum times, and slow down index scans. In this blog post, I will walk you through how to use the pgstattuple extension in PostgreSQL to analyze and understand the physical storage of your database objects. By leveraging pgstattuple, you can identify and quantify the unused space within your tables and indexes. We will guide you through analyzing bloat, interpreting the results, and addressing the bloat to optimize your database and improve its performance. I will be using the pg_repack extension as an alternative to VACUUM FULL and REINDEX. pg_repack is a PostgreSQL extension that removes bloat from tables and indexes and reorganizes them more efficiently. pg_repack works by creating a new copy of the target table or index, applying any changes that occurred during the process, and then swapping the old and new versions atomically. pg_repack doesn't require any downtime or exclusive access locks on the processed table or index, except for a brief period at the beginning and at the end of the operation. Performing a full table repack requires free disk space about twice as large as the target table(s) and its indexes. For example, if the total size of the tables and indexes to be reorganized is 1GB, an additional 2GB of disk space is required. For pg_repack to run successfully on a table you must have either a PRIMARY KEY or a UNIQUE index on a NOT NULL column. Let us dive in and see how you can make the most of this powerful tool. To get more details on the bloat on an Azure Database for PostgreSQL flexible server, you can follow these steps: 1. Installing pgstattuple Add pgstattuple to the azure.extensions server parameter. You must install the extension on the database in which you want to analyze the bloat. To do so, connect to the database of interest and run the following command: CREATE EXTENSION pgstattuple; 2. Analyze the table/index Once the extension is installed, you can use the pgstattuple function to gather the detailed statistics to analyze the bloat on test table. The function provides information such as the number of live tuples, dead tuples, and the percentage of bloat, free space within the pages. To showcase the pgstattuple extension features, I have used a 4 Vcore SKU with PG version 16, created a test table with an index, loaded it with 24 Gb data and kept on generating bloat by performing some update/delete commands on the table leading to bloat. Creating a test table using the below script. CREATE TABLE test_table ( id bigserial PRIMARY KEY, column1 text, column2 text, column3 text, column4 text, column5 text, column6 text, column7 text, column8 text, column9 text, column10 text); Loading the table with 24 GB data using the below script. INSERT INTO test_table (column1, column2, column3, column4, column5, column6, column7, column8, column9, column10) SELECT md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text), md5(random()::text) FROM generate_series(1, 22000000); Create an index on this table to depict the usage of pgstatindex function to analyze bloat on an index. CREATE INDEX idx_column1 ON test_table (column1); Run functions pgstattuple on the above table and pgstatindex on the above index without bloat. pgstattuple on unbloated table You can observe the dead_tuple_percent is 0 and free_percent is 4.3 referring to the healthy state of a table. pgstatindex on unbloated index You can observe the avg_leaf_density is 89.07% and leaf_fragementation is 8.1 referring to healthy state of the index. Using the command below, I am generating bloat on the table. UPDATE test_table SET column1 = md5(random()::text), column2 = md5(random()::text), column3 = md5(random()::text), column4 = md5(random()::text), column5 = md5(random()::text), column6 = md5(random()::text) WHERE id % 5 = 0; You can analyze a table using the below query: SELECT * FROM pgstattuple('your_table_name'); What it does: It performs a full scan to gather detailed statistics Performance: On large tables the performance is slower and might take some seconds to minutes depending on the table size due to the full table scans Use case: To diagnose bloat or planning vacuuming strategies like (performing VACUUM/VACUUM FULL) To achieve faster estimates you can use, SELECT * FROM pgstattuple_approx(‘your_table_name’); What it does: Uses sampling to estimate statistics about the table. Accuracy: Results are close but not exact Performance: On large tables the performance is faster as it only considers a sample (subset of pages). Use case: quick insights The function provides the following information about the table. Column Description table_len Table length in bytes (tuple_len+dead_tuple_len+frees_space) and the overhead accounts for the padding (for tuple alignment) and page header (for per page table pointers) tuple_count Number of live tuples tuple_len Length of live tuples in bytes tuple_percent Percentage of live tuples dead_tuple_count Number of dead tuples dead_tuple_len Length of dead tuples in bytes dead_tuple_percent Percentage of dead tuples free_space Total free space in bytes within the pages free_percent Percentage of free space within the pages You should mainly concentrate on the below 3 columns to understand the table bloat and the unused space. dead_tuple_percent column tells us the percentage of dead tuples in the table. It is calculated as below. dead_tuple_percent = dead_tuple_len / table_len * 100 This can be reduced by running VACUUM on the table. However, VACUUM does not reclaim the space. Hence, free_space and free_percent would increase after the VACUUM. free_space and free_percent depict the unused/wasted space within the pages. The space can be reclaimed only by performing a VACUUM FULL on the table. If you see high free_percent it depicts the table needs VACUUM FULL (here instead you could use pg_repack) to reclaim the space. If you observe a dead_tuple_percent anything > 20% you would need to run VACUUM on the table. However, if you observe a free_percent > 50% you would need to run VACUUM FULL on the table. The below 3 snips depict the pgstattuple function run on a bloated table, vacuumed table and the output after pg_repack run on the table. Pgstattuple function run on the bloated table. pgstattuple function run on the table after vacuuming the table. Note: Here you also see a difference in tuple_count as I have performed some delete statements which are not captured in the document. Hence you see a tuple count difference. pgstattuple function on after running pg_repack on the table. Summary of Changes with vacuum and pg_repack run on a bloated table: dead_tuple_count: Reduced to 0 after VACUUM dead_tuple_len: Reduced to 0 after VACUUM dead_tuple_percent: Reduced after VACUUM from 21% to 0 free_space: Increased after VACUUM but significantly reduced after pg_repack free_percent: Increased after VACUUM but drastically reduced after pg_repack Similarly, to analyze an index, you can run: SELECT * FROM pgstatindex('your_index_name'); The function provides the following information about the index. Column Description version B-tree version number tree_level Tree level of the root page index_size Total number of pages in index root_block_no Location of root block internal_pages Number of "internal" (upper-level) pages leaf_pages Number of leaf pages empty_pages Number of empty pages deleted_pages Number of deleted pages avg_leaf_density Average density of leaf pages leaf_fragmentation Leaf page fragmentation Low avg_leaf_density implies underutilized pages. It denotes the percentage of good data in index pages. After VACUUM this column value would further go down as cleaning the dead tuples in the indexes reduces the leaf density further pointing us to increase in unused/wasted space. To reclaim the unused/wasted space REINDEX needs to be performed for the index to be performant. High leaf_fragmentation implies Poor data locality within the pages again a REINDEX would help. If you see avg_leaf_density anything < 20% you would need perform REINDEX. The below 3 snips depict the pgstatindex function run on a bloated table index, vacuumed table index and the output after pg_repack run on the table index. pgstatindex ran on bloated table pgstatindex run on index after Vacuuming the table pgstatindex run on index after pg_repack run Summary of Changes with vacuum and pg_repack run on a bloated index: index_size: Remained high after VACUUM but significantly reduced after pg_repack as the unused/wasted space is reclaimed avg_leaf_density: Reduced significantly after VACUUM from 80%-14% depicting a smaller number of good data on the leaf page and increased to 89% after pg_repack leaf_fragmentation: Remained the same after VACUUM and reduced to 0 after pg_repack depicting no page fragmentation happening Note: The index size is a sum of leaf_pages, empty_pages, deleted_pages and internal_pages. pgstattuple acquires a read lock on the object (table/index). So the results do not reflect an instantaneous snapshot; concurrent updates will affect them. For more information on pgstatginindex and pgstathasindex functions refer to PostgreSQL documentation here. For more insights on pgstattuple with respect to TOAST tables, please refer to the relevant documentation here. 3. Addressing the bloat Once you have identified a bloat, you can address it by taking the following steps. Below are common approaches. VACUUM: Clears dead tuples without reclaiming the space. Pg_repack: Performs VACUUM FULL and REINDEX online and efficiently reorganizes the data. Note: Other unused space, like the one left in heap or index pages due to a configured fill factor lower than 100 or because the remaining available space on a page cannot accommodate a row, given its minimum size, is not considered bloat, while it's also unused space. pgstattuple will help you address bloat! This blog post guided you through understanding and managing bloat in PostgreSQL using the pgstattuple extension. By leveraging this tool, you were able to gain detailed insights into the extent of bloat within their tables and indexes. These insights prove valuable in maintaining efficient storage and ensuring optimal server performance.PostgreSQL 17 In-Place Upgrade – Now in Public Preview
By Varun Dhawan, Principal PM. PostgreSQL 17 is now supported via in-place upgrades on Azure Database for PostgreSQL – Flexible Server. Learn how you can upgrade seamlessly from earlier versions without changing your server or migrating data, and take advantage of the latest PostgreSQL version.