Microsoft PostgreSQL OSS engine team: reflecting on 2024

Windows Server · Saturday at 02:02

A robust PostgreSQL development ecosystem is essential for the success of Azure Database for PostgreSQL. Beyond substantial engineering and product initiatives on the managed service side, Microsoft has invested in the PostgreSQL Open Source (OSS) engine team. This team is comprised of code contributors and committers to the upstream PostgreSQL open-source project, aiming to ensure that development is well-funded, healthy, and thriving.

In this first part of a two part blog post, you will learn about who the Microsoft PostgreSQL OSS Engine team is, their code contributions to upstream PostgreSQL & their journey during 2024. In the second part, you will get a sneak preview of upcoming work in 2025 (and PG18 cycle) and more.

Here are quick pointers of what is in store for you:

Meet our team
What does our team do?
The village beyond our team
What are the team's recent contributions?
See you soon

Meet Our Team

The Microsoft PostgreSQL OSS engine team already had an impressive set of team members:

In 2024 awesome upstream code contributors and committers

joined our group making our team even more well-rounded and versatile.

Microsoft PostgreSQL OSS Engine Team

What does our team do?

Our team actively contributes to various PostgreSQL development projects and plays a leading or co-leading role in significant projects. Additionally, we participate in numerous initiatives aimed at enhancing PostgreSQL code quality and improving the development process.

Examples of team's work includes but not limited to:

Modernizing PostgreSQL APIs
Improving the build system
CI/CD enhancements
Handling bug reports, and
Addressing reported performance regressions,
and more.

Regular activities also involve engagement with other developers, design reviews & discussions, code reviews, and testing patches. The team allocates considerable resources to projects that intersect community interests, contributor interests, and user/customer interests.

In addition to significant upstream code work, the team has also made notable contributions to the community by delivering numerous talks, organizing events, and serving on community committees.

The village beyond our team

PostgreSQL development fundamentally relies on teamwork, making close collaboration and partnership central aspects for our team. Every patch that merges upstream undergoes a rigorous review and vetting process from core PostgreSQL developers who are often from different companies, different countries, and different cultures—involving in-depth discussion, review, and testing on and off the pgsql-hackers mailing list. This article outlines contributions from the perspective of our team. However, it is essential to recognize that the support, review, diligence, and collaboration from numerous core PostgreSQL developers beyond Microsoft were critical for the acceptance of patches into upstream PostgreSQL.

What are the team's recent contributions?

The PostgreSQL development cycle lasts for a year with a major version releasing every year. PostgreSQL 17 was released in September 2024. Below are some areas which our team made significant contributions to PG17.

Async IO – read stream

Adding Async IO and Direct IO has been a long running project led by engineers from our team with involvement and participation from the community. You can read about the evolution of this project led by Andres Freund’s in his talk The path to using AIO in postgres. In PG17 the AIO project took a huge step by adding a read stream interface. This work led by Thomas Munro paves way to add AIO implementations (e.g.: io_uring in Linux) without making changes to the users of this interface in the upcoming releases. It can also use read-ahead advice to drive buffered I/O concurrency in a systematic and centralized way, in preparation for later work on asynchronous I/O. In addition to the streaming read interface, some users of this interface such as pg_prewarm (Nazır Bilal Yavuz), sequential scan (Melanie Plageman) and ANALYZE (Nazır Bilal Yavuz) were part of PG17 as well. You can find more details on this work here: Streaming I/O and vectored I/O (PG Conf EU 2024).

IO combining

Until PG17, PostgreSQL would use single 8K reads when reading data from disk. With sequential read using the read stream interface, vectored read is used when possible thereby consuming multiple 8K pages at the same time. This project was primarily led by Thomas Munro with collaboration across our team and community. On Linux, PG uses preadv instead of pread for cases where it can accumulate the reads in sequential fashion.

Below is a screenshot of PG16 sequential scan, with the top part displaying the SQL query being executed and the bottom part showing the strace output that indicates the system calls made by the Postgres process while executing the query. As you can see, the reads are performed as single 8K reads.

PG16: IO not combined, 8K reads

The figure below shows the same on PG17 with IO combining in action. The resultant I/O calls combine multiple 8K reads into one system call.

PG17: IO combining in action

UNION & IS [NOT] NULL query planner improvements

Before PG17 the planner had to append the sub query results at the top level. This would lead to suboptimal planning. Changes in PG17 adjust the UNION planner to instruct the child planner nodes to provide a presorted input. The child node could then choose the most optimal ways (e.g., indexes) to sort resulting in performance improvements. These patches were contributed by David Rowley and you can find more here: Allow planner to use Merge Append to efficiently implement UNION.

Below you can see how for a simple table the PG16 UNION query would use sequential scan, while in the PG17 it would use the index at the child nodes of the UNION query.

[PG16]$ psql -d postgres psql (16.8) Type "help" for help. postgres=# CREATE TABLE numbers (num int); CREATE TABLE postgres=# CREATE UNIQUE INDEX num_idx ON numbers(num); CREATE INDEX postgres=# INSERT INTO numbers(num) SELECT * FROM generate_series(1, 1000000); INSERT 0 1000000 postgres=# EXPLAIN (COSTS OFF) SELECT num FROM numbers UNION SELECT num FROM numbers; QUERY PLAN ------------------------------------------------- Unique -> Sort Sort Key: numbers.num -> Append -> Seq Scan on numbers -> Seq Scan on numbers numbers_1 (6 rows)

psql -d postgres [PG17]$ psql -d postgres psql (17.4) Type "help" for help. postgres=# CREATE TABLE numbers (num int); CREATE TABLE postgres=# CREATE UNIQUE INDEX num_idx ON numbers(num); CREATE INDEX postgres=# INSERT INTO numbers(num) SELECT * FROM generate_series(1, 1000000); INSERT 0 1000000 postgres=# EXPLAIN (COSTS OFF) SELECT num FROM numbers UNION SELECT num FROM numbers; QUERY PLAN ---------------------------------------------------------------- Unique -> Merge Append Sort Key: numbers.num -> Index Only Scan using num_idx on numbers -> Index Only Scan using num_idx on numbers numbers_1 (5 rows)

Another query layer improvement was w.r.t to handling NULL constraints. The previous planner would always produce a plan resulting in evaluation of NULL/IS NOT NULL qualifications, regardless of if the given column had a NOT NULL constraint or not. However, with PG17, the planner now optimizes by considering NOT NULL constraints. This can mean redundant qualifications (e.g., IS NOT NULL on a NOT NULL column) can be ignored and impossible qualifications (e.g., IS NULL on a NOT NULL column) can prevent scans entirely. You can find more details of these changes from the merged patches from David Rowley here: Add better handling of redundant IS [NOT] NULL quals.

VACUUM WAL volume reduction and performance improvements

In PG17, because of the work done by Melanie Plageman, VACUUM pruning and freezing have been combined. This makes VACUUM faster by reducing the time it takes to emit and replay WAL. Further this also results in generating less WAL thereby saving storage space. Here is a screen shot of WAL inspect showing the differences in records generated:

PG16: Two separate WAL records are generated.PG17: Only one WAL record is generated for pruning and freezing

Libpq performance and cancellation

There are changes in PG17 to reduce the memory copies made during the operations such as COPY TO STDOUT and pg_basebackup. This work was spun off from the project to improve physical replication performance and Mustafa Melih Mutlu was behind these contributions. Additionally, changes from Daniel Gustafsson allows asynchronous cancel to avoid blocking cancel calls on the client front in PG17.

Partitioned tables and query planner improvements

The Bitmapset data structure in PostgreSQL is used heavily by the query planner. In PG17, David Rowley committed a change to modify Bitmapset so that trailing zero words are never stored. This allows short-circuiting of various Bitmapset operations. For example, s1 cannot be a subset of s2 if s1 contains more words. This change helped speed up query planning for queries with partitioned tables having a large number of partitions. The following patch from David Rowley made this possible: Remove trailing zero words from Bitmapsets.

Memory performance enhancements

In PG17 a change was introduced to separate out hot and cold paths during the memory allocation, and run the hot path in a way which reduces the need to setup stack frame, thereby leading to optimizations. You can find details of this change from David Rowley here: Refactor AllocSetAlloc(), separating hot and cold paths. Bump memory context adds an optimized memory context implementation to be used in sorting tuples by removing some bookkeeping which is not typically needed for such scenarios. For example, it removes the header which is used for freeing the memory chunks since only reset of the entire context is needed when sort is used. This reduces the memory usage in sort & incremental sort. The patch from David Rowley on this can be found here Introduce a bump memory allocator.

PG upgrade optimization

The checks for datatype usage during upgrade were improved by using a single connection for check which validates data type. Previously the checks for datatype were connecting separately to each of the databases. This change was introduced by Daniel Gustafsson.

Developer tool

As part of the memory plasticity efforts details of which you can be find in the talk here: Enhancing PostgreSQL Plasticity, we had an intern project kicked off. The intern project led to upstream contribution in the form of pg_buffercache_evict tool which has become a very handy tool when operating on buffer pool. Palak Chaturvedi produced the initial version of this patch with guidance from Thomas Munro and then Thomas took it through the finish line. Details of patch can be found here: Add pg_buffercache_evict() function for testing.

See you soon

With this we conclude the first part of the blog which reflects on the journey of the Microsoft PostgreSQL OSS engine team through 2024. The second part will come out soon which will take you through what is in store during 2025, the PG18 cycle and more. See you soon with the second part: “Microsoft PostgreSQL OSS engine team: previewing 2025”.

Max file size: 75 MBMax attachments: 5

View the full article

Sign In