Monitor PostgreSQL Database Performance

2024-06-21

Monitoring

Introduction

PostgreSQL plays a crucial role in countless applications, from powering websites to supporting critical business operations.
However, ensuring the ongoing health and performance of your database requires attention and proactive maintenance.
In this blog, we’ll explore some parameters and their importance along with a few techniques that you can use to check the health of your PostgreSQL database.
We will also write queries to gather relevant metrics to set up a dashboard for visualizing the health of your PostgreSQL database which can provide valuable insights into its performance and status.
This will allow you to easily monitor key metrics and identify any potential issues or areas for optimization.

1. Database connections

Overview

In PostgreSQL, managing connections is crucial for smooth operation.
Understanding the key parameters and monitoring connection states is essential for effective resource allocation and performance optimization.

Parameters

Max Connections: This parameter determines the maximum number of simultaneous connections allowed. Setting it too low may result in client access denial while setting it too high can strain server resources.
Active Connections: Indicates the current number of active connections. A high number may indicate heavy traffic or potential performance issues.
Other States: PostgreSQL connections can exist in various states such as idle, idle in transaction, waiting, disabled, and others. Each state provides insights into the current activity of the database, from idle connections awaiting queries to active transactions.
You can dive deeper into these different states in this blog .

Query

SELECT count(*) AS active_connections 
FROM pg_stat_activity 
WHERE state = 'active';

Why it matters

Resource Allocation: Tracking maximum and active connections helps optimize resource allocation, preventing server strain and ensuring smooth operation.
Performance Insights: Monitoring active connections provides valuable insights into current workload and potential performance issues, enabling timely optimization.
Bottleneck Detection: Understanding connection states helps identify bottlenecks and inefficiencies, facilitating proactive troubleshooting and optimization.
User Experience: Maintaining an optimal number of connections ensures a seamless user experience, avoiding access denials and performance slowdowns.

2. Cache hit ratio

Overview

The cache hit ratio represents the percentage of database requests that are served directly from memory (cache) rather than requiring disk access.
A higher cache hit ratio indicates efficient memory utilization and better performance, as fewer requests need to access data from disk.

Query

SELECT 
  sum(heap_blks_read) as heap_read,
  sum(heap_blks_hit)  as heap_hit,
  sum(heap_blks_hit) / (sum(heap_blks_hit) + sum(heap_blks_read)) as ratio
FROM 
  pg_statio_user_tables;

The query was sourced from this link .
It retrieves statistics on the number of blocks read from disk (heap_blks_read) and the number of blocks retrieved from memory (heap_blks_hit) for user tables.
It then calculates the cache hit ratio by dividing the number of blocks hit by the total number of blocks read and hit.

Why it Matters

Performance Benchmarking: A cache hit ratio close to 1 indicates that most database requests are served from memory, which is ideal for optimal performance.
Diagnostic Tool: Conversely, a lower cache hit ratio suggests that more requests are requiring disk access, potentially indicating memory or caching issues that need to be addressed.
Target Ratio: This blog mentions “For normal operations of Postgres and performance, you’ll want to have your Postgres cache hit ratio of about 99%.”
However, based on my experience, you should target keeping it at least above 70% (Please note that there is no official mention of this number. It’s based on my observations related to the performance of a fairly large production database).

3. Transaction rate

Overview

Monitoring transactions, specifically transactions per second (TPS), is crucial for ensuring your database’s health.
TPS tells you how many database actions are happening each second, giving insights into performance and workload.

Query

To check TPS in PostgreSQL, you can use this query:

SELECT 
    (xact_commit + xact_rollback) / extract(epoch FROM now() - stats_reset) AS tps 
FROM 
    pg_stat_database 
WHERE 
    datname = current_database();

This query calculates TPS by dividing the total number of committed and rolled-back transactions by the time since the last statistics reset.
The query was sourced from this link .

Why It Matters

Optimize Performance: TPS helps identify performance issues and optimize resource usage.
Plan for Growth: By tracking TPS trends, you can anticipate future resource needs and plan accordingly.
Spot Problems: Sudden changes in TPS can signal underlying issues, allowing for quick troubleshooting.
Meet Expectations: Monitoring TPS ensures your database meets performance expectations and SLAs.

4. Disk space used by tables

Overview

Efficient management of disk space is crucial for optimal performance and longevity in PostgreSQL.
Monitoring table disk space usage provides insights into data distribution, identifies storage-intensive tables, and informs capacity planning and performance optimization efforts.

Query

SELECT 
    s.schemaname AS table_schema,
    s.relname AS table_name,
    pg_size_pretty(pg_total_relation_size(s.relid)) AS total,
    pg_size_pretty(pg_relation_size(s.relid)) AS internal,
    pg_size_pretty(pg_total_relation_size(s.relid) - pg_relation_size(s.relid)) AS external,
    u.n_live_tup AS row_count
FROM pg_catalog.pg_statio_user_tables s
JOIN pg_stat_user_tables u ON s.relid = u.relid
ORDER BY pg_total_relation_size(s.relid) DESC;

The query was sourced from this link .
pg_size_pretty(pg_total_relation_size(s.relid)) AS total: This calculates and formats the total size of the table, including indexes and other dependencies such as TOAST tables and associated shared objects.
pg_size_pretty(pg_relation_size(s.relid)) AS internal: This calculates and formats the internal size of the table, excluding indexes but including other dependencies.
pg_size_pretty(pg_total_relation_size(s.relid) — pg_relation_size(s.relid)) AS external: This calculates and formats the external size of the table, which includes indexes and other dependencies.

Why it matters

Identify Storage Hogs: Monitoring table disk space helps pinpoint tables consuming excessive storage, facilitating actions like data archiving or table optimization.
Optimize Performance: Understanding table disk space usage enables query and index optimization, particularly beneficial for large tables.
Plan for Growth: Tracking disk space usage aids in anticipating storage requirements and planning for scalability, ensuring smooth data volume increases.
Cost Optimization: Efficient management of disk space usage leads to cost savings, especially in cloud environments with usage-based pricing.

5. Dead tuples

Dead tuples are rows that have been marked for deletion but not yet reclaimed by the autovacuum process.
These dead tuples occupy space within the table but do not contribute to query results.
Monitoring dead tuples is essential for maintaining database performance and managing storage efficiently.

Query

SELECT 
    relname, 
    n_dead_tup, 
    last_vacuum, 
    last_autovacuum,
    last_analyze,
    last_autoanalyze
FROM pg_stat_all_tables 
WHERE schemaname = 'public' 
ORDER BY n_dead_tup DESC;

The query was sourced from this link .
This query retrieves information about dead tuples, including the table name ( relname ), the number of dead tuples ( n_dead_tup ), and the timestamps of the last manual and automatic vacuum and analyze operations ( last_vacuum, last_autovacuum, last_analyze, last_autoanalyze ).

Why It Matters

Performance Optimization: Monitoring dead tuples helps identify tables with a high number of dead tuples, indicating potential performance issues. Regular vacuuming and analyzing of such tables can reclaim space and improve query performance.
Storage Management: Dead tuples consume space within the table, affecting storage usage. Identifying and removing dead tuples ensures efficient storage utilization and prevents table bloating.
Preventing Transaction Wraparound: Accumulation of dead tuples over time can lead to transaction ID wraparound issues, potentially causing database downtime. Monitoring and managing dead tuples helps mitigate this risk.
You can read about vacuuming in detail in these blogs:
- Standard Vacuum vs full vacuum
- Auto-Vacuuming in PostgreSQL

6. Index usage

Overview

Indexes play a crucial role in enhancing query performance by providing efficient access paths to data.
Monitoring index usage helps identify tables where sequential scans are prevalent over index scans, indicating potential opportunities for index optimization.

Query

SELECT 
    relname,
    seq_scan - idx_scan AS too_much_seq,
    CASE WHEN seq_scan - coalesce(idx_scan, 0) > 0 THEN 'Missing Index?' ELSE 'OK' END 
FROM pg_stat_all_tables 
WHERE schemaname = 'public' AND pg_relation_size(relname::regclass) > 80000 
ORDER BY too_much_seq DESC;

The query was sourced from this link .
This query retrieves information about table access methods, specifically focusing on the difference between sequential scans (seq_scan) and index scans (idx_scan).
It identifies tables where sequential scans are prevalent and suggests whether additional indexes may be needed.

Why It Matters

Query Performance: Monitoring index usage helps identify tables where sequential scans dominate, potentially indicating inefficient query execution plans. Creating or optimizing indexes for such tables can significantly improve query performance.
Resource Optimization: Indexes provide faster access paths to data compared to sequential scans, resulting in reduced resource consumption and improved overall database performance. Monitoring index usage ensures efficient resource utilization.
Identifying Missing Indexes: Tables with a high number of sequential scans relative to index scans may benefit from additional indexes. Monitoring index usage helps identify such tables and prompts further investigation into potential missing indexes.

7. Long running queries

Overview

Long-running queries can have a significant impact on database performance and user experience.
Monitoring and identifying these queries helps in optimizing database performance, preventing resource contention, and ensuring timely responses to user requests.

Query

SELECT pid, now() - pg_stat_activity.query_start AS duration, query 
FROM pg_stat_activity 
WHERE (now() - pg_stat_activity.query_start) > interval '5 minutes';

This query retrieves information about queries that have been running for more than 5 minutes (interval '5 minutes'). It includes the process ID (pid), the duration of the query (duration), and the query text (query).

Why It Matters

Performance Optimization: Monitoring long-running queries helps identify performance bottlenecks and inefficient query execution plans. Optimizing these queries can significantly improve overall database performance.
Resource Management: Long-running queries consume database resources such as CPU, memory, and I/O, potentially impacting other concurrent queries and user transactions. Identifying and optimizing these queries ensures efficient resource utilization.
User Experience: Timely response to user requests is crucial for a positive user experience. Monitoring long-running queries allows for timely intervention and optimization to ensure optimal response times and user satisfaction.

8. Transaction IDs

Overview

*Transaction IDs - (*XIDs) in PostgreSQL are identifiers assigned to transactions for managing database concurrency and ensuring data consistency.
Monitoring transaction IDs and addressing the transaction wraparound issue is crucial for maintaining database stability and preventing downtime.

Query

SELECT datname, age(datfrozenxid) AS xid_age, datfrozenxid FROM pg_database;

The query was sourced from this link .
This query retrieves information about databases, including the database name (datname), the age of the oldest transaction ID (xid_age), and the frozen transaction ID (datfrozenxid), which is used to determine when transaction wraparound may occur.

Why It Matters

Transaction Wraparound: Transaction wraparound occurs when transaction IDs reach the maximum value and wrap around to the beginning, potentially causing database downtime. Monitoring transaction IDs and addressing the wraparound issue is crucial for preventing this scenario.
Database Stability: Transaction wraparound can lead to database instability and downtime, impacting business operations and user experience. Proactive monitoring and mitigation of transaction wraparound issues ensure database stability and uninterrupted service.
Preventive Measures: Regular monitoring of transaction IDs allows database administrators to take preventive measures such as performing regular vacuuming and ensuring sufficient headroom for transaction IDs to prevent wraparound.