Mail Server Monitoring

Posted on 17 2026

A mail server that is silently failing is worse than one that is loudly broken. Silent failures accumulate: messages deferred for days before bouncing, spam filtering quietly blocking legitimate mail, a blacklisted IP sending every outbound message to the recipient’s spam folder. Monitoring catches these before users notice.

The source material covers Munin graphs for Postfix. This series already has Grafana and InfluxDB running for IoT telemetry. Mail metrics fit naturally into the same stack. This page covers the full monitoring picture: daily log summaries, real-time queue monitoring, Grafana integration, and external deliverability health checks.

Log analysis with pflogsumm

pflogsumm parses Postfix log files and produces a human-readable daily summary: messages delivered, deferred, bounced, rejected; top senders and recipients; delivery times. It is the single most useful tool for understanding what the mail server is doing.

Install it:

sudo apt install -y pflogsumm

Run manually against the current day’s logs:

sudo pflogsumm /var/log/mail.log

Or pipe the output by email for a daily digest. Create a daily cron job:

sudo tee /etc/cron.daily/mail-report << 'EOF'
#!/usr/bin/env bash
# Daily Postfix log summary

YESTERDAY=$(date -d yesterday +%Y-%m-%d)
LOG=/var/log/mail.log

pflogsumm \
    --problems-first \
    --detail=5 \
    --verbose-msg-detail \
    --zero-fill \
    "$LOG" 2>/dev/null | \
    mail -s "Mail server report for $YESTERDAY on $(hostname -s)" root

EOF

sudo chmod 0755 /etc/cron.daily/mail-report

The --problems-first flag puts delivery failures and rejections at the top of the report, which is where attention is needed most. The summary lands in your inbox alongside other server mail each morning.

Useful pflogsumm flags

# Show only problems (no successful delivery stats)
pflogsumm --problems-first --detail=0 /var/log/mail.log

# Analyse a specific date range using zcat for rotated logs
zcat /var/log/mail.log.1.gz | pflogsumm

# Show top rejected senders
pflogsumm --smtpd-stats /var/log/mail.log

Real-time log monitoring

For watching what the mail server is doing right now:

# Follow Postfix logs in real time
sudo journalctl -u postfix -f

# Follow all mail-related logs
sudo tail -f /var/log/mail.log

# Filter for delivery failures only
sudo journalctl -u postfix -f | grep -E "deferred|bounced|reject"

# Watch authentication failures
sudo journalctl -u dovecot -f | grep -i "auth failed\|authentication failure"

Following a specific message

When debugging a delivery problem, trace a specific message by its queue ID:

# Find the queue ID from the log
sudo grep "from=<sender@example.com>" /var/log/mail.log | tail -5

# Trace all log entries for that queue ID
sudo grep "A1B2C3D4E" /var/log/mail.log

Queue monitoring

A healthy mail server has a small, fast-moving queue. A growing queue signals a problem.

Check the current queue:

# Show all queued messages
mailq

# Count queued messages
mailq | grep -c "^[A-F0-9]"

# Show deferred messages only
postqueue -p | grep "^[A-F0-9]" | grep -v "^\*"

# Queue analysis by destination domain
qshape deferred
qshape active
qshape incoming

qshape groups queued messages by domain and age, making it easy to spot whether delays are concentrated on a specific destination (suggesting a problem with that remote server) or spread across all destinations (suggesting a local problem).

Queue management

# Force immediate retry of all deferred messages
sudo postqueue -f

# Delete all deferred messages (use with care)
sudo postsuper -d ALL deferred

# Delete a specific message by queue ID
sudo postsuper -d QUEUE_ID

# Put a specific message on hold
sudo postsuper -h QUEUE_ID

# Release a held message
sudo postsuper -H QUEUE_ID

Grafana integration

The Grafana and InfluxDB stack used for IoT telemetry can be extended to visualise mail metrics. Two approaches work well: parsing Postfix logs with Telegraf, or using the mailstats utility to push metrics directly.

Telegraf log parsing

If Telegraf is installed as part of the monitoring stack:

# Add to /etc/telegraf/telegraf.conf

[[inputs.logparser]]
    files = ["/var/log/mail.log"]
    from_beginning = false

    [inputs.logparser.grok]
        patterns = [
            "%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST} postfix/%{WORD:postfix_process}\\[%{NUMBER:pid}\\]: %{GREEDYDATA:message}",
        ]
        measurement = "postfix_log"

[[inputs.exec]]
    commands = [
        "bash -c 'mailq | grep -c \"^[A-F0-9]\" || echo 0'"
    ]
    name_suffix = "_queue_size"
    data_format = "value"
    data_type = "integer"
    interval = "1m"

Postfix metrics via pflogsumm and Telegraf

A simpler approach: run pflogsumm periodically and push key metrics to InfluxDB:

sudo tee /usr/local/bin/postfix-metrics << 'EOF'
#!/usr/bin/env bash
# Parse pflogsumm output and push metrics to InfluxDB

INFLUXDB_URL="http://10.1.0.17:8086"
INFLUXDB_DB="mail"
HOSTNAME=$(hostname -s)
TIMESTAMP=$(date +%s%N)
LOG=/var/log/mail.log

# Run pflogsumm and extract key metrics
DELIVERED=$(pflogsumm "$LOG" 2>/dev/null | grep "delivered" | awk '{print $1}')
DEFERRED=$(pflogsumm "$LOG" 2>/dev/null | grep "deferred" | awk '{print $1}')
BOUNCED=$(pflogsumm "$LOG" 2>/dev/null | grep "bounced" | awk '{print $1}')
REJECTED=$(pflogsumm "$LOG" 2>/dev/null | grep "rejected" | awk '{print $1}')
QUEUE=$(mailq | grep -c "^[A-F0-9]" 2>/dev/null || echo 0)

# Write to InfluxDB
curl -s -XPOST "${INFLUXDB_URL}/write?db=${INFLUXDB_DB}" \
    --data-binary "postfix,host=${HOSTNAME} delivered=${DELIVERED:-0}i,deferred=${DEFERRED:-0}i,bounced=${BOUNCED:-0}i,rejected=${REJECTED:-0}i,queue=${QUEUE:-0}i ${TIMESTAMP}" \
    > /dev/null
EOF

sudo chmod 0755 /usr/local/bin/postfix-metrics

Add to crontab to run every 15 minutes:

*/15 * * * * /usr/local/bin/postfix-metrics

Grafana dashboard panels

With metrics in InfluxDB, create a Grafana dashboard with panels for:

Delivery rate: messages delivered per hour
Queue size: current queued message count over time
Rejection rate: rejected messages per hour (spike indicates spam attack or misconfiguration)
Deferral rate: deferred messages per hour (spike indicates delivery problems to remote servers)
Bounce rate: bounced messages (persistent spikes indicate list quality or recipient validation issues)

Dovecot monitoring

Monitor IMAP login activity and connection counts:

# Current active IMAP connections
doveadm who

# IMAP authentication failures in the last hour
journalctl -u dovecot --since "1 hour ago" | grep -c "auth failed"

# Dovecot statistics
doveadm stats dump

# Per-user mailbox statistics
doveadm quota get -u you@yourdomain.net

For recurring monitoring, add a brief Dovecot check to the daily mail report:

# Append to /etc/cron.daily/mail-report
echo ""
echo "=== Dovecot ==="
echo "Active connections: $(doveadm who 2>/dev/null | wc -l)"
echo "Auth failures (24h): $(journalctl -u dovecot --since '24 hours ago' 2>/dev/null | grep -c 'auth failed' || echo 0)"

External deliverability monitoring

The internal monitoring covers what the server is doing. External monitoring covers whether the mail reaches its destination and how it is treated.

Blacklist checks

Check whether the server’s WAN IP is on any major mail blacklists. Being blacklisted silently causes outbound mail to be rejected or spam-classified.

Check manually:

# Install mxtoolbox or use the website
curl -s "https://api.mxtoolbox.com/api/v1/lookup/blacklist/your.wan.ip.address" | jq

Or use the MXToolbox web interface: https://mxtoolbox.com/blacklists.aspx

Set up automated daily blacklist checking:

sudo tee /etc/cron.daily/blacklist-check << 'EOF'
#!/usr/bin/env bash
# Check if the mail server IP is on common blacklists
# Using local DNS-based RBL queries

MAIL_IP="your.wan.ipv4.address"
REVERSED_IP=$(echo "$MAIL_IP" | awk -F. '{print $4"."$3"."$2"."$1}')
BLACKLISTS=(
    "zen.spamhaus.org"
    "bl.spamcop.net"
    "dnsbl.sorbs.net"
    "b.barracudacentral.org"
)

FOUND=0
for BL in "${BLACKLISTS[@]}"; do
    RESULT=$(dig +short "${REVERSED_IP}.${BL}" 2>/dev/null)
    if [ -n "$RESULT" ]; then
        echo "BLACKLISTED on ${BL}: ${RESULT}"
        FOUND=1
    fi
done

if [ "$FOUND" -eq 1 ]; then
    echo "Mail server IP ${MAIL_IP} is blacklisted. Check and remediate." | \
        mail -s "BLACKLIST ALERT: $(hostname -s) mail server IP blacklisted" root
fi

EOF

sudo chmod 0755 /etc/cron.daily/blacklist-check

Mail authentication testing

Send a test message to Port25’s authentication verifier and review the report:

echo "Testing mail authentication" | mail -s "Auth test $(date)" check-auth@verifier.port25.com

The response includes SPF, DKIM, and DMARC check results. Review it after any changes to authentication configuration.

SMTP connectivity testing from outside

Test that the mail server is reachable from external hosts:

# Test SMTP connection from the desktop over mobile hotspot or external connection
nc -v mail.yourdomain.net 25
nc -v mail.yourdomain.net 587
nc -v mail.yourdomain.net 993

Mail Loop test

Send a test message from an external address to the monitored mailbox and verify delivery time. A simple way to automate this uses a second external mail address:

# Send a test message from an external service and track delivery time
# Manual: send from Gmail or similar, check timestamp of arrival

Alerting thresholds

Configure alerting for the following conditions:

Condition	Threshold	Action
Queue size	> 50 messages	Email alert
Queue size	> 200 messages	Urgent alert
Blacklist hit	Any	Immediate alert
Auth failures	> 20 in 1 hour	Check for brute force
Disk usage (mail storage)	> 80%	Warning alert
Disk usage (mail storage)	> 90%	Urgent alert
Postfix service down	Any	Immediate alert
Dovecot service down	Any	Immediate alert

Configure these thresholds in the monitoring system. The Grafana alerting setup covered in the server monitoring section handles this once metrics are flowing into InfluxDB.

Logrotate configuration

Postfix logs to /var/log/mail.log. On Ubuntu 24.04, logrotate handles rotation automatically via /etc/logrotate.d/rsyslog. Verify the rotation is configured correctly:

cat /etc/logrotate.d/rsyslog

The mail log should rotate daily, compressed, with at least 14 days of retention for analysis purposes. If the default is insufficient, create a custom logrotate configuration:

sudo tee /etc/logrotate.d/mail-extended << 'EOF'
/var/log/mail.log {
    daily
    missingok
    rotate 30
    compress
    delaycompress
    sharedscripts
    postrotate
        /usr/lib/rsyslog/rsyslog-rotate
    endscript
}
EOF

A mail server that nobody is watching is a liability. pflogsumm and the daily report take fifteen minutes to set up and pay back every morning with a clear picture of whether the server is healthy. Add the blacklist check and the external connectivity test and you have covered the most common silent failure modes.