Mail Server Monitoring
A mail server that is silently failing is worse than one that is loudly broken. Silent failures accumulate: messages deferred for days before bouncing, spam filtering quietly blocking legitimate mail, a blacklisted IP sending every outbound message to the recipient’s spam folder. Monitoring catches these before users notice.
The source material covers Munin graphs for Postfix. This series already has Grafana and InfluxDB running for IoT telemetry. Mail metrics fit naturally into the same stack. This page covers the full monitoring picture: daily log summaries, real-time queue monitoring, Grafana integration, and external deliverability health checks.
Log analysis with pflogsumm
pflogsumm parses Postfix log files and produces a human-readable daily summary: messages delivered, deferred, bounced, rejected; top senders and recipients; delivery times. It is the single most useful tool for understanding what the mail server is doing.
Install it:
sudo apt install -y pflogsumm
Run manually against the current day’s logs:
sudo pflogsumm /var/log/mail.log
Or pipe the output by email for a daily digest. Create a daily cron job:
sudo tee /etc/cron.daily/mail-report << 'EOF'
#!/usr/bin/env bash
# Daily Postfix log summary
YESTERDAY=$(date -d yesterday +%Y-%m-%d)
LOG=/var/log/mail.log
pflogsumm \
--problems-first \
--detail=5 \
--verbose-msg-detail \
--zero-fill \
"$LOG" 2>/dev/null | \
mail -s "Mail server report for $YESTERDAY on $(hostname -s)" root
EOF
sudo chmod 0755 /etc/cron.daily/mail-report
The --problems-first flag puts delivery failures and rejections at the top of the report, which is where attention is needed most. The summary lands in your inbox alongside other server mail each morning.
Useful pflogsumm flags
# Show only problems (no successful delivery stats)
pflogsumm --problems-first --detail=0 /var/log/mail.log
# Analyse a specific date range using zcat for rotated logs
zcat /var/log/mail.log.1.gz | pflogsumm
# Show top rejected senders
pflogsumm --smtpd-stats /var/log/mail.log
Real-time log monitoring
For watching what the mail server is doing right now:
# Follow Postfix logs in real time
sudo journalctl -u postfix -f
# Follow all mail-related logs
sudo tail -f /var/log/mail.log
# Filter for delivery failures only
sudo journalctl -u postfix -f | grep -E "deferred|bounced|reject"
# Watch authentication failures
sudo journalctl -u dovecot -f | grep -i "auth failed\|authentication failure"
Following a specific message
When debugging a delivery problem, trace a specific message by its queue ID:
# Find the queue ID from the log
sudo grep "from=<sender@example.com>" /var/log/mail.log | tail -5
# Trace all log entries for that queue ID
sudo grep "A1B2C3D4E" /var/log/mail.log
Queue monitoring
A healthy mail server has a small, fast-moving queue. A growing queue signals a problem.
Check the current queue:
# Show all queued messages
mailq
# Count queued messages
mailq | grep -c "^[A-F0-9]"
# Show deferred messages only
postqueue -p | grep "^[A-F0-9]" | grep -v "^\*"
# Queue analysis by destination domain
qshape deferred
qshape active
qshape incoming
qshape groups queued messages by domain and age, making it easy to spot whether delays are concentrated on a specific destination (suggesting a problem with that remote server) or spread across all destinations (suggesting a local problem).
Queue management
# Force immediate retry of all deferred messages
sudo postqueue -f
# Delete all deferred messages (use with care)
sudo postsuper -d ALL deferred
# Delete a specific message by queue ID
sudo postsuper -d QUEUE_ID
# Put a specific message on hold
sudo postsuper -h QUEUE_ID
# Release a held message
sudo postsuper -H QUEUE_ID
Grafana integration
The Grafana and InfluxDB stack used for IoT telemetry can be extended to visualise mail metrics. Two approaches work well: parsing Postfix logs with Telegraf, or using the mailstats utility to push metrics directly.
Telegraf log parsing
If Telegraf is installed as part of the monitoring stack:
# Add to /etc/telegraf/telegraf.conf
[[inputs.logparser]]
files = ["/var/log/mail.log"]
from_beginning = false
[inputs.logparser.grok]
patterns = [
"%{SYSLOGTIMESTAMP:timestamp} %{SYSLOGHOST} postfix/%{WORD:postfix_process}\\[%{NUMBER:pid}\\]: %{GREEDYDATA:message}",
]
measurement = "postfix_log"
[[inputs.exec]]
commands = [
"bash -c 'mailq | grep -c \"^[A-F0-9]\" || echo 0'"
]
name_suffix = "_queue_size"
data_format = "value"
data_type = "integer"
interval = "1m"
Postfix metrics via pflogsumm and Telegraf
A simpler approach: run pflogsumm periodically and push key metrics to InfluxDB:
sudo tee /usr/local/bin/postfix-metrics << 'EOF'
#!/usr/bin/env bash
# Parse pflogsumm output and push metrics to InfluxDB
INFLUXDB_URL="http://10.1.0.17:8086"
INFLUXDB_DB="mail"
HOSTNAME=$(hostname -s)
TIMESTAMP=$(date +%s%N)
LOG=/var/log/mail.log
# Run pflogsumm and extract key metrics
DELIVERED=$(pflogsumm "$LOG" 2>/dev/null | grep "delivered" | awk '{print $1}')
DEFERRED=$(pflogsumm "$LOG" 2>/dev/null | grep "deferred" | awk '{print $1}')
BOUNCED=$(pflogsumm "$LOG" 2>/dev/null | grep "bounced" | awk '{print $1}')
REJECTED=$(pflogsumm "$LOG" 2>/dev/null | grep "rejected" | awk '{print $1}')
QUEUE=$(mailq | grep -c "^[A-F0-9]" 2>/dev/null || echo 0)
# Write to InfluxDB
curl -s -XPOST "${INFLUXDB_URL}/write?db=${INFLUXDB_DB}" \
--data-binary "postfix,host=${HOSTNAME} delivered=${DELIVERED:-0}i,deferred=${DEFERRED:-0}i,bounced=${BOUNCED:-0}i,rejected=${REJECTED:-0}i,queue=${QUEUE:-0}i ${TIMESTAMP}" \
> /dev/null
EOF
sudo chmod 0755 /usr/local/bin/postfix-metrics
Add to crontab to run every 15 minutes:
*/15 * * * * /usr/local/bin/postfix-metrics
Grafana dashboard panels
With metrics in InfluxDB, create a Grafana dashboard with panels for:
- Delivery rate: messages delivered per hour
- Queue size: current queued message count over time
- Rejection rate: rejected messages per hour (spike indicates spam attack or misconfiguration)
- Deferral rate: deferred messages per hour (spike indicates delivery problems to remote servers)
- Bounce rate: bounced messages (persistent spikes indicate list quality or recipient validation issues)
Dovecot monitoring
Monitor IMAP login activity and connection counts:
# Current active IMAP connections
doveadm who
# IMAP authentication failures in the last hour
journalctl -u dovecot --since "1 hour ago" | grep -c "auth failed"
# Dovecot statistics
doveadm stats dump
# Per-user mailbox statistics
doveadm quota get -u you@yourdomain.net
For recurring monitoring, add a brief Dovecot check to the daily mail report:
# Append to /etc/cron.daily/mail-report
echo ""
echo "=== Dovecot ==="
echo "Active connections: $(doveadm who 2>/dev/null | wc -l)"
echo "Auth failures (24h): $(journalctl -u dovecot --since '24 hours ago' 2>/dev/null | grep -c 'auth failed' || echo 0)"
External deliverability monitoring
The internal monitoring covers what the server is doing. External monitoring covers whether the mail reaches its destination and how it is treated.
Blacklist checks
Check whether the server’s WAN IP is on any major mail blacklists. Being blacklisted silently causes outbound mail to be rejected or spam-classified.
Check manually:
# Install mxtoolbox or use the website
curl -s "https://api.mxtoolbox.com/api/v1/lookup/blacklist/your.wan.ip.address" | jq
Or use the MXToolbox web interface: https://mxtoolbox.com/blacklists.aspx
Set up automated daily blacklist checking:
sudo tee /etc/cron.daily/blacklist-check << 'EOF'
#!/usr/bin/env bash
# Check if the mail server IP is on common blacklists
# Using local DNS-based RBL queries
MAIL_IP="your.wan.ipv4.address"
REVERSED_IP=$(echo "$MAIL_IP" | awk -F. '{print $4"."$3"."$2"."$1}')
BLACKLISTS=(
"zen.spamhaus.org"
"bl.spamcop.net"
"dnsbl.sorbs.net"
"b.barracudacentral.org"
)
FOUND=0
for BL in "${BLACKLISTS[@]}"; do
RESULT=$(dig +short "${REVERSED_IP}.${BL}" 2>/dev/null)
if [ -n "$RESULT" ]; then
echo "BLACKLISTED on ${BL}: ${RESULT}"
FOUND=1
fi
done
if [ "$FOUND" -eq 1 ]; then
echo "Mail server IP ${MAIL_IP} is blacklisted. Check and remediate." | \
mail -s "BLACKLIST ALERT: $(hostname -s) mail server IP blacklisted" root
fi
EOF
sudo chmod 0755 /etc/cron.daily/blacklist-check
Mail authentication testing
Send a test message to Port25’s authentication verifier and review the report:
echo "Testing mail authentication" | mail -s "Auth test $(date)" check-auth@verifier.port25.com
The response includes SPF, DKIM, and DMARC check results. Review it after any changes to authentication configuration.
SMTP connectivity testing from outside
Test that the mail server is reachable from external hosts:
# Test SMTP connection from the desktop over mobile hotspot or external connection
nc -v mail.yourdomain.net 25
nc -v mail.yourdomain.net 587
nc -v mail.yourdomain.net 993
Mail Loop test
Send a test message from an external address to the monitored mailbox and verify delivery time. A simple way to automate this uses a second external mail address:
# Send a test message from an external service and track delivery time
# Manual: send from Gmail or similar, check timestamp of arrival
Alerting thresholds
Configure alerting for the following conditions:
| Condition | Threshold | Action |
|---|---|---|
| Queue size | > 50 messages | Email alert |
| Queue size | > 200 messages | Urgent alert |
| Blacklist hit | Any | Immediate alert |
| Auth failures | > 20 in 1 hour | Check for brute force |
| Disk usage (mail storage) | > 80% | Warning alert |
| Disk usage (mail storage) | > 90% | Urgent alert |
| Postfix service down | Any | Immediate alert |
| Dovecot service down | Any | Immediate alert |
Configure these thresholds in the monitoring system. The Grafana alerting setup covered in the server monitoring section handles this once metrics are flowing into InfluxDB.
Logrotate configuration
Postfix logs to /var/log/mail.log. On Ubuntu 24.04, logrotate handles rotation automatically via /etc/logrotate.d/rsyslog. Verify the rotation is configured correctly:
cat /etc/logrotate.d/rsyslog
The mail log should rotate daily, compressed, with at least 14 days of retention for analysis purposes. If the default is insufficient, create a custom logrotate configuration:
sudo tee /etc/logrotate.d/mail-extended << 'EOF'
/var/log/mail.log {
daily
missingok
rotate 30
compress
delaycompress
sharedscripts
postrotate
/usr/lib/rsyslog/rsyslog-rotate
endscript
}
EOF
A mail server that nobody is watching is a liability. pflogsumm and the daily report take fifteen minutes to set up and pay back every morning with a clear picture of whether the server is healthy. Add the blacklist check and the external connectivity test and you have covered the most common silent failure modes.