Distributed Top¶

Real-time resource monitoring across all VMs in your fleet with a unified dashboard.

Overview¶

The azlin top command runs a distributed top command across all VMs in your resource group, showing real-time CPU, memory, load averages, and top processes in a unified dashboard that updates every N seconds. This is invaluable for monitoring fleet-wide resource utilization and identifying performance bottlenecks.

Unlike SSH-ing into each VM individually, azlin top provides a centralized view that aggregates metrics from all VMs simultaneously.

Basic Usage¶

# Start distributed top with default settings (10s refresh)
azlin top

# Use 5 second refresh interval
azlin top -i 5

# Monitor specific resource group
azlin top --rg production-vms

# Custom refresh and timeout
azlin top -i 15 -t 10

Press Ctrl+C to exit the dashboard.

Command Options¶

Option	Short	Description	Default
`--resource-group`	`--rg`	Resource group to monitor	Current context RG
`--interval`	`-i`	Refresh interval in seconds	10
`--timeout`	`-t`	SSH timeout per VM in seconds	5
`--config`		Config file path	`~/.azlin/config.toml`
`--help`	`-h`	Show help message

Examples¶

Monitor Default Resource Group¶

Monitor all VMs in your default resource group with 10-second updates:

azlin top

Output:

=== VM: azlin-dev-1 (10.0.1.4) ===
top - 14:23:45 up 2 days, 3:15, 1 user, load average: 0.52, 0.45, 0.38
Tasks: 142 total, 1 running, 141 sleeping
%Cpu(s): 12.3 us, 4.2 sy, 0.0 ni, 83.1 id
MiB Mem: 16384 total, 8192 free, 4096 used, 4096 buff/cache

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM
1234 azureuser 20   0  4.2g   1.1g   128m S  45.0   7.1  node
5678 azureuser 20   0  2.8g   512m    64m S  12.3   3.2  python3

=== VM: azlin-dev-2 (10.0.1.5) ===
top - 14:23:46 up 5 days, 12:42, 2 users, load average: 0.15, 0.20, 0.18
...

[Updating every 10s. Press Ctrl+C to exit]

Fast Refresh for Active Debugging¶

Use a 5-second refresh when actively debugging performance issues:

azlin top -i 5

This provides near-real-time updates ideal for watching the immediate impact of changes.

Monitor Production Fleet¶

Monitor your production VMs with a longer timeout for slower networks:

azlin top --rg production-fleet -i 15 -t 10

Use Case: Production VMs behind Azure Bastion or with higher latency may need longer timeouts.

Development Environment Monitoring¶

Quick check of development VMs with fast refresh:

azlin top --rg dev-team -i 3

Use Case: During active development, see immediate resource usage from build processes, tests, or local servers.

Understanding the Output¶

The dashboard shows for each VM:

Header Line¶

=== VM: azlin-dev-1 (10.0.1.4) ===

- VM name - Private IP address

System Summary¶

top - 14:23:45 up 2 days, 3:15, 1 user, load average: 0.52, 0.45, 0.38

- Current time - Uptime - Number of logged-in users - Load averages (1, 5, 15 minutes)

Task Statistics¶

Tasks: 142 total, 1 running, 141 sleeping

- Total processes - Running vs sleeping states

CPU Usage¶

%Cpu(s): 12.3 us, 4.2 sy, 0.0 ni, 83.1 id

- us: User space CPU % - sy: System/kernel CPU % - ni: Nice (low priority) CPU % - id: Idle CPU %

Memory Usage¶

MiB Mem: 16384 total, 8192 free, 4096 used, 4096 buff/cache

- Total memory available - Free memory - Used memory - Buffer/cache memory

Top Processes¶

Shows the most resource-intensive processes with: - PID, user, priority, virtual/resident memory - CPU and memory percentage - Command name

Common Use Cases¶

1. Fleet-Wide Performance Monitoring¶

Monitor all VMs to ensure balanced workload distribution:

azlin top --rg compute-cluster -i 10

What to Look For: - Load averages > number of CPUs (overloaded) - Memory usage > 90% (need more RAM) - One VM with high load while others idle (poor distribution)

2. Build Process Monitoring¶

Watch resource usage during parallel builds across your build farm:

azlin top --rg build-farm -i 5

What to Look For: - CPU spikes during compilation - Memory leaks in build processes - I/O bottlenecks

3. Application Performance Tuning¶

Monitor resource usage while load testing your application:

azlin top --rg app-servers -i 3

What to Look For: - CPU saturation under load - Memory growth patterns - Process count explosions

4. Cost Optimization Discovery¶

Identify underutilized VMs that could be downsized:

azlin top --rg all-vms -i 30

What to Look For: - Consistently low CPU usage (< 10%) - Large amounts of free memory - VMs that could be combined or reduced

Performance Considerations¶

Refresh Interval¶

Fast (3-5s): Real-time debugging, high network load
Medium (10s): Default, balanced monitoring
Slow (15-30s): Low-impact monitoring, trend watching

SSH Timeout¶

Short (5s): Local network, Azure VMs in same region
Medium (10s): Cross-region, Bastion connections
Long (15-30s): High latency networks, VPN connections

Network Impact¶

Each refresh makes SSH connections to all VMs. For large fleets (20+ VMs): - Use longer refresh intervals (15-30s) - Increase timeout to prevent false failures - Consider splitting into multiple resource groups

Troubleshooting¶

Some VMs Not Appearing¶

Symptom: Not all VMs show up in the dashboard.

Causes: 1. VMs are stopped or deallocated 2. SSH timeout too short for Bastion connections 3. Network connectivity issues

Solution:

# Check VM status first
azlin list --rg my-rg

# Increase timeout for Bastion VMs
azlin top --rg my-rg -t 15

# Check specific VM connectivity
azlin connect my-vm

"Connection Timeout" Errors¶

Symptom: SSH connection timeout messages for VMs.

Causes: 1. VM is starting up 2. SSH service not responding 3. Network latency too high for timeout

Solution:

# Increase SSH timeout
azlin top -t 20

# Check VM status
azlin status

# Test direct connection
azlin connect problem-vm

Dashboard Updates Slowly¶

Symptom: Long pauses between updates with many VMs.

Cause: Sequential SSH connections to many VMs take time.

Solution:

# Use longer refresh interval
azlin top -i 30

# Split into smaller resource groups
azlin top --rg team-a -i 10
# In another terminal:
azlin top --rg team-b -i 10

High Load But Low CPU¶

Symptom: Load average high but CPU idle percentage is also high.

Cause: Processes waiting for I/O (disk, network).

Action: - Check disk I/O with azlin ps | grep D (uninterruptible sleep) - Review storage performance - Consider faster disk tiers

Tips and Best Practices¶

1. Use Appropriate Refresh Intervals¶

Don't use very fast refresh (< 5s) unless actively debugging. It creates unnecessary SSH connections and network traffic.

# Good for monitoring
azlin top -i 10

# Only when debugging
azlin top -i 3

2. Monitor During Key Operations¶

Run azlin top during deployments, migrations, or load tests to see real-time impact:

# Terminal 1: Run deployment
./deploy.sh

# Terminal 2: Watch resources
azlin top -i 5

3. Combine with Other Monitoring¶

Use azlin top alongside other monitoring commands:

# Check who's logged in
azlin w

# See all processes
azlin ps

# View distributed top
azlin top

4. Create Resource Group Aliases¶

For frequently monitored groups, set up shell aliases:

alias top-prod='azlin top --rg production-fleet -i 15'
alias top-dev='azlin top --rg dev-team -i 5'

5. Watch for Patterns Over Time¶

Run azlin top at different times to understand usage patterns: - Morning: User activity peaks - Midday: Background jobs - Night: Scheduled tasks, backups

Integration with Other Tools¶

With Cost Tracking¶

Identify expensive, underutilized resources:

# Find idle VMs
azlin top -i 60  # Watch for consistent low usage

# Check their costs
azlin cost --by-vm

With Batch Operations¶

Stop idle VMs discovered through monitoring:

# Identify idle VMs via top
azlin top --rg all-vms

# Stop the idle ones
azlin batch stop --tag env=dev

With Fleet Management¶

Monitor fleet-wide command execution:

# Terminal 1: Execute fleet command
azlin fleet run update-packages

# Terminal 2: Watch resource impact
azlin top -i 5

Distributed Top¶

Overview¶

Basic Usage¶

Command Options¶

Examples¶

Monitor Default Resource Group¶

Fast Refresh for Active Debugging¶

Monitor Production Fleet¶

Development Environment Monitoring¶

Understanding the Output¶

Header Line¶

System Summary¶

Task Statistics¶

CPU Usage¶

Memory Usage¶

Top Processes¶

Common Use Cases¶

1. Fleet-Wide Performance Monitoring¶

2. Build Process Monitoring¶

3. Application Performance Tuning¶

4. Cost Optimization Discovery¶

Performance Considerations¶

Refresh Interval¶

SSH Timeout¶

Network Impact¶

Troubleshooting¶

Some VMs Not Appearing¶

"Connection Timeout" Errors¶

Dashboard Updates Slowly¶

High Load But Low CPU¶

Tips and Best Practices¶

1. Use Appropriate Refresh Intervals¶

2. Monitor During Key Operations¶

3. Combine with Other Monitoring¶

4. Create Resource Group Aliases¶

5. Watch for Patterns Over Time¶

Integration with Other Tools¶

With Cost Tracking¶

With Batch Operations¶

With Fleet Management¶

See Also¶