How to Monitor Server Status | 如何监控服务器状态

一、基本命令查看服务器状态

/proc/目录原理:硬件状态信息在启动的过程中装载到虚拟目录/proc下文件中。

proc文件系统是一个伪文件系统,它只存在内存当中,而不占用外存空间。它以文件系统的方式为访问系统内核数据的操作提供接口。

CPU硬件信息

cat /proc/cpuinfo:查看CPU硬件静态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[fivezh@master ~]$ cat /proc/cpuinfo 
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 69
model name : Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz
stepping : 1
microcode : 0x1d
cpu MHz : 2394.457
cache size : 3072 KB
physical id : 0
siblings : 1
core id : 0
cpu cores : 1
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 13
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm ida arat epb xsaveopt pln pts dtherm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 invpcid
bogomips : 4788.91
clflush size : 64
cache_alignment : 64
address sizes : 42 bits physical, 48 bits virtual
power management:

主要参数项:

physical id: 物理封装的处理器的id。
cpu cores: 位于当前逻辑核相同物理封装的处理器中的核数。
core id: 每个核心在当前CPU中id。
siblings: 位于相同物理封装的处理器中的逻辑处理器的数量。
processor: 逻辑处理器的id。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# physical id: 物理CPU封装数量
cat /proc/cpuinfo|grep “physical id”|sort -u|wc –l
# cpu cores: 每个物理地CPU内的核数
cat /proc/cpuinfo |grep "cpu cores"|uniq|cut -d: -f2
# core id: 该核在物理CPU中的编号
cat /proc/cpuinfo|grep “core id”|sort -u
# siblings: 同一物理CPU中逻辑处理器的个数
cat /proc/cpuinfo |grep "siblings"
# processor: 逻辑处理器的个数
cat /proc/cpuinfo |grep "processor"|wc –l
# 判断是否开启超线程: 比较siblings和cores是否一致
如果siblings数量是cores的两倍,则为开启超线程;否则,为未开启或不支持超线程
cat /proc/cpuinfo |grep "siblings"
cat /proc/cpuinfo |grep "cpu cores"

内存硬件信息

cat /proc/meminfo: 查看内存状态信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[fivezh@master ~]$ cat /proc/meminfo 
MemTotal: 1003164 kB
MemFree: 243768 kB
MemAvailable: 683912 kB
Buffers: 764 kB
Cached: 534796 kB
SwapCached: 0 kB
Active: 254736 kB
Inactive: 338540 kB
Active(anon): 58104 kB
Inactive(anon): 6388 kB
Active(file): 196632 kB
Inactive(file): 332152 kB
Unevictable: 0 kB
Mlocked: 0 kB
SwapTotal: 2097148 kB
SwapFree: 2097148 kB
Dirty: 18612 kB
Writeback: 0 kB
AnonPages: 57756 kB
Mapped: 21732 kB
Shmem: 6776 kB
Slab: 90112 kB
SReclaimable: 47968 kB
SUnreclaim: 42144 kB
KernelStack: 7744 kB
PageTables: 4008 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 2598728 kB
Committed_AS: 257172 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 199252 kB
VmallocChunk: 34359522160 kB
HardwareCorrupted: 0 kB
AnonHugePages: 8192 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 53120 kB
DirectMap2M: 995328 kB
DirectMap1G: 0 kB

CPU、内存占有率信息

top: 实时显示进程的资源占用情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[fivezh@master ~]$ top
top - 19:47:39 up 27 min, 2 users, load average: 0.01, 0.17, 0.15
Tasks: 341 total, 2 running, 339 sleeping, 0 stopped, 0 zombie
%Cpu(s): 1.5 us, 1.5 sy, 0.0 ni, 95.5 id, 1.4 wa, 0.0 hi, 0.1 si, 0.0 st
KiB Mem : 1003164 total, 240996 free, 134148 used, 628020 buff/cache
KiB Swap: 2097148 total, 2097148 free, 0 used. 683368 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 56656 6452 3848 S 0.0 0.6 0:01.38 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:02.62 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root rt 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/0
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcuob/1

free -m: 内存使用情况

1
2
3
4
[fivezh@master ~]$ free -m
total used free shared buff/cache available
Mem: 979 130 238 6 610 667
Swap: 2047 0 2047

vmstat: 查看虚拟内存使用情况

1
2
3
4
[fivezh@master ~]$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 243960 764 624716 0 0 102 489 130 168 2 2 95 1 0

超线程是否开启

  1. 比较cpuinfo中siblings和cores数是否一致,如siblings=2xcores则为开启超线程

    1
    2
    cat /proc/cpuinfo | grep "siblings"
    cat /proc/cpuinfo | grep "cpu cores"
  2. 查看cpuinfo下是否存在相同physical id和core id的逻辑处理器(processor)

硬盘信息

sudo df -h: 查看文件系统占用情况及挂载点等信息, man df
sudo fdisk -l: 查看磁盘分区及使用情况, man fdisk

IO信息

lsof: 查看打开文件和相关进程情况

1
2
3
4
5
6
7
8
9
[fivezh@master ~]$ lsof | more
COMMAND PID TID USER FD TYPE DEVICE SIZE/OFF NODE NAME
systemd 1 root cwd unknown /proc/1/cwd (readlink: Permission denied)
systemd 1 root rtd unknown /proc/1/root (readlink: Permission denied)
systemd 1 root txt unknown /proc/1/exe (readlink: Permission denied)
systemd 1 root NOFD /proc/1/fd (opendir: Permission denied)
kthreadd 2 root cwd unknown /proc/2/cwd (readlink: Permission denied)
kthreadd 2 root rtd unknown /proc/2/root (readlink: Permission denied)
kthreadd 2 root txt unknown /proc/2/exe (readlink: Permission denied)

iotop: 记录进程IO使用率,第三方需安装
Install Iotop in Linux
iostat: 查看分区IO使用情况,第三方需安装

网络信息

netstat -an: 查看网络连接情况

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
[fivezh@master ~]$ netstat -an | more
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
tcp 0 0 127.0.0.1:25 0.0.0.0:* LISTEN
tcp 0 0 192.168.8.5:22 192.168.8.1:5769 ESTABLISHED
tcp 0 52 192.168.8.5:22 192.168.8.1:63401 ESTABLISHED
tcp6 0 0 :::22 :::* LISTEN
tcp6 0 0 ::1:25 :::* LISTEN
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 15943 /var/run/dbus/system_bus_socket
unix 2 [ ACC ] STREAM LISTENING 19654 private/bounce
unix 2 [ ACC ] STREAM LISTENING 19657 private/defer
unix 2 [ ACC ] STREAM LISTENING 19660 private/trace
unix 2 [ ACC ] STREAM LISTENING 19663 private/verify
unix 2 [ ACC ] STREAM LISTENING 19669 private/proxymap

dmesg | grep -i eth: 查看网卡型号信息

1
2
3
4
[fivezh@master ~]$ dmesg | grep -i eth
[ 2.110507] e1000 0000:02:01.0 eth0: (PCI:66MHz:32-bit) 00:0c:29:1a:01:9c
[ 2.110513] e1000 0000:02:01.0 eth0: Intel(R) PRO/1000 Network Connection
[ 2.113662] systemd-udevd[487]: renamed network interface eth0 to eno16777736

额外信息

getconf LONG_BIT或file /bin/ls: 查看32、64位操作系统

1
2
3
4
[fivezh@master ~]$ getconf LONG_BIT
64
[fivezh@master ~]$ file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=0xa0cbb02fea1cb40346262515965696d361dbf5ba, stripped

dmidecode | grep "Product Name": 查看服务器型号
DMI(Desktop Management Interface)是Linux下遵循SMBIOS/DMI标准的收集系统信息的管理系统,其输出的信息包括 BIOS、系统、主板、处理器、内存、缓存等。
查看服务器型号:dmidecode | grep ‘Product Name’
查看主板的序列号:dmidecode |grep ‘Serial Number’
查看系统序列号:dmidecode -s system-serial-number
查看内存信息:dmidecode -t memory
查看OEM信息:dmidecode -t 11

1
2
3
[fivezh@master ~]$ sudo dmidecode | grep "Product Name" 
Product Name: VMware Virtual Platform
Product Name: 440BX Desktop Reference Platform

其他工具:

Htop

nmon

nmon for Linux - nmon is short for Nigel’s performance Monitor for Linux. This systems administrator, tuner, benchmark tool gives you a huge amount of important performance information in one go.

nmon
nmon是Linux下监控系统资源的统一化管理工具,支持CPU, 内存, 文件系统, 虚拟内存, 资源, NFS,内核, Top等,还支持监控数据导出文件,并通过nmon analyser生成报表数据。

1
2
3
4
5
6
7
x  Use these keys to toggle statistics on/off:                                                                                      x
x c = CPU l = CPU Long-term - = Faster screen updates x
x m = Memory j = Filesystems + = Slower screen updates x
x d = Disks n = Network V = Virtual Memory x
x r = Resource N = NFS v = Verbose hints x
x k = kernel t = Top-processes . = only busy disks/procs x
x h = more options q = Quit

nmon命令参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Hint: nmon [-h] [-s <seconds>] [-c <count>] [-f -d <disks> -t -r <name>] [-x]

-h FULL help information
Interactive-Mode:
read startup banner and type: "h" once it is running
For Data-Collect-Mode (-f)
-f spreadsheet output format [note: default -s300 -c288]
optional
-s <seconds> between refreshing the screen [default 2]
-c <number> of refreshes [default millions]
-d <disks> to increase the number of disks [default 256]
-t spreadsheet includes top processes
-x capacity planning (15 min for 1 day = -fdt -s 900 -c 96)

Version - nmon 14i

代码示例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
[five@nagios-server ~]$ nmon -s3 -c10 -f
[five@nagios-server ~]$ ls
nagios-server_160411_1857.nmon
[five@nagios-server ~]$ tail -f nagios-server_160411_1857.nmon
可以看到tail -f中数据文件的变化,通过nmon analyser对该文件分析。

mon -s 300 -c 288 -f -m /tmp
-s 300:表示每300秒采集一次数据,
-c 288 :表示采集288次,300*288=86400秒,刚好是1天的数据,这样运行一次这个程序就会生成一个一天的数据文件,
-m /tmp: 表示生成的数据文件的路径
-f :表示生成的数据文件名中有时间

# 通过crontab作为自动监控脚本
[root@dhdb sh]# more nmon.sh
#author: skate
#function: monitor system information
#time:2011/08/05

NPATH=/tmp/
# monitoring per 120 senonds
nmon -s 120 -c 720 -f -m $NPATH

# monitoring per 300 senonds
#nmon -s 300 -c 288 -f -m $NPATH

#delete file before 365 day
#find /tmp -name *.nmon -mtime +365 -exec rm {} \;
~

[root@dhdb sh]# crontab -l
0 0 * * * sh /oracle/sh/nmon.sh >/dev/null 2>&1

nmon16b

参考文献

  1. 20 Command Line Tools to Monitor Linux Performance
  2. Linux下/proc目录简介
  3. Linux下使用NMON监控、分析系统性能