||Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
|(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix|
|News||Network Troubleshooting||Recommended Links||Recommended Papers||Debugging||Which Process Listens to Particular Port|
|Troubleshooting TCP/IP Communication Issues||Troubleshooting NFS Problems||DNS Troubleshooting||Postfix Troubleshooting||Postfix Connection Refused Problem||Apache troubleshooting||Xinetd|
|Duplicate IP Address||Duplicate MAC Address||Faulty Cable||Multi-Homed System Acts as Rogue||TCP Treason Uncloaked|
|Main Entry: trou-ble-shoot-er
From the Merriam-Webster online dictionary
Note: For troubleshooting strategies see Network Troubleshooting. For general strategies see Debugging
The ping, traceroute, ngrep, Tcpdump , Nmap are indispensable tools for troubleshooting networking problems.
Troubleshooting, problem analysis and root cause determination requires patience, determination, and experience. It is important to fully investigate the problem and collect all relevant data in order to begin troubleshooting on the correct path. Although you may start out on one path and end up on another in order to resolve a complex problem, over time the most important skill will become patience. Keep personal (preferably electronic, as it gives you ability to search it) log as in a process of solving complex problem you can become distracted and forget about important facts or findings. Netbooks are perfect for keeping your personal log.
Merriam-Webster online dictionary provides a fitting definition of troubleshooting and by extension a definition of a Unix system administrator in general - making repairs, dealing with people, and trying to anticipate as well as prevent problems. Like program debugging, linux problems troubleshooting is very similar to investigation of the crime scene. Some "obvious" leads can be and often are false. Finding relevant information is not easy and can take tremendous amount of time and require maintaining well organized documentation about the problem. Some highly suspicious suspects without alibi are actually innocent. You need to have a plan and abilities to see big picture not to be led off-track. You need clear analytical thinking and experience to get to the root cause.
Both modern Linux/Unix and TCP/IP stack are very complex systems and for many subsystems the administrator have little or no understanding of internals. There are several lines of fault analysis that can help in this situation:
/bin/dmesgfor similar purposes.
/bin/dmesgprovides more detailed information in real time, while the log file keeps less information for historical purposes.
The most general strategy is to compare a problematic system to working system or backup of the current system that did not have the problem you are experiencing now. Often the problem si the result of miscommunication when several system admins implement some changes on the system. Here configuration management is critical.
Like with any complex system creation of baseline for a Linux system is critically important. The most primitive way is to tar /etc and /root directories as well as couple of other about which you know that they contain important configurations files.
Supportconfig is a standard tool to collect all the relevant for troubleshooting information in Suse. There is a tool that helps to analyze the collected information called the Supportconfig Health Check Report Tool (schealth). For Red Hat several tools are also available. See Baseliners
When an event occurs that will cause system or application downtime, the number one priority is get it working again as quickly as possible. If the event is something that you are familiar with, you will approach the issue with confidence due to your experience. When the event is new and one that you have not personally resolved, you are likely to tread with a more cautious mindset. The confidence comes with experience; the techniques used are generally a combination of proven solutions. If the solution is documented from the application or system vendor, it has been tested and will most likely bring the system or application online with the least downtime. Collaboration with colleagues is important when the issue falls outside the realm of familiar territory. It never hurts to ask a question or two of those who might have seen a similar situation and have more current experience. Finally, determination is the final essential quality for someone facing the troubleshooting task even when the situation they face may be unfamiliar.
Unix syslog daemon provides wealth of information both for security and troubleshooting of problems. It permits creating centralized log server so that each syslog daemon sent to it its logs for correlated analysis. Contrary to popular opinion that does not exclude using local log analyzers on each and every server. Typical for half-dollar security specialists paranoia about log modification is a little bit naive. In modern systems such modifications are often picked up by monitoring system. That means that while centralization of logs is a necessary step toward a good log infrastructure it is not sufficient. Multilevel log analysis with a good, flexible analyzer of each server and a different, mostly oriented on correlation of logs analyzer on the syslog host is a better deal.
Information you need to collect before calling support. Among questions that are useful to ask are
There are several dozen useful network troubleshooting tools. Among them ( see also Linux Troubleshooting Wiki)
The ping utility sends ICMP echo request packets to the target host or hosts. Once ICMP echo responses are received, the message target is alive, where target is the hostname of the device receiving the ICMP echo requests, is displayed.
# ping problem.host.com problem.host.com is alive
The -s option is useful when attempting to connect to a remote host that is down or not available. No output will be produced until an ICMP echo response is received from the target host. The -R option can be useful if the traceroute utility is not available.
Statistics are displayed when the ping -s command is terminated.
# ping -s problem.host.com
Another useful troubleshooting technique using ping is to send ICMP echo requests to the entire network by using the broadcast address as the target host. Using the -s option with the broadcast address provides good information about which systems are available on the network:
# ping -s 172.20.4.255
The ifconfig utility is useful when troubleshooting networking problems. You can use it to display an interface's current status including the settings for the following:
In the simplest setup tcpdump dumps all traffic to the screen in a text format, which you can brose via more ot other pager.
tcpdump can also be used to see if a given service may be unresponsive because your packets are simply not reaching the remote machine. Since tcpdump is a commandline tool, you'll very probably need to add filters - especially when you're firing tcpdump up on a remote machine, where you're logged in via SSH. Otherwise you'll get lots of packet dumps of SSH packets that are telling you of packets dumped that belong to ssh telling you of packets dumped...
tcpdump -l -i eth0 port 25
This will dump all packets aimed at, or originating from, a TCP or UDP port 25. The '-l' is to do line buffering, so we'll actually see each packet as it crosses the wire.
If you're debugging network connections over an SSH connection, the following will probably be the most frequent way that you'll invoke tcpdump:
tcpdump -l not port 22
And to monitor the communication between Server A.local.net (running tcpdump) and the remote server B.remote.net:
tcpdump -l src or dst B.remote.net
The tcpdump filter syntax is actually surprisingly powerful you can always create a filter that gives you presizely information you want.
Tcpdump prints verbose information about the sniffed traffic with the -v option. It can print hex dump of the packet with option -x. For printing full packet you need to use option -s. One important feature is the ability to write the "raw packets" to a file using -s0 -w name_of_file.
You can later analyze the capture using any program that understand pcap raw dump format, for example tcpdump, snort or Ethereal. I think almost all useful network programs now understand this format and can reuse raw packet tcpdump-generated files. If you forgot the option -s 0 packets will truncated and you will not be able to analyze them for example with snort. To print them from raw tcpdump file you can use option -r. For example tcpdump -r file_with_capture. You can use option r with -v (versose) and -vv to get more information.
Two very useful options are (see also Most useful options)
Here are several additional useful options
TCPDUMP will only process packets that match the filter expression. Such a filter expression can be passed on the command line, or read from a file using the -F filename parameter.
tcpdump -F path_to_filter
Tcpdump filter language is similar to snort (actually this language originated in tcpdump and later was adopted and extended by snort). For more detailed info on TCPDUMP and filter expressions, please consult the TCPDUMP man page, either via the man command or online at tcpdump.org.
Here are several examples (See also Filters to Detect, Filters to Protect The Mechanics of Writing TCPdump Filters.)
tcpdump dst host 10.10.10.10 or scr host 10.10.10.10Here we try to get TCP packets with source port 80
tcpdump src port 80 and tcp
Write the syntax of a tcpdump command that captures packets containing IP datagrams with a source or destination IP address equal to 10.0.1.12.tcpdump host 10.0.1.12
Write the syntax of a tcpdump command that captures packets containing ICMP messages with a source or destination IP address equal to 10.0.1.12.tcpdump icmp and host 10.0.1.12
Write the syntax of a tcpdump command that captures packets containing IP datagrams between two hosts with IP addresses 10.0.1.11 and 10.0.1.12, both on interface eth1.tcpdump -i eth1 host 10.0.1.11 and host 10.0.1.12
Write a tcpdump filter expression that captures packets containing TCP segments with a source or destination IP address equal to 10.0.1.12.tcp and host 10.0.1.12
Write a tcpdump filter expression that, in addition to the constraints in Question 5, only captures packets using port number 23.tcp port 23 and host 10.0.1.12
tcpdump '(host 184.108.40.206 and net 192.168.1) and ((tcp port 80 or port 443))'
Ethereal/Wireshark can display tcpdump logs via GUI interface and you can filter raw logs via Etherreal/Wireshark too. It can also act as data collection tool of its own. It will display all the connections it traced during the capture. There are a couple ways to look for bandwidth hogs.
The "Statistics" menu has a couple of useful options. The "Protocol Hierarchy" shows what % of packets in the trace is from each type of protocol. In the case of a bandwith hog, at least what protocol is the culprit should be easy to spot here.
The "Conversations" screen is also helpful for looking for bandwidth hogs. Since you can sort the "conversations" by number of packets, the culprit is likely to hop to the top. This isn't always the case, as it could easily be many small connections killing the bandwidth, not one big heavy connection.
A typical Ethernet card can operate at two speeds, 100 Mbits per second and 1Gbit per second. You can use the ethtool utility to quickly display the speed at which the interface is running.
Ethtool utility also can show whether that Ethernet card is connected to network or not.
The ethernet card can run in either full-duplex or half-duplex mode. As you can see above ethtool provides this info. The line
indicates that the interface is running in full-duplex mode. To check if interface eth0 is full duplex or half-duplex mode:
# ethtool eth0 Settings for eth0: Supported ports: [ TP ] Supported link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Supports auto-negotiation: Yes Advertised link modes: 10baseT/Half 10baseT/Full 100baseT/Half 100baseT/Full 1000baseT/Full Advertised auto-negotiation: Yes Speed: 1000Mb/s Duplex: Full Port: Twisted Pair PHYAD: 1 Transceiver: internal Auto-negotiation: on Supports Wake-on: g Wake-on: g Link detected: yes
You can use the netstat utility to display the status of the system's network interfaces. Of particular interest when troubleshooting networks are the routing tables for the system(s) in question. You can use the -r switch to display a system's routing tables.
Although interesting, the displayed routing table is not of much use unless you are familiar with the name resolution services. Using the -n switch allow to display IP addresses instead of names:
# netstat -rn ifconfig -a
Routing table with IP addresses is somewhat easier to understand and troubleshoot, especially when combined with the information from the ifconfig -a output.The verbose mode switch, -v displays additional information, including the MTU size configured for the interface:
Another useful options is:
The -p option tells it to try to determine what program has the socket open, which is often very useful info. For example, someone nmap's their system and wants to know what is using port 666 for example. Running netstat -pa will show you its satand running on that tcp port.
One of the most twisted, but useful invocations is:
netstat -t -n | cut -c 68- | sort | uniq -c | sort -n
This will show you a sorted list of how many sockets are in each connection state. For example:
9 LISTEN 21 ESTABLISHED
A quick and dirty way to see what daemons are running and accepting connections on your machine is
for TCP services and
for UDP services. Unix domain sockets are usually more abundant than either of these two and a lot less interesting.
If you're having trouble with network throughput for some reason, try
This will print out a summary of the network stack state counters, going into way more detail than the RX/TX frames dropped counter of ifconfig. By looking at what counters are rapidly increasing, you may be able to find out why your network throughput is misbehaving.
The traceroute utility is useful when you perform network troubleshooting. You can quickly determine if the expected route is being taken when communicating or attempting to communicate with a target network device. As with most network troubleshooting, it is useful to have a benchmark against which current traceroute output can be compared. The traceroute output can report network problems to other network troubleshooters. For example, you could say, "Our normal route to a host is from our router called router1-ISP to your routers called rtr-a1 to rtr-c4. Today, however, users are complaining that performance is very slow. Screen refreshes are taking more than 40 seconds when they normally take less than a second. The output from traceroute shows that the route to the host is from our router router1-ISP to your routers called rtr-a1, rtr-d4 rtr-x5, and then to rtr-c4. What is going on?"
The traceroute utility uses the IP TTL and tries to force ICMP TIME_EXCEEDED
responses from all gateways and routers along the path to the target host. The traceroute
utility also tries to force a PORT_UNREACHABLE message from the target host. The
traceroute utility can also attempt to force an ICMP ECHO_REPLY message from the
target host by using the -I (ICMP ECHO) option when issuing the traceroute command.
The traceroute utility will, by default, resolve IP addresses as shown in the following example:
# traceroute 172.20.4.110
traceroute to 172.20.4.110 (172.20.4.110), 30 hops max, 40 byte packets
1 220.127.116.11 (18.104.22.168) 1.037 ms 0.785 ms 0.702 ms
2 22.214.171.124 (126.96.36.199) 1.452 ms 1.569 ms 0.766 ms
3 * dungeon (188.8.131.52) 1.320 ms *
You can display IP addresses instead of hostnames by using the -n switch as shown in the following example. In this example, the hostname dungeon for IP address 184.108.40.206 on line 3 is no longer resolved.
# traceroute -n 172 .20.4. 110
traceroute to 172.20.4.110 (172.20.4.110), 30 hops max, 40 byte packets
0.534 ms *
diff compares two files and shows the difference between them.
For troubleshooting, this is most often used on config files. If one version of a config file works, but another does not, a `diff` of the two files can often be enlightening. Since it can be very easy to miss a small difference in a file, being able to see just the differences is useful.
For troubleshooting diff is invaluable as it shows the changes that very difficult to impossible to detent with naked eye. Seeing exactly what changed between two versions of a particular configuration file is a great help.
For example, if foo-2.2 is acting weird, where foo-2.1 worked fine, it's not uncommon to `diff` the source code between the two versions to see if anything related to your problem changed.
Occasionally, you find yourself without a working network. This article is designed to guide you through the basic steps to work out what is wrong. Hopefully, from there you will be able to find out how to resolve your problem.
This guide was written based on an Ubuntu setup, using commands that are installed by default. It should apply to any system that has iproute and mtr installed. The article also assumes you are only dealing with a wired connection. It will mostly apply to a wireless network, but there may be additional steps you need to investigate.
The first thing to do is check that your network card has been detected.
Run "ip link"% ip link 1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:a0:c9:92:9c:c0 brd ff:ff:ff:ff:ff:ff 3: sit0: <NOARP> mtu 1480 qdisc noop link/sit 0.0.0.0 brd 0.0.0.0
You should see eth0. If this is the case, then your network card was detected correctly.
Lets make sure we have a cable plugged in correctly by checking the link status using mii-tool.% sudo mii-tool eth0: negotiated 100baseTx-FD flow-control, link ok
If you see "link ok" then you have a working ethernet connection. If you don't you should check your network cable is plugged in securely and that it is wired correctly and your switch is working correctly.
The next thing to do is to check that you have got an IP address for that network device. You can do that by running "ip addr show dev eth0"2: eth0: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:a0:c9:92:9c:c0 brd ff:ff:ff:ff:ff:ff inet 10.187.182.233/27 brd 10.187.182.255 scope global eth0 inet6 2002:8b0:ed:2:2a0:c9ff:fe92:9cc0/64 scope global dynamic valid_lft 2591991sec preferred_lft 604791sec inet6 fe80::2a0:c9ff:fe92:9cc0/64 scope link valid_lft forever preferred_lft forever
The line you're interested in here is the line that starts inet. If you don't or it starts 169.254 then you don't have an ip address assigned.
You can either get this dynamically via something called DHCP, or you can configure it statically. We'll try dhcp first by running "sudo dhclient eth0"% sudo dhclient eth0 Password: Internet Systems Consortium DHCP Client V3.0.4 Copyright 2004-2006 Internet Systems Consortium. All rights reserved. For info, please visit http://www.isc.org/sw/dhcp/ Listening on LPF/eth0/00:a0:c9:92:9c:c0 Sending on LPF/eth0/00:a0:c9:92:9c:c0 Sending on Socket/fallback DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 4 DHCPOFFER from 10.187.182.226 DHCPREQUEST on eth0 to 255.255.255.255 port 67 DHCPACK from 10.187.182.226 bound to 10.187.182.233 -- renewal in 38436 seconds.
If you keep seeing DHCPDISCOVER lines over and over, then it means your router is not providing addresses via DHCP, although I find this quite unlikely.
If you repeat the "ip addr show eth0" line again you should see that you now have a new "inet" line.
Lets see if our networking is working. Let's ping the machine that gave us an IP address. If you take the IP address from the DHCPOFFER line and try to ping it using "ping <ipaddress>". Press Ctrl-C to stop it.% ping 10.187.182.226 PING 10.187.182.226 (10.187.182.226) 56(84) bytes of data. 64 bytes from 10.187.182.226: icmp_seq=1 ttl=64 time=0.364 ms 64 bytes from 10.187.182.226: icmp_seq=2 ttl=64 time=0.274 ms 64 bytes from 10.187.182.226: icmp_seq=3 ttl=64 time=0.286 ms --- 10.187.182.226 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2006ms rtt min/avg/max/mdev = 0.274/0.308/0.364/0.039 ms
If you get lines like this, then you have working IP networking.
So we've been given an IP address from somewhere. Let's see if they have given us a default route. We can do this by running "ip route"% ip route 10.187.182.224/27 dev eth0 proto kernel scope link src 10.187.182.233 default via 10.187.182.225 dev eth0
From this we can see that our default route is to 10.187.182.225 using eth0 network device. Lets try pinging that:% ping 10.187.182.225 PING 10.187.182.225 (10.187.182.225) 56(84) bytes of data. 64 bytes from 10.187.182.225: icmp_seq=1 ttl=64 time=0.317 ms 64 bytes from 10.187.182.225: icmp_seq=2 ttl=64 time=0.291 ms 64 bytes from 10.187.182.225: icmp_seq=3 ttl=64 time=0.224 ms --- 220.127.116.11 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2001ms rtt min/avg/max/mdev = 0.224/0.277/0.317/0.041 ms
So we know we can at least reach the router.
Now, lets see if we can get any further than this. Lets try pinging Ubuntu's webserver.% ping 18.104.22.168 PING 22.214.171.124 (126.96.36.199) 56(84) bytes of data. 64 bytes from 188.8.131.52: icmp_seq=1 ttl=52 time=30.5 ms 64 bytes from 184.108.40.206: icmp_seq=2 ttl=52 time=30.8 ms 64 bytes from 220.127.116.11: icmp_seq=3 ttl=52 time=30.2 ms --- 18.104.22.168 ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2006ms rtt min/avg/max/mdev = 30.232/30.532/30.836/0.318 ms
If this works, then we have working networking and can move on to checking DNS.
If this doesn't work, we need to find out where the problem lies using mtr (I'd normally suggest traceroute here, but it doesn't look like it's a part of the standard Ubuntu install). We will trace the route to ubuntu's webserver again.% mtr -r -c 1 22.214.171.124 HOST: mojo-jojo Loss% Snt Last Avg Best Wrst StDev 1. brian.catnip.org.uk 0.0% 1 0.4 0.4 0.4 0.4 0.0 2. 10.187.182.201 0.0% 1 1.1 1.1 1.1 1.1 0.0 3. careless.aaisp.net.uk 0.0% 1 30.2 30.2 30.2 30.2 0.0 4. needless.aaisp.net.uk 0.0% 1 28.7 28.7 28.7 28.7 0.0 5. ge-2-0-216.ipcolo2.London1.L 0.0% 1 30.2 30.2 30.2 30.2 0.0 6. ae-0-55.bbr1.London1.Level3. 0.0% 1 30.2 30.2 30.2 30.2 0.0 7. as-0-0.bbr2.London2.Level3.n 0.0% 1 30.3 30.3 30.3 30.3 0.0 8. ge-3-0-0-55.gar1.London2.Lev 0.0% 1 30.2 30.2 30.2 30.2 0.0 9. 126.96.36.199 0.0% 1 30.4 30.4 30.4 30.4 0.0 10. vlan102.core-l-1.lon2.mnet.n 0.0% 1 29.6 29.6 29.6 29.6 0.0 11. 188.8.131.52 0.0% 1 31.7 31.7 31.7 31.7 0.0 12. 184.108.40.206 0.0% 1 30.0 30.0 30.0 30.0 0.0 13. signey.ubuntu.com 0.0% 1 29.9 29.9 29.9 29.9 0.0
This shows us every router between us and the remote machine. The first line will show your ADSL router. The line after that will be the remote end of your ADSL line. If your adsl is not connected you won't be able to reach the second hop. Anything beyond this is nothing you can control, but considering it works in Windows it's unlikely that this is the case.
There is another possibility why you can't reach the second hop and that is that the default route isn't correct, but this address should have been given to you via DHCP like your IP address.
If this is all working, we can check DNS.
Try looking up a host by name using the host command:% host www.ubuntu.com www.ubuntu.com has address 220.127.116.11
If this works, then your networking should be working fine. If not, then we need to check /etc/resolv.conf. It should look something like:% cat /etc/resolv.conf nameserver 10.187.182.226 nameserver 10.187.182.229
Here we list DNS name servers. You should edit this file to use the name servers that you were given by your ISP.
- Linux Network Administrator's Guide by Terry Dawson, Gregor N Purdy, Tony Bautts
- Linux Networking Clearly Explained by Bryan Pfaffenberger, Michael Jang
- Linux Home Networking - Troubleshooting
- Linux Network Administrator's Guide - Online
Tcpreplay is a set of Unix tools which allows the editing and replaying of captured network traffic in pcap (tcpdump) format. It can be used to test a variety of passive and inline network devices, including IPS's, UTM's, routers, firewalls, and NIDS.
Release focus: Major bugfixes
This release fixes some serious regression bugs that prevented tcprewrite from editing most packets on Intel and other little-endian systems. Some smaller bugfixes and tweaks to improve replay performance were made.
Aaron Turner [contact developer]
- Two Tips for Network Performance Checking
- Network Connectivity Troubleshooting
- Checking Network Settings
- Checking Routing Settings
- Changing the IP Address
- About the Author
... .... ...
b. Use ping with small (1Kbyte) and large (10K) packet sizes: Sometime routers in the network can have issues depending upon the size of the packet, as some use different queues within the router depending upon packet size.....
Network Connectivity Troubleshooting
Here is a checklist to help you locate and resolve network connectivity problems.
ifconfig -ato check that interfaces are plumbed; that is, that they exist in the output. Also, check the network address and
netmaskof the interface.
To plumb an interface, run the command
ifconfig <interface><instance> plumb, for example:# ifconfig ce1 plumbUse
ifconfigto see if the interface now exists.# ifconfig -a lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 ce0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 444.555.666.7 netmask ffffff00 broadcast 444.555.666.255 ether 5:3:de:de:de:de ce1: flags=1000842<BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 6 inet 0.0.0.0 netmask 0 ether 3:4:aa:bb:cc:dd
Give the interface its
netmask.ifconfig ce1 518.104.22.168 netmask 255.255.255.0 up
2. Ping the interface address; it should work!
3. Ping your router/switch. If you see
=> fail, then check your network settings. (See the Checking Network Settings section of this article.)
4. Ping a host on another network. If that doesn't work, check the routing table. (See the Checking Routing Settings section of this document.)
... ... ...
Follow these guidelines while troubleshooting an IP network:
- Always begin at the network interface layer and work up to the application layer.
- Make sure protocols at each layer of the Internet protocol suite can communicate with the layer above and below it.
To troubleshoot an IP network1. Ping successfully.
If you can ping successfully, you have verified IP communications between the network interface layer and the internet layer. The Ping command uses the Address Resolution Protocol (ARP) to resolve the IP address to a hardware address for each echo request and echo reply.
2. Establish a session with a host.
If you can establish a session, you have verified TCP/IP session communications from the network interface layer through the application layer.
Note If you are unable to resolve a problem, you may need to use an IP analyzer (such as Microsoft Network Monitor) to view network activity at each layer.
The first goal in troubleshooting is to make sure you can successfully ping an IP address. Ping a host with its host name only after you can successfully ping the host with its IP address.
To troubleshoot the network interface and internet layers by using the Ping command1. Ping the loopback address to verify that TCP/IP was installed and loaded correctly.
If this step is unsuccessful, verify that the system was restarted after TCP/IP was installed and configured.
2. Ping your IP address to verify that it was configured correctly.
If this step is unsuccessful, view the configuration by using the Network application in the Windows NT Control Panel to verify that the address was entered correctly, and verify that the IP address is valid and that it follows addressing guidelines.
3. Ping the IP address of the default gateway to verify that the gateway is functioning and configured correctly.
If this step is unsuccessful, verify that you are using the correct IP address and subnet mask.
4. Ping the IP address of a remote host to verify the connection to the wide area network.
If this step is unsuccessful:
- Make sure that IP routing is enabled.
- Verify that the IP address of the default gateway is correct.
- Make sure that the remote host is functional.
- Verify that the link between routers is operational.
After you can successfully ping the IP address, ping the host name to verify that the name is configured correctly in the HOSTS file.
Verifying TCP/IP Session Communications
The next goal in troubleshooting is to successfully establish a session. Use one of the following methods to verify communications between the network interface layer and the application layer.
To establish a session with a Windows NT–based computer or other RFC-compliant NetBIOS-based host, make a connect with the Net use or Net view command. If this step is unsuccessful:
- Verify that the destination (target) host is NetBIOS-based.
- Confirm that the scope ID on the destination host matches that of the source host.
- Verify that you used the correct NetBIOS name.
- If the destination host is on a remote network, check the LMHOSTS file for the correct entry.
To establish a session with a non-RFC-compliant NetBIOS-based host, use the Telnet or FTP utility to make a connection. If this step is unsuccessful:
- Verify that the destination host is configured with the Telnet daemon or FTP daemon.
- Confirm that you have the correct permissions on the destination host.
- Check the HOSTS file for a valid entry if you are connecting using a host name.
A home directory already exists for this service. Creating a new home directory will cause the existing directory to no longer be a home directory. An alias will be created for the existing home directory. This message is a warning only. It appears when the new home directory you are trying to add already exists. The maximum number of home directories allowed is one per virtual root.
Invalid Server Name
While trying to connect to a server, you typed an invalid server name. Try to connect again and make sure you type the name correctly.
More than 1 home directory was found. An automatic alias will be generated instead.
When getting the directory entries from the server, Internet Service Manager has determined that a duplicate exists. This duplicate may have been added by using the Registry Editor or in some other way.
No administerable services found.
While trying to connect to a server, you typed the name of a server that has no installed services that Internet Service Manager can administer. That is, WWW, FTP, and gopher services have not been installed on the computer you connected to.
The alias you have given is invalid for a non-home directory.
You're trying to assign the alias '/' to a non-home directory. This alias automatically means home.
The connection attempt failed because there's a version conflict between the server and client software.
This message is an RPC error message. The RPC interface does not match what is expected. This should happen only if you are running a beta admininstration tool or server. The official error is RPC_S_UNKNOWN_IF.
The service configuration DLL 'filename' failed to load correctly.
The named service configuration DLL (for example, W3scfg.dll) failed to load. The DLL or one if its dependencies could be missing or corrupted. Generally this is a setup problem. Run the Setup program and select Remove All, then reinstall Microsoft Internet Information Server.
Unable to connect to target machine.
This message is an RPC error message that appears while executing an API. The computer could be offline. The system error was EPT_S_NOT_REGISTERED or RPC_S_SERVER_UNAVAILABLE.
Unable to create directory.
The directory name or path you typed in in the New Directory Name box cannot be created. It could be an invalid path, or a file may already exist that has this name.
15 May 2000 (support.novell.com) This document addresses communication issues that generate about a third of the support calls coming into the TCP/IP group at Novell Technical Support. We recommend that anyone who is implementing TCP/IP in a NetWare 5.x environment read and understand the information presented here.
This article is divided into two parts: understanding the concepts behind IP routing, and troubleshooting common TCP/IP problems. A follow-up article will explain some of the TCP/IP tools that are available for use in troubleshooting problems in a TCP/IP environment.
Concepts Behind TCP/IP Routing
The majority of connectivity issues involve problems with routing table entries. Every packet being processed by a TCP/IP host has a source and destination IP address. Upon receiving each packet, the IP protocol examines the destination address of the packet, compares it with entries in its local routing table, and then decides what action to take:
- If the destination IP address is itself (that is, to a local application such as GroupWise, BorderManager Proxy Server, etc.), the packet is passed up to a protocol layer above IP.
- If the packet is destined for another known network, the packet is forwarded through one of the locally-attached network adapters. (This assumes that the TCP/IP host has multiple interfaces and has routing enabled.)
- If neither of the above apply, the packet is discarded.
The TCP/IP routing table can maintain four different types of routes, listed below in the order that they are searched for a match:
- Host (a route to a single, specific destination IP address)
- Subnet (a route to a subnet)
- Network (a route to an entire network)
- Default (used when there is no other match)
IP compares the destination IP address of the packet that it is processing with the entries in the table. If IP finds that a host entry exists and matches the destination IP address, it will forward the packet to the next hop associated with that host entry. Host entries are usually found in routing tables when ICMP (Internet Control Message Protocol) has added the entry because of the pathMTU algorithm, or from an "ICMP redirect" call. To check this, load the TCPCON utility at the server console prompt and look at the IP Routing Table option to verify if the protocol associated with that route is ICMP.
IP has three classes of addresses: Class A, Class B and Class C. Each class contains a default subnet mask (for instance, Class A has 255.0.0.0. as a default subnet) until a class of addresses is broken into extra networks (i.e., subnetted). However, once the network is subnetted, the IP address will not have the default subnet mask.
So if IP doesn't find a host entry, but does find a subnet entry that matches the packet's destination IP address, IP will forward the packet to the next hop associated with that subnet entry. Subnet entries exist when RIP2 (Routing Internet Protocol v2), OSPF (Open Shortest Path First), or static entries have been added to the routing table through a non-default subnet mask.
If IP doesn't find a subnet entry in the TCP/IP routing table but does find a network entry that matches the destination IP address, IP will forward the packet to the next hop associated with that network entry. (Customers running in default NetWare TCP/IP mode will have network entries.)
Finally, if IP doesn't find a network entry, but does find that a default route entry exists, IP will forward the packet to the next hop associated with that default entry. The default route is most commonly inserted as a static route through NetWare's server console INETCFG utility. However, the route may also be learned via RIP or OSPF. Failure to at least have a default route can often lead to communication problems on the network.
If an IP packet match has not been found in the TCP/IP routing table at this stage, the packet is simply dropped and an ICMP "destination unreachable" message is triggered to notify the sender that the host or network is unreachable.
When a TCP/IP communication problem occurs, the most common reason is that a route entry doesn't exist for the network or host with which you are trying to communicate. When this is the case, you can either add a route entry or try to figure out why the route is missing.
Troubleshooting Common TCP/IP Problems
When troubleshooting any networking problem, it is helpful to take a logical approach. Some questions to ask are:
- What does work?
- What doesn't work?
- How are the things that do and don't work related?
- Have the things that don't work ever worked on this computer/network?
- If so, what has changed since the last time it did work?
Troubleshooting a problem "from the bottom up" is often a good way to quickly isolate what's wrong and come up with a solution. The "bottom up" approach from an IP routing perspective is to start by verifying that the problem is not related to the physical layer (cabling, hubs, switches, and so on) or ARP (Address Resolution Protocol). Next, you ensure that the IP routing table is functioning correctly. Finally, you check to see whether the problem is at a generic TCP/UDP or application level.
A pattern is just that: It is not a firm set of rules-it's a set of guidelines. If you follow a troubleshooting method consistently, it will help you to find solutions more easily. You will be able to zero in on the root cause of the issue and quickly resolve it. One nice thing about this pattern is that it is neither Linux- nor TCP/IP-specific. You can apply it to a variety of problems-I make no promises about in-law problems, though.
To try to set this pattern into context, each step of the pattern is described in its own section. Nine steps are involved in the pattern, as shown in Figure 1.
A nine-step problem-solving pattern
Step 1: Clearly Describe the Symptoms
There's no good way to attack a problem until you know what the problem really is. Far too often, system and network administrators hear a rather poor (if not outright misleading) description of the problem. It's then your job to dig in and find out what's really going on.
As you can probably guess, you'll need some interviewing skills to get a clear description of the symptoms from a user. People don't want to hide the truth from you, but they often have predetermined the problem, coloring their perception of the issues involved.
It's a good idea to take notes as you're talking with someone, periodically summarizing the problem description as you go. This can help you spot follow-up questions to ask the user. It can also help jog a user's memory for other tidbits.
Never hesitate to call or email the user back with further questions to clarify the situation. It is certainly better to get all the answers you need up-front, but the reality is that you might not know all the questions that you need to ask until you've gotten your hands dirty working on the problem. If you need more detail, go get it.
Holding your interview at the customer's location also gives you a chance to say, "Show me." This enables you to see what the user is doing and perhaps to identify some more key points about the problem. Sometimes it will also reveal the problem as one of those transient things that just won't show up when you're there to see it.
If you run into a problem that you can't reproduce, you have yet another problem on your hands what to do about it. The best thing is often to set up a monitoring plan with the user. Get all the details that you can, and tell the user to call you back when the problem recurs. Leave the user with a list of questions to try to answer when calling you back. On your end, you should maintain a log so that you can track details about the problem.
There is no good rule to determine when a problem is clearly stated. This is fairly objective. If you think it's clear enough, it probably is. If you're not sure, try to describe the problem to someone else. (It really doesn't matter whether that person understands networking. In fact you could try explaining it to a house plant it's the process of talking through the problem while describing the symptoms that helps clarify things for you.)
As you're talking with people about the problem, see if there are other hosts with the same symptoms. If people haven't seen this problem, ask them to try to reproduce it. If there isn't anyone else available, try to reproduce it yourself. Knowing whether this problem affects a single host, a local group of hosts, or all the hosts on a network will help you when you hit Step 2.
Some key questions that you should know the answers to are listed here:
What applications or protocols are affected?
- What hosts are involved?
- What is common between affected hosts?
- When did the problem start?
- Is this a constant problem?
- If the problem is not constant, does it occur at a regular time or interval?
Step 2: Understand the Environment
When you have a clear description of the symptoms, you must be able to understand the environment that the problem occurs in to effectively troubleshoot it. Gaining this understanding is really a twofold job: It requires both identifying the pieces involved in the problem and understanding how those pieces should act when they are not experiencing the problem.
The first task typically means creating a subset of your network map, showing the portions of the network that are involved in the problem. Sometimes this new map will be a logical map, and sometimes you will want to draw it out.
The second task, understanding how things should be behaving, is made much easier if you look at a snapshot of how your network acted before the problem occurred. These snapshots are called baselines and are covered in more detail in Chapter 7 of Networking Linux. In the absence of a baseline, you will need to create a model of the proper behavior of the network from your understanding of its layout, components, and configuration.
Step 3: List Hypotheses
Having made a list of the affected systems (in Step 2), we can begin to list potential causes of the problem. It's safe to brainstorm at this stage because we will be narrowing our search later. In fact, it is better to be overly creative here and end up with extra hypotheses than to miss the actual cause and chase blind leads.
Just like the maps of the problem environment, your list of hypotheses doesn't need to be anything formal. A mental list is normally fine; something scrawled on a piece of scratch paper is even better. Sometimes, though, you'll want a formal document; big network issues affecting lots of people just cry out for formal documents (well, at least the managers involved cry a lot).
Step 4: Prioritize Hypotheses and Narrow Focus
This is the step where we stop making work for ourselves and start making our jobs easier. Although we've just made a list of things that could be the problem, we don't want to research every item on the list if we don't have to. Instead, we can prioritize the potential causes and chase down the most likely ones first. Eventually, we'll either solve the problem or run out of possible causes (in which case we need to go back to Step 3).
As you're prioritizing your list, pay particular attention to recent changes. These are often the source of your problems. Changes meant to improve the environment often have unintended consequences.
Step 5: Create a Plan of Attack
Now that you've identified the most likely causes of the problem, it's time to disprove each of the possible causes in turn. As each of the potential causes is eliminated, you narrow your search further. Eventually you will reach a problem that you can't disprove, and your most recent attempt will have corrected the problem.
One thing you don't want to do is make changes in many areas at once. Making one change at a time, working on only one component per change, ensures that you'll be able to identify the modifications that actually fixed the problem.
You don't need a hard and fast plan for the follow-up steps to take if a test doesn't solve or identify the problem. However, you should at least think about where you're going to go next. Your prioritized list will be of great help as you make plans for the future. Don't be too surprised if your plans take a slight detour, though crystal balls are notoriously vague.
A final step in preparing your plan is to review it with those holding a stake in solving the problem. This probably includes management, the customer suffering the problem, and anyone working with you in troubleshooting.
Step 6: Act on Your Plan
With a plan in place and reviewed by those with a stake in solving the problem you're prepared to act.
While you're acting on the plan, take good notes and make sure that you keep copies of configuration files that you're changing. Nothing is worse than finishing off a series of tests, finding that they didn't solve the problem, and then discovering that you introduced a new problem and can't easily back out your changes. It can also be disheartening to have insufficient or misleading information to report at the conclusion of your test.
Step 7: Test Results
You'll never know whether your test has done anything without checking to see if the problem still exists. You'll also never know whether you've introduced new problems with your changes if you don't test. Testing gives you confidence that all is as it should be.
I recommend that you make it a practice to keep a suite of tests that exercise the main functionality of your network. Each time you run into a problem, add a test or two to check for it as well. Given a suite like this and a system to run all the tests, you can feel confident that your network is solid at the end of the day.
Step 8: Apply Results of Testing to Hypotheses
This is the pay-off step. If your testing has isolated and solved the problem, you're almost done. All that remains is to make the changes introduced in your test a permanent part of the network. If you haven't solved the problem yet, this is where you sit down with your results and your list of hypotheses to see what you've learned.
If the most recent test solved your problem, this step is unnecessary. You've found the problem and (hopefully) corrected it. If your efforts haven't solved the problem (or if you've created a new one), you need to look at how the data from this test affects your prioritized list of possible causes. Does your prioritization need to change? Are more possibilities pointed out by this test? If the test didn't identify and solve your problem, did it eliminate this possible cause? If not, what further tests are needed to make sure that this possible cause isn't the root of your problem?
Step 9: Iterate as Needed
Most often, you won't need to go all the way back to Step 1 or 2. Instead, you'll be able to go back to Step 4 to reprioritize and refocus. You might find that the things you learned in your most recent test point you in a slightly different direction. It is also possible that you will find another possibility in this case, you can jump back to Step 3 and add it to your list.
If you've completely run out of possible causes or found additional information, you might even want to go all the way back to Step 1 and restate the problem just to make sure that you've not missed the mark completely.
This article is excerpted from Networking Linux: A Practical Guide to TCP/IP by Pat Eyler (New Riders Publishing, 2000, ISBN 0735710317).
Refer to Chapter 6 of this book for more detailed information on the material covered in this article.
Two of the fundamental aspects of Linux system security and troubleshooting are knowing what services are running, and what connections and services are available. We're all familiar with ps for viewing active services. netstat goes a couple of steps further, and displays all available connections, services, and their status. It shows one type of service that ps does not: services run from inetd or xinetd, because inetd/xinetd start them up on demand. If the service is available but not active, such as telnet, all you see in ps is either inetd or xinetd:
$ ps ax | grep -E 'telnet|inetd'
520 ? Ss 0:00 /usr/sbin/inetd
But netstat shows telnet sitting idly, waiting for a connection:
$ netstat --inet -a | grep telnet
tcp 0 0 *:telnet *:* LISTEN
This netstat invocation shows all activity:
$ netstat -a
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 *:telnet *:* LISTEN
tcp 0 0 *:ipp *:* LISTEN
tcp 0 0 *:smtp *:* LISTEN
tcp 0 0 192.168.1.5:32851 nest.anthill.echid:ircd ESTABLISHED
udp 0 0 *:ipp *:*
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 2 [ ACC ] STREAM LISTENING 1065 /tmp/ksocket-carla/klaunchertDCh2b.slave-socket
unix 2 [ ACC ] STREAM LISTENING 1002 /tmp/ssh-OoMGfFm666/agent.666
unix 2 [ ACC ] STREAM LISTENING 819 private/smtp
Your total output will probably run to a couple hundred lines. (A fun and quick way to count lines of output is netstat -a | wc -l.) You can ignore everything under "Active UNIX domain sockets." Those are local inter-process communications, not network connections. To avoid displaying them at all, do this:
$ netstat --inet -a
This will display only network connections, both listening and established. Already netstat has earned its keep- both the telnet and smtp services are running. This is bad, because I don't want to have either a telnet or smtp server running on this machine. So now I know I need to turn them off, and re-configure my startup files so they won't start at boot.
How do you know what services you want running? That is a mondo subject for another day, and an important one. For example, if your system has been compromised, this is one place to find evidence of a Trojan horse or other malware phoning home. In this example, ipp is Internet Printing Protocol, which belongs to CUPS (Common Unix Printing System.) If you want your printer to work, this needs to be here. The connection on 192.168.1.5:32851 is my active IRC (Internet Relay Chat) connection. Refer to your /etc/services file to learn more about TCP and UDP ports, and the services assigned to them.
What It Means
"Proto" is short for protocol, which is either TCP or UDP. "Recv-Q" and "Send-Q" mean receiving queue and sending queue. These should always be zero; if they're not you might have a problem. Packets should not be piling up in either queue, except briefly, as this example shows:
tcp 0 593 192.168.1.5:34321 venus.euao.com:smtp ESTABLISHED
That happened when I hit the "check mail" button in KMail; a brief queuing of outgoing packets is normal behavior. If the receiving queue is consistently jamming up, you might be experiencing a denial-of-service attack. If the sending queue does not clear quickly, you might have an application that is sending them out too fast, or the receiver cannot accept them quickly enough.
"Local address" is either your IP and port number, or IP and the name of a service. "Foreign address" is the hostname and service you are connected to. The asterisk is a placeholder for IP addresses, which of course cannot be known until a remote host connects. "State" is the current status of the connection. Any TCP state can be displayed here, but these three are the ones you want to see:
LISTEN- waiting to receive a connection
ESTABLISHED- a connection is active
TIME_WAIT- a recently terminated connection; this should last only a minute or two, then change back to LISTEN. The socket pair cannot be re-used as long the TIME_WAIT state persists.
UDP is stateless, so the "State" column is always blank.
A socket pair is both sides of a TCP/IP connection, like this example for a locally-attached printer:
localhost:ipp localhost:34493 ESTABLISHED
Or a telnet connection to a remote server:
192.168.1.5:34437 22.214.171.124.pt:telnet ESTABLISHED
A socket is any hostname-port combination, or IP address-port.
Because all these things change often, how do you capture the changes? Run netstat continuously with the -c flag and record the output:
$ netstat --inet -a -c > netstat.txt
Then check email, start and stop services, surf the web, log in to a telnet BBS and play Legend of the Red Dragon; then review your capture file to see what it all looks like.
If netstat is taking too long, or not resolving a hostname at all, give it the -n flag to turn off DNS lookups:
$ netstat --inet -an
netstat can help diagnose NIC problems. Use the -i flag when you're troubleshooting a flakey connection, and you suspect your NIC:$ netstat -i Kernel Interface table Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 0 28698 0 0 0 33742 0 0 0 BMRU lo 16436 0 14 0 0 0 14 0 0 0 LRUYou should see large numbers in the RX-OK (received OK) and TX-OK (transmitted OK) columns, and very low numbers in all the others. If you are seeing a lot of RX-ERRs or TX-ERRs, suspect the NIC or the patch cable. This is what the flags mean:B = broadcast address L = loopback device M = promicuous mode R = interface is running U = interface is up
Linux Network Administrator's Guide , by Olaf Kirch & Terry Dawson
Google matched content
IBM Redbooks Debugging UNIX System Services, Lotus Domino, Novell Network Services, and other Applications on OS-390
Implementing, Managing, and Troubleshooting Network Protocols and Services Troubleshooting TCP-IP Connections
Advanced Network Configuration and Troubleshooting (2.205.2)
IBM Redbooks TCP-IP Tutorial and Technical Overview
Implementing, Managing, and Troubleshooting Network Protocols and Services Troubleshooting TCP-IP Connections
Sometimes when you talk to a seasoned system or network administrator, he'll tell you that he knows that something is wrong when things don't feel right. This isn't an admission of paranormal powers; it's just a shorthand method for explaining that these experts know how their system or network is supposed to behave and that it isn't acting like that now. These administrators have created a baseline for their environment. Not all of them have done it formally, but the ones who have will have gained significant added benefits.
A Baseline Defined
Several things make up a baseline, but at its heart, a baseline is merely a snapshot of your network the way it normally acts. The least effective form of a baseline is the "sixth sense" that you develop when you've been around something for a while. It seems to work because you to notice aberrations subconsciously because you're used to the way things ought to be. Better baselines will be less informal and may include the following components:
- Summarized network utilization data
- Logs of work done on the network
- Maps of the network
- Records of equipment on the network and related configuration data
In Chapter 10, "Network Monitoring Tools" we discussed the ethereal network analyzer. This tool's capability to save capture files (or traces) enables you to maintain a history of your network. If the only traces you have saved represent your troubleshooting efforts, you won't have a very good picture of your network.
You also need to be aware that a lot of things will influence the contents of the traces you collect. Weekend vs. weekday; Monday or Friday vs. the rest of the week; and time of day are all examples of the kinds of factors that will affect your data. Running ethereal (or some other analyzer) at least three times a day, every day, and saving the capture file will give you a much clearer idea of how things normally work.
Several tools can give you a quick look at your network's behavior: netstat, traceroute, ping, and even the contents of your system logs are all good sources of information.
The netstat tool can show you several important bits of information. Running it with the -M, -i, and -a switches are especially helpful. I typically add the -n switch to netstat as well this switch turns off name resolution, which is a real boon if DNS is broken or IP addresses don't resolve back to names properly. The -i switch gives you interface specific information:
[[email protected] sgml]$ netstat -i
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 0 0 0 0 39 0 0 0 BRU
lo 3924 0 36 0 0 0 36 0 0 0 LRU
The -M switch gives information pertaining to masqueraded connections:
[[email protected] pate]$ netstat -Mn
IP masquerading entries
prot expire source destination ports
tcp 59:59.96 192.168.1.10 126.96.36.199 1028 -> 80 (61002)
tcp 58:43.75 192.168.1.10 188.8.131.52 622 -> 22 (61001)
udp 16:37.72 192.168.1.10 184.108.40.206 1025 -> 53 (61000)
The -a switch gives connection-oriented output (this output has been abbreviated):
[[email protected] pate]$ netstat -an
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN
udp 0 0 0.0.0.0:111 0.0.0.0:*
raw 0 0 0.0.0.0:1 0.0.0.0:* 7
raw 0 0 0.0.0.0:6 0.0.0.0:* 7
Active UNIX domain sockets (servers and established)
Proto RefCnt Flags Type State I-Node Path
unix 1 [ ] STREAM CONNECTED 1332 /tmp/.X11-unix/X0
unix 1 [ ] STREAM CONNECTED 1330 /tmp/.X11-unix/X0
unix 0 [ ] DGRAM 440
[[email protected] pate]$
The traceroute tool is especially important for servers that handle connections from disparate parts of the Internet. Setting up several traceroutes to different remote hosts can give you an indication of remote users connection speeds to your server.
The ping tool can help you watch the performance of a local or remote network in much the same way that traceroute does. It does not give as much detail, but it requires less overhead.
When users connect to services on your hosts, they leave a trail through your log files. If you use a central logging host and a log reader to grab important entries, you can build a history of how often services are used and when they are most heavily utilized.
You will likely find yourself touching a lot of the equipment on your network, so it is important that you keep good records of what you do. Even seemingly blind trails in troubleshooting may lead you to discover information about your network. In addition, you will find that your documentation will be an invaluable aid the next time you need to troubleshoot a similar problem.
Some people like to carry around a paper notebook to keep their records in; others prefer to keep things online. Both camps have good points, many related to information access. If you keep everyn't have it handy, it es you no good. Similarly, if everything is online and the network is down, you're in bad shape.
My preference is to keep things online, but in a cvs repository. Then you can keep it on a central server or two while also keeping a copy on your laptop/PC/palmtop. If you like, you can even grab printouts. A nice benefit to this is that several people can make updates to documentation and then commit their changes back to the cvs repository when they've finished.
I won't get into the Web vs. flatfile vs. database vs. XML vs. whatever conflict. They all have benefits. Choose the right option for your organization, and stick to it. The important bit is that you have the data, right?
A roundly ignored set of baseline information is the network map. If you have more than two systems in your network and don't have a map, set down this book for 20 minutes and sketch something out. It doesn't have to be pretty, just reasonably accurate. Are you back? Good. Now that you have a map showing what is where, we can get back to work.
Most people want to deal with two kinds of maps. The first is a topological/physical map, which shows what equipment is where and how it is connected. The second is a logical map. This shows what services are provided and what user communities are supported by which servers. If you can combine these two maps, so much the better; color coding, numeric coding, and outlined boxes are all mechanisms that can help with this. A sample map is shown in Figure 1.
Figure 1 A sample network map
Like the information discussd that you keep your maps online and in a couple of places. (cvs can be a good solution here as well.) Nicely done maps also look good on your wall, not to mention that this is a convenient place to find them when a problem breaks out and you need to start troubleshooting.
You should also have accurate records of the hardware and software in your network. At a minimum, you should have a hardware listing of each box on the network, a list of system and application levels (showing currently installed versions and patches), and configurations of the same. If you keep this in cvs, you'll also have a nice mechanism for looking at your history.
If you decide to keep these records, it is vital that they be kept up-to-date. Every time you make a change, you should edit the appropriate file and commit it to cvs. If you fall behind, you'll miss something, and then you'll really be stuck.
This article is excerpted from Networking Linux: A Practical Guide to TCP/IP by Pat Eyler (New Riders Publishing, 2000, ISBN 0735710317).
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to to buy a cup of coffee for authors of this site|
Last modified: March 12, 2019