Softpanorama

May the source be with you, but remember the KISS principle ;-)
Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and  bastardization of classic Unix

TCP/IP Network Troubleshooting

News

Softpanorama Networking Links Recommended Books Recommended Links TCP/Protocol layers OSI Protocol Layers Network Troubleshooting Tools Troubleshooting Solaris Network Problems
NTP Troubleshooting NTP on Solaris Trobleshooting NTP on Red Hat Linux Solaris Inetd Services Solaris Multipathing Linux Multipath Troubleshooting TCP/IP Communication Issues TCP Performance Tuning
Classic Net tools Ftp Telnet ssh Mail HTTP Protocol Apache troubleshooting Network Security
Linux network configuration Solaris network configuration Ethernet ARP ICMP Routing NAT Firewalls
DHCP NIS Troubleshooting NFS Problems DNS Troubleshooting Postfix Troubleshooting Postfix Connection Refused Problem Samba LDAP
RPC ICMP Tools Nmap ntop ngrep rsync Network IDS Intrusion Detection
Xinetd sniffers Tcpdump Wireshark snoop Tips Humor Etc

Top Updates


Introduction

Network is now the most important parts of enterprise communication infrastructure -- the nerve system of modern enterprise. Network outages often lead to paralysis of organization as people are unable to find alternative ways of performing of their activities. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). It is also important continually evaluate your network's performance: serious networking problems sometimes begin as performance problems. Paying proper attention to performance can help you address issues before they become serious.

Network troubleshooting means recognizing and diagnosing correctly problems in production network working under considerable stress and pressure.

Note: the structure on this note was borrowed form old Solaris 8 sysadmin course but the content is different and original...

Like in any investigation you need to avoid jumping to conclusion and calmly collect all relevant facts. You can use famous "How to solve it " approach. Among more network specific issues

As most networking problems are repetitive use your records,
available baseline and a several heuristics

In general, there is no one correct way to determine the root cause of a networking problem. Like any troubleshooting of complex systems this is more art then science and the success depends both on your IQ and the level of experience with the environment.

In general, there is no one correct way to determine the root cause of a networking problem. Like any troubleshooting of complex systems this is more art then science and the success depends both on your IQ and the level of experience with the environment.

However, most networking problems are repetitive and there are a several heuristics that you can follow:

Troubleshooting Commandments

  1. If the problem is server related create a backup of the faulty system before fixing anything. Backup can be done only for configuration files or for the complete system. Complete backup is important as troubleshooting is a high stress activity and it is easy accidentally to destroy some files. Ghost is a great tool for performing quick complete backups and Ghost 2003 works with Linux ext filesystems. With the current sized of portable USB drives available most system partitions can be backuped on a USB drive. Typically we are talking about 4-6 GB of information so it does not take too long and can be done in parallel with the investigation. Such backup also can be indispensable if the fault disappears on its own: faults that fix themselves often come back on their own too.
  2. Create baseline of /etc  directory each day after first login.
     
  3. Make backup of each file before changing it. This is a simple rule that every sysadmin knows but it is way to often violated. And consequences can be very unpleasant. That prevents you from the most typical mistake in troubleshooting: losing the initial configuration.
  4. Simplify your environment, if possible. Where possible try to remove routers and firewalls from the networking path affected. Often problems are introduced by network devices. This is typical for example for home environments with cheap routers like Linksys.
     
  5. Exclude possibilities that changes were made by other people. In enterprise environment left hand often does not know what right is doing and similar effects can be due the fact that someone may have upgraded a router's operating system or a firewall's rule set. Patches are just special kind of upgrade and can introduce problems too.
  6. Have a testing plan and revise it as you progress. Make sure that you can replicate the reported fault at will. This is important because you should always attempt to re-create the reported fault after effecting any changes. You need to be sure that you are not changing or adding to the problem.
  7. Document all steps and results. This is important because you could forget exactly what you did to fix or change the problem. This is especially true when someone interrupts you as you are about to test a configuration change. You can always revert the system to the faulty state if you backed it up as suggested earlier.
  8. Where possible, make permanent changes to the configuration settings. Temporary changes may be faster to implement but cause confusion when the system reboots after a power failure months or even years later and the fault occurs again. Nobody will remember what was done by whom.

Common Network Problems

Following is a list of some common problems that occur:

Layers-based troubleshooting

When troubleshooting networks, some people prefer to think in layers, similar to the TCP/IP Model while others prefer to think in terms of functionality. Using the TCP/IP Model layered approach, you could start at either the Physical or Application layer. Start at either end of the model and test, draw conclusions, move to the next layer and so on.

The Application Layer

A user complains that an application is not functioning. Assuming the application has everything that it needs, such as disk space, name servers, and the like, determine if the Application layer is functional by using another system.

Application layer programs often have diagnostic capabilities and may report that a remote system is not available. Use the tcpdump or Wireshark command to determine if the application program is receiving and sending the expected data.

The Transport Layer and the Internet Layer

These two layers can be bundled together for the purposes of troubleshooting. Determine if the systems can communicate with each other. Look for ICMP messages that can provide clues as to where the problem lies. Could this be a router or switching problem? Are the protocols (TFTP, BOOTP) being routed? Are you attempting to use protocols that cannot be routed? Are the hostnames being translated to the correct IP addresses? Are the correct netmask and broadcast addresses being used? Tests between the client and server can include
using ping, traceroute, arp, and snoop.

The Network Interface Layer

Use snoop to determine if the network interface is actually functioning. Use the arp command to determine if the arp cache has the expected Ethernet or MAC address. Fourth generation hubs and some switches can be configured to block certain MAC addresses.

When troubleshooting connectivity problems here are some useful questions:

The Physical Layer

Check that the link status LED is lit. Test the connection with your laptop or with a known working cable. The link LED will be lit even if the transmit line is damaged. Verify that a mdi-x connection or crossover cable is being used if connecting hub to hub. See Faulty Cable for details

See also

Here are pages devoted to troubleshooting sorted in reverse order for frequency of visits:


Top Visited
Switchboard
Latest
Past week
Past month

Old News ;-)

[Jul 18, 2013] Random Findings

[Jul 18, 2013] Nine traits of the veteran network admin Data Center  By

July 15, 2013 | InfoWorld

Born or made, network admins share certain defining characteristics. Here are but nine

A few years ago, I wrote a somewhat tongue-in-cheek piece detailing nine traits of the veteran Unix admin. It enjoyed quite a reception and sparked all kinds of debate across the Internet, with people discussing each trait point by point and sparking skirmishes between rival factions. Since then, I've thought about giving network admins the same treatment, but never got around to it. It seems that this is the week. Here are a few of the many traits of the veteran network admin.

Veteran network admin trait No. 1: We already know it's down
Few things are more annoying than having your phone blow up with automated alert messages from your monitoring systems, scrambling to dig into the issue, only to be continually bombarded with humans texting/talking/emailing/calling with the same "Is x down?" question, or even worse, "The network's down!" If the outage is significant, we already know about it, and we are trying to work on it as fast as we possibly can. Continued attempts to deliver elderly information will only impede that effort.

Veteran network admin trait No. 2: If we don't know it's down, it's probably not down
Conversely, if we get a message claiming, "The network's down!" yet we have not been notified by any monitoring system, then the problem is almost certainly the complaining user in question. To users, if there is any resource that cannot be contacted, whether that resource is internal to the network, on the Internet, or perhaps orbiting the earth, that means the network is "down." This apparently includes 404 errors from shady websites, mistyped URLs, or the lack of any sort of network connection on the user's laptop. Nothing is more grating to a network admin than someone claiming the network is "down." No, it isn't -- reboot your laptop.

Veteran network admin trait No. 3: We will ping and test several times before digging into the problem

If we begin looking into a problem, especially across a WAN or long-haul link with several providers in the middle, we will reserve judgment for the first several minutes. This is because these connections are subject to the vagaries of their path, and connectivity problems can come and go like ghosts in the night. A fiber WAN link that was stable a minute ago but is now exhibiting 30 percent packet loss will more than likely fix itself in short order. Only after a probationary period will we start digging deeper into the issue.

Veteran network admin trait No. 4: Believe it or not, we've tried turning it off and back on again

Many times, we will "fix" a problem by turning an interface off and back on again. In fact, this may be the first thing we try when troubleshooting an issue. Whether it's a problem with auto-negotiation, a sketchy cable, or sunspots, you'd be surprised at how often dropping and restarting an interface will restore proper operation. While networking may be a science, it's not without its white whale.

Veteran network admin trait No. 5: During an outage, we're not just staring at the screen -- we're following a path in our heads

If you come across a network admin working on a problem, looking at a routing table with a 1,000-yard stare, he's not experiencing performance anxiety or somehow "lost." He's running through dozens of possible scenarios in his head, following the routing and switching paths, and calculating possible problem scenarios. The uninitiated can't quite grok the way a network admin's mind works, but a parallel might be to imagine an intangible maze, then try to solve it. Somewhere in that maze lies a dead end that shouldn't be there. We're looking for that -- then we can take steps to open up the path again. 

 

Veteran network admin trait No. 6: We calculate subnet masks and CIDR as easily as breathing

I think it's safe to say that perhaps outside of the common Class C netmask, the overwhelming majority of humans do not understand subnetting, or CIDR. For network admins, this is as intrinsic and involuntary to our brain as breathing. It's not just knowing that a /28 is a netmask of 255.255.255.240, and that it describes 16 addresses, 14 usable, or where subnet boundaries lie. It's the ability to collapse large numbers of smaller subnets into larger descriptors in order to reduce routing table sizes, ACL applications, and a wide variety of other internal networking tasks. When we see contiguous networks sliced into smaller chunks in an ACL, yet with identical rules applied, we get agita. A /19 is a much better idea than a collection of /24s. The same applies to wildmasks.

Veteran network admin trait No. 7: We do not tolerate bugs; they are of the devil

On occasion, conventional troubleshooting or building new networks run into an unexplainable blocking issue. After poring over configurations, sketching out connections, routes, and forwarding tables, and running debugs, one is brought no closer to solving the problem. This is the unholy area of networking inhabited by the software bug. Network admins think of switching and routing software bugs as personal attacks, and they will usually excoriate a vendor when one is discovered. This is because before the determination is made that the problem is due to a bug, nothing makes sense whatsoever. It completely violates years of experience and knowledge, throws waste to logic, and causes immense amounts of stress and turmoil. You might think of it as if you spontaneously transmogrified into a difference species. Everything you've ever known suddenly does not apply, yet here you are.

Veteran network admin trait No. 8: We can read live packet streams and write highly complex filters in our sleep

Few items in networking and computing are even remotely similar to how they're portrayed on TV and in movies. The terminal window with text rapidly scrolling past, however, is one of them. In the sys admin world, that's usually a log file tail. In the networking world, it's usually a packet dump. Depending on what we're trying to do, we may quickly call up a packet capture on a live circuit and watch packets fly by in real time. It's not gibberish. We're looking for telltale signs, and we'll winnow down our capture using filters until we've found what we're looking for. Speed is usually of the essence in these cases, so we're well versed in BPF filtering syntax. Also, we tend to notice really strange things like IP headers that "just don't look right" and other oddities that may as well be alphabet soup to most people. Some people speak Klingon. We speak IP.

Veteran network admin trait No. 9: We take big risks all the time

Network admins tend to work on many remote devices. Unlike server admins who can pull up a console if the server is otherwise inaccessible via the network, we usually have no such luxury. This means that changes we make to certain devices carry with them the ever-present threat of the loss of connection. Basically, we're always a missed keystroke or two away from making a problem worse, or causing a big problem where none previously existed.

While making a change to a remote switch and applying that change to port 34 instead of port 35, say, we could inadvertently cause a complete lack of connectivity to the remote site, taking who knows what offline until the device is power-cycled or we -- or worse, someone else -- drive over to fix it manually. Ideally, this shouldn't happen in the middle of the night, but given the usual timing of network maintenance windows, it often does. Oh, and we sometimes have to make changes to remote devices knowing that if anything goes slightly wrong, we will lose connectivity -- such as when changing Internet providers at a remote site and reconfiguring the firewall with a script because we have no other persistent connection to it. We live with this understanding all day, every day.

I hope that this insight into the extremely logical, yet consistently dangerous world of the network admin has shed some light on how we work and how we think. I don't expect it to curtail the repeated claims of the network being down, but maybe it's a start. In fact, if you're reading this and you are not a network admin, perhaps you should find the closest one and buy him or her a cup of coffee. They could probably use it.

This story, "Nine traits of the veteran network admin," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

[ Also on InfoWorld: Nine traits of the veteran Unix admin | How to become a certified IT ninja | Get expert networking how-to advice from InfoWorld's Networking Deep Dive PDF special report. | Subscribe to InfoWorld's Data Center newsletter to stay on top of the latest developments. ]

 

Recommended Links

Google matched content

Softpanorama Recommended

Top articles

Sites

Net

General : TCP Protocol Layers : OSI Protocol Layers :Network Sniffers : Tools : Troubleshooting : Firewalls and Firewall Rules Auditing
Application Layer :
SSH : NFS : DNS : FTP : Telnet : Samba : Telnet Protocol : VNC : SMTP
Transport Layer
NAT : Subnetting and VLSM : TCP : TCP handshake : UDP
Internet Layer :
ARP : ICMP : Routing : IPv6

WWW

Perl Wiki as System Administrator Tool : Requests for non-existing web pages :

DNS

General: DNS Ports Usage : MX Records Checking for for Users of Web Hosting : DNS Security

Solaris DNS Server Installation and Administration : DNS Tools : DNS Troubleshooting : DNS Tutorials : History of DNS : DNS Tips :
Tools
Classic DNS Tools : dig : nslookup : hostname : DNS Zone Generators : Online DNS Tools : DNS Audit Scripts :

Tutorials

Solaris DNS Tutorial, part 1/3 : Solaris DNS Tutorial, part 2/3 : Solaris DNS Tutorial, part 3/3

Security

Cloud providers as intelligence collection hubs : Big Uncle is Watching You : Strategies of Defending Microsoft Windows against Malware : Fighting Spyware : Softpanorama Malware Defense Strategy : eMail Security  : Spam : TCP Wrappers

Mail
General :Spam : Mta : Mua : UUCP : SMTPProcmail filtering : eMail Security : MTA Log Analysers :
Sendmail
Sendmail : Minitutotial : Sendmail on RHEL : Configuring Solaris sendmail : Sendmail performance tuning : Sendmail Security : Sendmail Log Formats : Sendmail file permissions : Dual-instance sendmail
Postfix
Postfix : Postfix Troubleshooting : Postfix Connection Refused Problem
Spamfiltering
Spam : Email Etiquette : Email Overload

Linux Networking

TCP Performance Tuning : Trunking / Bonding Multiple Network Interfaces : Linux Multipath : Changing hostname : How to change IP address in RHEL : Linux Network Troubleshooting : Linux NFS : Linux Routing : Linux networking tips

The following Softpanorama links are sorted in the reverse order of the total monthly number of hits:



Etc

Society

Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers :   Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism  : The Iron Law of Oligarchy : Libertarian Philosophy

Quotes

War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda  : SE quotes : Language Design and Programming Quotes : Random IT-related quotesSomerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose BierceBernard Shaw : Mark Twain Quotes

Bulletin:

Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 :  Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method  : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law

History:

Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds  : Larry Wall  : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOSProgramming Languages History : PL/1 : Simula 67 : C : History of GCC developmentScripting Languages : Perl history   : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history

Classic books:

The Peter Principle : Parkinson Law : 1984 : The Mythical Man-MonthHow to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite

Most popular humor pages:

Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor

The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D


Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.

FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.

This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...

You can use PayPal to to buy a cup of coffee for authors of this site

Disclaimer:

The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.

Last modified: June 13, 2021