|Home||Switchboard||Unix Administration||Red Hat||TCP/IP Networks||Neoliberalism||Toxic Managers|
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better
|Softpanorama Networking Links||Recommended Books||Recommended Links||TCP/Protocol layers||OSI Protocol Layers||Network Troubleshooting Tools||Troubleshooting Solaris Network Problems|
|NTP||Troubleshooting NTP on Solaris||Trobleshooting NTP on Red Hat Linux||Solaris Inetd Services||Solaris Multipathing||Linux Multipath||Troubleshooting TCP/IP Communication Issues||TCP Performance Tuning|
|Classic Net tools||Ftp||Telnet||ssh||HTTP Protocol||Apache troubleshooting||Network Security|
|Linux network configuration||Solaris network configuration||Ethernet||ARP||ICMP||Routing||NAT||Firewalls|
|DHCP||NIS||Troubleshooting NFS Problems||DNS Troubleshooting||Postfix Troubleshooting||Postfix Connection Refused Problem||Samba||LDAP|
|RPC||ICMP Tools||Nmap||ntop||ngrep||rsync||Network IDS||Intrusion Detection|
Network is now the most important parts of enterprise communication infrastructure -- the nerve system of modern enterprise. Network outages often lead to paralysis of organization as people are unable to find alternative ways of performing of their activities. As a network administrator, your primary concern is maintaining connectivity of all devices (a process often called fault management). It is also important continually evaluate your network's performance: serious networking problems sometimes begin as performance problems. Paying proper attention to performance can help you address issues before they become serious.
Network troubleshooting means recognizing and diagnosing correctly problems in production network working under considerable stress and pressure.
Note: the structure on this note was borrowed form old Solaris 8 sysadmin course but the content is different and original...
Like in any investigation you need to avoid jumping to conclusion and calmly collect all relevant facts. You can use famous "How to solve it " approach. Among more network specific issues
In general, there is no one correct way to determine the root cause of a networking problem. Like any troubleshooting of complex systems this is more art then science and the success depends both on your IQ and the level of experience with the environment.
|In general, there is no one correct way to determine the root cause of a networking problem. Like any troubleshooting of complex systems this is more art then science and the success depends both on your IQ and the level of experience with the environment.|
However, most networking problems are repetitive and there are a several heuristics that you can follow:
Start searching by quoting a part or the entire error message exactly but without server specific information and searching for it. Be sure to put the search string in quotes. If it's a common problem, there's a good chance you will get some hits. Anything that looks like a FAQ is a good start; mailing list archives can also been a good source of information. Just be sure to check the archive indexes for other messages in the discussion.
If you are using a commercial product a good idea is to search vendor knowledgebase using your
vendor account. Sometimes they have useful documents for assisting in particular case in their knowledgebase.
Following is a list of some common problems that occur:
When troubleshooting networks, some people prefer to think in layers, similar to the TCP/IP Model while others prefer to think in terms of functionality. Using the TCP/IP Model layered approach, you could start at either the Physical or Application layer. Start at either end of the model and test, draw conclusions, move to the next layer and so on.
A user complains that an application is not functioning. Assuming the application has everything that it needs, such as disk space, name servers, and the like, determine if the Application layer is functional by using another system.
Application layer programs often have diagnostic capabilities and may report that a remote system is not available. Use the tcpdump or Wireshark command to determine if the application program is receiving and sending the expected data.
These two layers can be bundled together for the purposes of troubleshooting. Determine if the systems
can communicate with each other. Look for ICMP messages that can provide clues as to where the problem
lies. Could this be a router or switching problem? Are the protocols (TFTP, BOOTP) being routed? Are
you attempting to use protocols that cannot be routed? Are the hostnames being translated to the correct
IP addresses? Are the correct netmask and broadcast addresses being used? Tests between the client and
server can include
using ping, traceroute, arp, and snoop.
Use snoop to determine if the network interface is actually functioning. Use the arp command to determine if the arp cache has the expected Ethernet or MAC address. Fourth generation hubs and some switches can be configured to block certain MAC addresses.
When troubleshooting connectivity problems here are some useful questions:
Check that the link status LED is lit. Test the connection with your laptop or with a known working cable. The link LED will be lit even if the transmit line is damaged. Verify that a mdi-x connection or crossover cable is being used if connecting hub to hub. See Faulty Cable for details
Here are pages devoted to troubleshooting sorted in reverse order for frequency of visits:
July 15, 2013 | InfoWorld
Born or made, network admins share certain defining characteristics. Here are but nine
A few years ago, I wrote a somewhat tongue-in-cheek piece detailing nine traits of the veteran Unix admin. It enjoyed quite a reception and sparked all kinds of debate across the Internet, with people discussing each trait point by point and sparking skirmishes between rival factions. Since then, I've thought about giving network admins the same treatment, but never got around to it. It seems that this is the week. Here are a few of the many traits of the veteran network admin.
Veteran network admin trait No. 1: We already know it's down
Few things are more annoying than having your phone blow up with automated alert messages from your monitoring systems, scrambling to dig into the issue, only to be continually bombarded with humans texting/talking/emailing/calling with the same "Is x down?" question, or even worse, "The network's down!" If the outage is significant, we already know about it, and we are trying to work on it as fast as we possibly can. Continued attempts to deliver elderly information will only impede that effort.
Veteran network admin trait No. 2: If we don't know it's down, it's probably not down
Conversely, if we get a message claiming, "The network's down!" yet we have not been notified by any monitoring system, then the problem is almost certainly the complaining user in question. To users, if there is any resource that cannot be contacted, whether that resource is internal to the network, on the Internet, or perhaps orbiting the earth, that means the network is "down." This apparently includes 404 errors from shady websites, mistyped URLs, or the lack of any sort of network connection on the user's laptop. Nothing is more grating to a network admin than someone claiming the network is "down." No, it isn't -- reboot your laptop.
Veteran network admin trait No. 3: We will ping and test several times before digging into the problem
If we begin looking into a problem, especially across a WAN or long-haul link with several providers in the middle, we will reserve judgment for the first several minutes. This is because these connections are subject to the vagaries of their path, and connectivity problems can come and go like ghosts in the night. A fiber WAN link that was stable a minute ago but is now exhibiting 30 percent packet loss will more than likely fix itself in short order. Only after a probationary period will we start digging deeper into the issue.
Veteran network admin trait No. 4: Believe it or not, we've tried turning it off and back on again
Many times, we will "fix" a problem by turning an interface off and back on again. In fact, this may be the first thing we try when troubleshooting an issue. Whether it's a problem with auto-negotiation, a sketchy cable, or sunspots, you'd be surprised at how often dropping and restarting an interface will restore proper operation. While networking may be a science, it's not without its white whale.
Veteran network admin trait No. 5: During an outage, we're not just staring at the screen -- we're following a path in our heads
If you come across a network admin working on a problem, looking at a routing table with a 1,000-yard stare, he's not experiencing performance anxiety or somehow "lost." He's running through dozens of possible scenarios in his head, following the routing and switching paths, and calculating possible problem scenarios. The uninitiated can't quite grok the way a network admin's mind works, but a parallel might be to imagine an intangible maze, then try to solve it. Somewhere in that maze lies a dead end that shouldn't be there. We're looking for that -- then we can take steps to open up the path again.
Veteran network admin trait No. 6: We calculate subnet masks and CIDR as easily as breathing
I think it's safe to say that perhaps outside of the common Class C netmask, the overwhelming majority of humans do not understand subnetting, or CIDR. For network admins, this is as intrinsic and involuntary to our brain as breathing. It's not just knowing that a /28 is a netmask of 255.255.255.240, and that it describes 16 addresses, 14 usable, or where subnet boundaries lie. It's the ability to collapse large numbers of smaller subnets into larger descriptors in order to reduce routing table sizes, ACL applications, and a wide variety of other internal networking tasks. When we see contiguous networks sliced into smaller chunks in an ACL, yet with identical rules applied, we get agita. A /19 is a much better idea than a collection of /24s. The same applies to wildmasks.
Veteran network admin trait No. 7: We do not tolerate bugs; they are of the devil
On occasion, conventional troubleshooting or building new networks run into an unexplainable blocking issue. After poring over configurations, sketching out connections, routes, and forwarding tables, and running debugs, one is brought no closer to solving the problem. This is the unholy area of networking inhabited by the software bug. Network admins think of switching and routing software bugs as personal attacks, and they will usually excoriate a vendor when one is discovered. This is because before the determination is made that the problem is due to a bug, nothing makes sense whatsoever. It completely violates years of experience and knowledge, throws waste to logic, and causes immense amounts of stress and turmoil. You might think of it as if you spontaneously transmogrified into a difference species. Everything you've ever known suddenly does not apply, yet here you are.
Veteran network admin trait No. 8: We can read live packet streams and write highly complex filters in our sleep
Few items in networking and computing are even remotely similar to how they're portrayed on TV and in movies. The terminal window with text rapidly scrolling past, however, is one of them. In the sys admin world, that's usually a log file tail. In the networking world, it's usually a packet dump. Depending on what we're trying to do, we may quickly call up a packet capture on a live circuit and watch packets fly by in real time. It's not gibberish. We're looking for telltale signs, and we'll winnow down our capture using filters until we've found what we're looking for. Speed is usually of the essence in these cases, so we're well versed in BPF filtering syntax. Also, we tend to notice really strange things like IP headers that "just don't look right" and other oddities that may as well be alphabet soup to most people. Some people speak Klingon. We speak IP.
Veteran network admin trait No. 9: We take big risks all the time
Network admins tend to work on many remote devices. Unlike server admins who can pull up a console if the server is otherwise inaccessible via the network, we usually have no such luxury. This means that changes we make to certain devices carry with them the ever-present threat of the loss of connection. Basically, we're always a missed keystroke or two away from making a problem worse, or causing a big problem where none previously existed.
While making a change to a remote switch and applying that change to port 34 instead of port 35, say, we could inadvertently cause a complete lack of connectivity to the remote site, taking who knows what offline until the device is power-cycled or we -- or worse, someone else -- drive over to fix it manually. Ideally, this shouldn't happen in the middle of the night, but given the usual timing of network maintenance windows, it often does. Oh, and we sometimes have to make changes to remote devices knowing that if anything goes slightly wrong, we will lose connectivity -- such as when changing Internet providers at a remote site and reconfiguring the firewall with a script because we have no other persistent connection to it. We live with this understanding all day, every day.
I hope that this insight into the extremely logical, yet consistently dangerous world of the network admin has shed some light on how we work and how we think. I don't expect it to curtail the repeated claims of the network being down, but maybe it's a start. In fact, if you're reading this and you are not a network admin, perhaps you should find the closest one and buy him or her a cup of coffee. They could probably use it.
This story, "Nine traits of the veteran network admin," was originally published at InfoWorld.com. Read more of Paul Venezia's The Deep End blog at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.
[ Also on InfoWorld: Nine traits of the veteran Unix admin | How to become a certified IT ninja | Get expert networking how-to advice from InfoWorld's Networking Deep Dive PDF special report. | Subscribe to InfoWorld's Data Center newsletter to stay on top of the latest developments. ]
Google matched content
Perl Wiki as System Administrator Tool : Requests for non-existing web pages :
General: DNS Ports Usage : MX Records Checking for for Users of Web Hosting : DNS Security
Cloud providers as intelligence collection hubs : Big Uncle is Watching You : Strategies of Defending Microsoft Windows against Malware : Fighting Spyware : Softpanorama Malware Defense Strategy : eMail Security : Spam : TCP Wrappers
- Sendmail : Minitutotial : Sendmail on RHEL : Configuring Solaris sendmail : Sendmail performance tuning : Sendmail Security : Sendmail Log Formats : Sendmail file permissions : Dual-instance sendmail
- Postfix : Postfix Troubleshooting : Postfix Connection Refused Problem
- Spam : Email Etiquette : Email Overload
TCP Performance Tuning : Trunking / Bonding Multiple Network Interfaces : Linux Multipath : Changing hostname : How to change IP address in RHEL : Linux Network Troubleshooting : Linux NFS : Linux Routing : Linux networking tips
The following Softpanorama links are sorted in the reverse order of the total monthly number of hits:
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2018 by Dr. Nikolai Bezroukov. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) in the author free time and without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|You can use PayPal to make a contribution, supporting development of this site and speed up access. In case softpanorama.org is down you can use the at softpanorama.info|
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the author present and former employers, SDNP or any other organization the author may be associated with. We do not warrant the correctness of the information provided or its fitness for any purpose.
Last modified: March 12, 2019