|
Home | Switchboard | Unix Administration | Red Hat | TCP/IP Networks | Neoliberalism | Toxic Managers |
(slightly skeptical) Educational society promoting "Back to basics" movement against IT overcomplexity and bastardization of classic Unix |
|
Accidental reboot is a serious blunder, especially if you rebooted production box in the middle of the day. Not all applications survive such operation gracefully, especially if this happens in the middle of the write operation.
This is one of CLM (career limiting moves) for any administrator. Typically his happens with remote sessions when the administrator opens a shell to one box, then open one to another box. Then became distracted, after which returns to the wrong terminal window and send the wrong command to the wrong machine. Nobody is perfect.
|
The rule before issuing any reboot command should be classic "Stop. Think. Click".
But more constructive is prevent such things in interactive session by redefining the reboot and other dangerous command to aliases, which provide a warning with the name of the box and pause.
Stop. Think. Click |
Such script are usually called molly-guard. We will discuss them below
NOTE:
There are several such ideas.
alias reboot='/root/bin/safereboot'
Where safereboot is a simple script instead of the original command. Something like the following simple script:
#!/bin/bash #: safe_reboot: verify the name of the server before reboot #: Nikolai Bezroukov, 2013-2017. Released under Artistic license #: Version 3.5 (October, 2017) #: #: Invocation: #: mkuser.sh [short name of the server] # # DEBUG=0; SHUTDOWN=`which shutdown` (( DEBUG > 0 )) && SHUTDOWN='echo shutdown' HOST=`hostname -s` if [[ -f /root/noreboot ]] ; then echo Reboot of this server is currently prohibited. Please remove the file /root/noreboot to continue... exit 255 fi function verify_name { if [[ "$HOSTNAME" = "$HOST" ]] ; then echo "The current server is $HOST (full name $HOSTNAME) Please enter the short name if this is the server you intend to reboot: " else echo "The current server is $HOST Please enter this name, if this is the server you intend to reboot: " fi read answer if [[ "$answer" != "$HOST" ]]; then echo Wrong answer $ANSWER Reboot cancelled... exit 255 fi return 0 } function rebootme { users=`who | grep -v root | wc -1`; if (( users == 0 )) ; then $SHUTDOWN -r now else echo "ATTENTION: There are $users on this system" who echo You have 1 min to change your mind... Use shutdown -c to cancel $SHUTDOWN -r +1 fi } if [[ "$1" = "$HOST" ]] ; then rebootme fi verify_name (( $? == 0 )) && rebootme exit 0
This can be done in 5 min and is definitely better then no protection at all
/usr/bin
are aliases to consolehelper to control access to these commands through PAMThe name is interesting and has historical significance. Originally used of the plexiglass covers improvised for the BRS on an IBM 4341 mainframe after a programmer's toddler daughter (named Molly) tripped it twice in one day. Later generalized to covers over stop/reset switches on disk drives and networking equipment.
Ubuntu and Debian has a shell script called molly-guard which guards against accidental shutdowns-reboots (port exists for RHEL 5 and RHEL 6, but not RHEL7), It utilizes an interesting trick. If you put /usr/sbin before /sbin (for those systems that have those two directories, RHEL7 does not need to apply ;-) you can invoke your own script instead of system executable if command like reboot are submitted.This script was written in 2008 by Martin F. Krafft and did not changed much since then.
NAME
molly-guard - guard against accidental shutdowns/rebootsSYNOPSIS
shutdown [-hV] [--molly-guard-do-nothing] [-- script_options] halt [-hV] [--molly-guard-do-nothing] [-- script_options] reboot [-hV] [--molly-guard-do-nothing] [-- script_options] poweroff [-hV] [--molly-guard-do-nothing] [-- script_options]DESCRIPTION
molly-guard attempts to prevent you from accidentally shutting down or rebooting machines. It does this by injecting a couple of checks before the existing commands: halt, reboot, shutdown, and poweroff.This happens via scripts with the same names in /usr/sbin, so it only works if you have /usr/sbin before /sbin in your PATH! Before molly-guard invokes the real command, all scripts in /etc/molly-guard/run.d/ have to run and exit successfully; else, it aborts the command. run-parts(1) is used to process the directory. molly-guard passes any script_options to the scripts, and also populates the environment with the following variables: · MOLLYGUARD_CMD - the actual command invoked by the user. · MOLLYGUARD_DO_NOTHING - set to 1 if this is a demo-run. · MOLLYGUARD_SETTINGS - the path to a shell script snippet which scripts can source to obtain settings. molly-guard prints the contents of /etc/molly-guard/messages.d/COMMAND or /etc/molly-guard/messages.d/default to the console, if either exists. This is due to /etc/molly-guard/run.d/10-print-message.GUARDING SSH SESSIONS
molly-guard was primarily designed to shield SSH connections. This functionality (which should arguably be provided by the openssh-server package) is implemented in /etc/molly-guard/run.d/30-query-hostname. This script first tests whether the command is being executed from a tty which has been created by sshd. It also checks whether the variable SSH_CONNECTION is defined. If any of these tests are successful, test script queries the user for the machine´s hostname, which should be sufficient to prevent the user from doing something by accident. You can pass the --pretend-ssh script option to molly-guard to pretend that those tests succeeds. Alternatively, setting ALWAYS_QUERY_HOSTNAME in /etc/molly-guard/rc causes the script to always query. The following situations are still UNGUARDED. If you can think of ways to protect against those, please let me know! · running sudo within screen or screen within sudo; sudo eats the SSH_CONNECTION variable, and screen creates a new pty. · executing those command in a remote terminal window, that is a XTerm started on a remote machine but displaying on the local X server. You have been warned. You can use the --molly-guard-do-nothing switch to prevent anything from happening, e.g. halt --molly-guard-do-nothing.OPTIONS
--molly-guard-do-nothing Cause molly-guard to print the command which would be executed, after processing all scripts, instead of executing it. -h, --help Display usage information. -V, --version Display version information.SEE ALSO
shutdown(8), halt(1), reboot(8), poweroff(8).LEGALESE
molly-guard is copyright by martin f. krafft. Andrew Ruthven came up with the idea of using the scripts directory and submitted a patch, which I modified a bit. This manual page was written by martin f. krafft <[email protected]>. Permission is granted to copy, distribute and/or modify this document under the terms of the Artistic License 2.0
COPYRIGHT
Copyright © 2008 martin f. krafft
The Script itself is pretty short. It is just 126 lines of bash code.
#!/bin/sh # # shutdown -- wrapper script to guard against accidental shutdowns # # Copyright © martin f. krafftiT use only one important "rules file" which definitely can be greatly simplified to jsut asking the server name in all cases# Released under the terms of the Artistic Licence 2.0 # set -eu ME=molly-guard VERSION=0.4 SCRIPTSDIR="@cfgdir@/run.d" CMD="${0##*/}" EXEC="@REALPATH@/$CMD" case "$CMD" in halt|reboot|shutdown|poweroff|coldreboot|pm-hibernate|pm-suspend|pm-suspend-hybrid) if [ ! -f $EXEC ]; then echo "E: not a regular file: $EXEC" >&2 exit 4 fi if [ ! -x $EXEC ]; then echo "E: not an executable: $EXEC" >&2 exit 3 fi ;; *) echo "E: unsupported command: $CMD" >&2 exit 1 ;; esac usage() { cat <<-_eousage Usage: $ME [options] [-- script options] (shielding $EXEC) molly-guard's primary goal is to guard against accidental shutdowns/reboots. $ME will run all scripts in $SCRIPTSDIR and only invokes $EXEC if all scripts exited successfully. Specifying --molly-guard-do-nothing as argument to the command will make $ME echo the command it would execute rather than actually executing it. Options following the double hyphen will be passed unchanged to the scripts. Please see molly-guard(8) for more information. The actual command's help output follows: _eousage } CMDARGS= SCRIPTARGS= END_OF_ARGS=0 DO_NOTHING=0 for arg in "$@"; do case "$arg" in (*-molly-guard-do-nothing) DO_NOTHING=1;; (*-help) usage 2>&1 eval $EXEC --help 2>&1 exit 0 ;; --) END_OF_ARGS=1;; *\"*) echo 'E: cannot use double-quotes (") in arguments' >&2 exit 1 ;; *) if [ $END_OF_ARGS -eq 0 ]; then CMDARGS="${CMDARGS:+$CMDARGS }\"$arg\"" else SCRIPTARGS="${SCRIPTARGS:+$SCRIPTARGS }--arg \"$arg\"" fi ;; esac done do_real_cmd() { if [ $DO_NOTHING -eq 1 ]; then echo "$ME: would run: $EXEC $CMDARGS" exit 0 else eval exec $EXEC "$CMDARGS" fi } if [ $DO_NOTHING -eq 1 ]; then echo "I: demo mode; $ME will not do anything due to --molly-guard-do-nothing." >&2 fi if [ -n "${MOLLYGUARD_CMD:-}" ]; then do_real_cmd fi MOLLYGUARD_CMD=$CMD; export MOLLYGUARD_CMD MOLLYGUARD_DO_NOTHING=$DO_NOTHING; export MOLLYGUARD_DO_NOTHING MOLLYGUARD_SETTINGS="@cfgdir@/rc"; export MOLLYGUARD_SETTINGS # pass through certain commands case "$CMD $CMDARGS" in (*shutdown\ *-c*|*halt\ *-w*|*halt\ *-f*|*reboot\ *-f*) # allow canceling shutdowns, only write wtmp and force immediate halt echo "I: executing $CMD $CMDARGS regardless of check results." >&2 do_real_cmd ;; esac for script in $(run-parts --test $SCRIPTSDIR); do ret=0 eval $script $SCRIPTARGS || ret=$? if [ $ret -ne 0 ]; then echo "W: aborting $CMD due to ${script##*/} exiting with code $ret." >&2 exit $ret fi done do_real_cmd
Analysis of process tree can perform by pstree from package psmisc.x86_64 which is availed from standard
repositories in RHEL or compiling ftp://ftp.thp.uni-duisburg.de/pub/source/pstree.tar.gz
#!/bin/sh # # 30-ask-hostname - request the user to type in the hostname of the local host # # Copyright © 2006-2009 martin f. krafft# Copyright © 2012 Ludovico Gardenghi # Copyright © 2014 Josh Triplett # Copyright © 2015 Francois Marier # Copyright © 2017 Simó Albert i Beltran # Released under the terms of the Artistic Licence 2.0 # set -eu ME=molly-guard # Walk up the process tree until PID 1 is reached or a process with 'sshd' in # its /proc/ /cmdline is met. Return success if such a process is found. is_child_of_sshd_or_mosh_server() { pid=$$ ppid=$PPID # Be a bit paranoid with the guard, should some horribly broken system # provide a strange process hierarchy. '[ $pid -ne 1 ]' should be enough for # sane systems. [ -z "$pid" ] || [ -z "$ppid" ] && return 2 while [ $pid -gt 1 ] && [ $pid -ne $ppid ]; do if egrep -q 'sshd|mosh-server' /proc/$ppid/cmdline; then return 0 fi pid=$ppid ppid=$(grep ^PPid: /proc/$pid/status | tr -dc 0-9) done return 1 } [ -f "$MOLLYGUARD_SETTINGS" ] && . "$MOLLYGUARD_SETTINGS" PRETEND_SSH=0 for arg in "$@"; do case "$arg" in (*-pretend-ssh) PRETEND_SSH=1;; esac done # require an interactive terminal connected to stdin test -t 0 || exit 0 # we've been asked to always protect this host case "${ALWAYS_QUERY_HOSTNAME:-0}" in 0|false|False|no|No|off|Off) # only run if we are being called over SSH, that is if the current terminal # was created by sshd. command -v tty >/dev/null 2>&1 || exit 0 PTS=$(tty) if ! pgrep -f "^sshd.+${PTS#/dev/}\>" >/dev/null \ && [ -z "${SSH_CONNECTION:-}" ] \ && ! is_child_of_sshd_or_mosh_server; then if [ $PRETEND_SSH -eq 1 ]; then echo "I: $ME: this is not an SSH session, but --pretend-ssh was given..." >&2 else exit 0 fi else echo "W: $ME: SSH session detected!" >&2 fi ;; *) echo "I: $ME: $MOLLYGUARD_CMD is always molly-guarded on this system." >&2 ;; esac case "${USE_FQDN:-0}" in 0|false|False|no|No|off|Off) HOSTNAME="$(hostname --short)" ;; *) HOSTNAME="$(hostname --fqdn)" ;; esac sigh() { echo "Good thing I asked; I won't $MOLLYGUARD_CMD $HOSTNAME ..." >&2 exit 1 } trap 'echo;sigh' 1 2 3 9 10 12 15 echo -n "Please type in hostname of the machine to $MOLLYGUARD_CMD: " read HOSTNAME_USER || : HOSTNAME="$(echo "$HOSTNAME" | tr '[:upper:]' '[:lower:]')" HOSTNAME_USER="$(echo "$HOSTNAME_USER" | tr '[:upper:]' '[:lower:]')" [ "$HOSTNAME_USER" = "$HOSTNAME" ] || sigh trap - 1 2 3 9 10 12 15 exit 0
|
Switchboard | ||||
Latest | |||||
Past week | |||||
Past month |
Sep 21, 2015 | bris.ac.uk
Since I was looking at this already and had a few things to investigate and fix in our systemd-using hosts, I checked how plausible it is to insert a molly-guard-like password prompt as part of the reboot/shutdown process on CentOS 7 (i.e. using systemd).
Problems encountered include:
- Asking for a password from a service/unit in systemd -- Use
systemd-ask-password
and needs some agent setup to reply to this correctly?
- The
reboot
command always walls a message to all logged in users before it even runs the new reboot-molly unit, as it expects a reboot to happen. The argument--no-wall
stops this but that requires a change to the reboot command. Hence back to the original problem of replacing packaged files/symlinks with RPM
- The
reboot.target
unit is a "systemd.special" unit, which means that it has some special behaviour and cannot be renamed. We can modify it, of course, by editing thereboot.target
file.
- How do we get a systemd unit to run first and block anything later from running until it is complete? (In fact to abort the reboot but just for this time rather than being set as permanently failed. Reboot failing is a bit of a strange situation for it to be in ) The dependencies appear to work but the reboot target is quite keen on running other items from the dependency list -- I'm more than likely doing something wrong here!
So for now this is shelved. It would be nice to have a solution though, so any hints from systemd experts are gratefully received!
(Note that CentOS 7 uses systemd 208, so new features in later versions which help won't be available to us) This entry was posted in Uncategorized by dg12158 . Bookmark the permalink .
Nov 28, 2009 | www.ubuntugeek.com
molly-guard installs a shell script that overrides the existing shutdown/reboot/halt/poweroff commands and first runs a set of scripts, which all have to exit successfully, before molly-guard invokes the real command.One of the scripts checks for existing SSH sessions. If any of the four commands are called interactively over an SSH session, the shell script prompts you to enter the name of the host you wish to shut down. This should adequately prevent you from accidental shutdowns and reboots.
This shell script passes through the commands to the respective binaries in /sbin and should thus not get in the way if called non-interactively, or locally.
The tool is basically a replacement for halt, reboot and shutdown to prevent such accidents.
Install molly-guard in ubuntu
sudo apt-get install molly-guard
or click on the following link
Now that it's installed, try it out (on a non production box). Here you can see it save me from rebooting the box Ubuntu-test
Ubuntu-test:~$ sudo reboot
W: molly-guard: SSH session detected!
Please type in hostname of the machine to reboot: ruchi
Good thing I asked; I won't reboot Ubuntu-test ...
W: aborting reboot due to 30-query-hostname exiting with code 1.
Ubuntu-Test:~$By default you're only protected on sessions that look like SSH sessions (have $SSH_CONNECTION set). If, like us, you use alot of virtual machines and RILOE cards, edit /etc/molly-guard/rc and uncomment ALWAYS_QUERY_HOSTNAME=true. Now you should be prompted for any interactive session.
Oct 23, 2017 | matoski.com
rushing to leave and was still logged into a server so I wanted to shutdown my laptop, but what I didn't notice is that I was still connected to the remote server. Luckily before pressing enter I noticed I'm not on my machine but on a remote server. So I was thinking there should be a very easy way to prevent it from happening again, to me or to anyone else.So first thing we need to create a new bash script at /usr/local/bin/confirm with the contents bellow and with execution permissions
#!/usr/bin/env bash echo "About to execute $1 command" echo -n "Would you like to proceed y/n? " read reply if [ "$reply" = y -o "$reply" = Y ] then $1 "${@:2}" else echo "$1 ${@:2} cancelled" fiNow only thing left to do is to setup the aliases so they go through this command to confirm instead of directly calling the command.
So I create the following files
/etc/profile.d/confirm-shutdown.sh
alias shutdown="/usr/local/bin/confirm /sbin/shutdown"/etc/profile.d/confirm-reboot.sh
alias reboot="/usr/local/bin/confirm /sbin/reboot"Now when I actually try to do a shutdown/reboot it will prompt me like so.
ilijamt@x1 ~ $ reboot Before proceeding to perform /sbin/reboot, please ensure you have approval to perform this task Would you like to proceed y/n? n /sbin/reboot cancelled
Dec 04, 2012 | serverfault.com
My company makes an embedded Debian Linux device that boots from an ext3 partition on an internal SSD drive. Because the device is an embedded "black box", it is usually shut down the rude way, by simply cutting power to the device via an external switch.
This is normally okay, as ext3's journalling keeps things in order, so other than the occasional loss of part of a log file, things keep chugging along fine.
However, we've recently seen a number of units where after a number of hard-power-cycles the ext3 partition starts to develop structural issues -- in particular, we run e2fsck on the ext3 partition and it finds a number of issues like those shown in the output listing at the bottom of this Question. Running e2fsck until it stops reporting errors (or reformatting the partition) clears the issues.
My question is... what are the implications of seeing problems like this on an ext3/SSD system that has been subjected to lots of sudden/unexpected shutdowns?
My feeling is that this might be a sign of a software or hardware problem in our system, since my understanding is that (barring a bug or hardware problem) ext3's journalling feature is supposed to prevent these sorts of filesystem-integrity errors. (Note: I understand that user-data is not journalled and so munged/missing/truncated user-files can happen; I'm specifically talking here about filesystem-metadata errors like those shown below)
My co-worker, on the other hand, says that this is known/expected behavior because SSD controllers sometimes re-order write commands and that can cause the ext3 journal to get confused. In particular, he believes that even given normally functioning hardware and bug-free software, the ext3 journal only makes filesystem corruption less likely, not impossible, so we should not be surprised to see problems like this from time to time.
Which of us is right?
Embedded-PC-failsafe:~# ls Embedded-PC-failsafe:~# umount /mnt/unionfs Embedded-PC-failsafe:~# e2fsck /dev/sda3 e2fsck 1.41.3 (12-Oct-2008) embeddedrootwrite contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Invalid inode number for '.' in directory inode 46948. Fix<y>? yes Directory inode 46948, block 0, offset 12: directory corrupted Salvage<y>? yes Entry 'status_2012-11-26_14h13m41.csv' in /var/log/status_logs (46956) has deleted/unused inode 47075. Clear<y>? yes Entry 'status_2012-11-26_10h42m58.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47076. Clear<y>? yes Entry 'status_2012-11-26_11h29m41.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47080. Clear<y>? yes Entry 'status_2012-11-26_11h42m13.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47081. Clear<y>? yes Entry 'status_2012-11-26_12h07m17.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47083. Clear<y>? yes Entry 'status_2012-11-26_12h14m53.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47085. Clear<y>? yes Entry 'status_2012-11-26_15h06m49.csv' in /var/log/status_logs (46956) has deleted/unused inode 47088. Clear<y>? yes Entry 'status_2012-11-20_14h50m09.csv' in /var/log/status_logs (46956) has deleted/unused inode 47073. Clear<y>? yes Entry 'status_2012-11-20_14h55m32.csv' in /var/log/status_logs (46956) has deleted/unused inode 47074. Clear<y>? yes Entry 'status_2012-11-26_11h04m36.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47078. Clear<y>? yes Entry 'status_2012-11-26_11h54m45.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47082. Clear<y>? yes Entry 'status_2012-11-26_12h12m20.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47084. Clear<y>? yes Entry 'status_2012-11-26_12h33m52.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47086. Clear<y>? yes Entry 'status_2012-11-26_10h51m59.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47077. Clear<y>? yes Entry 'status_2012-11-26_11h17m09.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47079. Clear<y>? yes Entry 'status_2012-11-26_12h54m11.csv.gz' in /var/log/status_logs (46956) has deleted/unused inode 47087. Clear<y>? yes Pass 3: Checking directory connectivity '..' in /etc/network/run (46948) is <The NULL inode> (0), should be /etc/network (46953). Fix<y>? yes Couldn't fix parent of inode 46948: Couldn't find parent directory entry Pass 4: Checking reference counts Unattached inode 46945 Connect to /lost+found<y>? yes Inode 46945 ref count is 2, should be 1. Fix<y>? yes Inode 46953 ref count is 5, should be 4. Fix<y>? yes Pass 5: Checking group summary information Block bitmap differences: -(208264--208266) -(210062--210068) -(211343--211491) -(213241--213250) -(213344--213393) -213397 -(213457--213463) -(213516--213521) -(213628--213655) -(213683--213688) -(213709--213728) -(215265--215300) -(215346--215365) -(221541--221551) -(221696--221704) -227517 Fix<y>? yes Free blocks count wrong for group #6 (17247, counted=17611). Fix<y>? yes Free blocks count wrong (161691, counted=162055). Fix<y>? yes Inode bitmap differences: +(47089--47090) +47093 +47095 +(47097--47099) +(47101--47104) -(47219--47220) -47222 -47224 -47228 -47231 -(47347--47348) -47350 -47352 -47356 -47359 -(47457--47488) -47985 -47996 -(47999--48000) -48017 -(48027--48028) -(48030--48032) -48049 -(48059--48060) -(48062--48064) -48081 -(48091--48092) -(48094--48096) Fix<y>? yes Free inodes count wrong for group #6 (7608, counted=7624). Fix<y>? yes Free inodes count wrong (61919, counted=61935). Fix<y>? yes embeddedrootwrite: ***** FILE SYSTEM WAS MODIFIED ***** embeddedrootwrite: ********** WARNING: Filesystem still has errors ********** embeddedrootwrite: 657/62592 files (24.4% non-contiguous), 87882/249937 blocks Embedded-PC-failsafe:~# Embedded-PC-failsafe:~# e2fsck /dev/sda3 e2fsck 1.41.3 (12-Oct-2008) embeddedrootwrite contains a file system with errors, check forced. Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Directory entry for '.' in ... (46948) is big. Split<y>? yes Missing '..' in directory inode 46948. Fix<y>? yes Setting filetype for entry '..' in ... (46948) to 2. Pass 3: Checking directory connectivity '..' in /etc/network/run (46948) is <The NULL inode> (0), should be /etc/network (46953). Fix<y>? yes Pass 4: Checking reference counts Inode 2 ref count is 12, should be 13. Fix<y>? yes Pass 5: Checking group summary information embeddedrootwrite: ***** FILE SYSTEM WAS MODIFIED ***** embeddedrootwrite: 657/62592 files (24.4% non-contiguous), 87882/249937 blocks Embedded-PC-failsafe:~# Embedded-PC-failsafe:~# e2fsck /dev/sda3 e2fsck 1.41.3 (12-Oct-2008) embeddedrootwrite: clean, 657/62592 files, 87882/249937 blocksfilesystems hardware ssd ext3 share | improve this question edited Dec 5 '12 at 18:40 ewwhite 173k 75 364 712 asked Dec 4 '12 at 1:13 Jeremy Friesner Jeremy Friesner 611 1 8 25add a comment | 2 Answers 2 active oldest votes 10 You're both wrong (maybe?)... ext3 is coping the best it can with having its underlying storage removed so abruptly.
- Have you all thought of changing to ext4 or ZFS? – mdpc Dec 4 '12 at 2:14
- I've thought about changing to ext4, at least... would that help address this issue? Would ZFS be better still? – Jeremy Friesner Dec 4 '12 at 2:17
- 1 Neither option would fix this. We still use devices with supercapacitors in ZFS, and battery or flash-protected cache is recommended for ext4 in server applications. – ewwhite Dec 4 '12 at 2:54
Your SSD probably has some type of onboard cache. You don't mention the make/model of SSD in use, but this sounds like a consumer-level SSD versus an enterprise or industrial-grade model .
Either way, the cache is used to help coalesce writes and prolong the life of the drive. If there are writes in-transit, the sudden loss of power is definitely the source of your corruption. True enterprise and industrial SSD's have supercapacitors that maintain power long enough to move data from cache to nonvolatile storage, much in the same way battery-backed and flash-backed RAID controller caches work .
If your drive doesn't have a supercap, the in-flight transactions are being lost, hence the filesystem corruption. ext3 is probably being told that everything is on stable storage, but that's just a function of the cache. share | improve this answer edited Apr 13 '17 at 12:14 Community ♦ 1 answered Dec 4 '12 at 1:24 ewwhite ewwhite 173k 75 364 712
add a comment | 2 You are right and your coworker is wrong. Barring something going wrong the journal makes sure you never have inconsistent fs metadata. You might check with
- Sorry to you and everyone who upvoted this, but you're just wrong. Handling the loss of in progress writes is exactly what the journal is for, and as long as the drive correctly reports whether it has a write cache and obeys commands to flush it, the journal guarantees that the metadata will not be inconsistent. You only need a supercap or battery backed raid cache so you can enable write cache without having to enable barriers, which sacrifices some performance to maintain data correctness. – psusi Dec 5 '12 at 19:12
- @psusi The SSD in use probably has cache explicitly enabled or relies on an internal buffer regardless of the OS_level setting. The data in that cache is what a supercapacitor-enabled SSD would protect. – ewwhite Dec 5 '12 at 19:30
- The data in the cache doesn't need protecting if you enable IO barriers. Most consumer type drives ship with write caching disabled by default and you have to enable it if you want it, exactly because it causes corruption issues if the OS is not careful. – psusi Dec 5 '12 at 19:35
- @pusi Old now, but you mention this:
as long as the drive correctly reports whether it has a write cache and obeys commands to flush it, the journal guarantees that the metadata will not be inconsistent.
That's the thing: because of storage controllers that tend to assume older disks, SSDs will sometimes lie about whether data is flushed. You do need that supercap. – Joel Coel Aug 9 '15 at 22:01hdparm
to see if the drive's write cache is enabled. If it is, and you have not enabled IO barriers ( off by default on ext3, on by default in ext4 ), then that would be the cause of the problem.The barriers are needed to force the drive write cache to flush at the correct time to maintain consistency, but some drives are badly behaved and either report that their write cache is disabled when it is not, or silently ignore the flush commands. This prevents the journal from doing its job. share | improve this answer answered Dec 5 '12 at 19:09 psusi psusi 2,617 11 9
- -1 for reading-comprehension... – ewwhite Dec 5 '12 at 19:34
- @ewwhite, maybe you should try reading, and actually writing a useful response instead of this childish insult. – psusi Dec 5 '12 at 19:36
- +1 this answer probably could be improved, as any other answer in any QA. But at least provides some light and hints. @downvoters: improve the answer yourselves, or comment on possible flows, but downvoting this answer without proper justification is just disgusting! – Alberto Dec 6 '12 at 21:44
Oct 15, 2013 | www.linuxquestions.org
katmai90210
hi guys,
i have a problem. yesterday there was a power outage at one of my datacenters, where i have a relatively large fileserver. 2 arrays, 1 x 14 tb and 1 x 18 tb both in raid6, with a 3ware card.
after the outage, the server came back online, the xfs partitions were mounted, and everything looked okay. i could access the data and everything seemed just fine.
today i woke up to lots of i/o errors, and when i rebooted the server, the partitions would not mount:
Oct 14 04:09:17 kp4 kernel:
Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN a<ffffffff80056933>] pdflush+0x0/0x1fb
Oct 14 04:09:17 kp4 kernel: [<ffffffff80056a84>] pdflush+0x151/0x1fb
Oct 14 04:09:17 kp4 kernel: [<ffffffff800cd931>] wb_kupdate+0x0/0x16a
Oct 14 04:09:17 kp4 kernel: [<ffffffff80032c2b>] kthread+0xfe/0x132
Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfc1>] child_rip+0xa/0x11
Oct 14 04:09:17 kp4 kernel: [<ffffffff800a3ab7>] keventd_create_kthread+0x0/0xc4
Oct 14 04:09:17 kp4 kernel: [<ffffffff80032b2d>] kthread+0x0/0x132
Oct 14 04:09:17 kp4 kernel: [<ffffffff8005dfb7>] child_rip+0x0/0x11
Oct 14 04:09:17 kp4 kernel:
Oct 14 04:09:17 kp4 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN at line 279 of file fs/xfs/xfs_alloc.c. Caller 0xffffffff88342331
Oct 14 04:09:17 kp4 kernel:got a bunch of these in dmesg.
The array is fine:
[root@kp4 ~]# tw_cli
//kp4> focus c6
s
//kp4/c6> howUnit UnitType Status %RCmpl %V/I/M Stripe Size(GB) Cache AVrfy
------------------------------------------------------------------------------
u0 RAID-6 OK - - 256K 13969.8 RiW ON
u1 RAID-6 OK - - 256K 16763.7 RiW ONVPort Status Unit Size Type Phy Encl-Slot Model
------------------------------------------------------------------------------
p0 OK u1 2.73 TB SATA 0 - Hitachi HDS723030AL
p1 OK u1 2.73 TB SATA 1 - Hitachi HDS723030AL
p2 OK u1 2.73 TB SATA 2 - Hitachi HDS723030AL
p3 OK u1 2.73 TB SATA 3 - Hitachi HDS723030AL
p4 OK u1 2.73 TB SATA 4 - Hitachi HDS723030AL
p5 OK u1 2.73 TB SATA 5 - Hitachi HDS723030AL
p6 OK u1 2.73 TB SATA 6 - Hitachi HDS723030AL
p7 OK u1 2.73 TB SATA 7 - Hitachi HDS723030AL
p8 OK u0 2.73 TB SATA 8 - Hitachi HDS723030AL
p9 OK u0 2.73 TB SATA 9 - Hitachi HDS723030AL
p10 OK u0 2.73 TB SATA 10 - Hitachi HDS723030AL
p11 OK u0 2.73 TB SATA 11 - Hitachi HDS723030AL
p12 OK u0 2.73 TB SATA 12 - Hitachi HDS723030AL
p13 OK u0 2.73 TB SATA 13 - Hitachi HDS723030AL
p14 OK u0 2.73 TB SATA 14 - Hitachi HDS723030ALName OnlineState BBUReady Status Volt Temp Hours LastCapTest
---------------------------------------------------------------------------
bbu On Yes OK OK OK 0 xx-xxx-xxxxi googled for solutions and i think i jumped the horse by doing
xfs_repair -L /dev/sdcit would not clean it with xfs_repair /dev/sdc, and everybody pretty much says the same thing.
this is what i was getting when trying to mount the array.
Filesystem Corruption of in-memory data detected. Shutting down filesystem xfs_check
Did i jump the gun by using the -L switch :/ ?
jefro
Here is the RH data on that.
Jan 29, 2019 | thwack.solarwinds.com
George Sutherland Jul 8, 2015 9:58 AM ( in response to RandyBrown ) had similar thing happen with an HVAC tech that confused the BLACK button that got pushed to exit the room with the RED button clearly marked EMERGENCY POWER OFF. Clear plastic cover installed with in 24 hours.... after 3 hours of recovery!
PS... He told his boss that he did not do it.... the camera that focused on the door told a much different story. He was persona non grata at our site after that.
Jan 29, 2019 | thwack.solarwinds.com
sleeper_777 Jul 15, 2015 1:07 PM
Worked at a bank. 6" raised floor. Liebert cooling units on floor with all network equipment. Two units developed a water drain issue over a weekend.
About an hour into Monday morning, devices, servers, routers, in a domino effect starting shorting out and shutting down or blowing up, literally.
Opened the floor tiles to find three inches of water.
We did not have water alarms on the floor at the time.
Shortly after the incident, we did.
But the mistake was very costly and multiple 24 hour shifts of IT people made it a week of pure h3ll.
Jan 29, 2019 | thwack.solarwinds.com
aaronleet Jul 13, 2015 8:45 AM
In a former life, I had every server crash over the weekend when the facilities group took down the climate control and HVAC systems without warning.
Jan 29, 2019 | www.linuxquestions.org
07-01-2012, 12:56 PM # 1 damateem LQ Newbie
Registered: Dec 2010 Posts: 8
Rep:Unable to mount root file system after a power failure
[ Log in to get rid of this advertisement] We had a storm yesterday and the power dropped out, causing my Ubuntu server to shut off. Now, when booting, I get[ 0.564310] Kernel panic - not syncing: VFS: Unable to mount root fs on unkown-block(0,0)
It looks like a file system corruption, but I'm having a hard time fixing the problem. I'm using Rescue Remix 12-04 to boot from USB and get access to the system.
Using
sudo fdisk -l
Shows the hard drive as
/dev/sda1: Linux
/dev/sda2: Extended
/dev/sda5: Linux LVMUsing
sudo lvdisplay
Shows LV Names as
/dev/server1/root
/dev/server1/swap_1Using
sudo blkid
Shows types as
/dev/sda1: ext2
/dev/sda5: LVM2_member
/dev/mapper/server1-root: ext4
/dev/mapper/server1-swap_1: swapI can mount sda1 and server1/root and all the files appear normal, although I'm not really sure what issues I should be looking for. On sda1, I see a grub folder and several other files. On root, I see the file system as it was before I started having trouble.
I've ran the following fsck commands and none of them report any errors
sudo fsck -f /dev/sda1
sudo fsck -f /dev/server1/root
sudo fsck.ext2 -f /dev/sda1
sudo fsck.ext4 -f /dev/server1/rootand I still get the same error when the system boots.
I've hit a brick wall.
What should I try next?
What can I look at to give me a better understanding of what the problem is?
Thanks,
David
damateem View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by damateem
07-02-2012, 05:58 AM # 2 syg00 LQ Veteran
Registered: Aug 2003 Location: Australia Distribution: Lots ... Posts: 17,415
Rep:Might depend a bit on what messages we aren't seeing. Normally I'd reckon that means that either the filesystem or disk controller support isn't available. But with something like Ubuntu you'd expect that to all be in place from the initrd. And that is on the /boot partition, and shouldn't be subject to update activity in a normal environment. Unless maybe you're real unlucky and an update was in flight.
Can you chroot into the server (disk) install and run from there successfully ?.
syg00 View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by syg00
07-02-2012, 06:08 PM # 3 damateem LQ Newbie
Registered: Dec 2010 Posts: 8
Original Poster
Rep:I had a very hard time getting the Grub menu to appear. There must be a very small window for detecting the shift key. Holding it down through the boot didn't work. Repeatedly hitting it at about twice per second didn't work. Increasing the rate to about 4 hits per second got me into it. Once there, I was able to select an older kernel (2.6.32-39-server). The non-booting kernel was 2.6.32-40-server. 39 booted without any problems.
When I initially setup this system, I couldn't send email from it. It wasn't important to me at the time, so I planned to come back and fix it later. Last week (before the power drop), email suddenly started working on its own. I was surprised because I haven't specifically performed any updates. However, I seem to remember setting up automatic updates, so perhaps an auto update was done that introduced a problem, but it wasn't seen until the reboot that was forced by the power outage.
Next, I'm going to try updating to the latest kernel and see if it has the same problem.
Thanks,
David
damateem View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by damateem
07-02-2012, 06:24 PM # 4 frieza Senior Member Contributing Member
Registered: Feb 2002 Location: harvard, il Distribution: Ubuntu 11.4,DD-WRT micro plus ssh,lfs-6.6,Fedora 15,Fedora 16 Posts: 3,233
Rep:imho auto updates are dangerous, if you want my opinion, make sure auto updates are off, and only have the system tell you there are updates, that way you can chose not to install them during a power failure as for a possible future solution for what you went through, unlike other keys, the shift key being held doesn't register as a stuck key to the best of my knowledge, so you can hold the shift key to get into grub, after that, edit the recovery line (the e key) to say at the end, init=/bin/bash then boot the system using the keys specified on the bottom of the screen, then once booted to a prompt, you would run
Code:fsck -f {root partition}(in this state, the root partition should be either not mounted or mounted read-only, so you can safely run an fsck on the drive)note the -f seems to be an undocumented flag that does a more thorough scan than merely a standard run of fsck.
then reboot, and hopefully that fixes things
glad things seem to be working for the moment though.
frieza View Public Profile View LQ Blog View Review Entries View HCL Entries Visit frieza's homepage! Find More Posts by frieza
07-02-2012, 06:32 PM # 5 suicidaleggroll LQ Guru Contributing Member
Registered: Nov 2010 Location: Colorado Distribution: OpenSUSE, CentOS Posts: 5,573
Rep:Quote:
suicidaleggroll View Public Profile View LQ Blog View Review Entries View HCL Entries Find More Posts by suicidaleggroll
07-04-2012, 10:18 AM # 6 damateem LQ Newbie
Registered: Dec 2010 Posts: 8
Original Poster
Rep:I discovered the root cause of the problem. When I attempted the update, I found that the boot partition was full. So I suspect that caused issues for the auto update, but they went undetected until the reboot. I next tried to purge old kernels using the instructions at
http://www.liberiangeek.net/2011/11/...neiric-ocelot/
but that failed because a previous install had not completed, but it couldn't complete because of the full partition. So had no choice but to manually rm the oldest kernel and it's associated files. With that done, the command
apt-get -f install
got far enough that I could then purge the unwanted kernels. Finally,
sudo apt-get update
sudo apt-get upgradebrought everything up to date.
I will be deactivating the auto updates.
Thanks for all the help!
David
Jan 28, 2019 | thwack.solarwinds.com
gcp Jul 8, 2015 10:33 PM
Many years ago I worked at an IBM Mainframe site. To make systems more robust they installed a UPS system for the mainframe with battery bank and a honkin' great diesel generator in the yard.
During the commissioning of the system, they decided to test the UPS cutover one afternoon - everything goes *dark* in seconds. Frantic running around to get power back on and MF restarted and databases recovered (afternoon, remember? during the work day...). Oh! The UPS batteries were not charged! Oops.
Over the next few weeks, they did two more 'tests' during the working day, with everything going *dark* in seconds for various reasons. Oops.
Then they decided - perhaps we should test this outside of office hours. (YAY!)
Still took a few more efforts to get everything working - diesel generator wouldn't start automatically, fixed that and forgot to fill up the diesel tank so cutover was fine until the fuel ran out.
Many, many lessons learned from this episode.
Jan 28, 2019 | www.reddit.com
radiomix Jack of All Trades 5 points 6 points 7 points 3 years ago (2 children)
I was in my main network facility, for a municipal fiber optic ring. Outside were two technicians replacing our backup air conditioning unit. I walk inside after talking with the two technicians, turn on the lights and begin walking around just visually checking things around the room. All of a sudden I started smelling that dreaded electric hot/burning smell. In this place I have my core switch, primary router, a handful of servers, some customer equipment and a couple of racks for my service provider. I start running around the place like a mad man sniffing all the equipment. I even called in the AC technicians to help me sniff.benjunmun 2 points 3 points 4 points 3 years ago (0 children)After 15 minutes we could not narrow down where it was coming from. Finally I noticed that one of the florescent lights had not come on. I grabbed a ladder and opened it up.
The ballast had burned out on the light and it just so happen to be the light right in front of the AC vent blowing the smell all over the room.
The last time I had smelled that smell in that room a major piece of equipment went belly up and there was nothing I could do about it.
The exact same thing has happened to me. Nothing quite as terrifying as the sudden smell of ozone as you're surrounded by critical computers and electrical gear.
Jan 28, 2019 | www.reddit.com
eraser_6776 VP IT/Sec (a damn suit) 9 points 10 points 11 points 3 years ago (1 child)
May 22, 2004. There was a rather massive storm here that spurred one of the [biggest Tornaodes recorded in Nebraska]( www.tornadochaser.net/hallam.html ) and I was a sysadmin for a small company. It was a Saturday, aka beer day, and as all hell was breaking loose my friends and roomates' pagers and phones were all going off. "Ha ha!" I said, looking at a silent cellphone "sucks to be you!"Next morning around 10 my phone rings, and I groggily answer it because it's the owner of the company. "You'd better come in here, none of the computers will turn on" he says. Slight panic, but I hadn't received any emails. So it must have been breakers, and I can get that fixed. No problem.
I get into the office and something strikes me. That eery sound of silence. Not a single machine is on.. why not? Still shaking off too much beer from the night before, I go into the server room and find out why I didn't get paged. Machines are running, but every switch in the cabinet is dead. Some servers are dead. Panic sets in.
I start walking around the office trying to turn on machines and.. dead. All of them. Every last desktop won't power on. That's when panic REALLY set in.
In the aftermath I found out two things - one, when the building was built, it was built with a steel roof and steel trusses. Two, when my predecessor had the network cabling wired he hired an idiot who didn't know fire code and ran the network cabling, conveniently, along the trusses into the ceiling. Thus, when lightning hit the building it had a perfect ground path to every workstation in the company. Some servers that weren't in the primary cabinet had been wired to a wall jack (which, in turn, went up into the ceiling then back down into the cabinet because you know, wire management!). Thankfully they were all "legacy" servers.
The only thing that saved the main servers was that Cisco 2924 XL-EN's are some badass mofo's that would die before they let that voltage pass through to the servers in the cabinet. At least that's what I told myself.
All in all, it ended up being one of the longest work weeks ever as I first had to source a bunch of switches, fast to get things like mail and the core network back up. Next up was feeding my buddies a bunch of beer and pizza after we raided every box store in town for spools of Cat 5 and threw wire along the floor.
Finally I found out that CDW can and would get you a whole lot of desktops delivered to your door with your software pre-installed in less than 24 hours if you have an open checkbook. Thanks to a great insurance policy, we did. Shipping and "handling" for those were more than the cost of the machines (again, this was back in 2004 and they were business desktops so you can imagine).
Still, for weeks after I had non-stop user complaints that generally involved "..I think this is related to the lightning ". I drank a lot that summer.
Jan 28, 2019 | www.reddit.com
VexingRaven 1 point 2 points 3 points 3 years ago (1 child)
Not really a horror story but definitely one of my first "Oh shit" moments. I was the FNG helpdesk/sysadmin at a company of 150 people. I start getting calls that something (I think it was Outlook) wasn't working in Citrix, apparently something broken on one of the Citrix servers. I'm 100% positive it will be fixed with a reboot (I've seen this before on individual PCs), so I diligently start working to get people off that Citrix server (one of three) so I can reboot it.I get it cleared out, hit Reboot... And almost immediately get a call from the call center manager saying every single person just got kicked off Citrix. Oh shit. But there was nobody on that server! Apparently that server also housed the Secure Gateway server which my senior hadn't bothered to tell me or simply didn't know (Set up by a consulting firm). Whoops. Thankfully the servers were pretty fast and people's sessions reconnected a few minutes later, no harm no foul. And on the plus side, it did indeed fix the problem.
And that's how I learned to always check with somebody else before rebooting a production server, no matter how minor it may seem.
Jan 14, 2019 | www.nzlinux.com
- Francois Marier October 21, 2009 at 10:34 am
Another related tool, to prevent accidental reboots of servers this time, is molly-guard:
http://packages.debian.org/sid/molly-guard
It asks you to type the hostname of the machine you want to reboot as an extra confirmation step.
Oct 05, 2018 | www.reddit.com
ardwin 5 years ago (9 children)
Due to a configuration change I wasn't privy to, the software I was responsible for rebooted all the 911 operators servers at once.cobra10101010 5 years ago (1 child)Oh God..that is scary in true sense..hope everything was okayardwin 5 years ago (0 children)I quickly learned that the 911 operators, are trained to do their jobs without any kind of computer support. It made me feel better.reebzor 5 years ago (1 child)I did this too!vocatus 5 years ago (0 children)edit: except I was the one that deployed the software that rebooted the machines
Hey, maybe you should go apologize to ardwin. I bet he was pissed.
Jan 15, 2010 | Linux Today
Re: Solution looking for a problem
> How exactly does someone "accidentally" issue a shutdown or reboot command? ... Failing that highly likely scenario, this is someone shopping around a solution for a problem that doesn't really exist. Give me a break.
I haven't checked out the actual package in question, but based on the fact that (according to the posted output) it notes the connection is via SSH and asks for a hostname, I would say the author of the article did not articulate well what the purpose of the package is. The purpose appears to be to avoid shutting down the *wrong* computer when connecting remotely.
I've never had that problem, but more than once I've shut down the local computer when I intended to shut down a remote computer. I think after the second time (after I stopped swearing!) I created aliases for halt and reboot that first query with a message like: Really halt {hostname} [yn]?
Re: Solution looking for a problem
Re: How exactly does someone "accidentally" issue a shutdown or reboot command?I've done it while I was distracted, open a shell to one box, then open one to another box. Go to lunch. Forget which shell you are using and send the wrong command to the wrong machine. Nobody is perfect.
Michael Steele
When I was first starting out I worked for a Telecom as an 'Application Administrator' and I sat in a small room with a half a dozen other admins and together we took calls from users as their calls escalated up from tier I support. We were tier II in a three tier organization.
A month earlier someone from tier I confused a production server with a test server and rebooted it in the middle of the day. These servers were remotely connected over a large distance so it can be confusing. Care is needed before rebooting.
The tier I culprit took a great deal of abuse for this mistake and soon became a victim of several jokes. An outage had been caused in a high availability environment which meant management, interviews, reports; It went on and on and was pretty brutal.
And I was just as brutal as anyone.
Their entire organization soon became victimize by everyone from our organization. The abuse traveled right up the management tree and all participated.
It was hilarious, for us.
Until I did the same thing a month later.
There is nothing more humbling then 2000 people all knowing who you are for the wrong reason and I have never longed for anonymity more.
Now I alway do a 'uname' or 'hostname' before a reboot, even when I'm right in front of it.
(3) At the same institution, we were running a system software that had a serious bug where if anyone had logged out ungracefully, the system wouldn't let any more users onto the system and users who were logged on couldn't execute any new commands. (The newest release of the software later on did fix this bug.) I had to reboot the machine to restore the system to a sane state. I did a wall <<EOF We need to shutdown blah blah... EOF and then shutdown. Well, I should've waited since at the precise moment, one of our users was doing a once-a-year massive conversion of our financial data (talk about bad luck). I had shutdown in the middle of a very long disk write and thus, data was lost. We did recover that data and life went on.
Moral: make damn sure that *no one* is doing anything on your system before you reboot, even if other users are vociferously clamoring for you to reboot.
with 90 commentsAnyone who has never made a mistake has never tried anything new. -- Albert Einstein.
Here are a few mistakes that I made while working at UNIX prompt. Some mistakes caused me a good amount of downtime. Most of these mistakes are from my early days as a UNIX admin.
... ... ...
Rebooted Solaris Box
On Linux killall command kill processes by name (killall httpd). On Solaris it kill all active processes. As root I killed all process, this was our main Oracle db box:
killall process-name
Selected Comments
UnixEagle
Rebooted the wrong box
While adding alias to main network interface I ended up changing the main IP address, the system froze right away and I had to call for a reboot
Instead of appending text to Apache config file, I overwritten it's contents
Firewall lockdown while changing the ssh port
Wrongfully run a script contained recursive chmod and chown as root on / caused me a downtime of about 12 hours and a complete re-install
Some mistakes are really silly, and when they happen, you don't believe your self you did that, but every mistake, regardless of it's silliness, should be a learned lesson.
If you did a trivial mistake, you should not just overlook it, you have to think of the reasons that made you did it, like: you didn't have much sleep or your mind was confused about personal life or …..etc.
I like Einstein's quote, you really have to do mistakes to learn.
Re: Accidental shutdownby Todd A. Jacobson 2009-05-21T20:46:53+00:00.
On Thu, May 21, 2009 at 12:31:47AM +0100, Bhasker C V wrote:
> I can rename and shell wrap the binaries poweroff/shutdown/reboot but
> that would not be a clean method and I am sure there should be much
> better way than that.Nope. You could disable the reboot command in your sudoers file, but that isn't going to prevent you from rebooting the wrong machine if you really make an effort.
You might also consider editing sudoers to change the sudo password prompt to include the hostname of the box you're on, so that you're less likely to issue commands to the wrong box if you're paying attention.
However, the real problem here is that you're assuming Linux should protect you from yourself. It won't; part of being a power user is not running privileged commands without exercising due care. With power comes responsibility!
As has been said before: "*nix is user friendly. It's just picky about who its friends are!"
--
"Oh, look: rocks!"
-- Doctor Who, "Destiny of the Daleks"
by Scott Giffordon 2009-05-21T20:51:53+00:00.
Bhasker C V writes:
> Is there a method to prevent accidental powerdown of a linux box ? or atleast alert ?
If you get in the habit of running "shutdown -r +1" instead of "reboot", it will warn users for 1 minute before shutting down the server. That should give you enough time to run "shutdown -c" to cancel the shutdown if you realize it's on the wrong machine.
Hope this helps,
-----Scott.
Google matched content |
Stop. Think. Click |
Society
Groupthink : Two Party System as Polyarchy : Corruption of Regulators : Bureaucracies : Understanding Micromanagers and Control Freaks : Toxic Managers : Harvard Mafia : Diplomatic Communication : Surviving a Bad Performance Review : Insufficient Retirement Funds as Immanent Problem of Neoliberal Regime : PseudoScience : Who Rules America : Neoliberalism : The Iron Law of Oligarchy : Libertarian Philosophy
Quotes
War and Peace : Skeptical Finance : John Kenneth Galbraith :Talleyrand : Oscar Wilde : Otto Von Bismarck : Keynes : George Carlin : Skeptics : Propaganda : SE quotes : Language Design and Programming Quotes : Random IT-related quotes : Somerset Maugham : Marcus Aurelius : Kurt Vonnegut : Eric Hoffer : Winston Churchill : Napoleon Bonaparte : Ambrose Bierce : Bernard Shaw : Mark Twain Quotes
Bulletin:
Vol 25, No.12 (December, 2013) Rational Fools vs. Efficient Crooks The efficient markets hypothesis : Political Skeptic Bulletin, 2013 : Unemployment Bulletin, 2010 : Vol 23, No.10 (October, 2011) An observation about corporate security departments : Slightly Skeptical Euromaydan Chronicles, June 2014 : Greenspan legacy bulletin, 2008 : Vol 25, No.10 (October, 2013) Cryptolocker Trojan (Win32/Crilock.A) : Vol 25, No.08 (August, 2013) Cloud providers as intelligence collection hubs : Financial Humor Bulletin, 2010 : Inequality Bulletin, 2009 : Financial Humor Bulletin, 2008 : Copyleft Problems Bulletin, 2004 : Financial Humor Bulletin, 2011 : Energy Bulletin, 2010 : Malware Protection Bulletin, 2010 : Vol 26, No.1 (January, 2013) Object-Oriented Cult : Political Skeptic Bulletin, 2011 : Vol 23, No.11 (November, 2011) Softpanorama classification of sysadmin horror stories : Vol 25, No.05 (May, 2013) Corporate bullshit as a communication method : Vol 25, No.06 (June, 2013) A Note on the Relationship of Brooks Law and Conway Law
History:
Fifty glorious years (1950-2000): the triumph of the US computer engineering : Donald Knuth : TAoCP and its Influence of Computer Science : Richard Stallman : Linus Torvalds : Larry Wall : John K. Ousterhout : CTSS : Multix OS Unix History : Unix shell history : VI editor : History of pipes concept : Solaris : MS DOS : Programming Languages History : PL/1 : Simula 67 : C : History of GCC development : Scripting Languages : Perl history : OS History : Mail : DNS : SSH : CPU Instruction Sets : SPARC systems 1987-2006 : Norton Commander : Norton Utilities : Norton Ghost : Frontpage history : Malware Defense History : GNU Screen : OSS early history
Classic books:
The Peter Principle : Parkinson Law : 1984 : The Mythical Man-Month : How to Solve It by George Polya : The Art of Computer Programming : The Elements of Programming Style : The Unix Hater’s Handbook : The Jargon file : The True Believer : Programming Pearls : The Good Soldier Svejk : The Power Elite
Most popular humor pages:
Manifest of the Softpanorama IT Slacker Society : Ten Commandments of the IT Slackers Society : Computer Humor Collection : BSD Logo Story : The Cuckoo's Egg : IT Slang : C++ Humor : ARE YOU A BBS ADDICT? : The Perl Purity Test : Object oriented programmers of all nations : Financial Humor : Financial Humor Bulletin, 2008 : Financial Humor Bulletin, 2010 : The Most Comprehensive Collection of Editor-related Humor : Programming Language Humor : Goldman Sachs related humor : Greenspan humor : C Humor : Scripting Humor : Real Programmers Humor : Web Humor : GPL-related Humor : OFM Humor : Politically Incorrect Humor : IDS Humor : "Linux Sucks" Humor : Russian Musical Humor : Best Russian Programmer Humor : Microsoft plans to buy Catholic Church : Richard Stallman Related Humor : Admin Humor : Perl-related Humor : Linus Torvalds Related humor : PseudoScience Related Humor : Networking Humor : Shell Humor : Financial Humor Bulletin, 2011 : Financial Humor Bulletin, 2012 : Financial Humor Bulletin, 2013 : Java Humor : Software Engineering Humor : Sun Solaris Related Humor : Education Humor : IBM Humor : Assembler-related Humor : VIM Humor : Computer Viruses Humor : Bright tomorrow is rescheduled to a day after tomorrow : Classic Computer Humor
The Last but not Least Technology is dominated by two types of people: those who understand what they do not manage and those who manage what they do not understand ~Archibald Putt. Ph.D
Copyright © 1996-2021 by Softpanorama Society. www.softpanorama.org was initially created as a service to the (now defunct) UN Sustainable Development Networking Programme (SDNP) without any remuneration. This document is an industrial compilation designed and created exclusively for educational use and is distributed under the Softpanorama Content License. Original materials copyright belong to respective owners. Quotes are made for educational purposes only in compliance with the fair use doctrine.
FAIR USE NOTICE This site contains copyrighted material the use of which has not always been specifically authorized by the copyright owner. We are making such material available to advance understanding of computer science, IT technology, economic, scientific, and social issues. We believe this constitutes a 'fair use' of any such copyrighted material as provided by section 107 of the US Copyright Law according to which such material can be distributed without profit exclusively for research and educational purposes.
This is a Spartan WHYFF (We Help You For Free) site written by people for whom English is not a native language. Grammar and spelling errors should be expected. The site contain some broken links as it develops like a living tree...
|
You can use PayPal to to buy a cup of coffee for authors of this site |
Disclaimer:
The statements, views and opinions presented on this web page are those of the author (or referenced source) and are not endorsed by, nor do they necessarily reflect, the opinions of the Softpanorama society. We do not warrant the correctness of the information provided or its fitness for any purpose. The site uses AdSense so you need to be aware of Google privacy policy. You you do not want to be tracked by Google please disable Javascript for this site. This site is perfectly usable without Javascript.
Last modified: February 19, 2020