Softpanorama

Home Switchboard Unix Administration Red Hat TCP/IP Networks Neoliberalism Toxic Managers
May the source be with you, but remember the KISS principle ;-)
Skepticism and critical thinking is not panacea, but can help to understand the world better

Slightly Skeptical View on Enterprise Unix Administration

News Webliography of problems with "pure" cloud environment Recommended Books Recommended Links Shadow IT Project Management Programmable Keyboards
Sysadmin Horror Stories Missing backup horror stories Creative uses of rm Recovery of LVM partitions Notes on hard drives partitioning for Linux Top Vulnerabilities in Linux Environment Root filesystem is mounted read only on boot
The tar pit of Red Hat overcomplexity Systemd invasion into Linux Server space Registering a server using Red Hat Subscription Manager (RHSM) Nagios in Large Enterprise Environment Sudoer File Examples Dealing with multiple flavors of Unix SSH Configuration
Unix Configuration Management Tools Job schedulers Red Hat Certification Program Red Hat Enterprise Linux Life Cycle Registering a server using Red Hat Subscription Manager (RHSM) Unix System Monitoring Recommended Tools to Enhance Command Line Usage in Windows
Is DevOps a yet another "for profit" technocult Using HP ILO virtual CDROM iDRAC7 goes unresponsive - can't connect to iDRAC7 Resetting frozen iDRAC without unplugging the server Troubleshooting HPOM agents Saferm -- wrapper for rm command ILO command line interface
Bare metal recovery of Linux systems Relax-and-Recover on RHEL HP Operations Manager Troubleshooting HPOM agents Number of Servers per Sysadmin Open source politics: IBM acquires Red Hat Tivoli Workload Scheduler
Over 50 and unemployed Surviving a Bad Performance Review Understanding Micromanagers and Control Freaks Bosos or Empty Suits (Aggressive Incompetent Managers) Narcissists Female Sociopaths Bully Managers
Slackerism Information Overload Workaholism and Burnout Unix Sysadmin Tips Orthodox Editors Admin Humor Sysadmin Health Issues


The KISS rule can be expanded as: Keep It Simple, Sysadmin ;-)

This page is written as a protest against overcomplexity and bizarre data center atmosphere typical in "semi-outsourced" or fully outsourced datacenters ;-). Unix/Linux sysadmins are being killed by overcomplexity of the environment.  Later swats  of Linux knowledge (and many excellent  books)  were  killed with introduction of systemd. Especially for older, most experience members of the team, who have unique set of organization knowledge as well as specifics of their career which allowed them to watch the development of Linux almost from the version 0.92.

System administration is still a unique area were people with the ability to program can display their own creativity with relative ease and can still enjoy "old style" atmosphere of software development, when you yourself put a specification, implement it, test the program and then use it in daily work. This is a very exciting, unique opportunity that no DevOps can ever provide. Then why an increasing number of sysadmins are far from being excited about working in those positions, or outright want to quick the  field (or, at least, work 4 days a week). And that include sysadmins who have tremendous speed and capability to process and learn new information. Even for them "enough is enough".   The answer is different for each individual sysadmins, but usually is some variation of the following themes: 

  1.  Too rapid pace of change with a lot of "change for the sake of the change"  often serving as smokescreen for outsourcing efforts (VMware yesterday, Azure today, Amazon cloud tomorrow, etc)
  2. Job insecurity due to outsourcing/offshoring -- constant pressure to cut headcount in the name of 'efficiency" which in reality is more connected with the size of top brass bonuses then anything related to IT datacenter functioning.   Sysadmin over 50 are especially vulnerable category here and in case the are laid off have almost no chances to get back into the IT workforce at the previous level of salary/benefits. often the only job they can find is job  as Home Depot, or similar retail outlets. 
  3. Back breaking level of overcomplexity and bizarre tech decisions crippling the data center (aka crapification ). Potemkin-style  culture often prevails in evaluation of software in large US corporations. The surface sheen is more important than the substance. The marketing brochures and manuals are no different from mainstream news media in the level of BS they spew. IBM is especially guilty (look how they marketed IBM Watson; ; as Oren Etzioni, CEO of the Allen Institute for AI noted "the only intelligent thing about Watson was IBM PR department [push]").
  4. Bureaucratization/fossilization of the large companies IT environment. That includes using "Performance Reviews" (prevalent in IT variant of waterboarding ;-) for the enforcement of management policies, priorities, whims, etc.   That creates alienation from the company (as it should). One can think of the modern corporate Data Center as an organization where the administration has more tremendously power in the decision-making process and eats up more of the corporate budget while the people who do the actual work are increasingly ignored and their share of the budget shrinks.
  5. "Neoliberal austerity" (which is essentially another name for the "war on labor") -- Drastic cost cutting measures at the expense of workforce such as elimination of external vendor training, crapification of benefits, limitation of business trips and enforcing useless or outright harmful for business "new" products instead of "tried and true" old with  the same function.    They are accompanied by the new cultural obsession with ‘character’ (as in "he/she has a right character" -- which in "Neoliberal speak" means he/she is a toothless conformist ;-), glorification of groupthink,   and the intensification of surveillance.

As Charlie Schluting noted in 2010: (Enterprise Networking Plane, April 7, 2010)

What happened to the old "sysadmin" of just a few years ago? We've split what used to be the sysadmin into application teams, server teams, storage teams, and network teams. There were often at least a few people, the holders of knowledge, who knew how everything worked, and I mean everything. Every application, every piece of network gear, and how every server was configured -- these people could save a business in times of disaster.

Now look at what we've done. Knowledge is so decentralized we must invent new roles to act as liaisons between all the IT groups.

Architects now hold much of the high-level "how it works" knowledge, but without knowing how any one piece actually does work.

In organizations with more than a few hundred IT staff and developers, it becomes nearly impossible for one person to do and know everything. This movement toward specializing in individual areas seems almost natural. That, however, does not provide a free ticket for people to turn a blind eye.

Specialization

You know the story: Company installs new application, nobody understands it yet, so an expert is hired. Often, the person with a certification in using the new application only really knows how to run that application. Perhaps they aren't interested in learning anything else, because their skill is in high demand right now. And besides, everything else in the infrastructure is run by people who specialize in those elements. Everything is taken care of.

Except, how do these teams communicate when changes need to take place? Are the storage administrators teaching the Windows administrators about storage multipathing; or worse logging in and setting it up because it's faster for the storage gurus to do it themselves? A fundamental level of knowledge is often lacking, which makes it very difficult for teams to brainstorm about new ways evolve IT services. The business environment has made it OK for IT staffers to specialize and only learn one thing.

If you hire someone certified in the application, operating system, or network vendor you use, that is precisely what you get. Certifications may be a nice filter to quickly identify who has direct knowledge in the area you're hiring for, but often they indicate specialization or compensation for lack of experience.

Resource Competition

Does your IT department function as a unit? Even 20-person IT shops have turf wars, so the answer is very likely, "no." As teams are split into more and more distinct operating units, grouping occurs. One IT budget gets split between all these groups. Often each group will have a manager who pitches his needs to upper management in hopes they will realize how important the team is.

The "us vs. them" mentality manifests itself at all levels, and it's reinforced by management having to define each team's worth in the form of a budget. One strategy is to illustrate a doomsday scenario. If you paint a bleak enough picture, you may get more funding. Only if you are careful enough to illustrate the failings are due to lack of capital resources, not management or people. A manager of another group may explain that they are not receiving the correct level of service, so they need to duplicate the efforts of another group and just implement something themselves. On and on, the arguments continue.

Most often, I've seen competition between server groups result in horribly inefficient uses of hardware. For example, what happens in your organization when one team needs more server hardware? Assume that another team has five unused servers sitting in a blade chassis. Does the answer change? No, it does not. Even in test environments, sharing doesn't often happen between IT groups.

With virtualization, some aspects of resource competition get better and some remain the same. When first implemented, most groups will be running their own type of virtualization for their platform. The next step, I've most often seen, is for test servers to get virtualized. If a new group is formed to manage the virtualization infrastructure, virtual machines can be allocated to various application and server teams from a central pool and everyone is now sharing. Or, they begin sharing and then demand their own physical hardware to be isolated from others' resource hungry utilization. This is nonetheless a step in the right direction. Auto migration and guaranteed resource policies can go a long way toward making shared infrastructure, even between competing groups, a viable option.

Blamestorming

The most damaging side effect of splitting into too many distinct IT groups is the reinforcement of an "us versus them" mentality. Aside from the notion that specialization creates a lack of knowledge, blamestorming is what this article is really about. When a project is delayed, it is all too easy to blame another group. The SAN people didn't allocate storage on time, so another team was delayed. That is the timeline of the project, so all work halted until that hiccup was restored. Having someone else to blame when things get delayed makes it all too easy to simply stop working for a while.

More related to the initial points at the beginning of this article, perhaps, is the blamestorm that happens after a system outage.

Say an ERP system becomes unresponsive a few times throughout the day. The application team says it's just slowing down, and they don't know why. The network team says everything is fine. The server team says the application is "blocking on IO," which means it's a SAN issue. The SAN team say there is nothing wrong, and other applications on the same devices are fine. You've ran through nearly every team, but without an answer still. The SAN people don't have access to the application servers to help diagnose the problem. The server team doesn't even know how the application runs.

See the problem? Specialized teams are distinct and by nature adversarial. Specialized staffers often relegate themselves into a niche knowing that as long as they continue working at large enough companies, "someone else" will take care of all the other pieces.

I unfortunately don't have an answer to this problem. Maybe rotating employees between departments will help. They gain knowledge and also get to know other people, which should lessen the propensity to view them as outsiders

The tragic part of the current environment is that it is like shifting sands. And it is not only due to the "natural process of crapification of operating systems" in which the OS gradually loses its architectural integrity. The pace of change is just too fast to adapt for mere humans. And most of it represents "change for the  sake of change" not some valuable improvement or extension of capabilities.

If you are a sysadmin, who is writing  his own scripts, you write on the sand, spending a lot of time thinking over and debugging your scripts. Which raise you productivity and diminish the number of possible errors. But the next OS version wipes considerable part of your word and you need to revise your scripts again. The tale of Sisyphus can now be re-interpreted as a prescient warning about the thankless task of sysadmin to learn new staff and maintain their own script library ;-)  Sometimes a lot of work is wiped out because the corporate brass decides to to switch to a different flavor of Linux, or we add "yet another flavor" due to a large acquisition.  Add to this inevitable technological changes and the question arise, can't you get a more respectable profession, in which 66% of knowledge is not replaced in the next ten years.  

Balkanization of linux demonstrated also in the Babylon  Tower of system programming languages (C, C++, Perl, Python, Ruby, Go, Java to name a few) and systems that supposedly should help you but mostly do quite opposite (Puppet, Ansible, Chef, etc). Add to this monitoring infrastructure (say Nagios) and you definitely have an information overload.

Inadequate training just add to the stress. First of all corporations no longer want to pay for it. So you are your own and need to do it mostly on your free time, as the workload is substantial in most organizations. Using free or low cost courses if they are available, or buying your own books and trying to learn new staff using them (which of course is the mark of any good sysadmin, but should not the only source of new knowledge  Days when you can for a week travel to vendor training center and have a chance to communicate with other admins from different organization for a week (which probably was the most valuable part of the whole exercise; although I can tell that training by Sun (Solaris) and IBM (AIX) in late 1990th was really high quality using highly qualified instructors, from which you can learn a lot outside the main topic of the course.  Thos days are long in the past. Unlike "Trump University" Sun courses could probably have been called "Sun University." Most training now is via Web and chances for face-to-face communication disappeared.  Also from learning "why" the stress now is on learning of "how".  Why topic typically are reserved to "advanced" courses.

Also the necessary to relearn staff again and again (and often new technologies/daemons/version of OS) are iether the same, or even inferior to previous, or represent open scam in which training is the way to extract money from lemmings (Agile, most of DevOps hoopla, etc). This is typical neoliberal mentality (" greed is good") implemented in education. There is also tendency to treat virtual machines and cloud infrastructure as separate technologies, which requires separate training and separate set of certifications (ASW, Asure).  This is a kind of infantilization of profession when a person who learned a lot of staff in previous 10 years need to forget it and relearn most of it again and again.

Of course  sysadmins not the only suffered. Computer scientists also now struggle with  the excessive level of complexity and too quickly shifting sand. Look at the tragedy of Donald Knuth with this life long idea to create comprehensive monograph for system programmers (The Art of Computer programming). He was flattened by the shifting sands and probably will not be able to finish even volume 4 (out of seven that were planned) in his lifetime. 

Of course much  depends on the evolution of hardware and changes caused by the evolution of hardware such as mass introduction of large SSDs, multi-core CPUs and large RAM

Nobody is now surprised to see a server with 128GB of RAM, laptop with  16Gb of RAM, or cellphones with  4GB of RAM and 1GHZ CPU (Please not that IBM Pc stated with 1 MBof RAM (of which only 640KB was available for programs) and 4.7 MHz (not GHz) single core CPU without floating arithmetic unit).  Such changes while  painful are inevitable and hardware progress slowed down recently as it reached physical limits of technology (we probably will not see 2 nanometer lithography based CPU and 8GHz CPU clock speed in our lifetimes. .

 The other are changes caused by fashion and the desire to entrench their position by the dominate player are more difficult to accept. It is difficult or even impossible to predict which technology became fashionable tomorrow and how long DevOp will remain in fashion. Typically such thing last around ten years.  After that everything is typically fades in oblivion,  or even is crossed out, and former idols will be shattered. This strange period of re-invention of "glass-walls datacenter" under then banner of DevOps  (and old timers still remember that IBM datacenters were hated with passion, and this hate created additional non-technological incentive for mini-computers and later for IBM PC)  is characterized by the level of hype usually reserved for woman fashion.  Now it sometimes looks to me that the movie The Devil Wears Prada  is a subtle parable on sysadmin work.

Add to this horrible job  market, especially for university graduated and older sysadmins (see Over 50 and unemployed ) and one probably start suspect that the life of modern sysadmin is far from paradise. When you read some job description  on sites like Monster, Dice or  Indeed you just ask yourself, if those people really want to hire anybody, or this is just a smoke screen for H1B candidates job certification.  The level of details often is so precise that it is almost impossible to change your current specialization. They do not care about the level of talent, they do not want to train a suitable candidate. They want a person who fit 100% from day 1.  Also in place like NYC or SF rent and property prices and valuations are growing while income growth has been stagnant.

Vandalism of Unix performed by Red Hat with RHEL 7 makes the current  environment somewhat unhealthy. It is clear that this was done by the whim of Red Hat brass, not in the interest of the community. This is a typical Microsoft-style trick which make dozens of high quality books written by very talented authors instantly semi-obsolete.  And question arise whether it make sense to write any book about RHEL other then for solid advance.  It generated some backlash, but the position  of Red Hat as Microsoft on Linux  allowed it to shove down the throat their inferior technical decisions. In a way it reminds me the way Microsoft dealt with Windows 7 replacing it with Windows 10.  Essentially destroying previous windows interface ecosystem (while preserving binary compatibility)

See also

Here are my notes/reflection of sysadmin problem that often arise if rather strange (and sometimes pretty toxic) IT departments of large corporations:


Top Visited
Switchboard
Latest
Past week
Past month

NEWS CONTENTS

Old News ;-)

Home 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2004 2003 2002 2001 2000 1999

For the list of top articles see Recommended Links section

2018 2017 2016 2015 2014 2013 2012 2011 2010 2009
2008 2007 2006 2005 2004 2003 2002 2001 2000 1999

"I appreciate Woody Allen's humor because one of my safety valves is an appreciation for life's absurdities. His message is that life isn't a funeral march to the grave. It's a polka."

-- Dennis Kusinich

[Nov 09, 2019] Mirroring a running system into a ramdisk Oracle Linux Blog

Nov 09, 2019 | blogs.oracle.com

javascript:void(0)

Mirroring a running system into a ramdisk Greg Marsden

In this blog post, Oracle Linux kernel developer William Roche presents a method to mirror a running system into a ramdisk.

A RAM mirrored System ?

There are cases where a system can boot correctly but after some time, can lose its system disk access - for example an iSCSI system disk configuration that has network issues, or any other disk driver problem. Once the system disk is no longer accessible, we rapidly face a hang situation followed by I/O failures, without the possibility of local investigation on this machine. I/O errors can be reported on the console:

 XFS (dm-0): Log I/O Error Detected....

Or losing access to basic commands like:

# ls
-bash: /bin/ls: Input/output error

The approach presented here allows a small system disk space to be mirrored in memory to avoid the above I/O failures situation, which provides the ability to investigate the reasons for the disk loss. The system disk loss will be noticed as an I/O hang, at which point there will be a transition to use only the ram-disk.

To enable this, the Oracle Linux developer Philip "Bryce" Copeland created the following method (more details will follow):

Disk and memory sizes:

As we are going to mirror the entire system installation to the memory, this system installation image has to fit in a fraction of the memory - giving enough memory room to hold the mirror image and necessary running space.

Of course this is a trade-off between the memory available to the server and the minimal disk size needed to run the system. For example a 12GB disk space can be used for a minimal system installation on a 16GB memory machine.

A standard Oracle Linux installation uses XFS as root fs, which (currently) can't be shrunk. In order to generate a usable "small enough" system, it is recommended to proceed to the OS installation on a correctly sized disk space. Of course, a correctly sized installation location can be created using partitions of large physical disk. Then, the needed application filesystems can be mounted from their current installation disk(s). Some system adjustments may also be required (services added, configuration changes, etc...).

This configuration phase should not be underestimated as it can be difficult to separate the system from the needed applications, and keeping both on the same space could be too large for a RAM disk mirroring.

The idea is not to keep an entire system load active when losing disks access, but to be able to have enough system to avoid system commands access failure and analyze the situation.

We are also going to avoid the use of swap. When the system disk access is lost, we don't want to require it for swap data. Also, we don't want to use more memory space to hold a swap space mirror. The memory is better used directly by the system itself.

The system installation can have a swap space (for example a 1.2GB space on our 12GB disk example) but we are neither going to mirror it nor use it.

Our 12GB disk example could be used with: 1GB /boot space, 11GB LVM Space (1.2GB swap volume, 9.8 GB root volume).

Ramdisk memory footprint:

The ramdisk size has to be a little larger (8M) than the root volume size that we are going to mirror, making room for metadata. But we can deal with 2 types of ramdisk:

We can expect roughly 30% to 50% memory space gain from zram compared to brd, but zram must use 4k I/O blocks only. This means that the filesystem used for root has to only deal with a multiple of 4k I/Os.

Basic commands:

Here is a simple list of commands to manually create and use a ramdisk and mirror the root filesystem space. We create a temporary configuration that needs to be undone or the subsequent reboot will not work. But we also provide below a way of automating at startup and shutdown.

Note the root volume size (considered to be ol/root in this example):

?
1 2 3 # lvs --units k -o lv_size ol/root LSize 10268672.00k

Create a ramdisk a little larger than that (at least 8M larger):

?
1 # modprobe brd rd_nr=1 rd_size=$((10268672 + 8*1024))

Verify the created disk:

?
1 2 3 # lsblk /dev/ram0 NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT ram0 1:0 0 9.8G 0 disk

Put the disk under lvm control

?
1 2 3 4 5 6 7 8 9 # pvcreate /dev/ram0 Physical volume "/dev/ram0" successfully created. # vgextend ol /dev/ram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/ram0 Logical volume ol/root successfully converted.

We now have ol/root mirror to our /dev/ram0 disk.

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 40.70 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

A few minutes (or seconds) later, the synchronization is completed:

?
1 2 3 4 5 6 7 8 # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/ram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/ram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0)

We have our mirrored configuration running !

For security, we can also remove the swap and /boot, /boot/efi(if it exists) mount points:

?
1 2 3 # swapoff -a # umount /boot/efi # umount /boot

Stopping the system also requires some actions as you need to cleanup the configuration so that it will not be looking for a gone ramdisk on reboot.

?
1 2 3 4 5 6 7 # lvconvert -y -m 0 ol/root /dev/ram0 Logical volume ol/root successfully converted. # vgreduce ol /dev/ram0 Removed "/dev/ram0" from volume group "ol" # mount /boot # mount /boot/efi # swapon -a
What about in-memory compression ?

As indicated above, zRAM devices can compress data in-memory, but 2 main problems need to be fixed:

Make lvm work with zram:

The lvm configuration file has to be changed to take into account the "zram" type of devices. Including the following "types" entry to the /etc/lvm/lvm.conf file in its "devices" section:

?
1 2 3 devices { types = [ "zram" , 16 ] }
Root file system I/Os:

A standard Oracle Linux installation uses XFS, and we can check the sector size used (depending on the disk type used) with

?
1 2 3 4 5 6 7 8 9 10 # xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=512 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

We can notice here that the sector size (sectsz) used on this root fs is a standard 512 bytes. This fs type cannot be mirrored with a zRAM device, and needs to be recreated with 4k sector sizes.

Transforming the root file system to 4k sector size:

This is simply a backup (to a zram disk) and restore procedure after recreating the root FS. To do so, the system has to be booted from another system image. Booting from an installation DVD image can be a good possibility.

?
1 2 3 sh-4.2 # vgchange -a y ol 2 logical volume(s) in volume group "ol" now active sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 sh-4.2 # modprobe zram sh-4.2 # echo 10G > /sys/block/zram0/disksize sh-4.2 # mkfs.xfs /dev/zram0 meta-data=/dev/zram0 isize=256 agcount=4, agsize=655360 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2621440, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mkdir /mnt2 sh-4.2 # mount /dev/zram0 /mnt2 sh-4.2 # xfsdump -L BckUp -M dump -f /mnt2/ROOT /mnt xfsdump: using file dump (drive_simple) strategy xfsdump: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsdump: level 0 dump of localhost:/mnt ... xfsdump: dump complete: 130 seconds elapsed xfsdump: Dump Summary: xfsdump: stream 0 /mnt2/ROOT OK (success) xfsdump: Dump Status: SUCCESS sh-4.2 # umount /mnt
?
1 2 3 4 5 6 7 8 9 10 11 12 sh-4.2 # mkfs.xfs -f -s size=4096 /dev/mapper/ol-root meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0, sparse=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 sh-4.2 # mount /dev/mapper/ol-root /mnt
?
1 2 3 4 5 6 7 8 9 10 11 sh-4.2 # xfsrestore -f /mnt2/ROOT /mnt xfsrestore: using file dump (drive_simple) strategy xfsrestore: version 3.1.7 (dump format 3.0) - type ^C for status and control xfsrestore: searching media for dump ... xfsrestore: restore complete: 337 seconds elapsed xfsrestore: Restore Summary: xfsrestore: stream 0 /mnt2/ROOT OK (success) xfsrestore: Restore Status: SUCCESS sh-4.2 # umount /mnt sh-4.2 # umount /mnt2
?
1 sh-4.2 # reboot
?
1 2 3 4 5 6 7 8 9 10 $ xfs_info / meta-data=/dev/mapper/ol-root isize=256 agcount=4, agsize=641792 blks = sectsz=4096 attr=2, projid32bit=1 = crc=0 finobt=0 spinodes=0 data = bsize=4096 blocks=2567168, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=2560, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0

With sectsz=4096, our system is now ready for zRAM mirroring.

Basic commands with a zRAM device: ?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 # modprobe zram # zramctl --find --size 10G /dev/zram0 # pvcreate /dev/zram0 Physical volume "/dev/zram0" successfully created. # vgextend ol /dev/zram0 Volume group "ol" successfully extended # vgscan --cache Reading volume groups from cache. Found volume group "ol" using metadata type lvm2 # lvconvert -y -m 1 ol/root /dev/zram0 Logical volume ol/root successfully converted. # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 12.38 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol Iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # lvs -a -o +devices LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert Devices root ol rwi-aor--- 9.79g 100.00 root_rimage_0(0),root_rimage_1(0) [root_rimage_0] ol iwi-aor--- 9.79g /dev/sda2(307) [root_rimage_1] ol iwi-aor--- 9.79g /dev/zram0(1) [root_rmeta_0] ol ewi-aor--- 4.00m /dev/sda2(2814) [root_rmeta_1] ol ewi-aor--- 4.00m /dev/zram0(0) swap ol -wi-ao---- <1.20g /dev/sda2(0) # zramctl NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lzo 10G 9.8G 5.3G 5.5G 1

The compressed disk uses a total of 5.5GB of memory to mirror a 9.8G volume size (using in this case 8.5G).

Removal is performed the same way as brd, except that the device is /dev/zram0 instead of /dev/ram0.

Automating the process:

Fortunately, the procedure can be automated on system boot and shutdown with the following scripts (given as examples).

The start method: /usr/sbin/start-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/start-raid1-ramdisk ]

After a chmod 555 /usr/sbin/start-raid1-ramdisk, running this script on a 4k xfs root file system should show something like:

?
1 2 3 4 5 6 7 8 9 10 11 # /usr/sbin/start-raid1-ramdisk Volume group "ol" is already consistent. RAID1 ramdisk: intending to use 10276864 K of memory for facilitation of [ / ] Physical volume "/dev/zram0" successfully created. Volume group "ol" successfully extended Logical volume ol/root successfully converted. Waiting for mirror to synchronize... LVM RAID1 sync of [ / ] took 00:01:53 sec Logical volume ol/root changed. NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lz4 9.8G 9.8G 5.5G 5.8G 1

The stop method: /usr/sbin/stop-raid1-ramdisk: [ https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/stop-raid1-ramdisk ]

After a chmod 555 /usr/sbin/stop-raid1-ramdisk, running this script should show something like:

?
1 2 3 4 5 6 # /usr/sbin/stop-raid1-ramdisk Volume group "ol" is already consistent. Logical volume ol/root changed. Logical volume ol/root successfully converted. Removed "/dev/zram0" from volume group "ol" Labels on physical volume "/dev/zram0" successfully wiped.

A service Unit file can also be created: /etc/systemd/system/raid1-ramdisk.service [https://github.com/oracle/linux-blog-sample-code/blob/ramdisk-system-image/raid1-ramdisk.service]

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 [Unit] Description=Enable RAMdisk RAID 1 on LVM After= local -fs.target Before= shutdown .target reboot.target halt.target [Service] ExecStart=/usr/sbin/start-raid1-ramdisk ExecStop=/usr/sbin/stop-raid1-ramdisk Type=oneshot RemainAfterExit= yes TimeoutSec=0 [Install] WantedBy=multi-user.target
Conclusion:

When the system disk access problem manifests itself, the ramdisk mirror branch will provide the possibility to investigate the situation. This procedure goal is not to keep the system running on this memory mirror configuration, but help investigate a bad situation.

When the problem is identified and fixed, I really recommend to come back to a standard configuration -- enjoying the entire memory of the system, a standard system disk, a possible swap space etc.

Hoping the method described here can help. I also want to thank for their reviews Philip "Bryce" Copeland who also created the first prototype of the above scripts, and Mark Kanda who also helped testing many aspects of this work.

[Nov 09, 2019] chkservice Is A systemd Unit Manager With A Terminal User Interface

The site is https://github.com/linuxenko/chkservice The tool is written in C++
Looks like in version 0.3 the author increased the complexity by adding features which probably are not needed at all
Nov 07, 2019 | www.linuxuprising.com

chkservice systemd manager
chkservice, a terminal user interface (TUI) for managing systemd units, has been updated recently with window resize and search support.

chkservice is a simplistic systemd unit manager that uses ncurses for its terminal interface. Using it you can enable or disable, and start or stop a systemd unit. It also shows the units status (enabled, disabled, static or masked).

You can navigate the chkservice user interface using keyboard shortcuts:

To enable or disable a unit press Space , and to start or stop a unity press s . You can access the help screen which shows all available keys by pressing ? .

The command line tool had its first release in August 2017, with no new releases until a few days ago when version 0.2 was released, quickly followed by 0.3.

With the latest 0.3 release, chkservice adds a search feature that allows easily searching through all systemd units.

To search, type / followed by your search query, and press Enter . To search for the next item matching your search query you'll have to type / again, followed by Enter or Ctrl + m (without entering any search text).

Another addition to the latest chkservice is window resize support. In the 0.1 version, the tool would close when the user tried to resize the terminal window. That's no longer the case now, chkservice allowing the resize of the terminal window it runs in.

And finally, the last addition to the latest chkservice 0.3 is G-g navigation support . Press G ( Shift + g ) to navigate to the bottom, and g to navigate to the top.

Download and install chkservice

The initial (0.1) chkservice version can be found in the official repositories of a few Linux distributions, including Debian and Ubuntu (and Debian or Ubuntu based Linux distribution -- e.g. Linux Mint, Pop!_OS, Elementary OS and so on).

There are some third-party repositories available as well, including a Fedora Copr, Ubuntu / Linux Mint PPA, and Arch Linux AUR, but at the time I'm writing this, only the AUR package was updated to the latest chkservice version 0.3.

You may also install chkservice from source. Use the instructions provided in the tool's readme to either create a DEB package or install it directly.

[Nov 08, 2019] Multiple Linux sysadmins working as root

No new interesting ideas for such an important topic whatsoever. One of the main problems here is documenting actions of each administrator in such a way that the set of actions was visible to everybody in a convenient and transparent matter. With multiple terminal opened Unix history is not the file from which you can deduct each sysadmin actions as parts of the history from additional terminals are missing. , not smooch access. Actually Solaris has some ideas implemented in Solaris 10, but they never made it to Linux
May 21, 2012 | serverfault.com

In our team we have three seasoned Linux sysadmins having to administer a few dozen Debian servers. Previously we have all worked as root using SSH public key authentication. But we had a discussion on what is the best practice for that scenario and couldn't agree on anything.

Everybody's SSH public key is put into ~root/.ssh/authorized_keys2

Using personalized accounts and sudo

That way we would login with personalized accounts using SSH public keys and use sudo to do single tasks with root permissions. In addition we could give ourselves the "adm" group that allows us to view log files.

Using multiple UID 0 users

This is a very unique proposal from one of the sysadmins. He suggest to create three users in /etc/passwd all having UID 0 but different login names. He claims that this is not actually forbidden and allow everyone to be UID 0 but still being able to audit.

Comments:

The second option is the best one IMHO. Personal accounts, sudo access. Disable root access via SSH completely. We have a few hundred servers and half a dozen system admins, this is how we do it.

How does agent forwarding break exactly?

Also, if it's such a hassle using sudo in front of every task you can invoke a sudo shell with sudo -s or switch to a root shell with sudo su -

thepearson thepearson 775 8 8 silver badges 18 18 bronze badges

add a comment | 9 With regard to the 3rd suggested strategy, other than perusal of the useradd -o -u userXXX options as recommended by @jlliagre, I am not familiar with running multiple users as the same uid. (hence if you do go ahead with that, I would be interested if you could update the post with any issues (or sucesses) that arise...)

I guess my first observation regarding the first option "Everybody's SSH public key is put into ~root/.ssh/authorized_keys2", is that unless you absolutely are never going to work on any other systems;

  1. then at least some of the time, you are going to have to work with user accounts and sudo

The second observation would be, that if you work on systems that aspire to HIPAA, PCI-DSS compliance, or stuff like CAPP and EAL, then you are going to have to work around the issues of sudo because;

  1. It an industry standard to provide non-root individual user accounts, that can be audited, disabled, expired, etc, typically using some centralized user database.

So; Using personalized accounts and sudo

It is unfortunate that as a sysadmin, almost everything you will need to do on a remote machine is going to require some elevated permissions, however it is annoying that most of the SSH based tools and utilities are busted while you are in sudo

Hence I can pass on some tricks that I use to work-around the annoyances of sudo that you mention. The first problem is that if root login is blocked using PermitRootLogin=no or that you do not have the root using ssh key, then it makes SCP files something of a PITA.

Problem 1 : You want to scp files from the remote side, but they require root access, however you cannot login to the remote box as root directly.

Boring Solution : copy the files to home directory, chown, and scp down.

ssh userXXX@remotesystem , sudo su - etc, cp /etc/somefiles to /home/userXXX/somefiles , chown -R userXXX /home/userXXX/somefiles , use scp to retrieve files from remote.

Less Boring Solution : sftp supports the -s sftp_server flag, hence you can do something like the following (if you have configured password-less sudo in /etc/sudoers );

sftp  -s '/usr/bin/sudo /usr/libexec/openssh/sftp-server' \
userXXX@remotehost:/etc/resolv.conf

(you can also use this hack-around with sshfs, but I am not sure its recommended... ;-)

If you don't have password-less sudo rights, or for some configured reason that method above is broken, I can suggest one more less boring file transfer method, to access remote root files.

Port Forward Ninja Method :

Login to the remote host, but specify that the remote port 3022 (can be anything free, and non-reserved for admins, ie >1024) is to be forwarded back to port 22 on the local side.

 [localuser@localmachine ~]$ ssh userXXX@remotehost -R 3022:localhost:22
Last login: Mon May 21 05:46:07 2012 from 123.123.123.123
------------------------------------------------------------------------
This is a private system; blah blah blah
------------------------------------------------------------------------

Get root in the normal fashion...

-bash-3.2$ sudo su -
[root@remotehost ~]#

Now you can scp the files in the other direction avoiding the boring boring step of making a intermediate copy of the files;

[root@remotehost ~]#  scp -o NoHostAuthenticationForLocalhost=yes \
 -P3022 /etc/resolv.conf localuser@localhost:~
localuser@localhost's password: 
resolv.conf                                 100%  
[root@remotehost ~]#

Problem 2: SSH agent forwarding : If you load the root profile, e.g. by specifying a login shell, the necessary environment variables for SSH agent forwarding such as SSH_AUTH_SOCK are reset, hence SSH agent forwarding is "broken" under sudo su - .

Half baked answer :

Anything that properly loads a root shell, is going to rightfully reset the environment, however there is a slight work-around your can use when you need BOTH root permission AND the ability to use the SSH Agent, AT THE SAME TIME

This achieves a kind of chimera profile, that should really not be used, because it is a nasty hack , but is useful when you need to SCP files from the remote host as root, to some other remote host.

Anyway, you can enable that your user can preserve their ENV variables, by setting the following in sudoers;

 Defaults:userXXX    !env_reset

this allows you to create nasty hybrid login environments like so;

login as normal;

[localuser@localmachine ~]$ ssh userXXX@remotehost 
Last login: Mon May 21 12:33:12 2012 from 123.123.123.123
------------------------------------------------------------------------
This is a private system; blah blah blah
------------------------------------------------------------------------
-bash-3.2$ env | grep SSH_AUTH
SSH_AUTH_SOCK=/tmp/ssh-qwO715/agent.1971

create a bash shell, that runs /root/.profile and /root/.bashrc . but preserves SSH_AUTH_SOCK

-bash-3.2$ sudo -E bash -l

So this shell has root permissions, and root $PATH (but a borked home directory...)

bash-3.2# id
uid=0(root) gid=0(root) groups=0(root),1(bin),2(daemon),3(sys),4(adm),6(disk),10(wheel) context=user_u:system_r:unconfined_t
bash-3.2# echo $PATH
/usr/kerberos/sbin:/usr/local/sbin:/usr/sbin:/sbin:/home/xtrabm/xtrabackup-manager:/usr/kerberos/bin:/opt/admin/bin:/usr/local/bin:/bin:/usr/bin:/opt/mx/bin

But you can use that invocation to do things that require remote sudo root, but also the SSH agent access like so;

bash-3.2# scp /root/.ssh/authorized_keys ssh-agent-user@some-other-remote-host:~
/root/.ssh/authorized_keys              100%  126     0.1KB/s   00:00    
bash-3.2#

Tom H Tom H 8,793 3 3 gold badges 34 34 silver badges 57 57 bronze badges

add a comment | 2 The 3rd option looks ideal - but have you actually tried it out to see what's happenning? While you might see the additional usernames in the authentication step, any reverse lookup is going to return the same value.

Allowing root direct ssh access is a bad idea, even if your machines are not connected to the internet / use strong passwords.

Usually I use 'su' rather than sudo for root access.

symcbean symcbean 18.8k 1 1 gold badge 24 24 silver badges 40 40 bronze badges

add a comment | 2 I use (1), but I happened to type

rm -rf / tmp *

on one ill-fated day.I can see to be bad enough if you have more than a handful admins.

(2) Is probably more engineered - and you can become full-fledged root through sudo su -. Accidents are still possible though.

(3) I would not touch with a barge pole. I used it on Suns, in order to have a non-barebone-sh root account (if I remember correctly) but it was never robust - plus I doubt it would be very auditable.

add a comment | 2 Definitely answer 2.
  1. Means that you're allowing SSH access as root . If this machine is in any way public facing, this is just a terrible idea; back when I ran SSH on port 22, my VPS got multiple attempts hourly to authenticate as root. I had a basic IDS set up to log and ban IPs that made multiple failed attempts, but they kept coming. Thankfully, I'd disabled SSH access as the root user as soon as I had my own account and sudo configured. Additionally, you have virtually no audit trail doing this.
  2. Provides root access as and when it is needed. Yes, you barely have any privileges as a standard user, but this is pretty much exactly what you want; if an account does get compromised, you want it to be limited in its abilities. You want any super user access to require a password re-entry. Additionally, sudo access can be controlled through user groups, and restricted to particular commands if you like, giving you more control over who has access to what. Additionally, commands run as sudo can be logged, so it provides a much better audit trail if things go wrong. Oh, and don't just run "sudo su -" as soon as you log in. That's terrible, terrible practice.
  3. Your sysadmin's idea is bad. And he should feel bad. No, *nix machines probably won't stop you from doing this, but both your file system, and virtually every application out there expects each user to have a unique UID. If you start going down this road, I can guarantee that you'll run into problems. Maybe not immediately, but eventually. For example, despite displaying nice friendly names, files and directories use UID numbers to designate their owners; if you run into a program that has a problem with duplicate UIDs down the line, you can't just change a UID in your passwd file later on without having to do some serious manual file system cleanup.

sudo is the way forward. It may cause additional hassle with running commands as root, but it provides you with a more secure box, both in terms of access and auditing.

Rohaq Rohaq 121 3 3 bronze badges

Definitely option 2, but use groups to give each user as much control as possible without needing to use sudo. sudo in front of every command loses half the benefit because you are always in the danger zone. If you make the relevant directories writable by the sysadmins without sudo you return sudo to the exception which makes everyone feel safer.

Julian Julian 121 4 4 bronze badges

In the old days, sudo did not exist. As a consequence, having multiple UID 0 users was the only available alternative. But it's still not that good, notably with logging based on the UID to obtain the username. Nowadays, sudo is the only appropriate solution. Forget anything else.

It is documented permissible by fact. BSD unices have had their toor account for a long time, and bashroot users tend to be accepted practice on systems where csh is standard (accepted malpractice ;)

add a comment | 0 Perhaps I'm weird, but method (3) is what popped into my mind first as well. Pros: you'd have every users name in logs and would know who did what as root. Cons: they'd each be root all the time, so mistakes can be catastrophic.

I'd like to question why you need all admins to have root access. All 3 methods you propose have one distinct disadvantage: once an admin runs a sudo bash -l or sudo su - or such, you lose your ability to track who does what and after that, a mistake can be catastrophic. Moreover, in case of possible misbehaviour, this even might end up a lot worse.

Instead you might want to consider going another way:

This way, martin would be able to safely handle postfix, and in case of mistake or misbehaviour, you'd only lose your postfix system, not entire server.

Same logic can be applied to any other subsystem, such as apache, mysql, etc.

Of course, this is purely theoretical at this point, and might be hard to set up. It does look like a better way to go tho. At least to me. If anyone tries this, please let me know how it went.

Tuncay Göncüoğlu Tuncay Göncüoğlu 561 3 3 silver badges 9 9 bronze badges

[Nov 08, 2019] How to use cron in Linux by David Both

Nov 06, 2017 | opensource.com
No time for commands? Scheduling tasks with cron means programs can run but you don't have to stay up late. 9 comments Image by : Internet Archive Book Images. Modified by Opensource.com. CC BY-SA 4.0 x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Instead, I use two service utilities that allow me to run commands, programs, and tasks at predetermined times. The cron and at services enable sysadmins to schedule tasks to run at a specific time in the future. The at service specifies a one-time task that runs at a certain time. The cron service can schedule tasks on a repetitive basis, such as daily, weekly, or monthly.

In this article, I'll introduce the cron service and how to use it.

Common (and uncommon) cron uses

I use the cron service to schedule obvious things, such as regular backups that occur daily at 2 a.m. I also use it for less obvious things.

The crond daemon is the background service that enables cron functionality.

The cron service checks for files in the /var/spool/cron and /etc/cron.d directories and the /etc/anacrontab file. The contents of these files define cron jobs that are to be run at various intervals. The individual user cron files are located in /var/spool/cron , and system services and applications generally add cron job files in the /etc/cron.d directory. The /etc/anacrontab is a special case that will be covered later in this article.

Using crontab

The cron utility runs based on commands specified in a cron table ( crontab ). Each user, including root, can have a cron file. These files don't exist by default, but can be created in the /var/spool/cron directory using the crontab -e command that's also used to edit a cron file (see the script below). I strongly recommend that you not use a standard editor (such as Vi, Vim, Emacs, Nano, or any of the many other editors that are available). Using the crontab command not only allows you to edit the command, it also restarts the crond daemon when you save and exit the editor. The crontab command uses Vi as its underlying editor, because Vi is always present (on even the most basic of installations).

New cron files are empty, so commands must be added from scratch. I added the job definition example below to my own cron files, just as a quick reference, so I know what the various parts of a command mean. Feel free to copy it for your own use.

# crontab -e
SHELL = / bin / bash
MAILTO =root @ example.com
PATH = / bin: / sbin: / usr / bin: / usr / sbin: / usr / local / bin: / usr / local / sbin

# For details see man 4 crontabs

# Example of job definition:
# .---------------- minute (0 - 59)
# | .------------- hour (0 - 23)
# | | .---------- day of month (1 - 31)
# | | | .------- month (1 - 12) OR jan,feb,mar,apr ...
# | | | | .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# | | | | |
# * * * * * user-name command to be executed

# backup using the rsbu program to the internal 4TB HDD and then 4TB external
01 01 * * * / usr / local / bin / rsbu -vbd1 ; / usr / local / bin / rsbu -vbd2

# Set the hardware clock to keep it in sync with the more accurate system clock
03 05 * * * / sbin / hwclock --systohc

# Perform monthly updates on the first of the month
# 25 04 1 * * /usr/bin/dnf -y update

The crontab command is used to view or edit the cron files.

The first three lines in the code above set up a default environment. The environment must be set to whatever is necessary for a given user because cron does not provide an environment of any kind. The SHELL variable specifies the shell to use when commands are executed. This example specifies the Bash shell. The MAILTO variable sets the email address where cron job results will be sent. These emails can provide the status of the cron job (backups, updates, etc.) and consist of the output you would see if you ran the program manually from the command line. The third line sets up the PATH for the environment. Even though the path is set here, I always prepend the fully qualified path to each executable.

There are several comment lines in the example above that detail the syntax required to define a cron job. I'll break those commands down, then add a few more to show you some more advanced capabilities of crontab files.

01 01 * * * /usr/local/bin/rsbu -vbd1 ; /usr/local/bin/rsbu -vbd2

This line in my /etc/crontab runs a script that performs backups for my systems.

This line runs my self-written Bash shell script, rsbu , that backs up all my systems. This job kicks off at 1:01 a.m. (01 01) every day. The asterisks (*) in positions three, four, and five of the time specification are like file globs, or wildcards, for other time divisions; they specify "every day of the month," "every month," and "every day of the week." This line runs my backups twice; one backs up to an internal dedicated backup hard drive, and the other backs up to an external USB drive that I can take to the safe deposit box.

The following line sets the hardware clock on the computer using the system clock as the source of an accurate time. This line is set to run at 5:03 a.m. (03 05) every day.

03 05 * * * /sbin/hwclock --systohc

This line sets the hardware clock using the system time as the source.

I was using the third and final cron job (commented out) to perform a dnf or yum update at 04:25 a.m. on the first day of each month, but I commented it out so it no longer runs.

# 25 04 1 * * /usr/bin/dnf -y update

This line used to perform a monthly update, but I've commented it out.

Other scheduling tricks

Now let's do some things that are a little more interesting than these basics. Suppose you want to run a particular job every Thursday at 3 p.m.:

00 15 * * Thu /usr/local/bin/mycronjob.sh

This line runs mycronjob.sh every Thursday at 3 p.m.

Or, maybe you need to run quarterly reports after the end of each quarter. The cron service has no option for "The last day of the month," so instead you can use the first day of the following month, as shown below. (This assumes that the data needed for the reports will be ready when the job is set to run.)

02 03 1 1,4,7,10 * /usr/local/bin/reports.sh

This cron job runs quarterly reports on the first day of the month after a quarter ends.

The following shows a job that runs one minute past every hour between 9:01 a.m. and 5:01 p.m.

01 09-17 * * * /usr/local/bin/hourlyreminder.sh

Sometimes you want to run jobs at regular times during normal business hours.

I have encountered situations where I need to run a job every two, three, or four hours. That can be accomplished by dividing the hours by the desired interval, such as */3 for every three hours, or 6-18/3 to run every three hours between 6 a.m. and 6 p.m. Other intervals can be divided similarly; for example, the expression */15 in the minutes position means "run the job every 15 minutes."

*/5 08-18/2 * * * /usr/local/bin/mycronjob.sh

This cron job runs every five minutes during every hour between 8 a.m. and 5:58 p.m.

One thing to note: The division expressions must result in a remainder of zero for the job to run. That's why, in this example, the job is set to run every five minutes (08:05, 08:10, 08:15, etc.) during even-numbered hours from 8 a.m. to 6 p.m., but not during any odd-numbered hours. For example, the job will not run at all from 9 p.m. to 9:59 a.m.

I am sure you can come up with many other possibilities based on these examples.

Limiting cron access

More Linux resources

Regular users with cron access could make mistakes that, for example, might cause system resources (such as memory and CPU time) to be swamped. To prevent possible misuse, the sysadmin can limit user access by creating a /etc/cron.allow file that contains a list of all users with permission to create cron jobs. The root user cannot be prevented from using cron.

By preventing non-root users from creating their own cron jobs, it may be necessary for root to add their cron jobs to the root crontab. "But wait!" you say. "Doesn't that run those jobs as root?" Not necessarily. In the first example in this article, the username field shown in the comments can be used to specify the user ID a job is to have when it runs. This prevents the specified non-root user's jobs from running as root. The following example shows a job definition that runs a job as the user "student":

04 07 * * * student /usr/local/bin/mycronjob.sh

If no user is specified, the job is run as the user that owns the crontab file, root in this case.

cron.d

The directory /etc/cron.d is where some applications, such as SpamAssassin and sysstat , install cron files. Because there is no spamassassin or sysstat user, these programs need a place to locate cron files, so they are placed in /etc/cron.d .

The /etc/cron.d/sysstat file below contains cron jobs that relate to system activity reporting (SAR). These cron files have the same format as a user cron file.

# Run system activity accounting tool every 10 minutes
*/ 10 * * * * root / usr / lib64 / sa / sa1 1 1
# Generate a daily summary of process accounting at 23:53
53 23 * * * root / usr / lib64 / sa / sa2 -A

The sysstat package installs the /etc/cron.d/sysstat cron file to run programs for SAR.

The sysstat cron file has two lines that perform tasks. The first line runs the sa1 program every 10 minutes to collect data stored in special binary files in the /var/log/sa directory. Then, every night at 23:53, the sa2 program runs to create a daily summary.

Scheduling tips

Some of the times I set in the crontab files seem rather random -- and to some extent they are. Trying to schedule cron jobs can be challenging, especially as the number of jobs increases. I usually have only a few tasks to schedule on each of my computers, which is simpler than in some of the production and lab environments where I have worked.

One system I administered had around a dozen cron jobs that ran every night and an additional three or four that ran on weekends or the first of the month. That was a challenge, because if too many jobs ran at the same time -- especially the backups and compiles -- the system would run out of RAM and nearly fill the swap file, which resulted in system thrashing while performance tanked, so nothing got done. We added more memory and improved how we scheduled tasks. We also removed a task that was very poorly written and used large amounts of memory.

The crond service assumes that the host computer runs all the time. That means that if the computer is turned off during a period when cron jobs were scheduled to run, they will not run until the next time they are scheduled. This might cause problems if they are critical cron jobs. Fortunately, there is another option for running jobs at regular intervals: anacron .

anacron

The anacron program performs the same function as crond, but it adds the ability to run jobs that were skipped, such as if the computer was off or otherwise unable to run the job for one or more cycles. This is very useful for laptops and other computers that are turned off or put into sleep mode.

As soon as the computer is turned on and booted, anacron checks to see whether configured jobs missed their last scheduled run. If they have, those jobs run immediately, but only once (no matter how many cycles have been missed). For example, if a weekly job was not run for three weeks because the system was shut down while you were on vacation, it would be run soon after you turn the computer on, but only once, not three times.

The anacron program provides some easy options for running regularly scheduled tasks. Just install your scripts in the /etc/cron.[hourly|daily|weekly|monthly] directories, depending how frequently they need to be run.

How does this work? The sequence is simpler than it first appears.

  1. The crond service runs the cron job specified in /etc/cron.d/0hourly .
# Run the hourly jobs
SHELL = / bin / bash
PATH = / sbin: / bin: / usr / sbin: / usr / bin
MAILTO =root
01 * * * * root run-parts / etc / cron.hourly

The contents of /etc/cron.d/0hourly cause the shell scripts located in /etc/cron.hourly to run.

  1. The cron job specified in /etc/cron.d/0hourly runs the run-parts program once per hour.
  2. The run-parts program runs all the scripts located in the /etc/cron.hourly directory.
  3. The /etc/cron.hourly directory contains the 0anacron script, which runs the anacron program using the /etdc/anacrontab configuration file shown here.
# /etc/anacrontab: configuration file for anacron

# See anacron(8) and anacrontab(5) for details.

SHELL = / bin / sh
PATH = / sbin: / bin: / usr / sbin: / usr / bin
MAILTO =root
# the maximal random delay added to the base delay of the jobs
RANDOM_DELAY = 45
# the jobs will be started during the following hours only
START_HOURS_RANGE = 3 - 22

#period in days delay in minutes job-identifier command
1 5 cron.daily nice run-parts / etc / cron.daily
7 25 cron.weekly nice run-parts / etc / cron.weekly
@ monthly 45 cron.monthly nice run-parts / etc / cron.monthly

The contents of /etc/anacrontab file runs the executable files in the cron.[daily|weekly|monthly] directories at the appropriate times.

  1. The anacron program runs the programs located in /etc/cron.daily once per day; it runs the jobs located in /etc/cron.weekly once per week, and the jobs in cron.monthly once per month. Note the specified delay times in each line that help prevent these jobs from overlapping themselves and other cron jobs.

Instead of placing complete Bash programs in the cron.X directories, I install them in the /usr/local/bin directory, which allows me to run them easily from the command line. Then I add a symlink in the appropriate cron directory, such as /etc/cron.daily .

The anacron program is not designed to run programs at specific times. Rather, it is intended to run programs at intervals that begin at the specified times, such as 3 a.m. (see the START_HOURS_RANGE line in the script just above) of each day, on Sunday (to begin the week), and on the first day of the month. If any one or more cycles are missed, anacron will run the missed jobs once, as soon as possible.

More on setting limits

I use most of these methods for scheduling tasks to run on my computers. All those tasks are ones that need to run with root privileges. It's rare in my experience that regular users really need a cron job. One case was a developer user who needed a cron job to kick off a daily compile in a development lab.

It is important to restrict access to cron functions by non-root users. However, there are circumstances when a user needs to set a task to run at pre-specified times, and cron can allow them to do that. Many users do not understand how to properly configure these tasks using cron and they make mistakes. Those mistakes may be harmless, but, more often than not, they can cause problems. By setting functional policies that cause users to interact with the sysadmin, individual cron jobs are much less likely to interfere with other users and other system functions.

It is possible to set limits on the total resources that can be allocated to individual users or groups, but that is an article for another time.

For more information, the man pages for cron , crontab , anacron , anacrontab , and run-parts all have excellent information and descriptions of how the cron system works.


Ben Cotton on 06 Nov 2017 Permalink

One problem I used to have in an old job was cron jobs that would hang for some reason. This old sysadvent post had some good suggestions for how to deal with that: http://sysadvent.blogspot.com/2009/12/cron-practices.html

Jesper Larsen on 06 Nov 2017 Permalink

Cron is definitely a good tool. But if you need to do more advanced scheduling then Apache Airflow is great for this.

Airflow has a number of advantages over Cron. The most important are: Dependencies (let tasks run after other tasks), nice web based overview, automatic failure recovery and a centralized scheduler. The disadvantages are that you will need to setup the scheduler and some other centralized components on one server and a worker on each machine you want to run stuff on.

You definitely want to use Cron for some stuff. But if you find that Cron is too limited for your use case I would recommend looking into Airflow.

Leslle Satenstein on 13 Nov 2017 Permalink

Hi David,
you have a well done article. Much appreciated. I make use of the @reboot crontab entry. With crontab and root. I run the following.

@reboot /bin/dofstrim.sh

I wanted to run fstrim for my SSD drive once and only once per week.
dofstrim.sh is a script that runs the "fstrim" program once per week, irrespective of the number of times the system is rebooted. I happen to have several Linux systems sharing one computer, and each system has a root crontab with that entry. Since I may hop from Linux to Linux in the day or several times per week, my dofstrim.sh only runs fstrim once per week, irrespective which Linux system I boot. I make use of a common partition to all Linux systems, a partition mounted as "/scratch" and the wonderful Linux command line "date" program.

The dofstrim.sh listing follows below.

#!/bin/bash
# run fstrim either once/week or once/day not once for every reboot
#
# Use the date function to extract today's day number or week number
# the day number range is 1..366, weekno is 1 to 53
#WEEKLY=0 #once per day
WEEKLY=1 #once per week
lockdir='/scratch/lock/'

if [[ WEEKLY -eq 1 ]]; then
dayno="$lockdir/dofstrim.weekno"
today=$(date +%V)
else
dayno=$lockdir/dofstrim.dayno
today=$(date +%j)
fi

prevval="000"

if [ -f "$dayno" ]
then
prevval=$(cat ${dayno} )
if [ x$prevval = x ];then
prevval="000"
fi
else
mkdir -p $lockdir
fi

if [ ${prevval} -ne ${today} ]
then
/sbin/fstrim -a
echo $today > $dayno
fi

I had thought to use anacron, but then fstrim would be run frequently as each linux's anacron would have a similar entry.
The "date" program produces a day number or a week number, depending upon the +%V or +%j

Leslle Satenstein on 13 Nov 2017 Permalink

Running a report on the last day of the month is easy if you use the date program. Use the date function from Linux as shown

*/9 15 28-31 * * [ `date -d +'1 day' +\%d` -eq 1 ] && echo "Tomorrow is the first of month Today(now) is `date`" >> /root/message

Once per day from the 28th to the 31st, the date function is executed.
If the result of date +1day is the first of the month, today must be the last day of the month.

sgtrock on 14 Nov 2017 Permalink

Why not use crontab to launch something like Ansible playbooks instead of simple bash scripts? A lot easier to troubleshoot and manage these days. :-)

[Nov 08, 2019] Bash aliases you can't live without by Seth Kenlon

Jul 31, 2019 | opensource.com

Tired of typing the same long commands over and over? Do you feel inefficient working on the command line? Bash aliases can make a world of difference. 28 comments

A Bash alias is a method of supplementing or overriding Bash commands with new ones. Bash aliases make it easy for users to customize their experience in a POSIX terminal. They are often defined in $HOME/.bashrc or $HOME/bash_aliases (which must be loaded by $HOME/.bashrc ).

Most distributions add at least some popular aliases in the default .bashrc file of any new user account. These are simple ones to demonstrate the syntax of a Bash alias:

alias ls = 'ls -F'
alias ll = 'ls -lh'

Not all distributions ship with pre-populated aliases, though. If you add aliases manually, then you must load them into your current Bash session:

$ source ~/.bashrc

Otherwise, you can close your terminal and re-open it so that it reloads its configuration file.

With those aliases defined in your Bash initialization script, you can then type ll and get the results of ls -l , and when you type ls you get, instead of the output of plain old ls .

Those aliases are great to have, but they just scratch the surface of what's possible. Here are the top 10 Bash aliases that, once you try them, you won't be able to live without.

Set up first

Before beginning, create a file called ~/.bash_aliases :

$ touch ~/.bash_aliases

Then, make sure that this code appears in your ~/.bashrc file:

if [ -e $HOME / .bash_aliases ] ; then
source $HOME / .bash_aliases
fi

If you want to try any of the aliases in this article for yourself, enter them into your .bash_aliases file, and then load them into your Bash session with the source ~/.bashrc command.

Sort by file size

If you started your computing life with GUI file managers like Nautilus in GNOME, the Finder in MacOS, or Explorer in Windows, then you're probably used to sorting a list of files by their size. You can do that in a terminal as well, but it's not exactly succinct.

Add this alias to your configuration on a GNU system:

alias lt = 'ls --human-readable --size -1 -S --classify'

This alias replaces lt with an ls command that displays the size of each item, and then sorts it by size, in a single column, with a notation to indicate the kind of file. Load your new alias, and then try it out:

$ source ~ / .bashrc
$ lt
total 344K
140K configure *
44K aclocal.m4
36K LICENSE
32K config.status *
24K Makefile
24K Makefile.in
12K config.log
8.0K README.md
4.0K info.slackermedia.Git-portal.json
4.0K git-portal.spec
4.0K flatpak.path.patch
4.0K Makefile.am *
4.0K dot-gitlab.ci.yml
4.0K configure.ac *
0 autom4te.cache /
0 share /
0 bin /
0 install-sh @
0 compile @
0 missing @
0 COPYING @

On MacOS or BSD, the ls command doesn't have the same options, so this alias works instead:

alias lt = 'du -sh * | sort -h'

The results of this version are a little different:

$ du -sh * | sort -h
0 compile
0 COPYING
0 install-sh
0 missing
4.0K configure.ac
4.0K dot-gitlab.ci.yml
4.0K flatpak.path.patch
4.0K git-portal.spec
4.0K info.slackermedia.Git-portal.json
4.0K Makefile.am
8.0K README.md
12K config.log
16K bin
24K Makefile
24K Makefile.in
32K config.status
36K LICENSE
44K aclocal.m4
60K share
140K configure
476K autom4te.cache

In fact, even on Linux, that command is useful, because using ls lists directories and symlinks as being 0 in size, which may not be the information you actually want. It's your choice.

Thanks to Brad Alexander for this alias idea.

View only mounted drives

The mount command used to be so simple. With just one command, you could get a list of all the mounted filesystems on your computer, and it was frequently used for an overview of what drives were attached to a workstation. It used to be impressive to see more than three or four entries because most computers don't have many more USB ports than that, so the results were manageable.

Computers are a little more complicated now, and between LVM, physical drives, network storage, and virtual filesystems, the results of mount can be difficult to parse:

sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,seclabel,size=8131024k,nr_inodes=2032756,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
[...]
/dev/nvme0n1p2 on /boot type ext4 (rw,relatime,seclabel)
/dev/nvme0n1p1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
[...]
gvfsd-fuse on /run/user/100977/gvfs type fuse.gvfsd-fuse (rw,nosuid,nodev,relatime,user_id=100977,group_id=100977)
/dev/sda1 on /run/media/seth/pocket type ext4 (rw,nosuid,nodev,relatime,seclabel,uhelper=udisks2)
/dev/sdc1 on /run/media/seth/trip type ext4 (rw,nosuid,nodev,relatime,seclabel,uhelper=udisks2)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)

To solve that problem, try an alias like this:

alias mnt = "mount | awk -F' ' '{ printf \" %s \t %s \n\" , \$ 1, \$ 3; }' | column -t | egrep ^/dev/ | sort"

This alias uses awk to parse the output of mount by column, reducing the output to what you probably looking for (what hard drives, and not file systems, are mounted):

$ mnt
/dev/mapper/fedora-root /
/dev/nvme0n1p1 /boot/efi
/dev/nvme0n1p2 /boot
/dev/sda1 /run/media/seth/pocket
/dev/sdc1 /run/media/seth/trip

On MacOS, the mount command doesn't provide terribly verbose output, so an alias may be overkill. However, if you prefer a succinct report, try this:

alias mnt = 'mount | grep -E ^/dev | column -t'

The results:

$ mnt
/dev/disk1s1 on / (apfs, local, journaled)
/dev/disk1s4 on /private/var/vm (apfs, local, noexec, journaled, noatime, nobrowse) Find a command in your grep history

Sometimes you figure out how to do something in the terminal, and promise yourself that you'll never forget what you've just learned. Then an hour goes by, and you've completely forgotten what you did.

Searching through your Bash history is something everyone has to do from time to time. If you know exactly what you're searching for, you can use Ctrl+R to do a reverse search through your history, but sometimes you can't remember the exact command you want to find.

Here's an alias to make that task a little easier:

alias gh = 'history|grep'

Here's an example of how to use it:

$ gh bash
482 cat ~/.bashrc | grep _alias
498 emacs ~/.bashrc
530 emacs ~/.bash_aliases
531 source ~/.bashrc Sort by modification time

It happens every Monday: You get to work, you sit down at your computer, you open a terminal, and you find you've forgotten what you were doing last Friday. What you need is an alias to list the most recently modified files.

You can use the ls command to create an alias to help you find where you left off:

alias left = 'ls -t -1'

The output is simple, although you can extend it with the -- long option if you prefer. The alias, as listed, displays this:

$ left
demo.jpeg
demo.xcf
design-proposal.md
rejects.txt
brainstorm.txt
query-letter.xml Count files

If you need to know how many files you have in a directory, the solution is one of the most classic examples of UNIX command construction: You list files with the ls command, control its output to be only one column with the -1 option, and then pipe that output to the wc (word count) command to count how many lines of single files there are.

It's a brilliant demonstration of how the UNIX philosophy allows users to build their own solutions using small system components. This command combination is also a lot to type if you happen to do it several times a day, and it doesn't exactly work for a directory of directories without using the -R option, which introduces new lines to the output and renders the exercise useless.

Instead, this alias makes the process easy:

alias count = 'find . -type f | wc -l'

This one counts files, ignoring directories, but not the contents of directories. If you have a project folder containing two directories, each of which contains two files, the alias returns four, because there are four files in the entire project.

$ ls
foo bar
$ count
4 Create a Python virtual environment

Do you code in Python?

Do you code in Python a lot?

If you do, then you know that creating a Python virtual environment requires, at the very least, 53 keystrokes.
That's 49 too many, but that's easily circumvented with two new aliases called ve and va :

alias ve = 'python3 -m venv ./venv'
alias va = 'source ./venv/bin/activate'

Running ve creates a new directory, called venv , containing the usual virtual environment filesystem for Python3. The va alias activates the environment in your current shell:

$ cd my-project
$ ve
$ va
(venv) $ Add a copy progress bar

Everybody pokes fun at progress bars because they're infamously inaccurate. And yet, deep down, we all seem to want them. The UNIX cp command has no progress bar, but it does have a -v option for verbosity, meaning that it echoes the name of each file being copied to your terminal. That's a pretty good hack, but it doesn't work so well when you're copying one big file and want some indication of how much of the file has yet to be transferred.

The pv command provides a progress bar during copy, but it's not common as a default application. On the other hand, the rsync command is included in the default installation of nearly every POSIX system available, and it's widely recognized as one of the smartest ways to copy files both remotely and locally.

Better yet, it has a built-in progress bar.

alias cpv = 'rsync -ah --info=progress2'

Using this alias is the same as using the cp command:

$ cpv bigfile.flac /run/media/seth/audio/
3.83M 6% 213.15MB/s 0:00:00 (xfr#4, to-chk=0/4)

An interesting side effect of using this command is that rsync copies both files and directories without the -r flag that cp would otherwise require.

Protect yourself from file removal accidents

You shouldn't use the rm command. The rm manual even says so:

Warning : If you use 'rm' to remove a file, it is usually possible to recover the contents of that file. If you want more assurance that the contents are truly unrecoverable, consider using 'shred'.

If you want to remove a file, you should move the file to your Trash, just as you do when using a desktop.

POSIX makes this easy, because the Trash is an accessible, actual location in your filesystem. That location may change, depending on your platform: On a FreeDesktop , the Trash is located at ~/.local/share/Trash , while on MacOS it's ~/.Trash , but either way, it's just a directory into which you place files that you want out of sight until you're ready to erase them forever.

This simple alias provides a way to toss files into the Trash bin from your terminal:

alias tcn = 'mv --force -t ~/.local/share/Trash '

This alias uses a little-known mv flag that enables you to provide the file you want to move as the final argument, ignoring the usual requirement for that file to be listed first. Now you can use your new command to move files and folders to your system Trash:

$ ls
foo bar
$ tcn foo
$ ls
bar

Now the file is "gone," but only until you realize in a cold sweat that you still need it. At that point, you can rescue the file from your system Trash; be sure to tip the Bash and mv developers on the way out.

Note: If you need a more robust Trash command with better FreeDesktop compliance, see Trashy .

Simplify your Git workflow

Everyone has a unique workflow, but there are usually repetitive tasks no matter what. If you work with Git on a regular basis, then there's probably some sequence you find yourself repeating pretty frequently. Maybe you find yourself going back to the master branch and pulling the latest changes over and over again during the day, or maybe you find yourself creating tags and then pushing them to the remote, or maybe it's something else entirely.

No matter what Git incantation you've grown tired of typing, you may be able to alleviate some pain with a Bash alias. Largely thanks to its ability to pass arguments to hooks, Git has a rich set of introspective commands that save you from having to perform uncanny feats in Bash.

For instance, while you might struggle to locate, in Bash, a project's top-level directory (which, as far as Bash is concerned, is an entirely arbitrary designation, since the absolute top level to a computer is the root directory), Git knows its top level with a simple query. If you study up on Git hooks, you'll find yourself able to find out all kinds of information that Bash knows nothing about, but you can leverage that information with a Bash alias.

Here's an alias to find the top level of a Git project, no matter where in that project you are currently working, and then to change directory to it, change to the master branch, and perform a Git pull:

alias startgit = 'cd `git rev-parse --show-toplevel` && git checkout master && git pull'

This kind of alias is by no means a universally useful alias, but it demonstrates how a relatively simple alias can eliminate a lot of laborious navigation, commands, and waiting for prompts.

A simpler, and probably more universal, alias returns you to the Git project's top level. This alias is useful because when you're working on a project, that project more or less becomes your "temporary home" directory. It should be as simple to go "home" as it is to go to your actual home, and here's an alias to do it:

alias cg = 'cd `git rev-parse --show-toplevel`'

Now the command cg takes you to the top of your Git project, no matter how deep into its directory structure you have descended.

Change directories and view the contents at the same time

It was once (allegedly) proposed by a leading scientist that we could solve many of the planet's energy problems by harnessing the energy expended by geeks typing cd followed by ls .
It's a common pattern, because generally when you change directories, you have the impulse or the need to see what's around.

But "walking" your computer's directory tree doesn't have to be a start-and-stop process.

This one's cheating, because it's not an alias at all, but it's a great excuse to explore Bash functions. While aliases are great for quick substitutions, Bash allows you to add local functions in your .bashrc file (or a separate functions file that you load into .bashrc , just as you do your aliases file).

To keep things modular, create a new file called ~/.bash_functions and then have your .bashrc load it:

if [ -e $HOME / .bash_functions ] ; then
source $HOME / .bash_functions
fi

In the functions file, add this code:

function cl () {
DIR = "$*" ;
# if no DIR given, go home
if [ $# -lt 1 ] ; then
DIR = $HOME ;
fi ;
builtin cd " ${DIR} " && \
# use your preferred ls command
ls -F --color =auto
}

Load the function into your Bash session and then try it out:

$ source ~ / .bash_functions
$ cl Documents
foo bar baz
$ pwd
/ home / seth / Documents
$ cl ..
Desktop Documents Downloads
[ ... ]
$ pwd
/ home / seth

Functions are much more flexible than aliases, but with that flexibility comes the responsibility for you to ensure that your code makes sense and does what you expect. Aliases are meant to be simple, so keep them easy, but useful. For serious modifications to how Bash behaves, use functions or custom shell scripts saved to a location in your PATH .

For the record, there are some clever hacks to implement the cd and ls sequence as an alias, so if you're patient enough, then the sky is the limit even using humble aliases.

Start aliasing and functioning

Customizing your environment is what makes Linux fun, and increasing your efficiency is what makes Linux life-changing. Get started with simple aliases, graduate to functions, and post your must-have aliases in the comments!


ACG on 31 Jul 2019 Permalink

One function I like a lot is a function that diffs a file and its backup.
It goes something like

#!/usr/bin/env bash
file="${1:?File not given}"

if [[ ! -e "$file" || ! -e "$file"~ ]]; then
echo "File doesn't exist or has no backup" 1>&2
exit 1
fi

diff --color=always "$file"{~,} | less -r

I may have gotten the if wrong, but you get the idea. I'm typing this on my phone, away from home.
Cheers

Seth Kenlon on 31 Jul 2019 Permalink

That's pretty slick! I like it.

My backup tool of choice (rdiff-backup) handles these sorts of comparisons pretty well, so I tend to be confident in my backup files. That said, there's always the edge case, and this kind of function is a great solution for those. Thanks!

Kevin Cole on 13 Aug 2019 Permalink

A few of my "cannot-live-withouts" are regex based:

Decomment removes full-line comments and blank lines. For example, when looking at a "stock" /etc/httpd/whatever.conf file that has a gazillion lines in it,

alias decomment='egrep -v "^[[:space:]]*((#|;|//).*)?$" '

will show you that only four lines in the file actually DO anything, and the gazillion minus four are comments. I use this ALL the time with config files, Python (and other languages) code, and god knows where else.

Then there's unprintables and expletives which are both very similar:

alias unprintable='grep --color="auto" -P -n "[\x00-\x1E]"'
alias expletives='grep --color="auto" -P -n "[^\x00-\x7E]" '

The first shows which lines (with line numbers) in a file contain control characters, and the second shows which lines in a file contain anything "above" a RUBOUT, er, excuse me, I mean above ASCII 127. (I feel old.) ;-) Handy when, for example, someone gives you a program that they edited or created with LibreOffice, and oops... half of the quoted strings have "real" curly opening and closing quote marks instead of ASCII 0x22 "straight" quote mark delimiters... But there's actually a few curlies you want to keep, so a "nuke 'em all in one swell foop" approach won't work.

Seth Kenlon on 14 Aug 2019 Permalink

These are great!

Dan Jones on 13 Aug 2019 Permalink

Your `cl` function could be simplified, since `cd` without arguments already goes to home.

```
function cl() {
cd "$@" && \
ls -F --color=auto
}
```

Seth Kenlon on 14 Aug 2019 Permalink

Nice!

jkeener on 20 Aug 2019 Permalink

The first alias in my .bash_aliases file is always:

alias realias='vim ~/.bash_aliases; source ~/.bash_aliases'

replace vim with your favorite editor or $VISUAL

bhuvana on 04 Oct 2019 Permalink

Thanks for this post! I have created a Github repo- https://github.com/bhuvana-guna/awesome-bash-shortcuts
with a motive to create an extended list of aliases/functions for various programs. As I am a newbie to terminal and linux, please do contribute to it with these and other super awesome utilities and help others easily access them.

[Nov 08, 2019] Perl tricks for system administrators by Ruth Holloway Feed

Notable quotes:
"... /home/<department>/<username> ..."
Jul 27, 2016 | opensource.com

Did you know that Perl is a great programming language for system administrators? Perl is platform-independent so you can do things on different operating systems without rewriting your scripts. Scripting in Perl is quick and easy, and its portability makes your scripts amazingly useful. Here are a few examples, just to get your creative juices flowing! Renaming a bunch of files

Suppose you need to rename a whole bunch of files in a directory. In this case, we've got a directory full of .xml files, and we want to rename them all to .html . Easy-peasy!

#!/usr/bin/perl
use strict ;
use warnings ;

foreach my $file ( glob "*.xml" ) {
my $new = substr ( $file , 0 , - 3 ) . "html" ;
rename $file , $new ;
}

Then just cd to the directory where you need to make the change, and run the script. You could put this in a cron job, if you needed to run it regularly, and it is easily enhanced to accept parameters.

Speaking of accepting parameters, let's take a look at a script that does just that.

Creating a Linux user account

Programming and development

Suppose you need to regularly create Linux user accounts on your system, and the format of the username is first initial/last name, as is common in many businesses. (This is, of course, a good idea, until you get John Smith and Jane Smith working at the same company -- or want John to have two accounts, as he works part-time in two different departments. But humor me, okay?) Each user account needs to be in a group based on their department, and home directories are of the format /home/<department>/<username> . Let's take a look at a script to do that:

#!/usr/bin/env perl
use strict ;
use warnings ;

my $adduser = '/usr/sbin/adduser' ;

use Getopt :: Long qw ( GetOptions ) ;

# If the user calls the script with no parameters,
# give them help!

if ( not @ ARGV ) {
usage () ;
}

# Gather our options; if they specify any undefined option,
# they'll get sent some help!

my %opts ;
GetOptions ( \%opts ,
'fname=s' ,
'lname=s' ,
'dept=s' ,
'run' ,
) or usage () ;

# Let's validate our inputs. All three parameters are
# required, and must be alphabetic.
# You could be clever, and do this with a foreach loop,
# but let's keep it simple for now.

if ( not $opts { fname } or $opts { fname } !~ /^[a-zA-Z]+$/ ) {
usage ( "First name must be alphabetic" ) ;
}
if ( not $opts { lname } or $opts { lname } !~ /^[a-zA-Z]+$/ ) {
usage ( "Last name must be alphabetic" ) ;
}
if ( not $opts { dept } or $opts { dept } !~ /^[a-zA-Z]+$/ ) {
usage ( "Department must be alphabetic" ) ;
}

# Construct the username and home directory

my $username = lc ( substr ( $opts { fname } , 0 , 1 ) . $opts { lname }) ;
my $home = "/home/$opts{dept}/$username" ;

# Show them what we've got ready to go.

print "Name: $opts{fname} $opts{lname} \n " ;
print "Username: $username \n " ;
print "Department: $opts{dept} \n " ;
print "Home directory: $home \n\n " ;

# use qq() here, so that the quotes in the --gecos flag
# get carried into the command!

my $cmd = qq ( $adduser -- home $home -- ingroup $opts { dept } \\
-- gecos "$opts{fname} $opts{lname}" $username ) ;

print "$cmd \n " ;
if ( $opts { run }) {
system $cmd ;
} else {
print "You need to add the --run flag to actually execute \n " ;
}

sub usage {
my ( $msg ) = @_ ;
if ( $msg ) {
print "$msg \n\n " ;
}
print "Usage: $0 --fname FirstName --lname LastName --dept Department --run \n " ;
exit ;
}

As with the previous script, there are opportunities for enhancement, but something like this might be all that you need for this task.

One more, just for fun!

Change copyright text in every Perl source file in a directory tree

Now we're going to try a mass edit. Suppose you've got a directory full of code, and each file has a copyright statement somewhere in it. (Rich Bowen wrote a great article, Copyright statements proliferate inside open source code a couple of years ago that discusses the wisdom of copyright statements in open source code. It is a good read, and I recommend it highly. But again, humor me.) You want to change that text in each and every file in the directory tree. File::Find and File::Slurp are your friends!

#!/usr/bin/perl
use strict ;
use warnings ;

use File :: Find qw ( find ) ;
use File :: Slurp qw ( read_file write_file ) ;

# If the user gives a directory name, use that. Otherwise,
# use the current directory.

my $dir = $ARGV [ 0 ] || '.' ;

# File::Find::find is kind of dark-arts magic.
# You give it a reference to some code,
# and a directory to hunt in, and it will
# execute that code on every file in the
# directory, and all subdirectories. In this
# case, \&change_file is the reference
# to our code, a subroutine. You could, if
# what you wanted to do was really short,
# include it in a { } block instead. But doing
# it this way is nice and readable.

find ( \&change_file , $dir ) ;

sub change_file {
my $name = $_ ;

# If the file is a directory, symlink, or other
# non-regular file, don't do anything

if ( not - f $name ) {
return ;
}
# If it's not Perl, don't do anything.

if ( substr ( $name , - 3 ) ne ".pl" ) {
return ;
}
print "$name \n " ;

# Gobble up the file, complete with carriage
# returns and everything.
# Be wary of this if you have very large files
# on a system with limited memory!

my $data = read_file ( $name ) ;

# Use a regex to make the change. If the string appears
# more than once, this will change it everywhere!

$data =~ s/Copyright Old/Copyright New/g ;

# Let's not ruin our original files

my $backup = "$name.bak" ;
rename $name , $backup ;
write_file ( $name , $data ) ;

return ;
}

Because of Perl's portability, you could use this script on a Windows system as well as a Linux system -- it Just Works because of the underlying Perl interpreter code. In our create-an-account code above, that one is not portable, but is Linux-specific because it uses Linux commands such as adduser .

In my experience, I've found it useful to have a Git repository of these things somewhere that I can clone on each new system I'm working with. Over time, you'll think of changes to make to the code to enhance the capabilities, or you'll add new scripts, and Git can help you make sure that all your tools and tricks are available on all your systems.

I hope these little scripts have given you some ideas how you can use Perl to make your system administration life a little easier. In addition to these longer scripts, take a look at a fantastic list of Perl one-liners, and links to other Perl magic assembled by Mischa Peterson.

[Nov 08, 2019] Manage NTP with Chrony by David Both

Dec 03, 2018 | opensource.com

Chronyd is a better choice for most networks than ntpd for keeping computers synchronized with the Network Time Protocol.

"Does anybody really know what time it is? Does anybody really care?"
Chicago , 1969

Perhaps that rock group didn't care what time it was, but our computers do need to know the exact time. Timekeeping is very important to computer networks. In banking, stock markets, and other financial businesses, transactions must be maintained in the proper order, and exact time sequences are critical for that. For sysadmins and DevOps professionals, it's easier to follow the trail of email through a series of servers or to determine the exact sequence of events using log files on geographically dispersed hosts when exact times are kept on the computers in question.

I used to work at an organization that received over 20 million emails per day and had four servers just to accept and do a basic filter on the incoming flood of email. From there, emails were sent to one of four other servers to perform more complex anti-spam assessments, then they were delivered to one of several additional servers where the emails were placed in the correct inboxes. At each layer, the emails would be sent to one of the next-level servers, selected only by the randomness of round-robin DNS. Sometimes we had to trace a new message through the system until we could determine where it "got lost," according to the pointy-haired bosses. We had to do this with frightening regularity.

Most of that email turned out to be spam. Some people actually complained that their [joke, cat pic, recipe, inspirational saying, or other-strange-email]-of-the-day was missing and asked us to find it. We did reject those opportunities.

Our email and other transactional searches were aided by log entries with timestamps that -- today -- can resolve down to the nanosecond in even the slowest of modern Linux computers. In very high-volume transaction environments, even a few microseconds of difference in the system clocks can mean sorting thousands of transactions to find the correct one(s).

The NTP server hierarchy

Computers worldwide use the Network Time Protocol (NTP) to synchronize their times with internet standard reference clocks via a hierarchy of NTP servers. The primary servers are at stratum 1, and they are connected directly to various national time services at stratum 0 via satellite, radio, or even modems over phone lines. The time service at stratum 0 may be an atomic clock, a radio receiver tuned to the signals broadcast by an atomic clock, or a GPS receiver using the highly accurate clock signals broadcast by GPS satellites.

To prevent time requests from time servers lower in the hierarchy (i.e., with a higher stratum number) from overwhelming the primary reference servers, there are several thousand public NTP stratum 2 servers that are open and available for anyone to use. Many organizations with large numbers of hosts that need an NTP server will set up their own time servers so that only one local host accesses the stratum 2 time servers, then they configure the remaining network hosts to use the local time server which, in my case, is a stratum 3 server.

NTP choices

The original NTP daemon, ntpd , has been joined by a newer one, chronyd . Both keep the local host's time synchronized with the time server. Both services are available, and I have seen nothing to indicate that this will change anytime soon.

Chrony has features that make it the better choice for most environments for the following reasons:

The NTP and Chrony RPM packages are available from standard Fedora repositories. You can install both and switch between them, but modern Fedora, CentOS, and RHEL releases have moved from NTP to Chrony as their default time-keeping implementation. I have found that Chrony works well, provides a better interface for the sysadmin, presents much more information, and increases control.

Just to make it clear, NTP is a protocol that is implemented with either NTP or Chrony. If you'd like to know more, read this comparison between NTP and Chrony as implementations of the NTP protocol.

This article explains how to configure Chrony clients and servers on a Fedora host, but the configuration for CentOS and RHEL current releases works the same.

Chrony structure

The Chrony daemon, chronyd , runs in the background and monitors the time and status of the time server specified in the chrony.conf file. If the local time needs to be adjusted, chronyd does it smoothly without the programmatic trauma that would occur if the clock were instantly reset to a new time.

Chrony's chronyc tool allows someone to monitor the current status of Chrony and make changes if necessary. The chronyc utility can be used as a command that accepts subcommands, or it can be used as an interactive text-mode program. This article will explain both uses.

Client configuration

The NTP client configuration is simple and requires little or no intervention. The NTP server can be defined during the Linux installation or provided by the DHCP server at boot time. The default /etc/chrony.conf file (shown below in its entirety) requires no intervention to work properly as a client. For Fedora, Chrony uses the Fedora NTP pool, and CentOS and RHEL have their own NTP server pools. Like many Red Hat-based distributions, the configuration file is well commented.

# Use public servers from the pool.ntp.org project.
# Please consider joining the pool (http://www.pool.ntp.org/join.html).
pool 2.fedora.pool.ntp.org iburst

# Record the rate at which the system clock gains/losses time.
driftfile /var/lib/chrony/drift

# Allow the system clock to be stepped in the first three updates
# if its offset is larger than 1 second.
makestep 1.0 3

# Enable kernel synchronization of the real-time clock (RTC).

# Enable hardware timestamping on all interfaces that support it.
#hwtimestamp *

# Increase the minimum number of selectable sources required to adjust
# the system clock.
#minsources 2

# Allow NTP client access from local network.
#allow 192.168.0.0/16

# Serve time even if not synchronized to a time source.
#local stratum 10

# Specify file containing keys for NTP authentication.
keyfile /etc/chrony.keys

# Get TAI-UTC offset and leap seconds from the system tz database.
leapsectz right/UTC

# Specify directory for log files.
logdir /var/log/chrony

# Select which information is logged.
#log measurements statistics tracking

Let's look at the current status of NTP on a virtual machine I use for testing. The chronyc command, when used with the tracking subcommand, provides statistics that report how far off the local system is from the reference server.

[root@studentvm1 ~]# chronyc tracking
Reference ID : 23ABED4D (ec2-35-171-237-77.compute-1.amazonaws.com)
Stratum : 3
Ref time (UTC) : Fri Nov 16 16:21:30 2018
System time : 0.000645622 seconds slow of NTP time
Last offset : -0.000308577 seconds
RMS offset : 0.000786140 seconds
Frequency : 0.147 ppm slow
Residual freq : -0.073 ppm
Skew : 0.062 ppm
Root delay : 0.041452706 seconds
Root dispersion : 0.022665167 seconds
Update interval : 1044.2 seconds
Leap status : Normal
[root@studentvm1 ~]#

The Reference ID in the first line of the result is the server the host is synchronized to -- in this case, a stratum 3 reference server that was last contacted by the host at 16:21:30 2018. The other lines are described in the chronyc(1) man page .

The sources subcommand is also useful because it provides information about the time source configured in chrony.conf .

[root@studentvm1 ~]# chronyc sources
210 Number of sources = 5
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 6 377 0 -2613us[-2613us] +/- 63ms
^+ dev.smatwebdesign.com 3 10 377 28m -2961us[-3534us] +/- 113ms
^+ propjet.latt.net 2 10 377 465 -1097us[-1085us] +/- 77ms
^* ec2-35-171-237-77.comput> 2 10 377 83 +2388us[+2395us] +/- 95ms
^+ PBX.cytranet.net 3 10 377 507 -1602us[-1589us] +/- 96ms
[root@studentvm1 ~]#

The first source in the list is the time server I set up for my personal network. The others were provided by the pool. Even though my NTP server doesn't appear in the Chrony configuration file above, my DHCP server provides its IP address for the NTP server. The "S" column -- Source State -- indicates with an asterisk ( * ) the server our host is synced to. This is consistent with the data from the tracking subcommand.

The -v option provides a nice description of the fields in this output.

[root@studentvm1 ~]# chronyc sources -v
210 Number of sources = 5

.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^+ 192.168.0.51 3 7 377 28 -2156us[-2156us] +/- 63ms
^+ triton.ellipse.net 2 10 377 24 +5716us[+5716us] +/- 62ms
^+ lithium.constant.com 2 10 377 351 -820us[ -820us] +/- 64ms
^* t2.time.bf1.yahoo.com 2 10 377 453 -992us[ -965us] +/- 46ms
^- ntp.idealab.com 2 10 377 799 +3653us[+3674us] +/- 87ms
[root@studentvm1 ~]#

If I wanted my server to be the preferred reference time source for this host, I would add the line below to the /etc/chrony.conf file.

server 192.168.0.51 iburst prefer

I usually place this line just above the first pool server statement near the top of the file. There is no special reason for this, except I like to keep the server statements together. It would work just as well at the bottom of the file, and I have done that on several hosts. This configuration file is not sequence-sensitive.

The prefer option marks this as the preferred reference source. As such, this host will always be synchronized with this reference source (as long as it is available). We can also use the fully qualified hostname for a remote reference server or the hostname only (without the domain name) for a local reference time source as long as the search statement is set in the /etc/resolv.conf file. I prefer the IP address to ensure that the time source is accessible even if DNS is not working. In most environments, the server name is probably the better option, because NTP will continue to work even if the server's IP address changes.

If you don't have a specific reference source you want to synchronize to, it is fine to use the defaults.

Configuring an NTP server with Chrony

The nice thing about the Chrony configuration file is that this single file configures the host as both a client and a server. To add a server function to our host -- it will always be a client, obtaining its time from a reference server -- we just need to make a couple of changes to the Chrony configuration, then configure the host's firewall to accept NTP requests.

Open the /etc/chrony.conf file in your favorite text editor and uncomment the local stratum 10 line. This enables the Chrony NTP server to continue to act as if it were connected to a remote reference server if the internet connection fails; this enables the host to continue to be an NTP server to other hosts on the local network.

Let's restart chronyd and track how the service is working for a few minutes. Before we enable our host as an NTP server, we want to test a bit.

[root@studentvm1 ~]# systemctl restart chronyd ; watch chronyc tracking

The results should look like this. The watch command runs the chronyc tracking command every two seconds so we can watch changes occur over time.

Every 2.0s: chronyc tracking studentvm1: Fri Nov 16 20:59:31 2018

Reference ID : C0A80033 (192.168.0.51)
Stratum : 4
Ref time (UTC) : Sat Nov 17 01:58:51 2018
System time : 0.001598277 seconds fast of NTP time
Last offset : +0.001791533 seconds
RMS offset : 0.001791533 seconds
Frequency : 0.546 ppm slow
Residual freq : -0.175 ppm
Skew : 0.168 ppm
Root delay : 0.094823152 seconds
Root dispersion : 0.021242738 seconds
Update interval : 65.0 seconds
Leap status : Normal

Notice that my NTP server, the studentvm1 host, synchronizes to the host at 192.168.0.51, which is my internal network NTP server, at stratum 4. Synchronizing directly to the Fedora pool machines would result in synchronization at stratum 3. Notice also that the amount of error decreases over time. Eventually, it should stabilize with a tiny variation around a fairly small range of error. The size of the error depends upon the stratum and other network factors. After a few minutes, use Ctrl+C to break out of the watch loop.

To turn our host into an NTP server, we need to allow it to listen on the local network. Uncomment the following line to allow hosts on the local network to access our NTP server.

# Allow NTP client access from local network.
allow 192.168.0.0/16

Note that the server can listen for requests on any local network it's attached to. The IP address in the "allow" line is just intended for illustrative purposes. Be sure to change the IP network and subnet mask in that line to match your local network's.

Restart chronyd .

[root@studentvm1 ~]# systemctl restart chronyd

To allow other hosts on your network to access this server, configure the firewall to allow inbound UDP packets on port 123. Check your firewall's documentation to find out how to do that.

Testing

Your host is now an NTP server. You can test it with another host or a VM that has access to the network on which the NTP server is listening. Configure the client to use the new NTP server as the preferred server in the /etc/chrony.conf file, then monitor that client using the chronyc tools we used above.

Chronyc as an interactive tool

As I mentioned earlier, chronyc can be used as an interactive command tool. Simply run the command without a subcommand and you get a chronyc command prompt.

[root@studentvm1 ~]# chronyc
chrony version 3.4
Copyright (C) 1997-2003, 2007, 2009-2018 Richard P. Curnow and others
chrony comes with ABSOLUTELY NO WARRANTY. This is free software, and
you are welcome to redistribute it under certain conditions. See the
GNU General Public License version 2 for details.

chronyc>

You can enter just the subcommands at this prompt. Try using the tracking , ntpdata , and sources commands. The chronyc command line allows command recall and editing for chronyc subcommands. You can use the help subcommand to get a list of possible commands and their syntax.

Conclusion

Chrony is a powerful tool for synchronizing the times of client hosts, whether they are all on the local network or scattered around the globe. It's easy to configure because, despite the large number of options available, only a few configurations are required for most circumstances.

After my client computers have synchronized with the NTP server, I like to set the system hardware clock from the system (OS) time by using the following command:

/sbin/hwclock --systohc

This command can be added as a cron job or a script in cron.daily to keep the hardware clock synced with the system time.

Chrony and NTP (the service) both use the same configuration, and the files' contents are interchangeable. The man pages for chronyd , chronyc , and chrony.conf contain an amazing amount of information that can help you get started or learn about esoteric configuration options.

Do you run your own NTP server? Let us know in the comments and be sure to tell us which implementation you are using, NTP or Chrony.

[Nov 08, 2019] Quiet log noise with Python and machine learning by Tristan de Cacqueray

Sep 28, 2018 | opensource.com

Logreduce machine learning model is trained using previous successful job runs to extract anomalies from failed runs' logs.

This principle can also be applied to other use cases, for example, extracting anomalies from Journald or other systemwide regular log files.

Using machine learning to reduce noise

A typical log file contains many nominal events ("baselines") along with a few exceptions that are relevant to the developer. Baselines may contain random elements such as timestamps or unique identifiers that are difficult to detect and remove. To remove the baseline events, we can use a k -nearest neighbors pattern recognition algorithm ( k -NN).

ml-generic-workflow.png

Log events must be converted to numeric values for k -NN regression. Using the generic feature extraction tool HashingVectorizer enables the process to be applied to any type of log. It hashes each word and encodes each event in a sparse matrix. To further reduce the search space, tokenization removes known random words, such as dates or IP addresses.

hashing-vectorizer.png

Once the model is trained, the k -NN search tells us the distance of each new event from the baseline.

kneighbors.png

This Jupyter notebook demonstrates the process and graphs the sparse matrix vectors.

anomaly-detection-with-scikit-learn.png

Introducing Logreduce

The Logreduce Python software transparently implements this process. Logreduce's initial goal was to assist with Zuul CI job failure analyses using the build database, and it is now integrated into the Software Factory development forge's job logs process.

At its simplest, Logreduce compares files or directories and removes lines that are similar. Logreduce builds a model for each source file and outputs any of the target's lines whose distances are above a defined threshold by using the following syntax: distance | filename:line-number: line-content .

$ logreduce diff / var / log / audit / audit.log.1 / var / log / audit / audit.log
INFO logreduce.Classifier - Training took 21.982s at 0.364MB / s ( 1.314kl / s ) ( 8.000 MB - 28.884 kilo-lines )
0.244 | audit.log: 19963 : type =USER_AUTH acct = "root" exe = "/usr/bin/su" hostname =managesf.sftests.com
INFO logreduce.Classifier - Testing took 18.297s at 0.306MB / s ( 1.094kl / s ) ( 5.607 MB - 20.015 kilo-lines )
99.99 % reduction ( from 20015 lines to 1

A more advanced Logreduce use can train a model offline to be reused. Many variants of the baselines can be used to fit the k -NN search tree.

$ logreduce dir-train audit.clf / var / log / audit / audit.log. *
INFO logreduce.Classifier - Training took 80.883s at 0.396MB / s ( 1.397kl / s ) ( 32.001 MB - 112.977 kilo-lines )
DEBUG logreduce.Classifier - audit.clf: written
$ logreduce dir-run audit.clf / var / log / audit / audit.log

Logreduce also implements interfaces to discover baselines for Journald time ranges (days/weeks/months) and Zuul CI job build histories. It can also generate HTML reports that group anomalies found in multiple files in a simple interface.

html-report.png Managing baselines

The key to using k -NN regression for anomaly detection is to have a database of known good baselines, which the model uses to detect lines that deviate too far. This method relies on the baselines containing all nominal events, as anything that isn't found in the baseline will be reported as anomalous.

CI jobs are great targets for k -NN regression because the job outputs are often deterministic and previous runs can be automatically used as baselines. Logreduce features Zuul job roles that can be used as part of a failed job post task in order to issue a concise report (instead of the full job's logs). This principle can be applied to other cases, as long as baselines can be constructed in advance. For example, a nominal system's SoS report can be used to find issues in a defective deployment.

baselines.png Anomaly classification service

The next version of Logreduce introduces a server mode to offload log processing to an external service where reports can be further analyzed. It also supports importing existing reports and requests to analyze a Zuul build. The services run analyses asynchronously and feature a web interface to adjust scores and remove false positives.

classification-interface.png

Reviewed reports can be archived as a standalone dataset with the target log files and the scores for anomalous lines recorded in a flat JSON file.

Project roadmap

Logreduce is already being used effectively, but there are many opportunities for improving the tool. Plans for the future include:

If you are interested in getting involved in this project, please contact us on the #log-classify Freenode IRC channel. Feedback is always appreciated!


Tristan Cacqueray will present Reduce your log noise using machine learning at the OpenStack Summit , November 13-15 in Berlin.

[Nov 08, 2019] Getting started with Logstash by Jamie Riedesel

Nov 08, 2019 | opensource.com

No longer a simple log-processing pipeline, Logstash has evolved into a powerful and versatile data processing tool. Here are basics to get you started. 19 Oct 2017 Feed 298 up Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 Logstash , an open source tool released by Elastic , is designed to ingest and transform data. It was originally built to be a log-processing pipeline to ingest logging data into ElasticSearch . Several versions later, it can do much more.

At its core, Logstash is a form of Extract-Transform-Load (ETL) pipeline. Unstructured log data is extracted , filters transform it, and the results are loaded into some form of data store.

Logstash can take a line of text like this syslog example:

Sep 11 14:13:38 vorthys sshd[16998]: Received disconnect from 192.0.2.11 port 53730:11: disconnected by user

and transform it into a much richer datastructure:

{
"timestamp" : "1505157218000" ,
"host" : "vorthys" ,
"program" : "sshd" ,
"pid" : "16998" ,
"message" : "Received disconnect from 192.0.2.11 port 53730:11: disconnected by user" ,
"sshd_action" : "disconnect" ,
"sshd_tuple" : "192.0.2.11:513730"
}

Depending on what you are using for your backing store, you can find events like this by using indexed fields rather than grepping terabytes of text. If you're generating tens to hundreds of gigabytes of logs a day, that matters.

Internal architecture

Logstash has a three-stage pipeline implemented in JRuby:

The input stage plugins extract data. This can be from logfiles, a TCP or UDP listener, one of several protocol-specific plugins such as syslog or IRC, or even queuing systems such as Redis, AQMP, or Kafka. This stage tags incoming events with metadata surrounding where the events came from.

The filter stage plugins transform and enrich the data. This is the stage that produces the sshd_action and sshd_tuple fields in the example above. This is where you'll find most of Logstash's value.

The output stage plugins load the processed events into something else, such as ElasticSearch or another document-database, or a queuing system such as Redis, AQMP, or Kafka. It can also be configured to communicate with an API. It is also possible to hook up something like PagerDuty to your Logstash outputs.

Have a cron job that checks if your backups completed successfully? It can issue an alarm in the logging stream. This is picked up by an input, and a filter config set up to catch those events marks it up, allowing a conditional output to know this event is for it. This is how you can add alarms to scripts that would otherwise need to create their own notification layers, or that operate on systems that aren't allowed to communicate with the outside world.

Threads

In general, each input runs in its own thread. The filter and output stages are more complicated. In Logstash 1.5 through 2.1, the filter stage had a configurable number of threads, with the output stage occupying a single thread. That changed in Logstash 2.2, when the filter-stage threads were built to handle the output stage. With one fewer internal queue to keep track of, throughput improved with Logstash 2.2.

If you're running an older version, it's worth upgrading to at least 2.2. When we moved from 1.5 to 2.2, we saw a 20-25% increase in overall throughput. Logstash also spent less time in wait states, so we used more of the CPU (47% vs 75%).

Configuring the pipeline

Logstash can take a single file or a directory for its configuration. If a directory is given, it reads the files in lexical order. This is important, as ordering is significant for filter plugins (we'll discuss that in more detail later).

Here is a bare Logstash config file:

input { }
filter { }
output { }

Each of these will contain zero or more plugin configurations, and there can be multiple blocks.

Input config

An input section can look like this:

input {
syslog {
port => 514
type => "syslog_server"
}
}

This tells Logstash to open the syslog { } plugin on port 514 and will set the document type for each event coming in through that plugin to be syslog_server . This plugin follows RFC 3164 only, not the newer RFC 5424.

Here is a slightly more complex input block:

# Pull in syslog data
input {
file {
path => [
"/var/log/syslog" ,
"/var/log/auth.log"
]
type => "syslog"
}
}

# Pull in application - log data. They emit data in JSON form.
input {
file {
path => [
"/var/log/app/worker_info.log" ,
"/var/log/app/broker_info.log" ,
"/var/log/app/supervisor.log"
]
exclude => "*.gz"
type => "applog"
codec => "json"
}
}

This one uses two different input { } blocks to call different invocations of the file { } plugin : One tracks system-level logs, the other tracks application-level logs. By using two different input { } blocks, a Java thread is spawned for each one. For a multi-core system, different cores keep track of the configured files; if one thread blocks, the other will continue to function.

Both of these file { } blocks could be put into the same input { } block; they would simply run in the same thread -- Logstash doesn't really care.

Filter config

The filter section is where you transform your data into something that's newer and easier to work with. Filters can get quite complex. Here are a few examples of filters that accomplish different goals:

filter {
if [ program ] == "metrics_fetcher" {
mutate {
add_tag => [ 'metrics' ]
}
}
}

In this example, if the program field, populated by the syslog plugin in the example input at the top, reads metrics_fetcher , then it tags the event metrics . This tag could be used in a later filter plugin to further enrich the data.

filter {
if "metrics" in [ tags ] {
kv {
source => "message"
target => "metrics"
}
}
}

This one runs only if metrics is in the list of tags. It then uses the kv { } plugin to populate a new set of fields based on the key=value pairs in the message field. These new keys are placed as sub-fields of the metrics field, allowing the text pages_per_second=42 faults=0 to become metrics.pages_per_second = 42 and metrics.faults = 0 on the event.

Why wouldn't you just put this in the same conditional that set the tag value? Because there are multiple ways an event could get the metrics tag -- this way, the kv filter will handle them all.

Because the filters are ordered, being sure that the filter plugin that defines the metrics tag is run before the conditional that checks for it is important. Here are guidelines to ensure your filter sections are optimally ordered:

  1. Your early filters should apply as much metadata as possible.
  2. Using the metadata, perform detailed parsing of events.
  3. In your late filters, regularize your data to reduce problems downstream.
    • Ensure field data types get cast to a unified value. priority could be boolean, integer, or string.
      • Some systems, including ElasticSearch, will quietly convert types for you. Sending strings into a boolean field won't give you the results you want.
      • Other systems will reject a value outright if it isn't in the right data type.
    • The mutate { } plugin is helpful here, as it has methods to coerce fields into specific data types.

Here are useful plugins to extract fields from long strings:

Output config

Elastic would like you to send it all into ElasticSearch, but anything that can accept a JSON document, or the datastructure it represents, can be an output. Keep in mind that events can be sent to multiple outputs. Consider this example of metrics:

output {
# Send to the local ElasticSearch port , and rotate the index daily.
elasticsearch {
hosts => [
"localhost" ,
"logelastic.prod.internal"
]
template_name => "logstash"
index => "logstash-{+YYYY.MM.dd}"
}

if "metrics" in [ tags ] {
influxdb {
host => "influx.prod.internal"
db => "logstash"
measurement => "appstats"
# This next bit only works because it is already a hash.
data_points => "%{metrics}"
send_as_tags => [ 'environment' , 'application' ]
}
}
}

Remember the metrics example above? This is how we can output it. The events tagged metrics will get sent to ElasticSearch in their full event form. In addition, the subfields under the metrics field on that event will be sent to influxdb , in the logstash database, under the appstats measurement. Along with the measurements, the values of the environment and application fields will be submitted as indexed tags.

There are a great many outputs. Here are some grouped by type:

There are many more output plugins .

[Nov 08, 2019] Vim universe. fzf - command line fuzzy finder by Alexey Samoshkin

Nov 08, 2019 | www.youtube.com

Zeeshan Jan , 1 month ago (edited)

Alexey thanks for great video, I have a question, how did you integrate the fzf and bat. When I am in my zsh using tmux then when I type fzf and search for a file I am not able to select multiple files using TAB I can do this inside VIM but not in the tmux iTerm terminal also I am not able to see the preview I have already installed bat using brew on my mac book pro. also when I type cd ** it doesn't work

Paul Hale , 4 months ago

Thanks for the video. When searching in vim dotfiles are hidden. How can we configure so that dotfiles are shown but .git and .git subfolders are ignored?

[Nov 08, 2019] 10 resources every sysadmin should know about Opensource.com

Nov 08, 2019 | opensource.com

Cheat

Having a hard time remembering a command? Normally you might resort to a man page, but some man pages have a hard time getting to the point. It's the reason Chris Allen Lane came up with the idea (and more importantly, the code) for a cheat command .

The cheat command displays cheatsheets for common tasks in your terminal. It's a man page without the preamble. It cuts to the chase and tells you exactly how to do whatever it is you're trying to do. And if it lacks a common example that you think ought to be included, you can submit an update.

$ cheat tar
# To extract an uncompressed archive:
tar -xvf '/path/to/foo.tar'

# To extract a .gz archive:
tar -xzvf '/path/to/foo.tgz'
[ ... ]

You can also treat cheat as a local cheatsheet system, which is great for all the in-house commands you and your team have invented over the years. You can easily add a local cheatsheet to your own home directory, and cheat will find and display it just as if it were a popular system command.

[Nov 08, 2019] 5 alerting and visualization tools for sysadmins

Nov 08, 2019 | opensource.com

Common types of alerts and visualizations Alerts

Let's first cover what alerts are not . Alerts should not be sent if the human responder can't do anything about the problem. This includes alerts that are sent to multiple individuals with only a few who can respond, or situations where every anomaly in the system triggers an alert. This leads to alert fatigue and receivers ignoring all alerts within a specific medium until the system escalates to a medium that isn't already saturated.

For example, if an operator receives hundreds of emails a day from the alerting system, that operator will soon ignore all emails from the alerting system. The operator will respond to a real incident only when he or she is experiencing the problem, emailed by a customer, or called by the boss. In this case, alerts have lost their meaning and usefulness.

Alerts are not a constant stream of information or a status update. They are meant to convey a problem from which the system can't automatically recover, and they are sent only to the individual most likely to be able to recover the system. Everything that falls outside this definition isn't an alert and will only damage your employees and company culture.

Everyone has a different set of alert types, so I won't discuss things like priority levels (P1-P5) or models that use words like "Informational," "Warning," and "Critical." Instead, I'll describe the generic categories emergent in complex systems' incident response.

You might have noticed I mentioned an "Informational" alert type right after I wrote that alerts shouldn't be informational. Well, not everyone agrees, but I don't consider something an alert if it isn't sent to anyone. It is a data point that many systems refer to as an alert. It represents some event that should be known but not responded to. It is generally part of the visualization system of the alerting tool and not an event that triggers actual notifications. Mike Julian covers this and other aspects of alerting in his book Practical Monitoring . It's a must read for work in this area.

Non-informational alerts consist of types that can be responded to or require action. I group these into two categories: internal outage and external outage. (Most companies have more than two levels for prioritizing their response efforts.) Degraded system performance is considered an outage in this model, as the impact to each user is usually unknown.

Internal outages are a lower priority than external outages, but they still need to be responded to quickly. They often include internal systems that company employees use or components of applications that are visible only to company employees.

External outages consist of any system outage that would immediately impact a customer. These don't include a system outage that prevents releasing updates to the system. They do include customer-facing application failures, database outages, and networking partitions that hurt availability or consistency if either can impact a user. They also include outages of tools that may not have a direct impact on users, as the application continues to run but this transparent dependency impacts performance. This is common when the system uses some external service or data source that isn't necessary for full functionality but may cause delays as the application performs retries or handles errors from this external dependency.

Visualizations

There are many visualization types, and I won't cover them all here. It's a fascinating area of research. On the data analytics side of my career, learning and applying that knowledge is a constant challenge. We need to provide simple representations of complex system outputs for the widest dissemination of information. Google Charts and Tableau have a wide selection of visualization types. We'll cover the most common visualizations and some innovative solutions for quickly understanding systems.

Line chart

The line chart is probably the most common visualization. It does a pretty good job of producing an understanding of a system over time. A line chart in a metrics system would have a line for each unique metric or some aggregation of metrics. This can get confusing when there are a lot of metrics in the same dashboard (as shown below), but most systems can select specific metrics to view rather than having all of them visible. Also, anomalous behavior is easy to spot if it's significant enough to escape the noise of normal operations. Below we can see purple, yellow, and light blue lines that might indicate anomalous behavior.

monitoring_guide_line_chart.png

Another feature of a line chart is that you can often stack them to show relationships. For example, you might want to look at requests on each server individually, but also in aggregate. This allows you to understand the overall system as well as each instance in the same graph.

monitoring_guide_line_chart_aggregate.png Heatmaps

Another common visualization is the heatmap. It is useful when looking at histograms. This type of visualization is similar to a bar chart but can show gradients within the bars representing the different percentiles of the overall metric. For example, suppose you're looking at request latencies and you want to quickly understand the overall trend as well as the distribution of all requests. A heatmap is great for this, and it can use color to disambiguate the quantity of each section with a quick glance.

The heatmap below shows the higher concentration around the centerline of the graph with an easy-to-understand visualization of the distribution vertically for each time bucket. We might want to review a couple of points in time where the distribution gets wide while the others are fairly tight like at 14:00. This distribution might be a negative performance indicator.

monitoring_guide_histogram.png Gauges

The last common visualization I'll cover here is the gauge, which helps users understand a single metric quickly. Gauges can represent a single metric, like your speedometer represents your driving speed or your gas gauge represents the amount of gas in your car. Similar to the gas gauge, most monitoring gauges clearly indicate what is good and what isn't. Often (as is shown below), good is represented by green, getting worse by orange, and "everything is breaking" by red. The middle row below shows traditional gauges.

monitoring_guide_gauges.png Image source: Grafana.org (© Grafana Labs)

This image shows more than just traditional gauges. The other gauges are single stat representations that are similar to the function of the classic gauge. They all use the same color scheme to quickly indicate system health with just a glance. Arguably, the bottom row is probably the best example of a gauge that allows you to glance at a dashboard and know that everything is healthy (or not). This type of visualization is usually what I put on a top-level dashboard. It offers a full, high-level understanding of system health in seconds.

Flame graphs

A less common visualization is the flame graph, introduced by Netflix's Brendan Gregg in 2011. It's not ideal for dashboarding or quickly observing high-level system concerns; it's normally seen when trying to understand a specific application problem. This visualization focuses on CPU and memory and the associated frames. The X-axis lists the frames alphabetically, and the Y-axis shows stack depth. Each rectangle is a stack frame and includes the function being called. The wider the rectangle, the more it appears in the stack. This method is invaluable when trying to diagnose system performance at the application level and I urge everyone to give it a try.

monitoring_guide_flame_graph.png Image source: Wikimedia.org ( Creative Commons BY SA 3.0 ) Tool options

There are several commercial options for alerting, but since this is Opensource.com, I'll cover only systems that are being used at scale by real companies that you can use at no cost. Hopefully, you'll be able to contribute new and innovative features to make these systems even better.

Alerting tools Bosun

If you've ever done anything with computers and gotten stuck, the help you received was probably thanks to a Stack Exchange system. Stack Exchange runs many different websites around a crowdsourced question-and-answer model. Stack Overflow is very popular with developers, and Super User is popular with operations. However, there are now hundreds of sites ranging from parenting to sci-fi and philosophy to bicycles.

Stack Exchange open-sourced its alert management system, Bosun , around the same time Prometheus and its AlertManager system were released. There were many similarities in the two systems, and that's a really good thing. Like Prometheus, Bosun is written in Golang. Bosun's scope is more extensive than Prometheus' as it can interact with systems beyond metrics aggregation. It can also ingest data from log and event aggregation systems. It supports Graphite, InfluxDB, OpenTSDB, and Elasticsearch.

Bosun's architecture consists of a single server binary, a backend like OpenTSDB, Redis, and scollector agents . The scollector agents automatically detect services on a host and report metrics for those processes and other system resources. This data is sent to a metrics backend. The Bosun server binary then queries the backends to determine if any alerts need to be fired. Bosun can also be used by tools like Grafana to query the underlying backends through one common interface. Redis is used to store state and metadata for Bosun.

A really neat feature of Bosun is that it lets you test your alerts against historical data. This was something I missed in Prometheus several years ago, when I had data for an issue I wanted alerts on but no easy way to test it. To make sure my alerts were working, I had to create and insert dummy data. This system alleviates that very time-consuming process.

Bosun also has the usual features like showing simple graphs and creating alerts. It has a powerful expression language for writing alerting rules. However, it only has email and HTTP notification configurations, which means connecting to Slack and other tools requires a bit more customization ( which its documentation covers ). Similar to Prometheus, Bosun can use templates for these notifications, which means they can look as awesome as you want them to. You can use all your HTML and CSS skills to create the baddest email alert anyone has ever seen.

Cabot

Cabot was created by a company called Arachnys . You may not know who Arachnys is or what it does, but you have probably felt its impact: It built the leading cloud-based solution for fighting financial crimes. That sounds pretty cool, right? At a previous company, I was involved in similar functions around "know your customer" laws. Most companies would consider it a very bad thing to be linked to a terrorist group, for example, funneling money through their systems. These solutions also help defend against less-atrocious offenders like fraudsters who could also pose a risk to the institution.

So why did Arachnys create Cabot? Well, it is kind of a Christmas present to everyone, as it was a Christmas project built because its developers couldn't wrap their heads around Nagios . And really, who can blame them? Cabot was written with Django and Bootstrap, so it should be easy for most to contribute to the project. (Another interesting factoid: The name comes from the creator's dog.)

The Cabot architecture is similar to Bosun in that it doesn't collect any data. Instead, it accesses data through the APIs of the tools it is alerting for. Therefore, Cabot uses a pull (rather than a push) model for alerting. It reaches out into each system's API and retrieves the information it needs to make a decision based on a specific check. Cabot stores the alerting data in a Postgres database and also has a cache using Redis.

Cabot natively supports Graphite , but it also supports Jenkins , which is rare in this area. Arachnys uses Jenkins like a centralized cron, but I like this idea of treating build failures like outages. Obviously, a build failure isn't as critical as a production outage, but it could still alert the team and escalate if the failure isn't resolved. Who actually checks Jenkins every time an email comes in about a build failure? Yeah, me too!

Another interesting feature is that Cabot can integrate with Google Calendar for on-call rotations. Cabot calls this feature Rota, which is a British term for a roster or rotation. This makes a lot of sense, and I wish other systems would take this idea further. Cabot doesn't support anything more complex than primary and backup personnel, but there is certainly room for additional features. The docs say if you want something more advanced, you should look at a commercial option.

StatsAgg

StatsAgg ? How did that make the list? Well, it's not every day you come across a publishing company that has created an alerting platform. I think that deserves recognition. Of course, Pearson isn't just a publishing company anymore; it has several web presences and a joint venture with O'Reilly Media . However, I still think of it as the company that published my schoolbooks and tests.

StatsAgg isn't just an alerting platform; it's also a metrics aggregation platform. And it's kind of like a proxy for other systems. It supports Graphite, StatsD, InfluxDB, and OpenTSDB as inputs, but it can also forward those metrics to their respective platforms. This is an interesting concept, but potentially risky as loads increase on a central service. However, if the StatsAgg infrastructure is robust enough, it can still produce alerts even when a backend storage platform has an outage.

StatsAgg is written in Java and consists only of the main server and UI, which keeps complexity to a minimum. It can send alerts based on regular expression matching and is focused on alerting by service rather than host or instance. Its goal is to fill a void in the open source observability stack, and I think it does that quite well.

Visualization tools Grafana

Almost everyone knows about Grafana , and many have used it. I have used it for years whenever I need a simple dashboard. The tool I used before was deprecated, and I was fairly distraught about that until Grafana made it okay. Grafana was gifted to us by Torkel Ödegaard. Like Cabot, Grafana was also created around Christmastime, and released in January 2014. It has come a long way in just a few years. It started life as a Kibana dashboarding system, and Torkel forked it into what became Grafana.

Grafana's sole focus is presenting monitoring data in a more usable and pleasing way. It can natively gather data from Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. There's an Enterprise version that uses plugins for more data sources, but there's no reason those other data source plugins couldn't be created as open source, as the Grafana plugin ecosystem already offers many other data sources.

What does Grafana do for me? It provides a central location for understanding my system. It is web-based, so anyone can access the information, although it can be restricted using different authentication methods. Grafana can provide knowledge at a glance using many different types of visualizations. However, it has started integrating alerting and other features that aren't traditionally combined with visualizations.

Now you can set alerts visually. That means you can look at a graph, maybe even one showing where an alert should have triggered due to some degradation of the system, click on the graph where you want the alert to trigger, and then tell Grafana where to send the alert. That's a pretty powerful addition that won't necessarily replace an alerting platform, but it can certainly help augment it by providing a different perspective on alerting criteria.

Grafana has also introduced more collaboration features. Users have been able to share dashboards for a long time, meaning you don't have to create your own dashboard for your Kubernetes cluster because there are several already available -- with some maintained by Kubernetes developers and others by Grafana developers.

The most significant addition around collaboration is annotations. Annotations allow a user to add context to part of a graph. Other users can then use this context to understand the system better. This is an invaluable tool when a team is in the middle of an incident and communication and common understanding are critical. Having all the information right where you're already looking makes it much more likely that knowledge will be shared across the team quickly. It's also a nice feature to use during blameless postmortems when the team is trying to understand how the failure occurred and learn more about their system.

Vizceral

Netflix created Vizceral to understand its traffic patterns better when performing a traffic failover. Unlike Grafana, which is a more general tool, Vizceral serves a very specific use case. Netflix no longer uses this tool internally and says it is no longer actively maintained, but it still updates the tool periodically. I highlight it here primarily to point out an interesting visualization mechanism and how it can help solve a problem. It's worth running it in a demo environment just to better grasp the concepts and witness what's possible with these systems.

[Nov 08, 2019] What breaks our systems A taxonomy of black swans by Laura Nolan Feed

Oct 25, 2018 | opensource.com
Find and fix outlier events that create issues before they trigger severe production problems. Black swans are a metaphor for outlier events that are severe in impact (like the 2008 financial crash). In production systems, these are the incidents that trigger problems that you didn't know you had, cause major visible impact, and can't be fixed quickly and easily by a rollback or some other standard response from your on-call playbook. They are the events you tell new engineers about years after the fact.

Black swans, by definition, can't be predicted, but sometimes there are patterns we can find and use to create defenses against categories of related problems.

For example, a large proportion of failures are a direct result of changes (code, environment, or configuration). Each bug triggered in this way is distinctive and unpredictable, but the common practice of canarying all changes is somewhat effective against this class of problems, and automated rollbacks have become a standard mitigation.

As our profession continues to mature, other kinds of problems are becoming well-understood classes of hazards with generalized prevention strategies.

Black swans observed in the wild

All technology organizations have production problems, but not all of them share their analyses. The organizations that publicly discuss incidents are doing us all a service. The following incidents describe one class of a problem and are by no means isolated instances. We all have black swans lurking in our systems; it's just some of us don't know it yet.

Hitting limits

Programming and development

Running headlong into any sort of limit can produce very severe incidents. A canonical example of this was Instapaper's outage in February 2017 . I challenge any engineer who has carried a pager to read the outage report without a chill running up their spine. Instapaper's production database was on a filesystem that, unknown to the team running the service, had a 2TB limit. With no warning, it stopped accepting writes. Full recovery took days and required migrating its database.

The organizations that publicly discuss incidents are doing us all a service. Limits can strike in various ways. Sentry hit limits on maximum transaction IDs in Postgres . Platform.sh hit size limits on a pipe buffer . SparkPost triggered AWS's DDoS protection . Foursquare hit a performance cliff when one of its datastores ran out of RAM .

One way to get advance knowledge of system limits is to test periodically. Good load testing (on a production replica) ought to involve write transactions and should involve growing each datastore beyond its current production size. It's easy to forget to test things that aren't your main datastores (such as Zookeeper). If you hit limits during testing, you have time to fix the problems. Given that resolution of limits-related issues can involve major changes (like splitting a datastore), time is invaluable.

When it comes to cloud services, if your service generates unusual loads or uses less widely used products or features (such as older or newer ones), you may be more at risk of hitting limits. It's worth load testing these, too. But warn your cloud provider first.

Finally, where limits are known, add monitoring (with associated documentation) so you will know when your systems are approaching those ceilings. Don't rely on people still being around to remember.

Spreading slowness
"The world is much more correlated than we give credit to. And so we see more of what Nassim Taleb calls 'black swan events' -- rare events happen more often than they should because the world is more correlated."
-- Richard Thaler

HostedGraphite's postmortem on how an AWS outage took down its load balancers (which are not hosted on AWS) is a good example of just how much correlation exists in distributed computing systems. In this case, the load-balancer connection pools were saturated by slow connections from customers that were hosted in AWS. The same kinds of saturation can happen with application threads, locks, and database connections -- any kind of resource monopolized by slow operations.

HostedGraphite's incident is an example of externally imposed slowness, but often slowness can result from saturation somewhere in your own system creating a cascade and causing other parts of your system to slow down. An incident at Spotify demonstrates such spread -- the streaming service's frontends became unhealthy due to saturation in a different microservice. Enforcing deadlines for all requests, as well as limiting the length of request queues, can prevent such spread. Your service will serve at least some traffic, and recovery will be easier because fewer parts of your system will be broken.

Retries should be limited with exponential backoff and some jitter. An outage at Square, in which its Redis datastore became overloaded due to a piece of code that retried failed transactions up to 500 times with no backoff, demonstrates the potential severity of excessive retries. The Circuit Breaker design pattern can be helpful here, too.

Dashboards should be designed to clearly show utilization, saturation, and errors for all resources so problems can be found quickly.

Thundering herds

Often, failure scenarios arise when a system is under unusually heavy load. This can arise organically from users, but often it arises from systems. A surge of cron jobs that starts at midnight is a venerable example. Mobile clients can also be a source of coordinated demand if they are programmed to fetch updates at the same time (of course, it is much better to jitter such requests).

Events occurring at pre-configured times aren't the only source of thundering herds. Slack experienced multiple outages over a short time due to large numbers of clients being disconnected and immediately reconnecting, causing large spikes of load. CircleCI saw a severe outage when a GitLab outage ended, leading to a surge of builds queued in its database, which became saturated and very slow.

Almost any service can be the target of a thundering herd. Planning for such eventualities -- and testing that your plan works as intended -- is therefore a must. Client backoff and load shedding are often core to such approaches.

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing.

Automation systems are complex systems
"Complex systems are intrinsically hazardous systems."
-- Richard Cook, MD

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing. The trend for the past several years has been strongly towards more automation of software operations. Automation of anything that can reduce your system's capacity (e.g., erasing disks, decommissioning devices, taking down serving jobs) needs to be done with care. Accidents (due to bugs or incorrect invocations) with this kind of automation can take down your system very efficiently, potentially in ways that are hard to recover from.

Christina Schulman and Etienne Perot of Google describe some examples in their talk Help Protect Your Data Centers with Safety Constraints . One incident sent Google's entire in-house content delivery network (CDN) to disk-erase.

Schulman and Perot suggest using a central service to manage constraints, which limits the pace at which destructive automation can operate, and being aware of system conditions (for example, avoiding destructive operations if the service has recently had an alert).

Automation systems can also cause havoc when they interact with operators (or with other automated systems). Reddit experienced a major outage when its automation restarted a system that operators had stopped for maintenance. Once you have multiple automation systems, their potential interactions become extremely complex and impossible to predict.

It will help to deal with the inevitable surprises if all this automation writes logs to an easily searchable, central place. Automation systems should always have a mechanism to allow them to be quickly turned off (fully or only for a subset of operations or targets).

Defense against the dark swans

These are not the only black swans that might be waiting to strike your systems. There are many other kinds of severe problem that can be avoided using techniques such as canarying, load testing, chaos engineering, disaster testing, and fuzz testing -- and of course designing for redundancy and resiliency. Even with all that, at some point your system will fail.

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. For example, one unpleasant issue you might have to deal with is a complete outage of your network. It's important to have a fail-safe communications channel completely independent of your own infrastructure and its dependencies. For instance, if you run on AWS, using a service that also runs on AWS as your fail-safe communication method is not a good idea. A phone bridge or an IRC server that runs somewhere separate from your main systems is good. Make sure everyone knows what the communications platform is and practices using it.

Another principle is to ensure that your monitoring and your operational tools rely on your production systems as little as possible. Separate your control and your data planes so you can make changes even when systems are not healthy. Don't use a single message queue for both data processing and config changes or monitoring, for example -- use separate instances. In SparkPost: The Day the DNS Died , Jeremy Blosser presents an example where critical tools relied on the production DNS setup, which failed.

The psychology of battling the black swan

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. Dealing with major incidents in production can be stressful. It really helps to have a structured incident-management process in place for these situations. Many technology organizations ( including Google ) successfully use a version of FEMA's Incident Command System. There should be a clear way for any on-call individual to call for assistance in the event of a major problem they can't resolve alone.

For long-running incidents, it's important to make sure people don't work for unreasonable lengths of time and get breaks to eat and sleep (uninterrupted by a pager). It's easy for exhausted engineers to make a mistake or overlook something that might resolve the incident faster.

Learn more

There are many other things that could be said about black (or formerly black) swans and strategies for dealing with them. If you'd like to learn more, I highly recommend these two books dealing with resilience and stability in production: Susan Fowler's Production-Ready Microservices and Michael T. Nygard's Release It! .


Laura Nolan will present What Breaks Our Systems: A Taxonomy of Black Swans at LISA18 , October 29-31 in Nashville, Tennessee, USA.

[Nov 08, 2019] How to prevent and recover from accidental file deletion in Linux Enable Sysadmin

trashy - Trashy · GitLab might make sense in simple cases. But often massive file deletions are about attempts to get free space.
Nov 08, 2019 | www.redhat.com
Back up

You knew this would come first. Data recovery is a time-intensive process and rarely produces 100% correct results. If you don't have a backup plan in place, start one now.

Better yet, implement two. First, provide users with local backups with a tool like rsnapshot . This utility creates snapshots of each user's data in a ~/.snapshots directory, making it trivial for them to recover their own data quickly.

There are a great many other open source backup applications that permit your users to manage their own backup schedules.

Second, while these local backups are convenient, also set up a remote backup plan for your organization. Tools like AMANDA or BackupPC are solid choices for this task. You can run them as a daemon so that backups happen automatically.

Backup planning and preparation pay for themselves in both time, and peace of mind. There's nothing like not needing emergency response procedures in the first place.

Ban rm

On modern operating systems, there is a Trash or Bin folder where users drag the files they don't want out of sight without deleting them just yet. Traditionally, the Linux terminal has no such holding area, so many terminal power users have the bad habit of permanently deleting data they believe they no longer need. Since there is no "undelete" command, this habit can be quite problematic should a power user (or administrator) accidentally delete a directory full of important data.

Many users say they favor the absolute deletion of files, claiming that they prefer their computers to do exactly what they tell them to do. Few of those users, though, forego their rm command for the more complete shred , which really removes their data. In other words, most terminal users invoke the rm command because it removes data, but take comfort in knowing that file recovery tools exist as a hacker's un- rm . Still, using those tools take up their administrator's precious time. Don't let your users -- or yourself -- fall prey to this breach of logic.

If you really want to remove data, then rm is not sufficient. Use the shred -u command instead, which overwrites, and then thoroughly deletes the specified data

However, if you don't want to actually remove data, don't use rm . This command is not feature-complete, in that it has no undo feature, but has the capacity to be undone. Instead, use trashy or trash-cli to "delete" files into a trash bin while using your terminal, like so:

$ trash ~/example.txt
$ trash --list
example.txt

One advantage of these commands is that the trash bin they use is the same your desktop's trash bin. With them, you can recover your trashed files by opening either your desktop Trash folder, or through the terminal.

If you've already developed a bad rm habit and find the trash command difficult to remember, create an alias for yourself:

$ echo "alias rm='trash'"

Even better, create this alias for everyone. Your time as a system administrator is too valuable to spend hours struggling with file recovery tools just because someone mis-typed an rm command.

Respond efficiently

Unfortunately, it can't be helped. At some point, you'll have to recover lost files, or worse. Let's take a look at emergency response best practices to make the job easier. Before you even start, understanding what caused the data to be lost in the first place can save you a lot of time:

No matter how the problem began, start your rescue mission with a few best practices:

Once you have a sense of what went wrong, It's time to choose the right tool to fix the problem. Two such tools are Scalpel and TestDisk , both of which operate just as well on a disk image as on a physical drive.

Practice (or, go break stuff)

At some point in your career, you'll have to recover data. The smart practices discussed above can minimize how often this happens, but there's no avoiding this problem. Don't wait until disaster strikes to get familiar with data recovery tools. After you set up your local and remote backups, implement command-line trash bins, and limit the rm command, it's time to practice your data recovery techniques.

Download and practice using Scalpel, TestDisk, or whatever other tools you feel might be useful. Be sure to practice data recovery safely, though. Find an old computer, install Linux onto it, and then generate, destroy, and recover. If nothing else, doing so teaches you to respect data structures, filesystems, and a good backup plan. And when the time comes and you have to put those skills to real use, you'll appreciate knowing what to do.

[Nov 08, 2019] My first sysadmin mistake by Jim Hall

Wiping out of /etc directory is one thing that sysadmin accidentally do. This is often happen if the other directory is name etc, for example /Backup/etc. In such cases you automatically put a slash in front of etc because it is ingrained in your mind. And you put the slash in front of etc subconsciously, not realizing what you are doing. And then faces consequences. If you do not use saferm, the result are pretty devastating. In most cases the sever does not die, but new logins are impossible. SSH session survives. That's why it is important to backup /etc/at the first login to the server. On modern severs it takes a couple of seconds.
If subdirectories are intact then you still can copy the content from another server. But content of sysconfig subdirectory in linux is unique to the server and you need a backup to restore it.
Notable quotes:
"... As root. I thought I was deleting some stale cache files for one of our programs. Instead, I wiped out all files in the /etc directory by mistake. Ouch. ..."
"... I put together a simple strategy: Don't reboot the server. Use an identical system as a template, and re-create the ..."
Nov 08, 2019 | opensource.com
rm command in the wrong directory. As root. I thought I was deleting some stale cache files for one of our programs. Instead, I wiped out all files in the /etc directory by mistake. Ouch.

My clue that I'd done something wrong was an error message that rm couldn't delete certain subdirectories. But the cache directory should contain only files! I immediately stopped the rm command and looked at what I'd done. And then I panicked. All at once, a million thoughts ran through my head. Did I just destroy an important server? What was going to happen to the system? Would I get fired?

Fortunately, I'd run rm * and not rm -rf * so I'd deleted only files. The subdirectories were still there. But that didn't make me feel any better.

Immediately, I went to my supervisor and told her what I'd done. She saw that I felt really dumb about my mistake, but I owned it. Despite the urgency, she took a few minutes to do some coaching with me. "You're not the first person to do this," she said. "What would someone else do in your situation?" That helped me calm down and focus. I started to think less about the stupid thing I had just done, and more about what I was going to do next.

I put together a simple strategy: Don't reboot the server. Use an identical system as a template, and re-create the /etc directory.

Once I had my plan of action, the rest was easy. It was just a matter of running the right commands to copy the /etc files from another server and edit the configuration so it matched the system. Thanks to my practice of documenting everything, I used my existing documentation to make any final adjustments. I avoided having to completely restore the server, which would have meant a huge disruption.

To be sure, I learned from that mistake. For the rest of my years as a systems administrator, I always confirmed what directory I was in before running any command.

I also learned the value of building a "mistake strategy." When things go wrong, it's natural to panic and think about all the bad things that might happen next. That's human nature. But creating a "mistake strategy" helps me stop worrying about what just went wrong and focus on making things better. I may still think about it, but knowing my next steps allows me to "get over it."

[Nov 08, 2019] 13 open source backup solutions by Don Watkins

This is mostly the list. You need to do your own research. Some improtant backup applications are not mentioned. It is unclear from it what are methods used in each, and why each of them is preferable to tar. The stress in the list is on portability (Linux plus Mc and windows, not just Linux)
Mar 07, 2019 | opensource.com

Recently, we published a poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 08, 2019] What you probably didn't know about sudo

Nov 08, 2019 | opensource.com

Enable features for a certain group of users

The sudo command comes with a huge set of defaults. Still, there are situations when you want to override some of these. This is when you use the Defaults statement in the configuration. Usually, these defaults are enforced on every user, but you can narrow the setting down to a subset of users based on host, username, and so on. Here is an example that my generation of sysadmins loves to hear about: insults. These are just some funny messages for when someone mistypes a password:

czanik @ linux-mewy:~ > sudo ls
[ sudo ] password for root:
Hold it up to the light --- not a brain in sight !
[ sudo ] password for root:
My pet ferret can type better than you !
[ sudo ] password for root:
sudo: 3 incorrect password attempts
czanik @ linux-mewy:~ >

Because not everyone is a fan of sysadmin humor, these insults are disabled by default. The following example shows how to enable this setting only for your seasoned sysadmins, who are members of the wheel group:

Defaults !insults
Defaults:%wheel insults

I do not have enough fingers to count how many people thanked me for bringing these messages back.

Digest verification

There are, of course, more serious features in sudo as well. One of them is digest verification. You can include the digest of applications in your configuration:

peter ALL = sha244:11925141bb22866afdf257ce7790bd6275feda80b3b241c108b79c88 /usr/bin/passwd

In this case, sudo checks and compares the digest of the application to the one stored in the configuration before running the application. If they do not match, sudo refuses to run the application. While it is difficult to maintain this information in your configuration -- there are no automated tools for this purpose -- these digests can provide you with an additional layer of protection.

Session recording

Session recording is also a lesser-known feature of sudo . After my demo, many people leave my talk with plans to implement it on their infrastructure. Why? Because with session recording, you see not just the command name, but also everything that happened in the terminal. You can see what your admins are doing even if they have shell access and logs only show that bash is started.

There is one limitation, currently. Records are stored locally, so with enough permissions, users can delete their traces. Stay tuned for upcoming features. New features

There is a new version of sudo right around the corner. Version 1.9 will include many interesting new features. Here are the most important planned features:

Conclusion

I hope this article proved to you that sudo is a lot more than just a simple prefix. There are tons of possibilities to fine-tune permissions on your system. You cannot just fine-tune permissions, but also improve security by checking digests. Session recordings enable you to check what is happening on your systems. You can also extend the functionality of sudo using plugins, either using something already available or writing your own. Finally, given the list of upcoming features you can see that even if sudo is decades old, it is a living project that is constantly evolving.

If you want to learn more about sudo , here are a few resources:

[Nov 08, 2019] Winterize your Bash prompt in Linux

Nov 08, 2019 | opensource.com

Your Linux terminal probably supports Unicode, so why not take advantage of that and add a seasonal touch to your prompt? 11 Dec 2018 Jason Baker (Red Hat) Feed 84 up 3 comments Image credits : Jason Baker x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Hello once again for another installment of the Linux command-line toys advent calendar. If this is your first visit to the series, you might be asking yourself what a command-line toy even is? Really, we're keeping it pretty open-ended: It's anything that's a fun diversion at the terminal, and we're giving bonus points for anything holiday-themed.

Maybe you've seen some of these before, maybe you haven't. Either way, we hope you have fun.

Today's toy is super-simple: It's your Bash prompt. Your Bash prompt? Yep! We've got a few more weeks of the holiday season left to stare at it, and even more weeks of winter here in the northern hemisphere, so why not have some fun with it.

Your Bash prompt currently might be a simple dollar sign ( $ ), or more likely, it's something a little longer. If you're not sure what makes up your Bash prompt right now, you can find it in an environment variable called $PS1. To see it, type:

echo $PS1

For me, this returns:

[\u@\h \W]\$

The \u , \h , and \W are special characters for username, hostname, and working directory. There are others you can use as well; for help building out your Bash prompt, you can use EzPrompt , an online generator of PS1 configurations that includes lots of options including date and time, Git status, and more.

You may have other variables that make up your Bash prompt set as well; $PS2 for me contains the closing brace of my command prompt. See this article for more information.

To change your prompt, simply set the environment variable in your terminal like this:

$ PS1 = '\u is cold: '
jehb is cold:

To set it permanently, add the same code to your /etc/bashrc using your favorite text editor.

So what does this have to do with winterization? Well, chances are on a modern machine, your terminal support Unicode, so you're not limited to the standard ASCII character set. You can use any emoji that's a part of the Unicode specification, including a snowflake ❄, a snowman ☃, or a pair of skis 🎿. You've got plenty of wintery options to choose from.

🎄 Christmas Tree
🧥 Coat
🦌 Deer
🧤 Gloves
🤶 Mrs. Claus
🎅 Santa Claus
🧣 Scarf
🎿 Skis
🏂 Snowboarder
❄ Snowflake
☃ Snowman
⛄ Snowman Without Snow
🎁 Wrapped Gift

Pick your favorite, and enjoy some winter cheer. Fun fact: modern filesystems also support Unicode characters in their filenames, meaning you can technically name your next program "❄❄❄❄❄.py" . That said, please don't.

Do you have a favorite command-line toy that you think I ought to include? The calendar for this series is mostly filled out but I've got a few spots left. Let me know in the comments below, and I'll check it out. If there's space, I'll try to include it. If not, but I get some good submissions, I'll do a round-up of honorable mentions at the end.

[Nov 08, 2019] How to change the default shell prompt

Jun 29, 2014 | access.redhat.com
Raw
**PS1** - The value of this parameter is expanded and used as the primary prompt string. The default value is \u@\h \W\\$ .
**PS2** - The value of this parameter is expanded as with PS1 and used as the secondary prompt string. The default is ]
**PS3** - The value of this parameter is used as the prompt for the select command
**PS4** - The value of this parameter is expanded as with PS1 and the value is printed before each command bash displays during an execution trace. The first character of PS4 is replicated multiple times, as necessary, to indicate multiple levels of indirection. The default is +
Raw
\u = username
\h = hostname
\W = current working directory
Raw
# echo $PS1
Raw
# PS1='[[prod]\u@\h \W]\$'
Raw
[[prod]root@hostname ~]#

Find this line:

Raw
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[\u@\h \W]\\$ "

And change it as needed:

Raw
[ "$PS1" = "\\s-\\v\\\$ " ] && PS1="[[prod]\u@\h \W]\\$ "

This solution is part of Red Hat's fast-track publication program, providing a huge library of solutions that Red Hat engineers have created while supporting our customers. To give you the knowledge you need the instant it becomes available, these articles may be presented in a raw and unedited form. 2 Comments Log in to comment MW Community Member 48 points

6 October 2016 1:53 PM Mike Willis

This solution has simply "Red Hat Enterprise Linux" in the Environment section implying it applies to all versions of Red Hat Enterprise Linux.

Editing /etc/bashrc is against the advice of the comments in /etc/bashrc on Red Hat Enterprise Linux 7 which say

Raw
# It's NOT a good idea to change this file unless you know what you
# are doing. It's much better to create a custom.sh shell script in
# /etc/profile.d/ to make custom changes to your environment, as this
# will prevent the need for merging in future updates.

On RHEL 7 instead of the solution suggested above create a /etc/profile.d/custom.sh which contains

Raw
PS1="[[prod]\u@\h \W]\\$ "
27 March 2019 12:44 PM Mike Chanslor

Hello Red Hat community! I also found this useful: Raw

Special prompt variable characters:
 \d   The date, in "Weekday Month Date" format (e.g., "Tue May 26"). 

 \h   The hostname, up to the first . (e.g. deckard) 
 \H   The hostname. (e.g. deckard.SS64.com)

 \j   The number of jobs currently managed by the shell. 

 \l   The basename of the shell's terminal device name. 

 \s   The name of the shell, the basename of $0 (the portion following 
      the final slash). 

 \t   The time, in 24-hour HH:MM:SS format. 
 \T   The time, in 12-hour HH:MM:SS format. 
 \@   The time, in 12-hour am/pm format. 

 \u   The username of the current user. 

 \v   The version of Bash (e.g., 2.00) 

 \V   The release of Bash, version + patchlevel (e.g., 2.00.0) 

 \w   The current working directory. 
 \W   The basename of $PWD. 

 \!   The history number of this command. 
 \#   The command number of this command. 

 \$   If you are not root, inserts a "$"; if you are root, you get a "#"  (root uid = 0) 

 \nnn   The character whose ASCII code is the octal value nnn. 

 \n   A newline. 
 \r   A carriage return. 
 \e   An escape character (typically a color code). 
 \a   A bell character.
 \\   A backslash. 

 \[   Begin a sequence of non-printing characters. (like color escape sequences). This
      allows bash to calculate word wrapping correctly.

 \]   End a sequence of non-printing characters.
Using single quotes instead of double quotes when exporting your PS variables is recommended, it makes the prompt a tiny bit faster to evaluate plus you can then do an echo $PS1 to see the current prompt settings.

[Nov 08, 2019] How to escape unicode characters in bash prompt correctly - Stack Overflow

Nov 08, 2019 | stackoverflow.com

How to escape unicode characters in bash prompt correctly Ask Question Asked 8 years, 2 months ago Active 9 months ago Viewed 6k times 7 2


Andy Ray ,Aug 18, 2011 at 19:08

I have a specific method for my bash prompt, let's say it looks like this:
CHAR="༇ "
my_function="
    prompt=\" \[\$CHAR\]\"
    echo -e \$prompt"

PS1="\$(${my_function}) \$ "

To explain the above, I'm builidng my bash prompt by executing a function stored in a string, which was a decision made as the result of this question . Let's pretend like it works fine, because it does, except when unicode characters get involved

I am trying to find the proper way to escape a unicode character, because right now it messes with the bash line length. An easy way to test if it's broken is to type a long command, execute it, press CTRL-R and type to find it, and then pressing CTRL-A CTRL-E to jump to the beginning / end of the line. If the text gets garbled then it's not working.

I have tried several things to properly escape the unicode character in the function string, but nothing seems to be working.

Special characters like this work:

COLOR_BLUE=$(tput sgr0 && tput setaf 6)

my_function="
    prompt="\\[\$COLOR_BLUE\\] \"
    echo -e \$prompt"

Which is the main reason I made the prompt a function string. That escape sequence does NOT mess with the line length, it's just the unicode character.

Andy Ray ,Aug 23, 2011 at 2:09

The \[...\] sequence says to ignore this part of the string completely, which is useful when your prompt contains a zero-length sequence, such as a control sequence which changes the text color or the title bar, say. But in this case, you are printing a character, so the length of it is not zero. Perhaps you could work around this by, say, using a no-op escape sequence to fool Bash into calculating the correct line length, but it sounds like that way lies madness.

The correct solution would be for the line length calculations in Bash to correctly grok UTF-8 (or whichever Unicode encoding it is that you are using). Uhm, have you tried without the \[...\] sequence?

Edit: The following implements the solution I propose in the comments below. The cursor position is saved, then two spaces are printed, outside of \[...\] , then the cursor position is restored, and the Unicode character is printed on top of the two spaces. This assumes a fixed font width, with double width for the Unicode character.

PS1='\['"`tput sc`"'\]  \['"`tput rc`"'༇ \] \$ '

At least in the OSX Terminal, Bash 3.2.17(1)-release, this passes cursory [sic] testing.

In the interest of transparency and legibility, I have ignored the requirement to have the prompt's functionality inside a function, and the color coding; this just changes the prompt to the character, space, dollar prompt, space. Adapt to suit your somewhat more complex needs.

tripleee ,Aug 23, 2011 at 7:01

@tripleee wins it, posting the final solution here because it's a pain to post code in comments:
CHAR="༇"
my_function="
    prompt=\" \\[`tput sc`\\]  \\[`tput rc`\\]\\[\$CHAR\\] \"
    echo -e \$prompt"

PS1="\$(${my_function}) \$ "

The trick as pointed out in @tripleee's link is the use of the commands tput sc and tput rc which save and then restore the cursor position. The code is effectively saving the cursor position, printing two spaces for width, restoring the cursor position to before the spaces, then printing the special character so that the width of the line is from the two spaces, not the character.

> ,

(Not the answer to your problem, but some pointers and general experience related to your issue.)

I see the behaviour you describe about cmd-line editing (Ctrl-R, ... Cntrl-A Ctrl-E ...) all the time, even without unicode chars.

At one work-site, I spent the time to figure out the diff between the terminals interpretation of the TERM setting VS the TERM definition used by the OS (well, stty I suppose).

NOW, when I have this problem, I escape out of my current attempt to edit the line, bring the line up again, and then immediately go to the 'vi' mode, which opens the vi editor. (press just the 'v' char, right?). All the ease of use of a full-fledged session of vi; why go with less ;-)?

Looking again at your problem description, when you say

my_function="
    prompt=\" \[\$CHAR\]\"
    echo -e \$prompt"

That is just a string definition, right? and I'm assuming your simplifying the problem definition by assuming this is the output of your my_function . It seems very likely in the steps of creating the function definition, calling the function AND using the values returned are a lot of opportunities for shell-quoting to not work the way you want it to.

If you edit your question to include the my_function definition, and its complete use (reducing your function to just what is causing the problem), it may be easier for others to help with this too. Finally, do you use set -vx regularly? It can help show how/wnen/what of variable expansions, you may find something there.

Failing all of those, look at Orielly termcap & terminfo . You may need to look at the man page for your local systems stty and related cmds AND you may do well to look for user groups specific to you Linux system (I'm assuming you use a Linux variant).

I hope this helps.

[Nov 08, 2019] 7 Bash history shortcuts you will actually use by Ian Miell Feed

Nov 08, 2019 | opensource.com

7 Bash history shortcuts you will actually use Save time on the command line with these essential Bash shortcuts. 02 Oct 2019 37 up 12 comments Image by : Opensource.com x Subscribe now

This article outlines the shortcuts I actually use every day. It is based on some of the contents of my book, Learn Bash the hard way ; (you can read a preview of it to learn more).

When people see me use these shortcuts, they often ask me, "What did you do there!?" There's minimal effort or intelligence required, but to really learn them, I recommend using one each day for a week, then moving to the next one. It's worth taking your time to get them under your fingers, as the time you save will be significant in the long run.

1. The "last argument" one: !$

If you only take one shortcut from this article, make it this one. It substitutes in the last argument of the last command into your line.

Consider this scenario:

$ mv / path / to / wrongfile / some / other / place
mv: cannot stat '/path/to/wrongfile' : No such file or directory

Ach, I put the wrongfile filename in my command. I should have put rightfile instead.

You might decide to retype the last command and replace wrongfile with rightfile completely. Instead, you can type:

$ mv / path / to / rightfile ! $
mv / path / to / rightfile / some / other / place

and the command will work.

There are other ways to achieve the same thing in Bash with shortcuts, but this trick of reusing the last argument of the last command is one I use the most.

2. The " n th argument" one: !:2

Ever done anything like this?

$ tar -cvf afolder afolder.tar
tar: failed to open

Like many others, I get the arguments to tar (and ln ) wrong more often than I would like to admit.

tar_2x.png

When you mix up arguments like that, you can run:

$ ! : 0 ! : 1 ! : 3 ! : 2
tar -cvf afolder.tar afolder

and your reputation will be saved.

The last command's items are zero-indexed and can be substituted in with the number after the !: .

Obviously, you can also use this to reuse specific arguments from the last command rather than all of them.

3. The "all the arguments" one: !:1-$

Imagine I run a command like:

$ grep '(ping|pong)' afile

The arguments are correct; however, I want to match ping or pong in a file, but I used grep rather than egrep .

I start typing egrep , but I don't want to retype the other arguments. So I can use the !:1$ shortcut to ask for all the arguments to the previous command from the second one (remember they're zero-indexed) to the last one (represented by the $ sign).

$ egrep ! : 1 -$
egrep '(ping|pong)' afile
ping

You don't need to pick 1-$ ; you can pick a subset like 1-2 or 3-9 (if you had that many arguments in the previous command).

4. The "last but n " one: !-2:$

The shortcuts above are great when I know immediately how to correct my last command, but often I run commands after the original one, which means that the last command is no longer the one I want to reference.

For example, using the mv example from before, if I follow up my mistake with an ls check of the folder's contents:

$ mv / path / to / wrongfile / some / other / place
mv: cannot stat '/path/to/wrongfile' : No such file or directory
$ ls / path / to /
rightfile

I can no longer use the !$ shortcut.

In these cases, I can insert a - n : (where n is the number of commands to go back in the history) after the ! to grab the last argument from an older command:

$ mv / path / to / rightfile ! - 2 :$
mv / path / to / rightfile / some / other / place

Again, once you learn it, you may be surprised at how often you need it.

5. The "get me the folder" one: !$:h

This one looks less promising on the face of it, but I use it dozens of times daily.

Imagine I run a command like this:

$ tar -cvf system.tar / etc / system
tar: / etc / system: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors.

The first thing I might want to do is go to the /etc folder to see what's in there and work out what I've done wrong.

I can do this at a stroke with:

$ cd ! $:h
cd / etc

This one says: "Get the last argument to the last command ( /etc/system ) and take off its last filename component, leaving only the /etc ."

6. The "the current line" one: !#:1

For years, I occasionally wondered if I could reference an argument on the current line before finally looking it up and learning it. I wish I'd done so a long time ago. I most commonly use it to make backup files:

$ cp / path / to / some / file ! #:1.bak
cp / path / to / some / file / path / to / some / file.bak

but once under the fingers, it can be a very quick alternative to

7. The "search and replace" one: !!:gs

This one searches across the referenced command and replaces what's in the first two / characters with what's in the second two.

Say I want to tell the world that my s key does not work and outputs f instead:

$ echo my f key doef not work
my f key doef not work

Then I realize that I was just hitting the f key by accident. To replace all the f s with s es, I can type:

$ !! :gs / f / s /
echo my s key does not work
my s key does not work

It doesn't work only on single characters; I can replace words or sentences, too:

$ !! :gs / does / did /
echo my s key did not work
my s key did not work Test them out

Just to show you how these shortcuts can be combined, can you work out what these toenail clippings will output?

$ ping ! #:0:gs/i/o
$ vi / tmp /! : 0 .txt
$ ls ! $:h
$ cd ! - 2 :h
$ touch ! $! - 3 :$ !! ! $.txt
$ cat ! : 1 -$ Conclusion

Bash can be an elegant source of shortcuts for the day-to-day command-line user. While there are thousands of tips and tricks to learn, these are my favorites that I frequently put to use.

If you want to dive even deeper into all that Bash can teach you, pick up my book, Learn Bash the hard way or check out my online course, Master the Bash shell .


This article was originally posted on Ian's blog, Zwischenzugs.com , and is reused with permission.

[Nov 08, 2019] A Linux user's guide to Logical Volume Management Opensource.com

Nov 08, 2019 | opensource.com

In Figure 1, two complete physical hard drives and one partition from a third hard drive have been combined into a single volume group. Two logical volumes have been created from the space in the volume group, and a filesystem, such as an EXT3 or EXT4 filesystem has been created on each of the two logical volumes.

Figure 1: LVM allows combining partitions and entire hard drives into Volume Groups.

Adding disk space to a host is fairly straightforward but, in my experience, is done relatively infrequently. The basic steps needed are listed below. You can either create an entirely new volume group or you can add the new space to an existing volume group and either expand an existing logical volume or create a new one.

Adding a new logical volume

There are times when it is necessary to add a new logical volume to a host. For example, after noticing that the directory containing virtual disks for my VirtualBox virtual machines was filling up the /home filesystem, I decided to create a new logical volume in which to store the virtual machine data, including the virtual disks. This would free up a great deal of space in my /home filesystem and also allow me to manage the disk space for the VMs independently.

The basic steps for adding a new logical volume are as follows.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create a new logical volumes (LV) from the space in the volume group.
  6. Create a filesystem on the new logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.

Now for the details. The following sequence is taken from an example I used as a lab project when teaching about Linux filesystems.

Example

This example shows how to use the CLI to extend an existing volume group to add more space to it, create a new logical volume in that space, and create a filesystem on the logical volume. This procedure can be performed on a running, mounted filesystem.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install hard drive

If there is not enough space in the volume group on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive, and then perform the following steps.

Create Physical Volume from hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Extend the existing Volume Group

In this example we will extend an existing volume group rather than creating a new one; you can choose to do it either way. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Create the Logical Volume

First create the Logical Volume (LV) from existing free space within the Volume Group. The command below creates a LV with a size of 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvcreate -L +50G --name Stuff MyVG01
Create the filesystem

Creating the Logical Volume does not create the filesystem. That task must be performed separately. The command below creates an EXT4 filesystem that fits the newly created Logical Volume.

mkfs -t ext4 /dev/MyVG01/Stuff
Add a filesystem label

Adding a filesystem label makes it easy to identify the filesystem later in case of a crash or other disk related problems.

e2label /dev/MyVG01/Stuff Stuff
Mount the filesystem

At this point you can create a mount point, add an appropriate entry to the /etc/fstab file, and mount the filesystem.

You should also check to verify the volume has been created correctly. You can use the df , lvs, and vgs commands to do this.

Resizing a logical volume in an LVM filesystem

The need to resize a filesystem has been around since the beginning of the first versions of Unix and has not gone away with Linux. It has gotten easier, however, with Logical Volume Management.

  1. If necessary, install a new hard drive.
  2. Optional: Create a partition on the hard drive.
  3. Create a physical volume (PV) of the complete hard drive or a partition on the hard drive.
  4. Assign the new physical volume to an existing volume group (VG) or create a new volume group.
  5. Create one or more logical volumes (LV) from the space in the volume group, or expand an existing logical volume with some or all of the new space in the volume group.
  6. If you created a new logical volume, create a filesystem on it. If adding space to an existing logical volume, use the resize2fs command to enlarge the filesystem to fill the space in the logical volume.
  7. Add appropriate entries to /etc/fstab for mounting the filesystem.
  8. Mount the filesystem.
Example

This example describes how to resize an existing Logical Volume in an LVM environment using the CLI. It adds about 50GB of space to the /Stuff filesystem. This procedure can be used on a mounted, live filesystem only with the Linux 2.6 Kernel (and higher) and EXT3 and EXT4 filesystems. I do not recommend that you do so on any critical system, but it can be done and I have done so many times; even on the root (/) filesystem. Use your judgment.

WARNING: Only the EXT3 and EXT4 filesystems can be resized on the fly on a running, mounted filesystem. Many other filesystems including BTRFS and ZFS cannot be resized.

Install the hard drive

If there is not enough space on the existing hard drive(s) in the system to add the desired amount of space it may be necessary to add a new hard drive and create the space to add to the Logical Volume. First, install the physical hard drive and then perform the following steps.

Create a Physical Volume from the hard drive

It is first necessary to create a new Physical Volume (PV). Use the command below, which assumes that the new hard drive is assigned as /dev/hdd.

pvcreate /dev/hdd

It is not necessary to create a partition of any kind on the new hard drive. This creation of the Physical Volume which will be recognized by the Logical Volume Manager can be performed on a newly installed raw disk or on a Linux partition of type 83. If you are going to use the entire hard drive, creating a partition first does not offer any particular advantages and uses disk space for metadata that could otherwise be used as part of the PV.

Add PV to existing Volume Group

For this example, we will use the new PV to extend an existing Volume Group. After the Physical Volume has been created, extend the existing Volume Group (VG) to include the space on the new PV. In this example, the existing Volume Group is named MyVG01.

vgextend /dev/MyVG01 /dev/hdd
Extend the Logical Volume

Extend the Logical Volume (LV) from existing free space within the Volume Group. The command below expands the LV by 50GB. The Volume Group name is MyVG01 and the Logical Volume Name is Stuff.

lvextend -L +50G /dev/MyVG01/Stuff
Expand the filesystem

Extending the Logical Volume will also expand the filesystem if you use the -r option. If you do not use the -r option, that task must be performed separately. The command below resizes the filesystem to fit the newly resized Logical Volume.

resize2fs /dev/MyVG01/Stuff

You should check to verify the resizing has been performed correctly. You can use the df , lvs, and vgs commands to do this.

Tips

Over the years I have learned a few things that can make logical volume management even easier than it already is. Hopefully these tips can prove of some value to you.

I know that, like me, many sysadmins have resisted the change to Logical Volume Management. I hope that this article will encourage you to at least try LVM. I am really glad that I did; my disk management tasks are much easier since I made the switch. Topics Business Linux How-tos and tutorials Sysadmin About the author David Both - David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker who lives in Raleigh North Carolina. He is a strong proponent of and evangelist for the "Linux Philosophy." David has been in the IT industry for nearly 50 years. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for over 20 years. David prefers to purchase the components and build his...

[Nov 08, 2019] 10 killer tools for the admin in a hurry Opensource.com

Nov 08, 2019 | opensource.com

NixCraft
Use the site's internal search function. With more than a decade of regular updates, there's gold to be found here -- useful scripts and handy hints that can solve your problem straight away. This is often the second place I look after Google.

Webmin
This gives you a nice web interface to remotely edit your configuration files. It cuts down on a lot of time spent having to juggle directory paths and sudo nano , which is handy when you're handling several customers.

Windows Subsystem for Linux
The reality of the modern workplace is that most employees are on Windows, while the grown-up gear in the server room is on Linux. So sometimes you find yourself trying to do admin tasks from (gasp) a Windows desktop.

What do you do? Install a virtual machine? It's actually much faster and far less work to configure if you install the Windows Subsystem for Linux compatibility layer, now available at no cost on Windows 10.

This gives you a Bash terminal in a window where you can run Bash scripts and Linux binaries on the local machine, have full access to both Windows and Linux filesystems, and mount network drives. It's available in Ubuntu, OpenSUSE, SLES, Debian, and Kali flavors.

mRemoteNG
This is an excellent SSH and remote desktop client for when you have 100+ servers to manage.

Setting up a network so you don't have to do it again

A poorly planned network is the sworn enemy of the admin who hates working overtime.

IP Addressing Schemes that Scale
The diabolical thing about running out of IP addresses is that, when it happens, the network's grown large enough that a new addressing scheme is an expensive, time-consuming pain in the proverbial.

Ain't nobody got time for that!

At some point, IPv6 will finally arrive to save the day. Until then, these one-size-fits-most IP addressing schemes should keep you going, no matter how many network-connected wearables, tablets, smart locks, lights, security cameras, VoIP headsets, and espresso machines the world throws at us.

Linux Chmod Permissions Cheat Sheet
A short but sweet cheat sheet of Bash commands to set permissions across the network. This is so when Bill from Customer Service falls for that ransomware scam, you're recovering just his files and not the entire company's.

VLSM Subnet Calculator
Just put in the number of networks you want to create from an address space and the number of hosts you want per network, and it calculates what the subnet mask should be for everything.

Single-purpose Linux distributions

Need a Linux box that does just one thing? It helps if someone else has already sweated the small stuff on an operating system you can install and have ready immediately.

Each of these has, at one point, made my work day so much easier.

Porteus Kiosk
This is for when you want a computer totally locked down to just a web browser. With a little tweaking, you can even lock the browser down to just one website. This is great for public access machines. It works with touchscreens or with a keyboard and mouse.

Parted Magic
This is an operating system you can boot from a USB drive to partition hard drives, recover data, and run benchmarking tools.

IPFire
Hahahaha, I still can't believe someone called a router/firewall/proxy combo "I pee fire." That's my second favorite thing about this Linux distribution. My favorite is that it's a seriously solid software suite. It's so easy to set up and configure, and there is a heap of plugins available to extend it.

What about your top tools and cheat sheets?

So, how about you? What tools, resources, and cheat sheets have you found to make the workday easier? I'd love to know. Please share in the comments.

[Nov 08, 2019] Command-line tools for collecting system statistics Opensource.com

Nov 08, 2019 | opensource.com

Examining collected data

The output from the sar command can be detailed, or you can choose to limit the data displayed. For example, enter the sar command with no options, which displays only aggregate CPU performance data. The sar command uses the current day by default, starting at midnight, so you should only see the CPU data for today.

On the other hand, using the sar -A command shows all of the data that has been collected for today. Enter the sar -A | less command now and page through the output to view the many types of data collected by SAR, including disk and network usage, CPU context switches (how many times per second the CPU switched from one program to another), page swaps, memory and swap space usage, and much more. Use the man page for the sar command to interpret the results and to get an idea of the many options available. Many of those options allow you to view specific data, such as network and disk performance.

I typically use the sar -A command because many of the types of data available are interrelated, and sometimes I find something that gives me a clue to a performance problem in a section of the output that I might not have looked at otherwise. The -A option displays all of the collected data types.

Look at the entire output of the sar -A | less command to get a feel for the type and amount of data displayed. Be sure to look at the CPU usage data as well as the processes started per second (proc/s) and context switches per second (cswch/s). If the number of context switches increases rapidly, that can indicate that running processes are being swapped off the CPU very frequently.

You can limit the total amount of data to the total CPU activity with the sar -u command. Try that and notice that you only get the composite CPU data, not the data for the individual CPUs. Also try the -r option for memory, and -S for swap space. Combining these options so the following command will display CPU, memory, and swap space is also possible:

sar -urS

Using the -p option displays block device names for hard drives instead of the much more cryptic device identifiers, and -d displays only the block devices -- the hard drives. Issue the following command to view all of the block device data in a readable format using the names as they are found in the /dev directory:

sar -dp | less

If you want only data between certain times, you can use -s and -e to define the start and end times, respectively. The following command displays all CPU data, both individual and aggregate for the time period between 7:50 AM and 8:11 AM today:

sar -P ALL -s 07:50:00 -e 08:11:00

Note that all times must be in 24-hour format. If you have multiple CPUs, each CPU is detailed individually, and the average for all CPUs is also given.

The next command uses the -n option to display network statistics for all interfaces:

sar -n ALL | less
Data for previous days

Data collected for previous days can also be examined by specifying the desired log file. Assume that today's date is September 3 and you want to see the data for yesterday, the following command displays all collected data for September 2. The last two digits of each file are the day of the month on which the data was collected:

sar -A -f /var/log/sa/sa02 | less

You can use the command below, where DD is the day of the month for yesterday:

sar -A -f /var/log/sa/saDD | less
Realtime data

You can also use SAR to display (nearly) realtime data. The following command displays memory usage in 5- second intervals for 10 iterations:

sar -r 5 10

This is an interesting option for sar as it can provide a series of data points for a defined period of time that can be examined in detail and compared. The /proc filesystem All of this data for SAR and the system monitoring tools covered in my previous article must come from somewhere. Fortunately, all of that kernel data is easily available in the /proc filesystem. In fact, because the kernel performance data stored there is all in ASCII text format, it can be displayed using simple commands like cat so that the individual programs do not have to load their own kernel modules to collect it. This saves system resources and makes the data more accurate. SAR and the system monitoring tools I have discussed in my previous article all collect their data from the /proc filesystem.

Note that /proc is a virtual filesystem and only exists in RAM while Linux is running. It is not stored on the hard drive.

Even though I won't get into detail, the /proc filesystem also contains the live kernel tuning parameters and variables. Thus you can change the kernel tuning by simply changing the appropriate kernel tuning variable in /proc; no reboot is required.

Change to the /proc directory and list the files there.You will see, in addition to the data files, a large quantity of numbered directories. Each of these directories represents a process where the directory name is the Process ID (PID). You can delve into those directories to locate information about individual processes that might be of interest.

To view this data, simply cat some of the following files:

You will see that, although the data is available in these files, much of it is not annotated in any way. That means you will have work to do to identify and extract the desired data. However, the monitoring tools already discussed already do that for the data they are designed to display.

There is so much more data in the /proc filesystem that the best way to learn more about it is to refer to the proc(5) man page, which contains detailed information about the various files found there.

Next time I will pull all this together and discuss how I have used these tools to solve problems.

David Both - David Both is an Open Source Software and GNU/Linux advocate, trainer, writer, and speaker who lives in Raleigh North Carolina. He is a strong proponent of and evangelist for the "Linux Philosophy." David has been in the IT industry for nearly 50 years. He has taught RHCE classes for Red Hat and has worked at MCI Worldcom, Cisco, and the State of North Carolina. He has been working with Linux and Open Source Software for over 20 years.

[Nov 08, 2019] How to use Sanoid to recover from data disasters Opensource.com

Nov 08, 2019 | opensource.com

filesystem-level snapshot replication to move data from one machine to another, fast . For enormous blobs like virtual machine images, we're talking several orders of magnitude faster than rsync .

If that isn't cool enough already, you don't even necessarily need to restore from backup if you lost the production hardware; you can just boot up the VM directly on the local hotspare hardware, or the remote disaster recovery hardware, as appropriate. So even in case of catastrophic hardware failure , you're still looking at that 59m RPO, <1m RTO.

https://www.youtube.com/embed/5hEixXutaPo

Backups -- and recoveries -- don't get much easier than this.

The syntax is dead simple:

root@box1:~# syncoid pool/images/vmname root@box2:pooln
ame/images/vmname

Or if you have lots of VMs, like I usually do... recursion!

root@box1:~# syncoid -r pool/images/vmname root@box2:po
olname/images/vmname

This makes it not only possible, but easy to replicate multiple-terabyte VM images hourly over a local network, and daily over a VPN. We're not talking enterprise 100mbps symmetrical fiber, either. Most of my clients have 5mbps or less available for upload, which doesn't keep them from automated, nightly over-the-air backups, usually to a machine sitting quietly in an owner's house.

Preventing your own Humpty Level Events

Sanoid is open source software, and so are all its dependencies. You can run Sanoid and Syncoid themselves on pretty much anything with ZFS. I developed it and use it on Linux myself, but people are using it (and I support it) on OpenIndiana, FreeBSD, and FreeNAS too.

You can find the GPLv3 licensed code on the website (which actually just redirects to Sanoid's GitHub project page), and there's also a Chef Cookbook and an Arch AUR repo available from third parties.

[Nov 07, 2019] 5 alerting and visualization tools for sysadmins Opensource.com

Nov 07, 2019 | opensource.com

Common types of alerts and visualizations Alerts

Let's first cover what alerts are not . Alerts should not be sent if the human responder can't do anything about the problem. This includes alerts that are sent to multiple individuals with only a few who can respond, or situations where every anomaly in the system triggers an alert. This leads to alert fatigue and receivers ignoring all alerts within a specific medium until the system escalates to a medium that isn't already saturated.

For example, if an operator receives hundreds of emails a day from the alerting system, that operator will soon ignore all emails from the alerting system. The operator will respond to a real incident only when he or she is experiencing the problem, emailed by a customer, or called by the boss. In this case, alerts have lost their meaning and usefulness.

Alerts are not a constant stream of information or a status update. They are meant to convey a problem from which the system can't automatically recover, and they are sent only to the individual most likely to be able to recover the system. Everything that falls outside this definition isn't an alert and will only damage your employees and company culture.

Everyone has a different set of alert types, so I won't discuss things like priority levels (P1-P5) or models that use words like "Informational," "Warning," and "Critical." Instead, I'll describe the generic categories emergent in complex systems' incident response.

You might have noticed I mentioned an "Informational" alert type right after I wrote that alerts shouldn't be informational. Well, not everyone agrees, but I don't consider something an alert if it isn't sent to anyone. It is a data point that many systems refer to as an alert. It represents some event that should be known but not responded to. It is generally part of the visualization system of the alerting tool and not an event that triggers actual notifications. Mike Julian covers this and other aspects of alerting in his book Practical Monitoring . It's a must read for work in this area.

Non-informational alerts consist of types that can be responded to or require action. I group these into two categories: internal outage and external outage. (Most companies have more than two levels for prioritizing their response efforts.) Degraded system performance is considered an outage in this model, as the impact to each user is usually unknown.

Internal outages are a lower priority than external outages, but they still need to be responded to quickly. They often include internal systems that company employees use or components of applications that are visible only to company employees.

External outages consist of any system outage that would immediately impact a customer. These don't include a system outage that prevents releasing updates to the system. They do include customer-facing application failures, database outages, and networking partitions that hurt availability or consistency if either can impact a user. They also include outages of tools that may not have a direct impact on users, as the application continues to run but this transparent dependency impacts performance. This is common when the system uses some external service or data source that isn't necessary for full functionality but may cause delays as the application performs retries or handles errors from this external dependency.

Visualizations

There are many visualization types, and I won't cover them all here. It's a fascinating area of research. On the data analytics side of my career, learning and applying that knowledge is a constant challenge. We need to provide simple representations of complex system outputs for the widest dissemination of information. Google Charts and Tableau have a wide selection of visualization types. We'll cover the most common visualizations and some innovative solutions for quickly understanding systems.

Line chart

The line chart is probably the most common visualization. It does a pretty good job of producing an understanding of a system over time. A line chart in a metrics system would have a line for each unique metric or some aggregation of metrics. This can get confusing when there are a lot of metrics in the same dashboard (as shown below), but most systems can select specific metrics to view rather than having all of them visible. Also, anomalous behavior is easy to spot if it's significant enough to escape the noise of normal operations. Below we can see purple, yellow, and light blue lines that might indicate anomalous behavior.

monitoring_guide_line_chart.png

Another feature of a line chart is that you can often stack them to show relationships. For example, you might want to look at requests on each server individually, but also in aggregate. This allows you to understand the overall system as well as each instance in the same graph.

monitoring_guide_line_chart_aggregate.png Heatmaps

Another common visualization is the heatmap. It is useful when looking at histograms. This type of visualization is similar to a bar chart but can show gradients within the bars representing the different percentiles of the overall metric. For example, suppose you're looking at request latencies and you want to quickly understand the overall trend as well as the distribution of all requests. A heatmap is great for this, and it can use color to disambiguate the quantity of each section with a quick glance.

The heatmap below shows the higher concentration around the centerline of the graph with an easy-to-understand visualization of the distribution vertically for each time bucket. We might want to review a couple of points in time where the distribution gets wide while the others are fairly tight like at 14:00. This distribution might be a negative performance indicator.

monitoring_guide_histogram.png Gauges

The last common visualization I'll cover here is the gauge, which helps users understand a single metric quickly. Gauges can represent a single metric, like your speedometer represents your driving speed or your gas gauge represents the amount of gas in your car. Similar to the gas gauge, most monitoring gauges clearly indicate what is good and what isn't. Often (as is shown below), good is represented by green, getting worse by orange, and "everything is breaking" by red. The middle row below shows traditional gauges.

monitoring_guide_gauges.png Image source: Grafana.org (© Grafana Labs)

This image shows more than just traditional gauges. The other gauges are single stat representations that are similar to the function of the classic gauge. They all use the same color scheme to quickly indicate system health with just a glance. Arguably, the bottom row is probably the best example of a gauge that allows you to glance at a dashboard and know that everything is healthy (or not). This type of visualization is usually what I put on a top-level dashboard. It offers a full, high-level understanding of system health in seconds.

Flame graphs

A less common visualization is the flame graph, introduced by Netflix's Brendan Gregg in 2011. It's not ideal for dashboarding or quickly observing high-level system concerns; it's normally seen when trying to understand a specific application problem. This visualization focuses on CPU and memory and the associated frames. The X-axis lists the frames alphabetically, and the Y-axis shows stack depth. Each rectangle is a stack frame and includes the function being called. The wider the rectangle, the more it appears in the stack. This method is invaluable when trying to diagnose system performance at the application level and I urge everyone to give it a try.

monitoring_guide_flame_graph.png Image source: Wikimedia.org ( Creative Commons BY SA 3.0 ) Tool options

There are several commercial options for alerting, but since this is Opensource.com, I'll cover only systems that are being used at scale by real companies that you can use at no cost. Hopefully, you'll be able to contribute new and innovative features to make these systems even better.

Alerting tools Bosun

If you've ever done anything with computers and gotten stuck, the help you received was probably thanks to a Stack Exchange system. Stack Exchange runs many different websites around a crowdsourced question-and-answer model. Stack Overflow is very popular with developers, and Super User is popular with operations. However, there are now hundreds of sites ranging from parenting to sci-fi and philosophy to bicycles.

Stack Exchange open-sourced its alert management system, Bosun , around the same time Prometheus and its AlertManager system were released. There were many similarities in the two systems, and that's a really good thing. Like Prometheus, Bosun is written in Golang. Bosun's scope is more extensive than Prometheus' as it can interact with systems beyond metrics aggregation. It can also ingest data from log and event aggregation systems. It supports Graphite, InfluxDB, OpenTSDB, and Elasticsearch.

Bosun's architecture consists of a single server binary, a backend like OpenTSDB, Redis, and scollector agents . The scollector agents automatically detect services on a host and report metrics for those processes and other system resources. This data is sent to a metrics backend. The Bosun server binary then queries the backends to determine if any alerts need to be fired. Bosun can also be used by tools like Grafana to query the underlying backends through one common interface. Redis is used to store state and metadata for Bosun.

A really neat feature of Bosun is that it lets you test your alerts against historical data. This was something I missed in Prometheus several years ago, when I had data for an issue I wanted alerts on but no easy way to test it. To make sure my alerts were working, I had to create and insert dummy data. This system alleviates that very time-consuming process.

Bosun also has the usual features like showing simple graphs and creating alerts. It has a powerful expression language for writing alerting rules. However, it only has email and HTTP notification configurations, which means connecting to Slack and other tools requires a bit more customization ( which its documentation covers ). Similar to Prometheus, Bosun can use templates for these notifications, which means they can look as awesome as you want them to. You can use all your HTML and CSS skills to create the baddest email alert anyone has ever seen.

Cabot

Cabot was created by a company called Arachnys . You may not know who Arachnys is or what it does, but you have probably felt its impact: It built the leading cloud-based solution for fighting financial crimes. That sounds pretty cool, right? At a previous company, I was involved in similar functions around "know your customer" laws. Most companies would consider it a very bad thing to be linked to a terrorist group, for example, funneling money through their systems. These solutions also help defend against less-atrocious offenders like fraudsters who could also pose a risk to the institution.

So why did Arachnys create Cabot? Well, it is kind of a Christmas present to everyone, as it was a Christmas project built because its developers couldn't wrap their heads around Nagios . And really, who can blame them? Cabot was written with Django and Bootstrap, so it should be easy for most to contribute to the project. (Another interesting factoid: The name comes from the creator's dog.)

The Cabot architecture is similar to Bosun in that it doesn't collect any data. Instead, it accesses data through the APIs of the tools it is alerting for. Therefore, Cabot uses a pull (rather than a push) model for alerting. It reaches out into each system's API and retrieves the information it needs to make a decision based on a specific check. Cabot stores the alerting data in a Postgres database and also has a cache using Redis.

Cabot natively supports Graphite , but it also supports Jenkins , which is rare in this area. Arachnys uses Jenkins like a centralized cron, but I like this idea of treating build failures like outages. Obviously, a build failure isn't as critical as a production outage, but it could still alert the team and escalate if the failure isn't resolved. Who actually checks Jenkins every time an email comes in about a build failure? Yeah, me too!

Another interesting feature is that Cabot can integrate with Google Calendar for on-call rotations. Cabot calls this feature Rota, which is a British term for a roster or rotation. This makes a lot of sense, and I wish other systems would take this idea further. Cabot doesn't support anything more complex than primary and backup personnel, but there is certainly room for additional features. The docs say if you want something more advanced, you should look at a commercial option.

StatsAgg

StatsAgg ? How did that make the list? Well, it's not every day you come across a publishing company that has created an alerting platform. I think that deserves recognition. Of course, Pearson isn't just a publishing company anymore; it has several web presences and a joint venture with O'Reilly Media . However, I still think of it as the company that published my schoolbooks and tests.

StatsAgg isn't just an alerting platform; it's also a metrics aggregation platform. And it's kind of like a proxy for other systems. It supports Graphite, StatsD, InfluxDB, and OpenTSDB as inputs, but it can also forward those metrics to their respective platforms. This is an interesting concept, but potentially risky as loads increase on a central service. However, if the StatsAgg infrastructure is robust enough, it can still produce alerts even when a backend storage platform has an outage.

StatsAgg is written in Java and consists only of the main server and UI, which keeps complexity to a minimum. It can send alerts based on regular expression matching and is focused on alerting by service rather than host or instance. Its goal is to fill a void in the open source observability stack, and I think it does that quite well.

Visualization tools Grafana

Almost everyone knows about Grafana , and many have used it. I have used it for years whenever I need a simple dashboard. The tool I used before was deprecated, and I was fairly distraught about that until Grafana made it okay. Grafana was gifted to us by Torkel Ödegaard. Like Cabot, Grafana was also created around Christmastime, and released in January 2014. It has come a long way in just a few years. It started life as a Kibana dashboarding system, and Torkel forked it into what became Grafana.

Grafana's sole focus is presenting monitoring data in a more usable and pleasing way. It can natively gather data from Graphite, Elasticsearch, OpenTSDB, Prometheus, and InfluxDB. There's an Enterprise version that uses plugins for more data sources, but there's no reason those other data source plugins couldn't be created as open source, as the Grafana plugin ecosystem already offers many other data sources.

What does Grafana do for me? It provides a central location for understanding my system. It is web-based, so anyone can access the information, although it can be restricted using different authentication methods. Grafana can provide knowledge at a glance using many different types of visualizations. However, it has started integrating alerting and other features that aren't traditionally combined with visualizations.

Now you can set alerts visually. That means you can look at a graph, maybe even one showing where an alert should have triggered due to some degradation of the system, click on the graph where you want the alert to trigger, and then tell Grafana where to send the alert. That's a pretty powerful addition that won't necessarily replace an alerting platform, but it can certainly help augment it by providing a different perspective on alerting criteria.

Grafana has also introduced more collaboration features. Users have been able to share dashboards for a long time, meaning you don't have to create your own dashboard for your Kubernetes cluster because there are several already available -- with some maintained by Kubernetes developers and others by Grafana developers.

The most significant addition around collaboration is annotations. Annotations allow a user to add context to part of a graph. Other users can then use this context to understand the system better. This is an invaluable tool when a team is in the middle of an incident and communication and common understanding are critical. Having all the information right where you're already looking makes it much more likely that knowledge will be shared across the team quickly. It's also a nice feature to use during blameless postmortems when the team is trying to understand how the failure occurred and learn more about their system.

Vizceral

Netflix created Vizceral to understand its traffic patterns better when performing a traffic failover. Unlike Grafana, which is a more general tool, Vizceral serves a very specific use case. Netflix no longer uses this tool internally and says it is no longer actively maintained, but it still updates the tool periodically. I highlight it here primarily to point out an interesting visualization mechanism and how it can help solve a problem. It's worth running it in a demo environment just to better grasp the concepts and witness what's possible with these systems.

[Nov 07, 2019] What breaks our systems A taxonomy of black swans Opensource.com

Nov 07, 2019 | opensource.com

What breaks our systems: A taxonomy of black swans Find and fix outlier events that create issues before they trigger severe production problems. 25 Oct 2018 Laura Nolan Feed 147 up 2 comments Image credits : Eumelincen . CC0 x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

Black swans, by definition, can't be predicted, but sometimes there are patterns we can find and use to create defenses against categories of related problems.

For example, a large proportion of failures are a direct result of changes (code, environment, or configuration). Each bug triggered in this way is distinctive and unpredictable, but the common practice of canarying all changes is somewhat effective against this class of problems, and automated rollbacks have become a standard mitigation.

As our profession continues to mature, other kinds of problems are becoming well-understood classes of hazards with generalized prevention strategies.

Black swans observed in the wild

All technology organizations have production problems, but not all of them share their analyses. The organizations that publicly discuss incidents are doing us all a service. The following incidents describe one class of a problem and are by no means isolated instances. We all have black swans lurking in our systems; it's just some of us don't know it yet.

Hitting limits

Programming and development

Running headlong into any sort of limit can produce very severe incidents. A canonical example of this was Instapaper's outage in February 2017 . I challenge any engineer who has carried a pager to read the outage report without a chill running up their spine. Instapaper's production database was on a filesystem that, unknown to the team running the service, had a 2TB limit. With no warning, it stopped accepting writes. Full recovery took days and required migrating its database.

The organizations that publicly discuss incidents are doing us all a service. Limits can strike in various ways. Sentry hit limits on maximum transaction IDs in Postgres . Platform.sh hit size limits on a pipe buffer . SparkPost triggered AWS's DDoS protection . Foursquare hit a performance cliff when one of its datastores ran out of RAM .

One way to get advance knowledge of system limits is to test periodically. Good load testing (on a production replica) ought to involve write transactions and should involve growing each datastore beyond its current production size. It's easy to forget to test things that aren't your main datastores (such as Zookeeper). If you hit limits during testing, you have time to fix the problems. Given that resolution of limits-related issues can involve major changes (like splitting a datastore), time is invaluable.

When it comes to cloud services, if your service generates unusual loads or uses less widely used products or features (such as older or newer ones), you may be more at risk of hitting limits. It's worth load testing these, too. But warn your cloud provider first.

Finally, where limits are known, add monitoring (with associated documentation) so you will know when your systems are approaching those ceilings. Don't rely on people still being around to remember.

Spreading slowness
"The world is much more correlated than we give credit to. And so we see more of what Nassim Taleb calls 'black swan events' -- rare events happen more often than they should because the world is more correlated."
-- Richard Thaler

HostedGraphite's postmortem on how an AWS outage took down its load balancers (which are not hosted on AWS) is a good example of just how much correlation exists in distributed computing systems. In this case, the load-balancer connection pools were saturated by slow connections from customers that were hosted in AWS. The same kinds of saturation can happen with application threads, locks, and database connections -- any kind of resource monopolized by slow operations.

HostedGraphite's incident is an example of externally imposed slowness, but often slowness can result from saturation somewhere in your own system creating a cascade and causing other parts of your system to slow down. An incident at Spotify demonstrates such spread -- the streaming service's frontends became unhealthy due to saturation in a different microservice. Enforcing deadlines for all requests, as well as limiting the length of request queues, can prevent such spread. Your service will serve at least some traffic, and recovery will be easier because fewer parts of your system will be broken.

Retries should be limited with exponential backoff and some jitter. An outage at Square, in which its Redis datastore became overloaded due to a piece of code that retried failed transactions up to 500 times with no backoff, demonstrates the potential severity of excessive retries. The Circuit Breaker design pattern can be helpful here, too.

Dashboards should be designed to clearly show utilization, saturation, and errors for all resources so problems can be found quickly.

Thundering herds

Often, failure scenarios arise when a system is under unusually heavy load. This can arise organically from users, but often it arises from systems. A surge of cron jobs that starts at midnight is a venerable example. Mobile clients can also be a source of coordinated demand if they are programmed to fetch updates at the same time (of course, it is much better to jitter such requests).

Events occurring at pre-configured times aren't the only source of thundering herds. Slack experienced multiple outages over a short time due to large numbers of clients being disconnected and immediately reconnecting, causing large spikes of load. CircleCI saw a severe outage when a GitLab outage ended, leading to a surge of builds queued in its database, which became saturated and very slow.

Almost any service can be the target of a thundering herd. Planning for such eventualities -- and testing that your plan works as intended -- is therefore a must. Client backoff and load shedding are often core to such approaches.

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing.

Automation systems are complex systems
"Complex systems are intrinsically hazardous systems."
-- Richard Cook, MD

If your systems must constantly ingest data that can't be dropped, it's key to have a scalable way to buffer this data in a queue for later processing. The trend for the past several years has been strongly towards more automation of software operations. Automation of anything that can reduce your system's capacity (e.g., erasing disks, decommissioning devices, taking down serving jobs) needs to be done with care. Accidents (due to bugs or incorrect invocations) with this kind of automation can take down your system very efficiently, potentially in ways that are hard to recover from.

Christina Schulman and Etienne Perot of Google describe some examples in their talk Help Protect Your Data Centers with Safety Constraints . One incident sent Google's entire in-house content delivery network (CDN) to disk-erase.

Schulman and Perot suggest using a central service to manage constraints, which limits the pace at which destructive automation can operate, and being aware of system conditions (for example, avoiding destructive operations if the service has recently had an alert).

Automation systems can also cause havoc when they interact with operators (or with other automated systems). Reddit experienced a major outage when its automation restarted a system that operators had stopped for maintenance. Once you have multiple automation systems, their potential interactions become extremely complex and impossible to predict.

It will help to deal with the inevitable surprises if all this automation writes logs to an easily searchable, central place. Automation systems should always have a mechanism to allow them to be quickly turned off (fully or only for a subset of operations or targets).

Defense against the dark swans

These are not the only black swans that might be waiting to strike your systems. There are many other kinds of severe problem that can be avoided using techniques such as canarying, load testing, chaos engineering, disaster testing, and fuzz testing -- and of course designing for redundancy and resiliency. Even with all that, at some point your system will fail.

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. For example, one unpleasant issue you might have to deal with is a complete outage of your network. It's important to have a fail-safe communications channel completely independent of your own infrastructure and its dependencies. For instance, if you run on AWS, using a service that also runs on AWS as your fail-safe communication method is not a good idea. A phone bridge or an IRC server that runs somewhere separate from your main systems is good. Make sure everyone knows what the communications platform is and practices using it.

Another principle is to ensure that your monitoring and your operational tools rely on your production systems as little as possible. Separate your control and your data planes so you can make changes even when systems are not healthy. Don't use a single message queue for both data processing and config changes or monitoring, for example -- use separate instances. In SparkPost: The Day the DNS Died , Jeremy Blosser presents an example where critical tools relied on the production DNS setup, which failed.

The psychology of battling the black swan

To ensure your organization can respond effectively, make sure your key technical staff and your leadership have a way to coordinate during an outage. Dealing with major incidents in production can be stressful. It really helps to have a structured incident-management process in place for these situations. Many technology organizations ( including Google ) successfully use a version of FEMA's Incident Command System. There should be a clear way for any on-call individual to call for assistance in the event of a major problem they can't resolve alone.

For long-running incidents, it's important to make sure people don't work for unreasonable lengths of time and get breaks to eat and sleep (uninterrupted by a pager). It's easy for exhausted engineers to make a mistake or overlook something that might resolve the incident faster.

Learn more

There are many other things that could be said about black (or formerly black) swans and strategies for dealing with them. If you'd like to learn more, I highly recommend these two books dealing with resilience and stability in production: Susan Fowler's Production-Ready Microservices and Michael T. Nygard's Release It! .


Laura Nolan will present What Breaks Our Systems: A Taxonomy of Black Swans at LISA18 , October 29-31 in Nashville, Tennessee, USA.

[Nov 07, 2019] How to prevent and recover from accidental file deletion in Linux Enable Sysadmin

trashy - Trashy · GitLab might make sense in simple case. But often deletions are about increasing free space.
Nov 07, 2019 | www.redhat.com
Back up

You knew this would come first. Data recovery is a time-intensive process and rarely produces 100% correct results. If you don't have a backup plan in place, start one now.

Better yet, implement two. First, provide users with local backups with a tool like rsnapshot . This utility creates snapshots of each user's data in a ~/.snapshots directory, making it trivial for them to recover their own data quickly.

There are a great many other open source backup applications that permit your users to manage their own backup schedules.

Second, while these local backups are convenient, also set up a remote backup plan for your organization. Tools like AMANDA or BackupPC are solid choices for this task. You can run them as a daemon so that backups happen automatically.

Backup planning and preparation pay for themselves in both time, and peace of mind. There's nothing like not needing emergency response procedures in the first place.

Ban rm

On modern operating systems, there is a Trash or Bin folder where users drag the files they don't want out of sight without deleting them just yet. Traditionally, the Linux terminal has no such holding area, so many terminal power users have the bad habit of permanently deleting data they believe they no longer need. Since there is no "undelete" command, this habit can be quite problematic should a power user (or administrator) accidentally delete a directory full of important data.

Many users say they favor the absolute deletion of files, claiming that they prefer their computers to do exactly what they tell them to do. Few of those users, though, forego their rm command for the more complete shred , which really removes their data. In other words, most terminal users invoke the rm command because it removes data, but take comfort in knowing that file recovery tools exist as a hacker's un- rm . Still, using those tools take up their administrator's precious time. Don't let your users -- or yourself -- fall prey to this breach of logic.

If you really want to remove data, then rm is not sufficient. Use the shred -u command instead, which overwrites, and then thoroughly deletes the specified data

However, if you don't want to actually remove data, don't use rm . This command is not feature-complete, in that it has no undo feature, but has the capacity to be undone. Instead, use trashy or trash-cli to "delete" files into a trash bin while using your terminal, like so:

$ trash ~/example.txt
$ trash --list
example.txt

One advantage of these commands is that the trash bin they use is the same your desktop's trash bin. With them, you can recover your trashed files by opening either your desktop Trash folder, or through the terminal.

If you've already developed a bad rm habit and find the trash command difficult to remember, create an alias for yourself:

$ echo "alias rm='trash'"

Even better, create this alias for everyone. Your time as a system administrator is too valuable to spend hours struggling with file recovery tools just because someone mis-typed an rm command.

Respond efficiently

Unfortunately, it can't be helped. At some point, you'll have to recover lost files, or worse. Let's take a look at emergency response best practices to make the job easier. Before you even start, understanding what caused the data to be lost in the first place can save you a lot of time:

No matter how the problem began, start your rescue mission with a few best practices:

Once you have a sense of what went wrong, It's time to choose the right tool to fix the problem. Two such tools are Scalpel and TestDisk , both of which operate just as well on a disk image as on a physical drive.

Practice (or, go break stuff)

At some point in your career, you'll have to recover data. The smart practices discussed above can minimize how often this happens, but there's no avoiding this problem. Don't wait until disaster strikes to get familiar with data recovery tools. After you set up your local and remote backups, implement command-line trash bins, and limit the rm command, it's time to practice your data recovery techniques.

Download and practice using Scalpel, TestDisk, or whatever other tools you feel might be useful. Be sure to practice data recovery safely, though. Find an old computer, install Linux onto it, and then generate, destroy, and recover. If nothing else, doing so teaches you to respect data structures, filesystems, and a good backup plan. And when the time comes and you have to put those skills to real use, you'll appreciate knowing what to do.

[Nov 07, 2019] Linux commands to display your hardware information

Nov 07, 2019 | opensource.com

Get the details on what's inside your computer from the command line. 16 Sep 2019 Howard Fosdick Feed 44 up 5 comments Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0

The easiest way is to do that is with one of the standard Linux GUI programs:

Alternatively, you could open up the box and read the labels on the disks, memory, and other devices. Or you could enter the boot-time panels -- the so-called UEFI or BIOS panels. Just hit the proper program function key during the boot process to access them. These two methods give you hardware details but omit software information.

Or, you could issue a Linux line command. Wait a minute that sounds difficult. Why would you do this?

The Linux Terminal

Sometimes it's easy to find a specific bit of information through a well-targeted line command. Perhaps you don't have a GUI program available or don't want to install one.

Probably the main reason to use line commands is for writing scripts. Whether you employ the Linux shell or another programming language, scripting typically requires coding line commands.

Many line commands for detecting hardware must be issued under root authority. So either switch to the root user ID, or issue the command under your regular user ID preceded by sudo :

sudo <the_line_command>

and respond to the prompt for the root password.

This article introduces many of the most useful line commands for system discovery. The quick reference chart at the end summarizes them.

Hardware overview

There are several line commands that will give you a comprehensive overview of your computer's hardware.

The inxi command lists details about your system, CPU, graphics, audio, networking, drives, partitions, sensors, and more. Forum participants often ask for its output when they're trying to help others solve problems. It's a standard diagnostic for problem-solving:

inxi -Fxz

The -F flag means you'll get full output, x adds details, and z masks out personally identifying information like MAC and IP addresses.

The hwinfo and lshw commands display much of the same information in different formats:

hwinfo --short

or

lshw -short

The long forms of these two commands spew out exhaustive -- but hard to read -- output:

hwinfo

or

lshw
CPU details

You can learn everything about your CPU through line commands. View CPU details by issuing either the lscpu command or its close relative lshw :

lscpu

or

lshw -C cpu

In both cases, the last few lines of output list all the CPU's capabilities. Here you can find out whether your processor supports specific features.

With all these commands, you can reduce verbiage and narrow any answer down to a single detail by parsing the command output with the grep command. For example, to view only the CPU make and model:

lshw -C cpu | grep -i product

To view just the CPU's speed in megahertz:

lscpu | grep -i mhz

or its BogoMips power rating:

lscpu | grep -i bogo

The -i flag on the grep command simply ensures your search ignores whether the output it searches is upper or lower case.

Memory

Linux line commands enable you to gather all possible details about your computer's memory. You can even determine whether you can add extra memory to the computer without opening up the box.

To list each memory stick and its capacity, issue the dmidecode command:

dmidecode -t memory | grep -i size

For more specifics on system memory, including type, size, speed, and voltage of each RAM stick, try:

lshw -short -C memory

One thing you'll surely want to know is is the maximum memory you can install on your computer:

dmidecode -t memory | grep -i max

Now find out whether there are any open slots to insert additional memory sticks. You can do this without opening your computer by issuing this command:

lshw -short -C memory | grep -i empty

A null response means all the memory slots are already in use.

Determining how much video memory you have requires a pair of commands. First, list all devices with the lspci command and limit the output displayed to the video device you're interested in:

lspci | grep -i vga

The output line that identifies the video controller will typically look something like this:

00:02.0 VGA compatible controller: Intel Corporation 82Q35 Express Integrated Graphics Controller (rev 02)

Now reissue the lspci command, referencing the video device number as the selected device:

lspci -v -s 00:02.0

The output line identified as prefetchable is the amount of video RAM on your system:

...
Memory at f0100000 ( 32 -bit, non-prefetchable ) [ size =512K ]
I / O ports at 1230 [ size = 8 ]
Memory at e0000000 ( 32 -bit, prefetchable ) [ size =256M ]
Memory at f0000000 ( 32 -bit, non-prefetchable ) [ size =1M ]
...

Finally, to show current memory use in megabytes, issue:

free -m

This tells how much memory is free, how much is in use, the size of the swap area, and whether it's being used. For example, the output might look like this:

total used free shared buff / cache available
Mem: 11891 1326 8877 212 1687 10077
Swap: 1999 0 1999

The top command gives you more detail on memory use. It shows current overall memory and CPU use and also breaks it down by process ID, user ID, and the commands being run. It displays full-screen text output:

top
Disks, filesystems, and devices

You can easily determine whatever you wish to know about disks, partitions, filesystems, and other devices.

To display a single line describing each disk device:

lshw -short -C disk

Get details on any specific SATA disk, such as its model and serial numbers, supported modes, sector count, and more with:

hdparm -i /dev/sda

Of course, you should replace sda with sdb or another device mnemonic if necessary.

To list all disks with all their defined partitions, along with the size of each, issue:

lsblk

For more detail, including the number of sectors, size, filesystem ID and type, and partition starting and ending sectors:

fdisk -l

To start up Linux, you need to identify mountable partitions to the GRUB bootloader. You can find this information with the blkid command. It lists each partition's unique identifier (UUID) and its filesystem type (e.g., ext3 or ext4):

blkid

To list the mounted filesystems, their mount points, and the space used and available for each (in megabytes):

df -m

Finally, you can list details for all USB and PCI buses and devices with these commands:

lsusb

or

lspci
Network

Linux offers tons of networking line commands. Here are just a few.

To see hardware details about your network card, issue:

lshw -C network

Traditionally, the command to show network interfaces was ifconfig :

ifconfig -a

But many people now use:

ip link show

or

netstat -i

In reading the output, it helps to know common network abbreviations:

Abbreviation Meaning
lo Loopback interface
eth0 or enp* Ethernet interface
wlan0 Wireless interface
ppp0 Point-to-Point Protocol interface (used by a dial-up modem, PPTP VPN connection, or USB modem)
vboxnet0 or vmnet* Virtual machine interface

The asterisks in this table are wildcard characters, serving as a placeholder for whatever series of characters appear from system to system.

To show your default gateway and routing tables, issue either of these commands:

ip route | column -t

or

netstat -r
Software

Let's conclude with two commands that display low-level software details. For example, what if you want to know whether you have the latest firmware installed? This command shows the UEFI or BIOS date and version:

dmidecode -t bios

What is the kernel version, and is it 64-bit? And what is the network hostname? To find out, issue:

uname -a
Quick reference chart

This chart summarizes all the commands covered in this article:

Display info about all hardware inxi -Fxz --or--
hwinfo --short --or--
lshw -short
Display all CPU info lscpu --or--
lshw -C cpu
Show CPU features (e.g., PAE, SSE2) lshw -C cpu | grep -i capabilities
Report whether the CPU is 32- or 64-bit lshw -C cpu | grep -i width
Show current memory size and configuration dmidecode -t memory | grep -i size --or--
lshw -short -C memory
Show maximum memory for the hardware dmidecode -t memory | grep -i max
Determine whether memory slots are available lshw -short -C memory | grep -i empty
(a null answer means no slots available)
Determine the amount of video memory lspci | grep -i vga
then reissue with the device number;
for example: lspci -v -s 00:02.0
The VRAM is the prefetchable value.
Show current memory use free -m --or--
top
List the disk drives lshw -short -C disk
Show detailed information about a specific disk drive hdparm -i /dev/sda
(replace sda if necessary)
List information about disks and partitions lsblk (simple) --or--
fdisk -l (detailed)
List partition IDs (UUIDs) blkid
List mounted filesystems, their mount points,
and megabytes used and available for each
df -m
List USB devices lsusb
List PCI devices lspci
Show network card details lshw -C network
Show network interfaces ifconfig -a --or--
ip link show --or--
netstat -i
Display routing tables ip route | column -t --or--
netstat -r
Display UEFI/BIOS info dmidecode -t bios
Show kernel version, network hostname, more uname -a

Do you have a favorite command that I overlooked? Please add a comment and share it

[Nov 07, 2019] An agentless servers inventory with Ansible Ansible-CMDB by Nitin J Mutkawoa

Nov 07, 2019 | tunnelix.com

09/16/2018

Building from scratch an agentless inventory system for Linux servers is a very time-consuming task. To have precise information about your server's inventory, Ansible comes to be very handy, especially if you are restricted to install an agent on the servers. However, there are some pieces of information that the Ansible's inventory mechanism cannot retrieve from the default inventory. In this case, a Playbook needs to be created to retrieve those pieces of information. Examples are VMware tool and other application versions which you might want to include in your inventory system. Since Ansible makes it easy to create JSON files, this can be easily manipulated for other interesting tasks, say an HTML static page. I would recommend Ansible-CMDB which is very handy for such conversion. The Ansible-CMDB allows you to create a pure HTML file based on the JSON file that was generated by Ansible. Ansible-CMDB is another amazing tool created by Ferry Boender .

Let's have a look how the agentless servers inventory with Ansible and Ansible-CMDB works. It's important to understand the prerequisites needed before installing Ansible. There are other articles which I published on Ansible:

Ansible Basics and Pre-requisites

1. In this article, you will get an overview of what Ansible inventory is capable of. Start by gathering the information that you will need for your inventory system. The goal is to make a plan first.

2. As explained in the article Getting started with Ansible deployment , you have to define a group and record the name of your servers(which can be resolved through the host file or DNS server) or IP's. Let's assume that the name of the group is " test ".

3. Launch the following command to see a JSON output which will describe the inventory of the machine. As you may notice that Ansible had fetched all the data.

Ansible -m setup test

4. You can also append the output to a specific directory for future use with Ansible-cmdb. I would advise creating a specific directory (I created /home/Ansible-Workdesk ) to prevent confusion where the file is appended.

Ansible-m setup --tree out/ test

5. At this point, you will have several files created in a tree format, i.e; specific file with the name of the server containing JSON information about the servers inventory.

Getting Hands-on with Ansible-cmdb

6. Now, you will have to install Ansible-cmdb which is pretty fast and easy. Do make sure that you follow all the requirements before installation:

git clone https://github.com/fboender/ansible-cmdb
cd ansible-cmdb && make install

7. To convert the JSON files into HTML, use the following command:

ansible-cmdb -t html_fancy_split out/

8. You should notice a directory called "cmdb" which contain some HTML files. Open the index.html and view your server inventory system.

Tweaking the default template

9. As mentioned previously, there is some information which is not available by default on the index.html template. You can tweak the /usr/local/lib/ansible-cmdb/ansiblecmdb/data/tpl/html_fancy_defs.html page and add more content, for example, ' uptime ' of the servers. To make the " Uptime " column visible, add the following line in the " Column definitions " section:

{"title": "Uptime",        "id": "uptime",        "func": col_uptime,         "sType": "string", "visible": True},

Also, add the following lines in the "Column functions " section :

<%def name="col_uptime(host, **kwargs)">
${jsonxs(host, 'ansible_facts.uptime', default='')}
</%def>

Whatever comes after the dot just after ansible_fact.<xxx> is the parent value in the JSON file. Repeat step 7. Here is how the end result looks like.

[Nov 07, 2019] 13 open source backup solutions Opensource.com

Nov 07, 2019 | opensource.com

13 open source backup solutions Readers suggest more than a dozen of their favorite solutions for protecting data. 07 Mar 2019 Don Watkins (Community Moderator) Feed 124 up 6 comments Image by : Opensource.com x Subscribe now

Get the highlights in your inbox every week.

https://opensource.com/eloqua-embedded-email-capture-block.html?offer_id=70160000000QzXNAA0 poll that asked readers to vote on their favorite open source backup solution. We offered six solutions recommended by our moderator community -- Cronopete, Deja Dup, Rclone, Rdiff-backup, Restic, and Rsync -- and invited readers to share other options in the comments. And you came through, offering 13 other solutions (so far) that we either hadn't considered or hadn't even heard of.

By far the most popular suggestion was BorgBackup . It is a deduplicating backup solution that features compression and encryption. It is supported on Linux, MacOS, and BSD and has a BSD License.

Second was UrBackup , which does full and incremental image and file backups; you can save whole partitions or single directories. It has clients for Windows, Linux, and MacOS and has a GNU Affero Public License.

Third was LuckyBackup ; according to its website, "it is simple to use, fast (transfers over only changes made and not all data), safe (keeps your data safe by checking all declared directories before proceeding in any data manipulation), reliable, and fully customizable." It carries a GNU Public License.

Casync is content-addressable synchronization -- it's designed for backup and synchronizing and stores and retrieves multiple related versions of large file systems. It is licensed with the GNU Lesser Public License.

Syncthing synchronizes files between two computers. It is licensed with the Mozilla Public License and, according to its website, is secure and private. It works on MacOS, Windows, Linux, FreeBSD, Solaris, and OpenBSD.

Duplicati is a free backup solution that works on Windows, MacOS, and Linux and a variety of standard protocols, such as FTP, SSH, and WebDAV, and cloud services. It features strong encryption and is licensed with the GPL.

Dirvish is a disk-based virtual image backup system licensed under OSL-3.0. It also requires Rsync, Perl5, and SSH to be installed.

Bacula 's website says it "is a set of computer programs that permits the system administrator to manage backup, recovery, and verification of computer data across a network of computers of different kinds." It is supported on Linux, FreeBSD, Windows, MacOS, OpenBSD, and Solaris and the bulk of its source code is licensed under AGPLv3.

BackupPC "is a high-performance, enterprise-grade system for backing up Linux, Windows, and MacOS PCs and laptops to a server's disk," according to its website. It is licensed under the GPLv3.

Amanda is a backup system written in C and Perl that allows a system administrator to back up an entire network of client machines to a single server using tape, disk, or cloud-based systems. It was developed and copyrighted in 1991 at the University of Maryland and has a BSD-style license.

Back in Time is a simple backup utility designed for Linux. It provides a command line client and a GUI, both written in Python. To do a backup, just specify where to store snapshots, what folders to back up, and the frequency of the backups. BackInTime is licensed with GPLv2.

Timeshift is a backup utility for Linux that is similar to System Restore for Windows and Time Capsule for MacOS. According to its GitHub repository, "Timeshift protects your system by taking incremental snapshots of the file system at regular intervals. These snapshots can be restored at a later date to undo all changes to the system."

Kup is a backup solution that was created to help users back up their files to a USB drive, but it can also be used to perform network backups. According to its GitHub repository, "When you plug in your external hard drive, Kup will automatically start copying your latest changes."

[Nov 06, 2019] Sysadmin 101 Alerting Linux Journal

Nov 06, 2019 | www.linuxjournal.com

A common pitfall sysadmins run into when setting up monitoring systems is to alert on too many things. These days, it's simple to monitor just about any aspect of a server's health, so it's tempting to overload your monitoring system with all kinds of system checks. One of the main ongoing maintenance tasks for any monitoring system is setting appropriate alert thresholds to reduce false positives. This means the more checks you have in place, the higher the maintenance burden. As a result, I have a few different rules I apply to my monitoring checks when determining thresholds for notifications.

Critical alerts must be something I want to be woken up about at 3am.

A common cause of sysadmin burnout is being woken up with alerts for systems that don't matter. If you don't have a 24x7 international development team, you probably don't care if the build server has a problem at 3am, or even if you do, you probably are going to wait until the morning to fix it. By restricting critical alerts to just those systems that must be online 24x7, you help reduce false positives and make sure that real problems are addressed quickly.

Critical alerts must be actionable.

Some organizations send alerts when just about anything happens on a system. If I'm being woken up at 3am, I want to have a specific action plan associated with that alert so I can fix it. Again, too many false positives will burn out a sysadmin that's on call, and nothing is more frustrating than getting woken up with an alert that you can't do anything about. Every critical alert should have an obvious action plan the sysadmin can follow to fix it.

Warning alerts tell me about problems that will be critical if I don't fix them.

There are many problems on a system that I may want to know about and may want to investigate, but they aren't worth getting out of bed at 3am. Warning alerts don't trigger a pager, but they still send me a quieter notification. For instance, if load, used disk space or RAM grows to a certain point where the system is still healthy but if left unchecked may not be, I get a warning alert so I can investigate when I get a chance. On the other hand, if I got only a warning alert, but the system was no longer responding, that's an indication I may need to change my alert thresholds.

Repeat warning alerts periodically.

I think of warning alerts like this thing nagging at you to look at it and fix it during the work day. If you send warning alerts too frequently, they just spam your inbox and are ignored, so I've found that spacing them out to alert every hour or so is enough to remind me of the problem but not so frequent that I ignore it completely.

Everything else is monitored, but doesn't send an alert.

There are many things in my monitoring system that help provide overall context when I'm investigating a problem, but by themselves, they aren't actionable and aren't anything I want to get alerts about. In other cases, I want to collect metrics from my systems to build trending graphs later. I disable alerts altogether on those kinds of checks. They still show up in my monitoring system and provide a good audit trail when I'm investigating a problem, but they don't page me with useless notifications.

Kyle's rule.

One final note about alert thresholds: I've developed a practice in my years as a sysadmin that I've found is important enough as a way to reduce burnout that I take it with me to every team I'm on. My rule is this:

If sysadmins were kept up during the night because of false alarms, they can clear their projects for the next day and spend time tuning alert thresholds so it doesn't happen again.

There is nothing worse than being kept up all night because of false positive alerts and knowing that the next night will be the same and that there's nothing you can do about it. If that kind of thing continues, it inevitably will lead either to burnout or to sysadmins silencing their pagers. Setting aside time for sysadmins to fix false alarms helps, because they get a chance to improve their night's sleep the next night. As a team lead or manager, sometimes this has meant that I've taken on a sysadmin's tickets for them during the day so they can fix alerts.

Paging

Sending an alert often is referred to as paging or being paged, because in the past, sysadmins, like doctors, carried pagers on them. Their monitoring systems were set to send a basic numerical alert to the pager when there was a problem, so that sysadmins could be alerted even when they weren't at a computer or when they were asleep. Although we still refer to it as paging, and some older-school teams still pass around an actual pager, these days, notifications more often are handled by alerts to mobile phones.

The first question you need to answer when you set up alerting is what method you will use for notifications. When you are deciding how to set up pager notifications, look for a few specific qualities.

Something that will alert you wherever you are geographically.

A number of cool office projects on the web exist where a broken software build triggers a big red flashing light in the office. That kind of notification is fine for office-hour alerts for non-critical systems, but it isn't appropriate as a pager notification even during the day, because a sysadmin who is in a meeting room or at lunch would not be notified. These days, this generally means some kind of notification needs to be sent to your phone.

An alert should stand out from other notifications.

False alarms can be a big problem with paging systems, as sysadmins naturally will start ignoring alerts. Likewise, if you use the same ringtone for alerts that you use for any other email, your brain will start to tune alerts out. If you use email for alerts, use filtering rules so that on-call alerts generate a completely different and louder ringtone from regular emails and vibrate the phone as well, so you can be notified even if you silence your phone or are in a loud room. In the past, when BlackBerries were popular, you could set rules such that certain emails generated a "Level One" alert that was different from regular email notifications.

The BlackBerry days are gone now, and currently, many organizations (in particular startups) use Google Apps for their corporate email. The Gmail Android application lets you set per-folder (called labels) notification rules so you can create a filter that moves all on-call alerts to a particular folder and then set that folder so that it generates a unique alert, vibrates and does so for every new email to that folder. If you don't have that option, most email software that supports multiple accounts will let you set different notifications for each account so you may need to resort to a separate email account just for alerts.

Something that will wake you up all hours of the night.

Some sysadmins are deep sleepers, and whatever notification system you choose needs to be something that will wake them up in the middle of the night. After all, servers always seem to misbehave at around 3am. Pick a ringtone that is loud, possibly obnoxious if necessary, and also make sure to enable phone vibrations. Also configure your alert system to re-send notifications if an alert isn't acknowledged within a couple minutes. Sometimes the first alert isn't enough to wake people up completely, but it might move them from deep sleep to a lighter sleep so the follow-up alert will wake them up.

While ChatOps (using chat as a method of getting notifications and performing administration tasks) might be okay for general non-critical daytime notifications, they are not appropriate for pager alerts. Even if you have an application on your phone set to notify you about unread messages in chat, many chat applications default to a "quiet time" in the middle of the night. If you disable that, you risk being paged in the middle of the night just because someone sent you a message. Also, many third-party ChatOps systems aren't necessarily known for their mission-critical reliability and have had outages that have spanned many hours. You don't want your critical alerts to rely on an unreliable system.

Something that is fast and reliable.

Your notification system needs to be reliable and able to alert you quickly at all times. To me, this means alerting is done in-house, but many organizations opt for third parties to receive and escalate their notifications. Every additional layer you can add to your alerting is another layer of latency and another place where a notification may be dropped. Just make sure whatever method you choose is reliable and that you have some way of discovering when your monitoring system itself is offline.

In the next section, I cover how to set up escalations -- meaning, how you alert other members of the team if the person on call isn't responding. Part of setting up escalations is picking a secondary, backup method of notification that relies on a different infrastructure from your primary one. So if you use your corporate Exchange server for primary notifications, you might select a personal Gmail account as a secondary. If you have a Google Apps account as your primary notification, you may pick SMS as your secondary alert.

Email servers have outages like anything else, and the goal here is to make sure that even if your primary method of notifications has an outage, you have some alternate way of finding out about it. I've had a number of occasions where my SMS secondary alert came in before my primary just due to latency with email syncing to my phone.

Create some means of alerting the whole team.

In addition to having individual alerting rules that will page someone who is on call, it's useful to have some way of paging an entire team in the event of an "all hands on deck" crisis. This may be a particular email alias or a particular key word in an email subject. However you set it up, it's important that everyone knows that this is a "pull in case of fire" notification and shouldn't be abused with non-critical messages.

Alert Escalations

Once you have alerts set up, the next step is to configure alert escalations. Even the best-designed notification system alerting the most well intentioned sysadmin will fail from time to time either because a sysadmin's phone crashed, had no cell signal, or for whatever reason, the sysadmin didn't notice the alert. When that happens, you want to make sure that others on the team (and the on-call person's second notification) is alerted so someone can address the alert.

Alert escalations are one of those areas that some monitoring systems do better than others. Although the configuration can be challenging compared to other systems, I've found Nagios to provide a rich set of escalation schedules. Other organizations may opt to use a third-party notification system specifically because their chosen monitoring solution doesn't have the ability to define strong escalation paths. A simple escalation system might look like the following:

The idea here is to give the on-call sysadmin time to address the alert so you aren't waking everyone up at 3am, yet also provide the rest of the team with a way to find out about the alert if the first sysadmin can't fix it in time or is unavailable. Depending on your particular SLAs, you may want to shorten or lengthen these time periods between escalations or make them more sophisticated with the addition of an on-call backup who is alerted before the full team. In general, organize your escalations so they strike the right balance between giving the on-call person a chance to respond before paging the entire team, yet not letting too much time pass in the event of an outage in case the person on call can't respond.

If you are part of a larger international team, you even may be able to set up escalations that follow the sun. In that case, you would select on-call administrators for each geographic region and set up the alerts so that they were aware of the different time periods and time of day in those regions, and then alert the appropriate on-call sysadmin first. Then you can have escalations page the rest of the team, regardless of geography, in the event that an alert isn't solved.

On-Call Rotation

During World War One, the horrors of being in the trenches at the front lines were such that they caused a new range of psychological problems (labeled shell shock) that, given time, affected even the most hardened soldiers. The steady barrage of explosions, gun fire, sleep deprivation and fear day in and out took its toll, and eventually both sides in the war realized the importance of rotating troops away from the front line to recuperate.

It's not fair to compare being on call with the horrors of war, but that said, it also takes a kind of psychological toll that if left unchecked, it will burn out your team. The responsibility of being on call is a burden even if you aren't alerted during a particular period. It usually means you must carry your laptop with you at all times, and in some organizations, it may affect whether you can go to the movies or on vacation. In some badly run organizations, being on call means a nightmare of alerts where you can expect to have a ruined weekend of firefighting every time. Because being on call can be stressful, in particular if you get a lot of nighttime alerts, it's important to rotate out sysadmins on call so they get a break.

The length of time for being on call will vary depending on the size of your team and how much of a burden being on call is. Generally speaking, a one- to four-week rotation is common, with two-week rotations often hitting the sweet spot. With a large enough team, a two-week rotation is short enough that any individual member of the team doesn't shoulder too much of the burden. But, even if you have only a three-person team, it means a sysadmin gets a full month without worrying about being on call.

Holiday on call.

Holidays place a particular challenge on your on-call rotation, because it ends up being unfair for whichever sysadmin it lands on. In particular, being on call in late December can disrupt all kinds of family time. If you have a professional, trustworthy team with good teamwork, what I've found works well is to share the on-call burden across the team during specific known holiday days, such as Thanksgiving, Christmas Eve, Christmas and New Year's Eve. In this model, alerts go out to every member of the team, and everyone responds to the alert and to each other based on their availability. After all, not everyone eats Thanksgiving dinner at the same time, so if one person is sitting down to eat, but another person has two more hours before dinner, when the alert goes out, the first person can reply "at dinner", but the next person can reply "on it", and that way, the burden is shared.

If you are new to on-call alerting, I hope you have found this list of practices useful. You will find a lot of these practices in place in many larger organizations with seasoned sysadmins, because over time, everyone runs into the same kinds of problems with monitoring and alerting. Most of these policies should apply whether you are in a large organization or a small one, and even if you are the only DevOps engineer on staff, all that means is that you have an advantage at creating an alerting policy that will avoid some common pitfalls and overall burnout.

[Nov 06, 2019] Sysadmin 101 Leveling Up by Kyle Rankin

Nov 06, 2019 | www.linuxjournal.com

This is the fourth in a series of articles on systems administrator fundamentals. These days, DevOps has made even the job title "systems administrator" seems a bit archaic like the "systems analyst" title it replaced. These DevOps positions are rather different from sysadmin jobs in the past with a much larger emphasis on software development far beyond basic shell scripting and as a result often are filled with people with software development backgrounds without much prior sysadmin experience.

In the past, a sysadmin would enter the role at a junior level and be mentored by a senior sysadmin on the team, but in many cases these days, companies go quite a while with cloud outsourcing before their first DevOps hire. As a result, the DevOps engineer might be thrust into the role at a junior level with no mentor around apart from search engines and Stack Overflow posts.

In the first article in this series, I explained how to approach alerting and on-call rotations as a sysadmin. In the second article , I discussed how to automate yourself out of a job. In the third , I covered why and how you should use tickets. In this article, I describe the overall sysadmin career path and what I consider the attributes that might make you a "senior sysadmin" instead of a "sysadmin" or "junior sysadmin", along with some tips on how to level up.

Keep in mind that titles are pretty fluid and loose things, and that they mean different things to different people. Also, it will take different people different amounts of time to "level up" depending on their innate sysadmin skills, their work ethic and the opportunities they get to gain more experience. That said, be suspicious of anyone who leveled up to a senior level in any field in only a year or two -- it takes time in a career to make the kinds of mistakes and learn the kinds of lessons you need to learn before you can move up to the next level.

Kyle Rankin is a Tech Editor and columnist at Linux Journal and the Chief Security Officer at Purism. He is the author of Linux Hardening in Hostile Networks , DevOps Troubleshooting , The Official Ubuntu Server Book , Knoppix Hacks , Knoppix Pocket Reference , Linux Multimedia Hacks and Ubuntu Hacks , and also a contributor to a number of other O'Reilly books. Rankin speaks frequently on security and open-source software including at BsidesLV, O'Reilly Security Conference, OSCON, SCALE, CactusCon, Linux World Expo and Penguicon. You can follow him at @kylerankin.

[Nov 06, 2019] 7 Ways to Make Fewer Mistakes at Work by Carey-Lee Dixon

May 31, 2015 | www.linkedin.com
Carey-Lee Dixon Follow Digital Marketing Executive at LASCO Financial Services

Though mistakes are not intentional and are inevitable, that doesn't mean we should take a carefree approach to getting things done. There are some mistakes we make in the workplace, which could be easily avoided if we paid a little more attention to what we were doing. Agree? We've all made them and possibly mulled over a few silly mistakes we have made in the past. But, I am here to tell you that mistakes doesn't make you 'bad' person, it's more of a great learning experience - of what you can do better and how you can get it right the next time. And having made a few silly mistakes in my work life, I guarantee that if you adopt a few of these approaches that I have been applying in my work life, I am pretty sure you too will make you fewer mistakes at work.

1. Give your full attention to what you are doing

...dedicate uninterrupted times to accomplish that [important] task. Do whatever it takes, to get it done with your full attention, so if it means eliminating distractions, taking breaks in between and working with a to-do list, do it. But trying to send emails, editing that blog post and doing whatever else, may lead to you making a few unwanted mistakes.

Tip: Eliminate distractions. 2. Ask Questions

Often, we make mistakes because we didn't ask that one question. Either we were too proud to or we thought we had it 'covered.' Unsure about the next step to take or how to undertake a task? Do your homework and ask someone who is more knowledgeable than you are, ask someone who can guide you accordingly. Worried about what others will think? Who cares? Asking questions only make you smarter, not dumb. And so what if others think you are dumb. Their opinion doesn't matter anyway, asking questions helps you to make fewer mistakes and as my mom would say, 'Put on the mask and ask' . Each task usually comes with a challenge and requires you learn something new, so use the resources available to you, like the more experienced colleagues to get all the information you need that will enable you to make fewer mistakes.

Tip: Do your homework. Ask for help. 3. Use checklists

Checklists can be used to help you structure what needs to be done before you publish that article or submit that project. They are quite useful especially when you have a million things to do. Since I am responsible for getting multiple tasks done, I often use checklists/to-do lists to help keep me get structured and to ensure I don't leave anything undone. In general, lists are great and using one to detail things to do, or steps required to move to the next stage will help to minimize errors, especially when you have a number of things on your plate. And did I mention, Richard Branson is also big on lists . That's how he gets a lot of things done.

4. Review, review, review

Carefully review your work. I must admit, I get a little paranoid, about delivering error-free work. Like, seriously, I don't like making them and often beat up myself if I send an email with some silly grammatical errors. And that's why reviewing your work before you click send, is a must-do. Often, we submit our work with errors because we are working against a tight deadline and didn't give yourself enough time to review what was done. The last thing you really need is your boss in neck for the document that was due last week, which you just completed without much time to review it. So, if you have spent endless hours working on a project, is proud your work and ready to show it to the team - take a break and come back to review it. Taking a break and then getting back to review what was done will allow you to find those mistakes before others can. And yes, the checklist is quite useful in the review process - so use it.

Tip: Get a second eye. 5. Get a second eye

Even when you have done careful review, chances are there will still be mistakes. It happens. So getting a second eye, especially one from a more experienced person can find that one error you overlooked. Sometimes we overlook the details, because we are in a hurry or not 100% focused on the task at hand, getting that other set of eyes to check for errors or an important point, that you missed, is always useful.

Tip: Get a second eye from someone more experienced or knowledgeable. 6. Allow enough time

In making mistakes at work, I realise I am more prone to making mistakes when I am working against a tight deadline . Failure to allow enough time for a project or for review can lead to missed requirements and incompleteness, which results in failure to meet desired expectations. That's why it is essential to be smart in estimating the time needed to accomplish a task, which should include time for review. Ideally, you want to give yourself enough time, to do research, complete a document/project, review what was done and ask for a second eye , so setting realistic schedules is most important in making fewer mistakes.

Tip: Limit working against tight deadlines. 7. Learn from others mistakes

No matter how much you know or think you know, it always important to learn from the mistakes of others. What silly mistakes did a co-worker make that caused a big stir in the office? Make note of it and intentionally try not to make the same mistakes too. Some of the greatest lessons are those we learn from others. So pay attention to past mistakes made, what they did right, what they didn't nail and how they got out of the rut.

Tip: Pay close attention to the mistakes others make.

No matter how much you know or think you know, it is always important to learn from the mistakes of others. Remember, mistakes are meant to teach you not break you . So if you make mistakes, it only shows us that sometimes we need to take a different approach to getting things done.

Mistakes are meant to teach you not break you

No one wants to make mistakes; I sure don't. But that does not mean we should be afraid of them. I have made quite a few mistakes in my work life, which has only proven that I need to be more attentive and that I need to ask for help more than I usually do. So, take the necessary steps to make fewer mistakes but at the same time, don't beat up yourself over the ones you make.

A great resource on mistakes in the workplace, Mistakes I Made at Work . A great resource on focusing on less and increasing productivity, One Thing .

____________________________________________________

For more musings, career lessons and tips that you can apply to your personal and professional life visit my personal blog, www.careyleedixon.com . I enjoy working on being the version of myself, helping others to grow in their personal and professional lives while doing what matters. For questions or to book me for writing/speaking engagements on career and personal development, email me at careyleedixon@gmail.com

[Nov 06, 2019] 10+ mistakes Linux newbies make - TechRepublic

Nov 06, 2019 | www.techrepublic.com

javascript:void(0)

7: Giving up too quickly

Here's another issue I see all too often. After a few hours (or a couple of days) working with Linux, new users will give up for one reason or another. I understand giving up when they realize something simply doesn't work (such as when they MUST use a proprietary application or file format). But seeing Linux not work under average demands is rare these days. If you see new Linux users getting frustrated, try to give them a little extra guidance. Sometimes getting over that initial hump is the biggest challenge they will face.

[Nov 06, 2019] Destroying multiple production databases by Jan Gerrit Kootstra

Aug 08, 2019 | www.redhat.com
In my 22-year-old career as an IT specialist, I encountered two major issues where -- due to my mistakes -- important production databases were blown apart. Here are my stories. Freshman mistake

The first time was in the late 1990s when I started working at a service provider for my local municipality's social benefit agency. I got an assignment as a newbie system administrator to remove retired databases from the server where databases for different departments were consolidated.

Due to a type error on a top-level directory, I removed two live database files instead of the one retired database. What was worse was that due to the complexity of the database consolidation during the restore, other databases were hit, too. Repairing all databases took approximately 22 hours.

What helped

A good backup that was tested each night by recovering an empty file at the end of the tar archive catalog, after the backup was made. Future-looking statement

It's important to learn from our mistakes. What I learned is this:

Senior sysadmin mistake

In a period where partly offshoring IT activities was common practice in order to reduce costs, I had to take over a database filesystem extension on a Red Hat 5 cluster. Given that I set up this system a couple of years before, I had not checked the current situation.

I assumed the offshore team was familiar with the need to attach all shared LUNs to both nodes of the two-node cluster. My bad, never assume. As an Australian tourist once mentioned when a friend and I were on a vacation in Ireland after my Latin grammar school graduation: "Do not make an ars out of you me." Or, another phrase: "Assuming is the mother of all mistakes."

Well, I fell for my own trap. I went for the filesystem extension on the active node, and without checking the passive node's ( node2 ) status, tested a failover. Because we had agreed to run the database on node2 until the next update window, I had put myself in trouble.

As the databases started to fail, we brought the database cluster down. No issues yet, but all hell broke loose when I ran a filesystem check on an LVM-based system with missing physical volumes.

Looking back

I would say you're stupid to myself. Running pvs , lvs , or vgs would have alerted me that LVM detected issues. Also, comparing multipath configuration files would have revealed probable issues.

So, next time, I would first, check to see if LVM contains issues before going for the last resort: A filesystem check and trying to fix the millions of errors. Most of the time you will destroy files, anyway.

What saved my day

What saved my day back then was:

Future-looking statement

I definitely learned some things. For example, always check the environment you're about to work on before any change. Never assume that you know how an environment looks -- change is a constant in IT.

Also, share what you learned from your mistakes. Train offshore colleagues instead of blaming them. Also, inform them about the impact the issue had on the customer's business. A continent's major transport hub cannot be put on hold due to a sysadmin's mistake.

A shutdown of the transport hub might have been needed if we failed to solve the issue and the backup site in a data centre of another service provider would have been hurt too. Part of the hub is a harbour and we could have blown up a part of the harbour next to a village of about 10,000 people if both a cotton ship and an oil tanker would have gotten lost on the harbour master's map and collided.

General lessons learned

I learned some important lessons overall from these and other mistakes:

I cannot stress this enough: Learn from your mistakes to avoid them in the future, rather than learning how to make them on a weekly basis. Jan Gerrit Kootstra Solution Designer (for Telco network services). Red Hat Accelerator. More about me

[Nov 06, 2019] My 10 Linux and UNIX Command Line Mistakes by Vivek Gite

May 20, 2018 | www.cyberciti.biz

I had only one backup copy of my QT project and I just wanted to get a directory called functions. I end up deleting entire backup (note -c switch instead of -x):
cd /mnt/bacupusbharddisk
tar -zcvf project.tar.gz functions

I had no backup. Similarly I end up running rsync command and deleted all new files by overwriting files from backup set (now I have switched to rsnapshot )
rsync -av -delete /dest /src
Again, I had no backup.

... ... ...

All men make mistakes, but only wise men learn from their mistakes -- Winston Churchill .
From all those mistakes I have learn that:
  1. You must keep a good set of backups. Test your backups regularly too.
  2. The clear choice for preserving all data of UNIX file systems is dump, which is only tool that guaranties recovery under all conditions. (see Torture-testing Backup and Archive Programs paper).
  3. Never use rsync with single backup directory. Create a snapshots using rsync or rsnapshots .
  4. Use CVS/git to store configuration files.
  5. Wait and read command line twice before hitting the dam [Enter] key.
  6. Use your well tested perl / shell scripts and open source configuration management software such as puppet, Ansible, Cfengine or Chef to configure all servers. This also applies to day today jobs such as creating the users and more.

Mistakes are the inevitable, so have you made any mistakes that have caused some sort of downtime? Please add them into the comments section below.

[Nov 02, 2019] LVM spanning over multiple disks What disk is a file on? Can I lose a drive without total loss

Notable quotes:
"... If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part. ..."
"... So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost. ..."
"... LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd. ..."
"... As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason. ..."
"... it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something. ..."
Mar 16, 2015 | serverfault.com

LVM spanning over multiple disks: What disk is a file on? Can I lose a drive without total loss? Ask Question Asked 8 years, 10 months ago Active 4 years, 6 months ago Viewed 9k times 7 2 I have three 990GB partitions over three drives in my server. Using LVM, I can create one ~3TB partition for file storage.

1) How does the system determine what partition to use first?
2) Can I find what disk a file or folder is physically on?
3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk? storage lvm share

edited Mar 16 '15 at 12:53

HopelessN00b 49k 25 25 gold badges 121 121 silver badges 194 194 bronze badges asked Dec 2 '10 at 2:28 Luke has no name Luke has no name 989 10 10 silver badges 13 13 bronze badges

add a comment | 3 Answers 3 active oldest votes 12
  1. The system fills from the first disk in the volume group to the last, unless you configure striping with extents.
  2. I don't think this is possible, but where I'd start to look is in the lvs/vgs commands man pages.
  3. If you lose a drive in a volume group, you can force the volume group online with the missing physical volume, but you will be unable to open the LV's that were contained on the dead PV, whether they be in whole or in part.
  4. So, if you had for instance 10 LV's, 3 total on the first drive, #4 partially on first drive and second drive, then 5-7 on drive #2 wholly, then 8-10 on drive 3, you would be potentially able to force the VG online and recover LV's 1,2,3,8,9,10.. #4,5,6,7 would be completely lost.
Peter Grace Peter Grace 2,676 2 2 gold badges 22 22 silver badges 38 38 bronze badges add a comment | 3

1) How does the system determine what partition to use first?

LVM doesn't really have the concept of a partition it uses PVs (Physical Volumes), which can be a partition. These PVs are broken up into extents and then these are mapped to the LVs (Logical Volumes). When you create the LVs you can specify if the data is striped or mirrored but the default is linear allocation. So it would use the extents in the first PV then the 2nd then the 3rd.

2) Can I find what disk a file or folder is physically on?

You can determine what PVs a LV has allocation extents on. But I don't know of a way to get that information for an individual file.

3) If I lose a drive in the LVM, do I lose all data, or just data physically on that disk?

As Peter has said the blocks appear as 0's if a PV goes missing. So you can potentially do data recovery on files that are on the other PVs. But I wouldn't rely on it. You normally see LVM used in conjunction with RAIDs for this reason.

3dinfluence 3dinfluence 12k 1 1 gold badge 23 23 silver badges 38 38 bronze badges

add a comment | 2 I don't know the answer to #2, so I'll leave that to someone else. I suspect "no", but I'm willing to be happily surprised.

1 is: you tell it, when you combine the physical volumes into a volume group.

3 is: it's effectively as if a huge chunk of your disk suddenly turned to badblocks. You can patch things back together with a new, empty drive to which you give the same UUID, and then run an fsck on any filesystems on logical volumes that went across the bad drive to hope you can salvage something.

And to the overall, unasked question: yeah, you probably don't really want to do that.

[Nov 02, 2019] Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail)

Notable quotes:
"... RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure. ..."
"... If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction. ..."
"... RAID5 + a GOOD backup is something to consider, though. ..."
"... Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ..."
"... RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel). ..."
Oct 03, 2019 | hardforum.com

RAID5 can survive a single drive failure. However, once you replace that drive, it has to be initialized. Depending on the controller and other things, this can take anywhere from 5-18 hours. During this time, all drives will be in constant use to re-create the failed drive. It is during this time that people worry that the rebuild would cause another drive near death to die, causing a complete array failure.

This isn't the only danger. The problem with 2TB disks, especially if they are not 4K sector disks, is that they have relative high BER rate for their capacity, so the likelihood of BER actually occurring and translating into an unreadable sector is something to worry about.

If during a rebuild one of the remaining disks experiences BER, your rebuild stops and you may have headaches recovering from such a situation, depending on controller design and user interaction.

So i would say with modern high-BER drives you should say:

So essentially you'll lose one parity disk alone for the BER issue. Not everyone will agree with my analysis, but considering RAID5 with today's high-capacity drives 'safe' is open for debate.

RAID5 + a GOOD backup is something to consider, though.

  1. So you're saying BER is the error count that 'escapes' the ECC correction? I do not believe that is correct, but i'm open to good arguments or links.

    As i understand, the BER is what prompt bad sectors, where the number of errors exceed that of the ECC error correcting ability; and you will have an unrecoverable sector (Current Pending Sector in SMART output).

    Also these links are interesting in this context:

    http://blog.econtech.selfip.org/200...s-not-fully-readable-a-lawsuit-in-the-making/

    The short story first: Your consumer level 1TB SATA drive has a 44% chance that it can be completely read without any error. If you run a RAID setup, this is really bad news because it may prevent rebuilding an array in the case of disk failure, making your RAID not so Redundant. Click to expand...
    Not sure on the numbers the article comes up with, though.

    Also this one is interesting:
    http://lefthandnetworks.typepad.com/virtual_view/2008/02/what-does-data.html

    BER simply means that while reading your data from the disk drive you will get an average of one non-recoverable error in so many bits read, as specified by the manufacturer. Click to expand...
    Rebuilding the data on a replacement drive with most RAID algorithms requires that all the other data on the other drives be pristine and error free. If there is a single error in a single sector, then the data for the corresponding sector on the replacement drive cannot be reconstructed, and therefore the RAID rebuild fails and data is lost. The frequency of this disastrous occurrence is derived from the BER. Simple calculations will show that the chance of data loss due to BER is much greater than all other reasons combined. Click to expand...
    These links do suggest that BER works to produce un-recoverable sectors, and not 'escape' them as 'undetected' bad sectors, if i understood you correctly.
  1. parityOCP said:
    That's guy's a bit of a scaremonger to be honest. He may have a point with consumer drives, but the article is sensationalised to a certain degree. However, there are still a few outfits that won't go past 500GB/drive in an array (even with enterprise drives), simply to reduce the failure window during a rebuild. Click to expand...
    Why is he a scaremonger? He is correct. Have you read his article? In fact, he has copied his argument from Adam Leventhal(?) that was one of the ZFS developers I believe.

    Adam's argument goes likes this:
    Disks are getting larger all the time, in fact, the storage increases exponentially. At the same time, the bandwidth is increasing not that fast - we are still at 100MB/sek even after decades. So, bandwidth has increased maybe 20x after decades. While storage has increased from 10MB to 3TB = 300.000 times.

    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month.

    Repairing a raid stresses the other disks much, which means they can break too. Experienced sysadmins reports that this happens quite often during a repair. Maybe because those disks come from the same batch, they have the same weakness. Some sysadmins therefore mix disks from different vendors and batches.

    Hence, I would not want to run a raid with 3TB disks and only use raid-5. During those days, if only another disk crashes you have lost all your data.

    Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives, such as 2TB or 3TB disks. You should instead use raid-6 (two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5. This is true, and not scaremongery.

    In fact, ZFS has therefore something called raidz3 - which means that three disks can fail without problems. To the OT: no raid-5 is not safe. Neither is raid-6, because neither of them can not always repair nor detect corrupted data. There are cases when they dont even notice that you got corrupted bits. See my other thread for more information about this. That is the reason people are switching to ZFS - which always CAN detect and repair those corrupted bits. I suggest, sell your hardware raid card, and use ZFS which requires no hardware. ZFS just uses JBOD.

    Here are research papers on raid-5, raid-6 and ZFS and corruption:
    http://hardforum.com/showpost.php?p=1036404173&postcount=73

  1. brutalizer said:
    The trend is clear. In the future when we have 10TB drives, they will not be much faster than today. This means, to repair an raid with 3TB disks today, will take several days, maybe even one week. With 10TB drives, it will take several weeks, maybe a month. Click to expand...
    While I agree with the general claim that the larger HDDs (1.5, 2, 3TBs) are best used in RAID 6, your claim about rebuild times is way off.

    I think it is not unreasonable to assume that the 10TB drives will be able to read and write at 200 MB/s or more. We already have 2TB drives with 150MB/s sequential speeds, so 200 MB/s is actually a conservative estimate.

    10e12/200e6 = 50000 secs = 13.9 hours. Even if there is 100% overhead (half the throughput), that is less than 28 hours to do the rebuild. It is a long time, but it is no where near a month! Try to back your claims in reality.

    And you have again made the false claim that "ZFS - which always CAN detect and repair those corrupted bits". ZFS can usually detect corrupted bits, and can usually correct them if you have duplication or parity, but nothing can always detect and repair. ZFS is safer than many alternatives, but nothing is perfectly safe. Corruption can and has happened with ZFS, and it will happen again.

Is RAID5 safe with Five 2TB Hard Drives ? | [H]ard|Forum Your browser indicates if you've visited this link

https://hardforum.com /threads/is-raid5-safe-with-five-2tb-hard-drives.1560198/

Hence, that article is correct, and he is not a scaremonger. Raid-5 is obsolete if you use large drives , such as 2TB or 3TB disks. You should instead use raid-6 ( two disks can fail). That is the conclusion of the article: use raid-6 with large disks, forget raid-5 . This is true, and not scaremongery.

RAID 5 Data Recovery How to Rebuild a Failed RAID 5 - YouTube

RAID 5 vs RAID 10: Recommended RAID For Safety and ... Your browser indicates if you've visited this link

https://www.cyberciti.biz /tips/raid5-vs-raid-10-safety-performance.html

RAID 6 offers more redundancy than RAID 5 (which is absolutely essential, RAID 5 is a walking disaster) at the cost of multiple parity writes per data write. This means the performance will be typically worse (although it's not theoretically much worse, since the parity operations are in parallel).

[Oct 29, 2019] Blame the Policies, Not the Robots

Oct 29, 2019 | economistsview.typepad.com

anne , October 26, 2019 at 11:59 AM

http://cepr.net/publications/op-eds-columns/blame-the-policies-not-the-robots

October 23, 2019

Blame the Policies, Not the Robots
By Jared Bernstein and Dean Baker - Washington Post

The claim that automation is responsible for massive job losses has been made in almost every one of the Democratic debates. In the last debate, technology entrepreneur Andrew Yang told of automation closing stores on Main Street and of self-driving trucks that would shortly displace "3.5 million truckers or the 7 million Americans who work in truck stops, motels, and diners" that serve them. Rep. Tulsi Gabbard (Hawaii) suggested that the "automation revolution" was at "the heart of the fear that is well-founded."

When Sen. Elizabeth Warren (Mass.) argued that trade was a bigger culprit than automation, the fact-checker at the Associated Press claimed she was "off" and that "economists mostly blame those job losses on automation and robots, not trade deals."

In fact, such claims about the impact of automation are seriously at odds with the standard data that we economists rely on in our work. And because the data so clearly contradict the narrative, the automation view misrepresents our actual current challenges and distracts from effective solutions.

Output-per-hour, or productivity, is one of those key data points. If a firm applies a technology that increases its output without adding additional workers, its productivity goes up, making it a critical diagnostic in this space.

Contrary to the claim that automation has led to massive job displacement, data from the Bureau of Labor Statistics (BLS) show that productivity is growing at a historically slow pace. Since 2005, it has been increasing at just over a 1 percent annual rate. That compares with a rate of almost 3 percent annually in the decade from 1995 to 2005.

This productivity slowdown has occurred across advanced economies. If the robots are hiding from the people compiling the productivity data at BLS, they are also managing to hide from the statistical agencies in other countries.

Furthermore, the idea that jobs are disappearing is directly contradicted by the fact that we have the lowest unemployment rate in 50 years. The recovery that began in June 2009 is the longest on record. To be clear, many of those jobs are of poor quality, and there are people and places that have been left behind, often where factories have closed. But this, as Warren correctly claimed, was more about trade than technology.

Consider, for example, the "China shock" of the 2000s, when sharply rising imports from countries with much lower-paid labor than ours drove up the U.S. trade deficit by 2.4 percentage points of GDP (almost $520 billion in today's economy). From 2000 to 2007 (before the Great Recession), the country lost 3.4 million manufacturing jobs, or 20 percent of the total.

Addressing that loss, Susan Houseman, an economist who has done exhaustive, evidence-based analysis debunking the automation explanation, argues that "intuitively and quite simply, there doesn't seem to have been a technology shock that could have caused a 20 to 30 percent decline in manufacturing employment in the space of a decade." What really happened in those years was that policymakers sat by while millions of U.S. factory workers and their communities were exposed to global competition with no plan for transition or adjustment to the shock, decimating parts of Ohio, Michigan and Pennsylvania. That was the fault of the policymakers, not the robots.

Before the China shock, from 1970 to 2000, the number (not the share) of manufacturing jobs held remarkably steady at around 17 million. Conversely, since 2010 and post-China shock, the trade deficit has stabilized and manufacturing has been adding jobs at a modest pace. (Most recently, the trade war has significantly dented the sector and worsened the trade deficit.) Over these periods, productivity, automation and robotics all grew apace.

In other words, automation isn't the problem. We need to look elsewhere to craft a progressive jobs agenda that focuses on the real needs of working people.

First and foremost, the low unemployment rate -- which wouldn't prevail if the automation story were true -- is giving workers at the middle and the bottom a bit more of the bargaining power they require to achieve real wage gains. The median weekly wage has risen at an annual average rate, after adjusting for inflation, of 1.5 percent over the past four years. For workers at the bottom end of the wage ladder (the 10th percentile), it has risen 2.8 percent annually, boosted also by minimum wage increases in many states and cities.

To be clear, these are not outsize wage gains, and they certainly are not sufficient to reverse four decades of wage stagnation and rising inequality. But they are evidence that current technologies are not preventing us from running hotter-for-longer labor markets with the capacity to generate more broadly shared prosperity.

National minimum wage hikes will further boost incomes at the bottom. Stronger labor unions will help ensure that workers get a fairer share of productivity gains. Still, many toiling in low-wage jobs, even with recent gains, will still be hard-pressed to afford child care, health care, college tuition and adequate housing without significant government subsidies.

Contrary to those hawking the automation story, faster productivity growth -- by boosting growth and pretax national income -- would make it easier to meet these challenges. The problem isn't and never was automation. Working with better technology to produce more efficiently, not to mention more sustainably, is something we should obviously welcome.

The thing to fear isn't productivity growth. It's false narratives and bad economic policy.

Paine -> anne... , October 27, 2019 at 06:54 AM
The domestic manufacturing sector and emplyment both shrank because of net off shoring of formerly domestic production

Simple fact


The net job losses are not evenly distributed Nor are the lost jobs to over seas primarily low wage rate jobs

Okay so we need special federal actions in areas with high concentrations of off-shoring induced job loses

But more easily we can simply raise service sector raises by heating up demand

Caution

Two sectors need controls however: Health and housing. Otherwise wage gains will be drained by rent sucking operations in these two sectors

Mr. Bill -> Paine... , October 28, 2019 at 02:21 PM
It is easy to spot the ignorance of those that have enough. Comfort reprises a certain arrogance.

The aura of deservedly is palpable. There are those here that would be excommunicated by society when the troubles come to their town.

[Oct 27, 2019] by Michael Olenick

Notable quotes:
"... "Too much automation is really all about narrowing the choices in your life and making it cheaper instead of enabling a richer lifestyle." Many times the only way to automate the creation of a product is to change it to fit the machine. ..."
"... You've gotta' get out of Paris: great French bread remains awesome. I live here. I've lived here for over half a decade and know many elderly French. The bread, from the right bakeries, remains great. ..."
"... I agree with others here who distinguish between labor saving automation and labor eliminating automation, but I don't think the former per se is the problem as much as the gradual shift toward the mentality and "rightness" of mass production and globalization. ..."
"... When I started doing robotics, I developed a working definition of a robot as: (a.) Senses its environment; (b.) Has goals and goal-seeking logic; (c.) Has means to affect environment in order to get goal and reality (the environment) to converge. Under that definition, Amazon's Alexa and your household air conditioning and heating system both qualify as "robot". ..."
"... The addition of a computer (with a program, or even downloadable-on-the-fly programs) to a static machine, e.g. today's computer-controlled-manufacturing machines (lathes, milling, welding, plasma cutters, etc.) makes a massive change in utility. It's almost the same physically, but ever so much more flexible, useful, and more profitable to own/operate. ..."
"... And if you add massive databases, internet connectivity, the latest machine-learning, language and image processing and some nefarious intent, then you get into trouble. ..."
Oct 25, 2019 | www.nakedcapitalism.com

By Michael Olenick, a research fellow at INSEAD who writes regularly at Olen on Economics and Innowiki . Originally published at Innowiki

Part I , "Automation Armageddon: a Legitimate Worry?" reviewed the history of automation, focused on projections of gloom-and-doom.

"It smells like death," is how a friend of mine described a nearby chain grocery store. He tends to exaggerate and visiting France admittedly brings about strong feelings of passion. Anyway, the only reason we go there is for things like foil or plastic bags that aren't available at any of the smaller stores.

Before getting to why that matters – and, yes, it does matter – first a tasty digression.

I live in a French village. To the French, high-quality food is a vital component to good life.

My daughter counts eight independent bakeries on the short drive between home and school. Most are owned by a couple of people. Counting high-quality bakeries embedded in grocery stores would add a few more. Going out of our way more than a minute or two would more than double that number.

Typical Bakery: Bread is cooked at least twice daily

Despite so many, the bakeries seem to do well. In the half-decade I've been here, three new ones opened and none of the old ones closed. They all seem to be busy. Bakeries are normally owner operated. The busiest might employ a few people but many are mom-and-pop operations with him baking and her selling. To remain economically viable, they rely on a dance of people and robots. Flour arrives in sacks with high-quality grains milled by machines. People measure ingredients, with each bakery using slightly different recipes. A human-fed robot mixes and kneads the ingredients into the dough. Some kind of machine churns the lumps of dough into baguettes.

https://www.youtube.com/embed/O22jWIjcdaY?feature=oembed


Baguette Forming Machine: This would make a good animated GIF

The baker places the formed baguettes onto baking trays then puts them in the oven. Big ovens maintain a steady temperature while timers keep track of how long various loaves of bread have been baking. Despite the sensors, bakers make the final decision when to pull the loaves out, with some preferring a bien cuit more cooked flavor and others a softer crust. Finally, a person uses a robot in the form of a cash register to ring up transactions and processes payments, either by cash or card.

Nobody -- not the owners, workers, or customers -- think twice about any of this. I doubt most people realize how much automation technology is involved or even that much of the equipment is automation tech. There would be no improvement in quality mixing and kneading the dough by hand. There would, however, be an enormous increase in cost. The baguette forming machines churn out exactly what a person would do by hand, only faster and at a far lower cost. We take the thermostatically controlled ovens for granted. However, for anybody who has tried to cook over wood controlling heat via air and fuel, thermostatically controlled ovens are clearly automation technology.

Is the cash register really a robot? James Ritty, who invented it, didn't think so; he sold the patent for cheap. The person who bought the patent built it into NCR, a seminal company laying the groundwork of the modern computer revolution.

Would these bakeries be financially viable if forced to do all this by hand? Probably not. They'd be forced to produce less output at higher cost; many would likely fail. Bread would cost more leaving less money for other purchases. Fewer jobs, less consumer spending power, and hungry bellies to boot; that doesn't sound like good public policy.

Getting back to the grocery store my friend thinks smells like death; just a few weeks ago they started using robots in a new and, to many, not especially welcome way.

As any tourist knows, most stores in France are closed on Sunday afternoons, including and especially grocery stores. That's part of French labor law: grocery stores must close Sunday afternoons. Except that the chain grocery store near me announced they are opening Sunday afternoon. How? Robots, and sleight-of-hand. Grocers may not work on Sunday afternoons but guards are allowed.

Not my store but similar.

Dimanche means Sunday. Aprés-midi means afternoon.

I stopped in to get a feel for how the system works. Instead of grocers, the store uses security guards and self-checkout kiosks.

When you step inside, a guard reminds you there are no grocers. Nobody restocks the shelves but, presumably for half a day, it doesn't matter. On Sunday afternoons, in place of a bored-looking person wearing a store uniform and overseeing the robo-checkout kiosks sits a bored-looking person wearing a security guard uniform doing the same. There are no human-assisted checkout lanes open but this store seldom has more than one operating anyway.

I have no idea how long the French government will allow this loophole to continue. I thought it might attract yellow vest protestors or at least a cranky store worker – maybe a few locals annoyed at an ancient tradition being buried – but there was nobody complaining. There were hardly any customers, either.

The use of robots to sidestep labor law and replace people, in one of the most labor-friendly countries in the world, produced a big yawn.

Paul Krugman and Matt Stoller argue convincingly that it's the bosses, not the robots, that crush the spirits and souls of workers. Krugman calls it "automation obsession" and Stoller points out predictions of robo-Armageddon have existed for decades. The well over 100+ examples I have of major automation-tech ultimately led to more jobs, not fewer.

Jerry Yang envisions some type of forthcoming automation-induced dystopia. Zuck and the tech-bros argue for a forthcoming Star Trek style robo-utopia.

My guess is we're heading for something in-between, a place where artisanal bakers use locally grown wheat, made affordable thanks to machine milling. Where small family-owned bakeries rely on automation tech to do the undifferentiated grunt-work. The robots in my future are more likely to look more like cash registers and less like Terminators.

It's an admittedly blander vision of the future; neither utopian nor dystopian, at least not one fueled by automation tech. However, it's a vision supported by the historic adoption of automation technology.


The Rev Kev , October 25, 2019 at 10:46 am

I have no real disagreement with a lot of automation. But how it is done is another matter altogether. Using the main example in this article, Australia is probably like a lot of countries with bread in that most of the loaves that you get in a supermarket are typically bland and come in plastic bags but which are cheap. You only really know what you grow up with.

When I first went to Germany I stepped into a Bakerie and it was a revelation. There were dozens of different sorts and types of bread on display with flavours that I had never experienced. I didn't know whether to order a loaf or to go for my camera instead. And that is the point. Too much automation is really all about narrowing the choices in your life and making it cheaper instead of enabling a richer lifestyle.

We are all familiar with crapification and I contend that it is automation that enables this to become a thing.

WobblyTelomeres , October 25, 2019 at 11:08 am

"I contend that it is automation that enables this to become a thing."

As does electricity. And math. Automation doesn't necessarily narrow choices; economies of scale and the profit motive do. What I find annoying (as in pollyannish) is the avoidance of the issue of those that cannot operate the machinery, those that cannot open their own store, etc.

I gave a guest lecture to a roomful of young roboticists (largely undergrad, some first year grad engineering students) a decade ago. After discussing the economics/finance of creating and selling a burgerbot, asked about those that would be unemployed by the contraption. One student immediately snorted out, "Not my problem!" Another replied, "But what if they cannot do anything else?". Again, "Not my problem!". And that is San Josie in a nutshell.

washparkhorn , October 26, 2019 at 3:25 am

A capitalist market that fails to account for the cost of a product's negative externalities is underpricing (and incentivizing more of the same). It's cheating (or sanctioned cheating due to ignorance and corruption). It is not capitalism (unless that is the only reasonable outcome of capitalism).

Tom Pfotzer , October 25, 2019 at 11:33 am

The author's vision of "appropriate tech" local enterprise supported by relatively simple automation is also my answer to the vexing question of "how do I cope with automation?"

In a recent posting here at NC, I said the way to cope with automation of your job(s) is to get good at automation. My remark caused a howl of outrage: "most people can't do automation! Your solution is unrealistic for the masses. Dismissed with prejudice!".

Thank you for that outrage, as it provides a wonder foil for this article. The article shows a small business which learned to re-design business processes, acquire machines that reduce costs. It's a good example of someone that "got good at automation".

Instead of being the victim of automation, these people adapted. They bought automation, took control of it, and operated it for their own benefit.

Key point: this entrepreneur is now harvesting the benefits of automation, rather than being systematically marginalized by it. Another noteworthy aspect of this article is that local-scale "appropriate" automation serves to reduce the scale advantages of the big players. The availability of small-scale machines that enable efficiencies comparable to the big guys is a huge problem. Most of the machines made for small-scale operators like this are manufactured in China, or India or Iran or Russia, Italy where industrial consolidation (scale) hasn't squashed the little players yet.

Suppose you're a grain farmer, but only have 50 acres (not 100s or 1000s like the big guys). You need a combine – that's a big machine that cuts grain stalk and separate grain from stalk (threshing). This cut/thresh function is terribly labor intensive, the combine is a must-have. Right now, there is no small-size ($50K or less) combine manufactured in the U.S., to my knowledge. They cost upwards of $200K, and sometimes a great deal more. The 50-acre farmer can't afford $200K (plus maint costs), and therefore can't farm at that scale, and has to sell out.

So, the design, production, and sales of these sort of small-scale, high-productivity machines is what is needed to re-distribute production (organically, not by revolution, thanks) back into the hands of the middle class.

If we make possible for the middle class to capture the benefits of automation, and you solve 1) the social dilemmas of concentration of wealth, 2) the declining std of living of the mid- and lower-class, and 3) have a chance to re-design an economy (business processes and collaborating suppliers to deliver end-user product/service) that actually fixes the planet as we make our living, instead of degrading it at every ka-ching of the cash register.

Point 3 is the most important, and this isn't the time or place to expand on that, but I hope others might consider it a bit.

marcel , October 25, 2019 at 12:07 pm

Regarding the combine, I have seen them operating on small-sized lands for the last 50 years. Without exception, you have one guy (sometimes a farmer, often not) who has this kind of harvester, works 24h a day for a week or something, harvesting for all farmers in the neighborhood, and then moves to the next crop (eg corn). Wintertime is used for maintenance. So that one person/farm/company specializes in these services, and everybody gets along well.

Tom Pfotzer , October 25, 2019 at 2:49 pm

Marcel – great solution to the problem. Choosing the right supplier (using combine service instead of buying a dedicated combine) is a great skill to develop. On the flip side, the fellow that provides that combine service probably makes a decent side-income from it. Choosing the right service to provide is another good skill to develop.

Jesper , October 25, 2019 at 5:59 pm

One counter-argument might be that while hoping for the best it might be prudent to prepare for the worst. Currently, and for a couple of decades, the efficiency gains have been left to the market to allocate. Some might argue that for the common good then the government might need to be more active.

What would happen if efficiency gains continued to be distributed according to the market? According to the relative bargaining power of the market participants where one side, the public good as represented by government, is asking for and therefore getting almost nothing?

As is, I do believe that people who are concerned do have reason to be concerned.

Kent , October 25, 2019 at 11:33 am

"Too much automation is really all about narrowing the choices in your life and making it cheaper instead of enabling a richer lifestyle." Many times the only way to automate the creation of a product is to change it to fit the machine.

Brooklin Bridge , October 25, 2019 at 12:02 pm

Some people make a living saying these sorts of things about automation. The quality of French bread is simply not what it used to be (at least harder to find) though that is a complicated subject having to do with flour and wheat as well as human preparation and many other things and the cost (in terms of purchasing power), in my opinion, has gone up, not down since the 70's.

As some might say, "It's complicated," but automation does (not sure about "has to") come with trade offs in quality while price remains closer to what an ever more sophisticated set of algorithms say can be "gotten away with."

This may be totally different for cars or other things, but the author chose French bread and the only overall improvement, or even non change, in quality there has come, if at all, from the dark art of marketing magicians.

Brooklin Bridge , October 25, 2019 at 12:11 pm

/ from the dark art of marketing magicians, AND people's innate ability to accept/be unaware of decreases in quality/quantity if they are implemented over time in small enough steps.

Michael , October 25, 2019 at 1:47 pm

You've gotta' get out of Paris: great French bread remains awesome. I live here. I've lived here for over half a decade and know many elderly French. The bread, from the right bakeries, remains great. But you're unlikely to find it where tourists might wander: the rent is too high.

As a general rule, if the bakers have a large staff or speak English you're probably in the wrong bakery. Except for one of my favorites where she learned her English watching every episode of Friends multiple times and likes to practice with me, though that's more of a fluke.

Brooklin Bridge , October 25, 2019 at 3:11 pm

It's a difficult subject to argue. I suspect that comparatively speaking, French bread remains good and there are still bakers who make high quality bread (given what they have to work with). My experience when talking to family in France (not Paris) is that indeed, they are in general quite happy with the quality of bread and each seems to know a bakery where they can still get that "je ne sais quoi" that makes it so special.

I, on the other hand, who have only been there once every few years since the 70's, kind of like once every so many frames of the movie, see a lowering of quality in general in France and of flour and bread in particular though I'll grant it's quite gradual.

The French love food and were among the best farmers in the world in the 1930s and have made a point of resisting radical change at any given point in time when it comes to the things they love (wine, cheese, bread, etc.) , so they have a long way to fall, and are doing so slowly; but gradually, it's happening.

I agree with others here who distinguish between labor saving automation and labor eliminating automation, but I don't think the former per se is the problem as much as the gradual shift toward the mentality and "rightness" of mass production and globalization.

Oregoncharles , October 26, 2019 at 12:58 am

I was exposed to that conflict, in a small way, because my father was an investment manager. He told me they were considering investing in a smallish Swiss pasta (IIRC) factory. He was frustrated with the negotiations; the owners just weren't interested in getting a lot bigger – which would be the point of the investment, from the investors' POV.

I thought, but I don't think I said very articulately, that of course, they thought of themselves as craftspeople – making people's food, after all. It was a fundamental culture clash. All that was 50 years ago; looks like the European attitude has been receding.

Incidentally, this is a possible approach to a better, more sustainable economy: substitute craft for capital and resources, on as large a scale as possible. More value with less consumption. But how we get there from here is another question.

Carolinian , October 25, 2019 at 12:42 pm

I have been touring around by car and was surprised to see that all Oregon gas stations are full serve with no self serve allowed (I vaguely remember Oregon Charles talking about this). It applies to every station including the ones with a couple of dozen pumps like we see back east. I have since been told that this system has been in place for years.

It's hard to see how this is more efficient and in fact just the opposite as there are fewer attendants than waiting customers and at a couple of stations the action seemed chaotic. Gas is also more expensive although nothing could be more expensive than California gas (over $5/gal occasionally spotted). It's also unclear how this system was preserved–perhaps out of fire safety concerns–but it seems unlikely that any other state will want to imitate just as those bakeries aren't going to bring back their wood fired ovens.

JohnnyGL , October 25, 2019 at 1:40 pm

I think NJ is still required to do all full-serve gas stations. Most in MA have only self-serve, but there's a few towns that have by-laws requiring full-serve.

Brooklin Bridge , October 25, 2019 at 2:16 pm

I'm not sure just how much I should be jumping up and down about our ability to get more gasoline into our cars quicker. But convenient for sure.

The Observer , October 25, 2019 at 4:33 pm

In the 1980s when self-serve gas started being implemented, NIOSH scientists said oh no, now 'everyone' will be increasingly exposed to benzene while filling up. Benzene is close to various radioactive elements in causing damage and cancer.

Oregoncharles , October 26, 2019 at 1:06 am

It was preserved by a series of referenda; turns out it's a 3rd rail here, like the sales tax. The motive was explicitly to preserve entry-level jobs while allowing drivers to keep the gas off their hands. And we like the more personal quality.

Also, we go to states that allow self-serve and observe that the gas isn't any cheaper. It's mainly the tax that sets the price, and location.

There are several bakeries in this area with wood-fired ovens. They charge a premium, of course. One we love is way out in the country, in Falls City. It's a reason to go there.

shinola , October 25, 2019 at 12:47 pm

Unless I misunderstood, the author of this article seems to equate mechanization/automation of nearly any type with robotics.

"Is the cash register really a robot? James Ritty, who invented it, didn't think so;" – Nor do I.

To me, "robot" implies a machine with a high degree of autonomy. Would the author consider an old fashioned manual typewriter or adding machine (remember those?) to be robotic? How about when those machines became electrified?

I think the author uses the term "robot" over broadly.

Dan , October 25, 2019 at 1:05 pm

Agree. Those are just electrified extensions of the lever or sand timer.
It's the "thinking" that is A.I.

Refuse to allow A.I.to destroy jobs and cheapen our standard of living.
Never interact with a robo call, just hang up.
Never log into a website when there is a human alternative.
Refuse to do business with companies that have no human alternative.
Never join a medical "portal" of any kind, demand to talk to medical personnel.
etc.

Sabotage A.I. whenever possible.
The Ten Commandments do not apply to corporations.

https://medium.com/@TerranceT/im-never-going-to-stop-stealing-from-the-self-checkout-22cbfff9919b

marieann , October 25, 2019 at 1:49 pm

I don't use self checkouts but sometimes I will allow a cashier to use one for me while I am supposedly learning how to work the machine.

Sancho Panza , October 25, 2019 at 1:52 pm

During a Chicago hotel stay my wife ordered an extra bath towel from the front desk. About 5 minutes later, a mini version of R2D2 rolled up to her door with towel in tow. It was really cute and interacted with her in a human-like way. Cute but really scary in the way that you indicate in your comment. It seems many low wage activities would be in immediate risk of replacement. But sabotage? I would never encourage sabotage; in fact, when it comes to true robots like this one, I would highly discourage any of the following: yanking its recharge cord in the middle of the night, zapping it with a car battery, lift its payload and replace with something else, give it a hip high-five to help it calibrate its balance, and of course, the good old kick'm in the bolts.

Sancho Panza , October 26, 2019 at 9:53 am

Here's a clip of that robot, Leo, bringing bottled water and a bath towel to my wife.

Sancho Panza , October 26, 2019 at 10:49 pm

https://www.youtube.com/watch?v=TXygNznHSs0

Barbara , October 26, 2019 at 11:48 am

Stop and Shop supermarket chain now has robots in the store. According to Stop and Shop they are oh so innocent! and friendly! why don't you just go up and say hello?
All the robots do, they say, go around scanning the shelves looking for: shelf price tags that don't match the current price, merchandise in the wrong place (that cereal box you picked up in the breakfast aisle and decided, in the laundry aisle, that you didn't want and put the box on a shelf with detergent.) All the robots do is notify management of wrong prices and misplaced merchandise.

The damn robot is cute, perky lit up eyes and a smile – so why does it remind me of the Stepford Wives.

S&S is the closest supermarket near me, so I go there when I need something in a hurry, but the bulk of my shopping is now done elsewhere. Thank goodness there are some stores that are not doing this: The area Shoprites and FoodTown's don't – and they are all run by family businesses. Shoprite succeeds by have a large assortment brands in every grocery category and keeping prices really competitive. FoodTown operates at a higher price and quality level with real butcher and seafood counters as well as prepackaged assortments in open cases and a cooked food counter of the most excellent quality with the store's cooks behind the counter to serve you and answer questions. You never have to come home from work tired and hungry and know that you just don't want to cook and settle for a power bar.

Danny , October 26, 2019 at 9:23 pm

OK, so how do you sabotage the cute SS robot? I suggest a laser pointer to blind its sensors. Or, maybe smear some peanut butter on them. What happens when it runs over your foot that just happens to get in its way? Contingency tort lawsuit?

The more automation you see in a business that you still patronize, the bigger the discount you should ask for. The Ten Commandments do not apply to corporations or job destroyers.

Barbara , October 26, 2019 at 11:30 pm

My husband recently retired from teaching. I'll have to see if he still has his laser pointer or if he had to give it back :-)

Carolinian , October 25, 2019 at 1:11 pm

A robot is a machine -- especially one programmable by a computer -- capable of carrying out a complex series of actions automatically. Robots can be guided by an external control device or the control may be embedded

https://en.wikipedia.org/wiki/Robot

Those early cash registers were perhaps an early form of analog computer. But Wiki reminds that the origin of the term is a work of fiction.

The term comes from a Czech word, robota, meaning "forced labor";the word 'robot' was first used to denote a fictional humanoid in a 1920 play R.U.R. (Rossumovi Univerzální Roboti – Rossum's Universal Robots) by the Czech writer, Karel Čapek

Michael , October 25, 2019 at 1:42 pm

The adding machine – yes, definitely.

The typewriter – maybe. Before the typewriter printed type had to be typeset by hand, a laborious and expensive process.

The French call food processors "robots." Industrial robots have no autonomy, much less a high degree. They repeatedly perform a task they're programmed to do but they're robots.

The idea that a robot has some type of autonomy, that it thinks, makes for good science fiction. But by that definition robots don't exist because there are no thinking machines with a high level of autonomy outside the sci-fi genre. Except that they do exist and they're everywhere.

Math is Your Friend , October 25, 2019 at 3:15 pm

"The typewriter – maybe. Before the typewriter printed type had to be typeset by hand, a laborious and expensive process."

This is really not the appropriate comparison.

The typewriter replaced pen and ink.

Hand assembly of type, a character at a time, was replaced by linotype machines in the late 1800s, which were in turn replaced by phototypesetting in the second half of the 20th century, in turn replaced by computerized typesetting.

Where you go, and if you go, after that, depends on use cases – laser printers, e-books, web pages, videos and probably a few other methods of delivering information.

shinola , October 25, 2019 at 4:26 pm

Perhaps I didn't qualify "autonomous" properly. I didn't mean to imply a 'Rosie the Robot' level of autonomy but the ability of a machine to perform its programmed task without human intervention (other than switching on/off or maintenance & adjustments).

If viewed this way, an adding machine or typewriter are not robots because they require constant manual input in order to function – if you don't push the keys, nothing happens. A computer printer might be considered robotic because it can be programmed to function somewhat autonomously (as in print 'x' number of copies of this document).

"Robotics" is a subset of mechanized/automated functions.

Michael , October 26, 2019 at 7:56 am

Adding machines are robots under your definition. You input what you want them to calculate but the actual calculations are done by the machine (which is the point of having the machine in the first place). You're touching the machine a lot more because it needs you to input more to do its thing but the calculation part is automatic. Typewriters, now that I think about it, really aren't: they don't automate anything.

The difference is a machine that extends human abilities with no automation (ex: a shovel) vs a machine that mimics human abilities (ex: a calculator or bread mixing machine).

The next level would be an autonomous machine but, depending on the definition of autonomy, I think those are a long way off. For example, a self-driving car – once perfected – can handle all sorts of different conditions but still can't really think. It comes down to the definition of autonomy, or maybe it's more accurate to say the degree of autonomy.

Brooklin Bridge , October 25, 2019 at 2:30 pm

I think the author confuses automation -in its most general sense- with progress in a down home, folksy, "let's be real" sort of way. The two are not always one and the same for all people and certainly not so for all other forms of life on the planet. At the end, he does at least make a passing gesture to automation in its more sinister form than that of helping artisans avoid drugery.

Stephen Gardner , October 25, 2019 at 4:48 pm

When I first got out of grad school I worked at United Technologies Research Center where I worked in the robotics lab. In general, at least in those days, we made a distinction between robotics and hard automation. A robot is programmable to do multiple tasks and hard automation is limited to a single task unless retooled. The machines the author is talking about are hard automation. We had ASEA robots that could be programmed to do various things. One of ours drilled, riveted and sealed the skin on the horizontal stabilators (the wing on the tail of a helicopter that controls pitch) of a Sikorsky Sea Hawk. The same robot with just a change of the fixture on the end could be programmed to paint a car or weld a seam on equipment. The drilling and riveting robot was capable of modifying where the rivets were placed (in the robot's frame of reference) based on the location of precisely milled blocks build into the fixture that held the stabilator. There was always some variation and it was important to precisely place the rivets because the spars were very narrow (weight at the tail is bad because of the lever arm). It was considered state of the art back in the day but now auto companies have far more sophisticated robotics.

Michael , October 26, 2019 at 8:03 am

By that definition aren't mixers still robots? You can put in a whisk and they'll mix one way. Put in a bread hook, set the right setting, and it will knead dough – just like the helicopter building robots. Same with cash registers: press one button and they add money, another and they calculate change, a third and they'll do a return, a fourth and they'll print a total. Though, despite multiple functions, you can't reprogram the old mechanical ones (of course, the newer ones are computers running a program). A baguette making machine seems like what you're describing as hard automation: it has one and only function.

Alex Cox , October 25, 2019 at 12:55 pm

The Oregon gas attendant rule is a job-creation scheme. It works well, and very rarely is there an annoying wait.

Gas in Oregon is considerably cheaper than in California.

A few years ago my wife told me I had to go out and get a real job. I realized I was only qualified to do two things: teach, or pump gas.

Thank you Oregon for giving me a choice!

Carolinian , October 25, 2019 at 1:17 pm

Gas in Oregon is considerably cheaper than in California.

And considerably more expensive than in low tax South Carolina (2.19/gal a recent example).

Socal Rhino , October 25, 2019 at 1:44 pm

But what happens when the bread machine is connected to the internet,can't function without an active internet connection, and requires an annual subscription to use?

That is the issue to me: however we define the tools, who will own them?

The Rev Kev , October 25, 2019 at 6:53 pm

You know, that is quite a good point that. It is not so much the automation that is the threat as the rent-seeking that anything connected to the internet allows to be implemented.

*_* , October 25, 2019 at 2:28 pm

Until 100 petaflops costs less than a typical human worker total automation isn't going to happen. Developments in AI software can't overcome basic hardware limits.

breadbaker , October 25, 2019 at 2:29 pm

The story about automation not worsening the quality of bread is not exactly true. Bakers had to develop and incorporate a new method called autolyze ( https://www.kingarthurflour.com/blog/2017/09/29/using-the-autolyse-method ) in the mid-20th-century to bring back some of the flavor lost with modern baking. There is also a trend of a new generation of bakeries that use natural yeast, hand shaping and kneading to get better flavors and quality bread.

But it is certainly true that much of the automation gives almost as good quality for much lower labor costs.

Tom Pfotzer , October 25, 2019 at 3:05 pm

On the subject of the machine-robot continuum

When I started doing robotics, I developed a working definition of a robot as: (a.) Senses its environment; (b.) Has goals and goal-seeking logic; (c.) Has means to affect environment in order to get goal and reality (the environment) to converge. Under that definition, Amazon's Alexa and your household air conditioning and heating system both qualify as "robot".

How you implement a, b, and c above can have more or less sophistication, depending upon the complexity, variability, etc. of the environment, or the solutions, or the means used to affect the environment.

A machine, like a typewriter, or a lawn-mower engine has the logic expressed in metal; it's static.

The addition of a computer (with a program, or even downloadable-on-the-fly programs) to a static machine, e.g. today's computer-controlled-manufacturing machines (lathes, milling, welding, plasma cutters, etc.) makes a massive change in utility. It's almost the same physically, but ever so much more flexible, useful, and more profitable to own/operate.

And if you add massive databases, internet connectivity, the latest machine-learning, language and image processing and some nefarious intent, then you get into trouble.

:)

Phacops , October 25, 2019 at 3:08 pm

Sometimes automation is necessary to eliminate the risks of manual processes. There are parenteral (injectable) drugs that cannot be sterilized except by filtration. Most of the work of filling, post filling processing, and sealing is done using automation in areas that make surgical suites seem filthy and people are kept from these operations.

Manual operations are only undertaken to correct issues with the automation and the procedures are tested to ensure that they do not introduce contamination, microbial or otherwise. Because even one non-sterile unit is a failure and testing is destructive process, of course any full lot of product cannot be tested to state that all units are sterile. Periodic testing of the automated process and manual intervention is done periodically and it is expensive and time consuming to test to a level of confidence that there is far less than a one in a million chance of any unit in a lot being non sterile.

In that respect, automation and the skills necessary to interface with it are fundamental to the safety of drugs frequently used on already compromised patients.

Brooklin Bridge , October 25, 2019 at 3:27 pm

Agree. Good example. Digital technology and miniaturization seem particularly well suited to many aspect of the medical world. But doubt they will eliminate the doctor or the nurse very soon. Insurance companies on the other hand

lyman alpha blob , October 25, 2019 at 8:34 pm

Bill Burr has some thoughts on self checkouts and the potential bonanza for shoppers – https://www.youtube.com/watch?v=FxINJzqzn4w

TG , October 26, 2019 at 11:51 am

"There would be no improvement in quality mixing and kneading the dough by hand. There would, however, be an enormous increase in cost." WRONG! If you had an unlimited supply of 50-cents-an-hour disposable labor, mixing and kneading the dough by hand would be cheaper. It is only because labor is expensive in France that the machine saves money.

In Japan there is a lot of automation, and wages and living standards are high. In Bangladesh there is very little automation, and wages and livings standards are very low.

Are we done with the 'automation is destroying jobs' meme yet? Excessive population growth is the problem, not robots. And the root cause of excessive population growth is the corporate-sponsored virtual taboo of talking about it seriously.

[Oct 25, 2019] Get inode number of a file on linux - Fibrevillage

Oct 25, 2019 | www.fibrevillage.com

Get inode number of a file on linux

An inode is a data structure in UNIX operating systems that contains important information pertaining to files within a file system. When a file system is created in UNIX, a set amount of inodes is created, as well. Usually, about 1 percent of the total file system disk space is allocated to the inode table.

How do we find a file's inode ?

ls -i Command: display inode
ls -i Command: display inode
$ls -i /etc/bashrc
131094 /etc/bashrc
131094 is the inode of /etc/bashrc.
Stat Command: display Inode
$stat /etc/bashrc
  File: `/etc/bashrc'
  Size: 1386          Blocks: 8          IO Block: 4096   regular file
Device: fd00h/64768d    Inode: 131094      Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2013-12-10 10:01:29.509908811 -0800
Modify: 2013-06-06 11:31:51.792356252 -0700
Change: 2013-06-06 11:31:51.792356252 -0700
find command: display inode
$find ./ -iname sysfs_fc_tools.tar -printf '%p %i\n'
./sysfs_fc_tools.tar 28311964

Notes :

    %p stands for file path
    %i stands for inode number
tree command: display inode under a directory
#tree -a -L 1 --inodes /etc
/etc
├── [ 132896]  a2ps
├── [ 132898]  a2ps.cfg
├── [ 132897]  a2ps-site.cfg
├── [ 133315]  acpi
├── [ 131864]  adjtime
├── [ 132340]  akonadi
...
usecase of using inode
find / -inum XXXXXX -print to find the full path for each file pointing to inode XXXXXX.

Though you can use the example to do rm action, but simply I discourage to do so, for security concern in find command, also in other file system, same inode refers a very different file.

filesystem repair

If you get a bad luck on your filesystem, most of time, run fsck to fix it. It helps if you have inode info of the filesystem in hand.
This is another big topic, I'll have another article for it.

[Oct 25, 2019] Howto Delete files by inode number by Erik

Feb 10, 2011 | erikimh.com
linux administration - tips, notes and projects

6 Comments

Ever mistakenly pipe output to a file with special characters that you couldn't remove?

-rw-r–r– 1 eriks eriks 4 2011-02-10 22:37 –fooface

Good luck. Anytime you pass any sort of command to this file, it's going to interpret it as a flag. You can't fool rm, echo, sed, or anything else into actually deeming this a file at this point. You do, however, have a inode for every file.

Traditional methods fail:

[eriks@jaded: ~]$ rm -f –fooface
rm: unrecognized option '–fooface'
Try `rm ./–fooface' to remove the file `–fooface'.
Try `rm –help' for more information.
[eriks@jaded: ~]$ rm -f '–fooface'
rm: unrecognized option '–fooface'
Try `rm ./–fooface' to remove the file `–fooface'.
Try `rm –help' for more information.

So now what, do you live forever with this annoyance of a file sitting inside your filesystem, never to be removed or touched again? Nah.

We can remove a file, simply by an inode number, but first we must find out the file inode number:

$ ls -il | grep foo

Output:

[eriks@jaded: ~]$ ls -il | grep foo
508160 drwxr-xr-x 3 eriks eriks 4096 2010-10-27 18:13 foo3
500724 -rw-r–r– 1 eriks eriks 4 2011-02-10 22:37 –fooface
589907 drwxr-xr-x 2 eriks eriks 4096 2010-11-22 18:52 tempfoo
589905 drwxr-xr-x 2 eriks eriks 4096 2010-11-22 18:48 tmpfoo

The number you see prior to the file permission set is actually the inode # of the file itself.

Hint: 500724 is inode number we want removed.

Now use find command to delete file by inode:

# find . -inum 500724 -exec rm -i {} \;

There she is.

[eriks@jaded: ~]$ find . -inum 500724 -exec rm -i {} \;
rm: remove regular file `./–fooface'? y

[Oct 25, 2019] unix - Remove a file on Linux using the inode number - Super User

Oct 25, 2019 | superuser.com

,

ome other methods include:

escaping the special chars:

[~]$rm \"la\*

use the find command and only search the current directory. The find command can search for inode numbers, and has a handy -delete switch:

[~]$ls -i
7404301 "la*

[~]$find . -maxdepth 1 -type f -inum 7404301
./"la*

[~]$find . -maxdepth 1 -type f -inum 7404301 -delete
[~]$ls -i
[~]$

,

Maybe I'm missing something, but...
rm '"la*'

Anyways, filenames don't have inodes, files do. Trying to remove a file without removing all filenames that point to it will damage your filesystem.

[Oct 25, 2019] Linux - Unix Find Inode Of a File Command

Jun 21, 2012 | www.cyberciti.biz
... ... ..

stat Command: Display Inode

You can also use the stat command as follows:
$ stat fileName-Here
$ stat /etc/passwd

Sample outputs:

  File: `/etc/passwd'
  Size: 1644            Blocks: 8          IO Block: 4096   regular file
Device: fe01h/65025d    Inode: 25766495    Links: 1
Access: (0644/-rw-r--r--)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2012-05-05 16:29:42.000000000 +0530
Modify: 2012-05-05 16:29:20.000000000 +0530
Change: 2012-05-05 16:29:21.000000000 +0530

Share on Facebook Twitter

Posted by: Vivek Gite

The author is the creator of nixCraft and a seasoned sysadmin, DevOps engineer, and a trainer for the Linux operating system/Unix shell scripting. Get the latest tutorials on SysAdmin, Linux/Unix and open source topics via RSS/XML feed or weekly email newsletter .

[Oct 23, 2019] How To Record Everything You Do In Terminal - OSTechNix

Oct 23, 2019 | www.ostechnix.com

Run the following command to start the Terminal session recording.

$ script -a my_terminal_activities

Where, -a flag is used to append the output to file or to typescript, retaining the prior contents. The above command records everything you do in the Terminal and append the output to a file called 'my_terminal_activities' and save it in your current working directory.

Sample output would be:

Script started, file is my_terminal_activities

Now, run some random Linux commands in your Terminal.

$ mkdir ostechnix
$ cd ostechnix/
$ touch hello_world.txt
$ cd ..
$ uname -r

After running all commands, end the 'script' command's session using command:

$ exit

After typing exit, you will the following output.

exit
Script done, file is my_terminal_activities

Record Everything You Do In Terminal Using Script Command In Linux

As you see, the Terminal activities have been stored in a file called 'my_terminal_activities' and saves it in the current working directory.

You can also save the Terminal activities in a file in different location like below.

$ script -a /home/ostechnix/documents/myscripts.txt

All commands will be stored in /home/ostechnix/documents/myscripts.txt file.

To view your Terminal activities, just open this file in any text editor or simply display it using the 'cat' command.

$ cat my_terminal_activities

Sample output:

Script started on 2019-10-22 12:07:37+0530
sk@ostechnix:~$ mkdir ostechnix
sk@ostechnix:~$ cd ostechnix/
sk@ostechnix:~/ostechnix$ touch hello_world.txt
sk@ostechnix:~/ostechnix$ cd ..
sk@ostechnix:~$ uname -r
5.0.0-31-generic
sk@ostechnix:~$ exit
exit

Script done on 2019-10-22 12:08:10+0530

View Terminal Activities

As you see in the above output, script command has recorded all my Terminal activities, including the start and end time of the script command. Awesome, isn't it? The reason to use script command is it's not just records the commands, but also the commands' output as well. To put this simply, Script command will record everything you do on the Terminal.

Bonus tip:

As one of our reader Mr.Alastair Montgomery mentioned in the comment section, we could setup an alias with would timestamp the recorded sessions.

Create an alias for the script command like below.

$ alias rec='script -aq ~/term.log-$(date "+%Y%m%d-%H-%M")'

Now simply enter the following command start recording the Terminal.

$ rec

Now, all your Terminal activities will be logged in a text file with timestamp, for example term.log-20191022-12-16 .

Record Terminal activities with timestamps


Suggested read:

[Oct 23, 2019] Apply Tags To Linux Commands To Easily Retrieve Them From History

Oct 23, 2019 | www.ostechnix.com

Let us take the following one-liner Linux command as an example.

$ find . -size +10M -type f -print0 | xargs -0 ls -Ssh | sort -z

For those wondering, the above command will find and list files bigger than 10 MB in the current directory and sort them by size. I admit that I couldn't remember this command. I guess some of you can't remember this command either. This is why we are going to apply a tag to such kind of commands.

To apply a tag, just type the command and add the comment ( i.e. tag) at the end of the command as shown below.

$ find . -size +10M -type f -print0 | xargs -0 ls -Ssh | sort -z #ListFilesBiggerThanXSize

Here, #ListFilesBiggerThanXSize is the tag name to the above command. Make sure you have given a space between the command and tag name. Also, please use the tag name as simple, short and clear as possible to easily remember it later. Otherwise, you may need another tool to recall the tags.

To run it again, simply use the tag name like below.

$ !? #ListFilesBiggerThanXSize

Here, the ! (Exclamation mark) and ? (Question mark) operators are used to fetch and run the command which we tagged earlier from the BASH history.

[Oct 22, 2019] Bank of America Says It Saves $2 Billion Per Year By Ignoring Amazon and Microsoft and Building Its Own Cloud Instead

Oct 22, 2019 | slashdot.org

(businessinsider.com) 121 building its own private cloud software rather than outsourcing to companies like Amazon, Microsoft, and Google. From a report: The investment, including a $350 million charge in 2017, hasn't been cheap, but it has had a striking payoff, CEO Brian Moynihan said during the company's third-quarter earnings call. He said the decision helped reduce the firm's servers to 70,000 from 200,000 and its data centers to 23 from 60, and it has resulted in $2 billion in annual infrastructure savings.

[Oct 22, 2019] Flaw In Sudo Enables Non-Privileged Users To Run Commands As Root

Notable quotes:
"... the function which converts user id into its username incorrectly treats -1, or its unsigned equivalent 4294967295, as 0, which is always the user ID of root user. ..."
Oct 22, 2019 | linux.slashdot.org

(thehackernews.com) 139 Posted by BeauHD on Monday October 14, 2019 @07:30PM from the Su-doh dept. exomondo shares a report from The Hacker News:

... ... ...

The vulnerability, tracked as CVE-2019-14287 and discovered by Joe Vennix of Apple Information Security, is more concerning because the sudo utility has been designed to let users use their own login password to execute commands as a different user without requiring their password. \

What's more interesting is that this flaw can be exploited by an attacker to run commands as root just by specifying the user ID "-1" or "4294967295."

That's because the function which converts user id into its username incorrectly treats -1, or its unsigned equivalent 4294967295, as 0, which is always the user ID of root user.

The vulnerability affects all Sudo versions prior to the latest released version 1.8.28, which has been released today.

    • Re:Not many systems vulnerable )
      mysidia ( 191772 ) #59309858)

      If you have been blessed with the power to run commands as ANY user you want, then you are still specially privileged, even though you are not fully privileged.

      Its a rare/unusual configuration to say (all, !root) --- the people using this configuration on their systems should probably KNOW there are going to exist some ways that access can be abused to ultimately circumvent the intended !root rule - If not within sudo itself, then by using sudo to get a shell as a different user UID that belongs to some person or program who DOES have root permissions, and then causing crafted code to run as that user --- For example, by installing a
      Trojanned version of the screen command and modifying files in the home directory of a legitimate root user to alias the screen command to trojanned version that will log the password the next time that Other user logs in normally and uses the sudo command.

[Oct 15, 2019] Economist's View The Opportunity Cost of Computer Programming

Oct 15, 2019 | economistsview.typepad.com

From Reuters Odd News :

Man gets the poop on outsourcing , By Holly McKenna, May 2, Reuters

Computer programmer Steve Relles has the poop on what to do when your job is outsourced to India. Relles has spent the past year making his living scooping up dog droppings as the "Delmar Dog Butler." "My parents paid for me to get a (degree) in math and now I am a pooper scooper," "I can clean four to five yards in a hour if they are close together." Relles, who lost his computer programming job about three years ago ... has over 100 clients who pay $10 each for a once-a-week cleaning of their yard.

Relles competes for business with another local company called "Scoopy Do." Similar outfits have sprung up across America, including Petbutler.net, which operates in Ohio. Relles says his business is growing by word of mouth and that most of his clients are women who either don't have the time or desire to pick up the droppings. "St. Bernard (dogs) are my favorite customers since they poop in large piles which are easy to find," Relles said. "It sure beats computer programming because it's flexible, and I get to be outside,"

[Oct 09, 2019] The gzip Recovery Toolkit

Oct 09, 2019 | www.aaronrenn.com

So you thought you had your files backed up - until it came time to restore. Then you found out that you had bad sectors and you've lost almost everything because gzip craps out 10% of the way through your archive. The gzip Recovery Toolkit has a program - gzrecover - that attempts to skip over bad data in a gzip archive. This saved me from exactly the above situation. Hopefully it will help you as well.

I'm very eager for feedback on this program . If you download and try it, I'd appreciate and email letting me know what your results were. My email is arenn@urbanophile.com . Thanks.

ATTENTION

99% of "corrupted" gzip archives are caused by transferring the file via FTP in ASCII mode instead of binary mode. Please re-transfer the file in the correct mode first before attempting to recover from a file you believe is corrupted.

Disclaimer and Warning

This program is provided AS IS with absolutely NO WARRANTY. It is not guaranteed to recover anything from your file, nor is what it does recover guaranteed to be good data. The bigger your file, the more likely that something will be extracted from it. Also keep in mind that this program gets faked out and is likely to "recover" some bad data. Everything should be manually verified.

Downloading and Installing

Note that version 0.8 contains major bug fixes and improvements. See the ChangeLog for details. Upgrading is recommended. The old version is provided in the event you run into troubles with the new release.

You need the following packages:

First, build and install zlib if necessary. Next, unpack the gzrt sources. Then cd to the gzrt directory and build the gzrecover program by typing make . Install manually by copying to the directory of your choice.

Usage

Run gzrecover on a corrupted .gz file. If you leave the filename blank, gzrecover will read from the standard input. Anything that can be read from the file will be written to a file with the same name, but with a .recovered appended (any .gz is stripped). You can override this with the -o option. The default filename when reading from the standard input is "stdin.recovered". To write recovered data to the standard output, use the -p option. (Note that -p and -o cannot be used together).

To get a verbose readout of exactly where gzrecover is finding bad bytes, use the -v option to enable verbose mode. This will probably overflow your screen with text so best to redirect the stderr stream to a file. Once gzrecover has finished, you will need to manually verify any data recovered as it is quite likely that our output file is corrupt and has some garbage data in it. Note that gzrecover will take longer than regular gunzip. The more corrupt your data the longer it takes. If your archive is a tarball, read on.

For tarballs, the tar program will choke because GNU tar cannot handle errors in the file format. Fortunately, GNU cpio (tested at version 2.6 or higher) handles corrupted files out of the box.

Here's an example:

$ ls *.gz
my-corrupted-backup.tar.gz
$ gzrecover my-corrupted-backup.tar.gz
$ ls *.recovered
my-corrupted-backup.tar.recovered
$ cpio -F my-corrupted-backup.tar.recovered -i -v

Note that newer versions of cpio can spew voluminous error messages to your terminal. You may want to redirect the stderr stream to /dev/null. Also, cpio might take quite a long while to run.

Copyright

The gzip Recovery Toolkit v0.8
Copyright (c) 2002-2013 Aaron M. Renn ( arenn@urbanophile.com )

The gzrecover program is licensed under the GNU General Public License .

[Oct 09, 2019] gzip - How can I recover files from a corrupted .tar.gz archive - Stack Overflow

Oct 09, 2019 | stackoverflow.com

15


George ,Jun 24, 2016 at 2:49

Are you sure that it is a gzip file? I would first run 'file SMS.tar.gz' to validate that.

Then I would read the The gzip Recovery Toolkit page.

JohnEye ,Oct 4, 2016 at 11:27

Recovery is possible but it depends on what caused the corruption.

If the file is just truncated, getting some partial result out is not too hard; just run

gunzip < SMS.tar.gz > SMS.tar.partial

which will give some output despite the error at the end.

If the compressed file has large missing blocks, it's basically hopeless after the bad block.

If the compressed file is systematically corrupted in small ways (e.g. transferring the binary file in ASCII mode, which smashes carriage returns and newlines throughout the file), it is possible to recover but requires quite a bit of custom programming, it's really only worth it if you have absolutely no other recourse (no backups) and the data is worth a lot of effort. (I have done it successfully.) I mentioned this scenario in a previous question .

The answers for .zip files differ somewhat, since zip archives have multiple separately-compressed members, so there's more hope (though most commercial tools are rather bogus, they eliminate warnings by patching CRCs, not by recovering good data). But your question was about a .tar.gz file, which is an archive with one big member.

,

Here is one possible scenario that we encountered. We had a tar.gz file that would not decompress, trying to unzip gave the error:
gzip -d A.tar.gz
gzip: A.tar.gz: invalid compressed data--format violated

I figured out that the file may been originally uploaded over a non binary ftp connection (we don't know for sure).

The solution was relatively simple using the unix dos2unix utility

dos2unix A.tar.gz
dos2unix: converting file A.tar.gz to UNIX format ...
tar -xvf A.tar
file1.txt
file2.txt 
....etc.

It worked! This is one slim possibility, and maybe worth a try - it may help somebody out there.

[Oct 08, 2019] Forward root email on Linux server

Oct 08, 2019 | www.reddit.com

Hi, generally I configure /etc/aliases to forward root messages to my work email address. I found this useful, because sometimes I become aware of something wrong...

I create specific email filter on my MUA to put everything with "fail" in subject in my ALERT subfolder, "update" or "upgrade" in my UPGRADE subfolder, and so on.

It is a bit annoying, because with > 50 server, there is lot of "rumor", anyway.

How do you manage that?

Thank you!

[Oct 08, 2019] I swear to god we spend 60% of our time planning our sprints, and 40% of the time doing the work, and management wonders why our true productivity has fallen through the floor...

Notable quotes:
"... Scrum is dead, long live Screm! We need to implement it immediately. We must innovate and stay ahead of the curve! level 7 ..."
"... First you scream, then you ahh. Now you can screm ..."
"... Are you saying quantum synergy coupled with block chain neutral intelligence can not be used to expedite artificial intelligence amalgamation into that will metaphor into cucumber obsession? ..."
Oct 08, 2019 | www.reddit.com

MadManMorbo 58 points · 6 days ago

We recently implemented DevOps practices, Scrum, and sprints have become the norm... I swear to god we spend 60% of our time planning our sprints, and 40% of the time doing the work, and management wonders why our true productivity has fallen through the floor... level 5

Angdrambor 26 points · 6 days ago

Let me guess - they left out the retrospectives because somebody brought up how bad they were fucking it all up? level 6

lurker_lurks 15 points · 6 days ago

Scrum is dead, long live Screm! We need to implement it immediately. We must innovate and stay ahead of the curve! level 7

JustCallMeFrij 1 point · 6 days ago

First you scream, then you ahh. Now you can screm

StormlitRadiance 1 point · 5 days ago

It consists of three managers for every engineer and they all screm all day at a different quartet of three managers and an engineer. level 6

water_mizu 7 points · 6 days ago

Are you saying quantum synergy coupled with block chain neutral intelligence can not be used to expedite artificial intelligence amalgamation into that will metaphor into cucumber obsession?

malikto44 9 points · 6 days ago

I worked at a place where the standup meetings went at least 4-6 hours each day. It was amazing how little got done there. Glad I bailed. k comments 1.0k Posted by u/bpitts2 3 days ago Rant If I go outside of process to help you for your "urgent" issue, be cool and don't abuse the relationship.

What is it with these people? Someone brought me an "urgent" request (of course there wasn't a ticket), so I said no worries, I'll help you out. Just open a ticket for me so we can track the work and document the conversation. We got that all knocked out and everyone was happy.

So a day or two later, I suddenly get an instant message for yet another "urgent" issue. ... Ok ... Open a ticket, and I'll get it assigned to one of my team members to take a look.

And a couple days later ... he's back and I'm being asked for help troubleshooting an application that we don't own. At least there's a ticket and an email thread... but wtf man.

What the heck man?

This is like when you get a free drink or dessert from your waiter. Don't keep coming back and asking for more free pie. You know damn well you're supposed to pay for pie. Be cool. I'll help you out when you're really in a tight spot, but the more you cry "urgent", the less I care about your issues.

IT folks are constantly looked at as being dicks because we force people to follow the support process, but this is exactly why we have to make them follow the process. 290 comments 833 Posted by u/SpicyTunaNinja 4 days ago Silver Let's talk about mental health and stress

Hey r/Sysadmin , please don't suffer in silence. I know the job can be very difficult at times, especially with competing objectives, tight (or impossible) deadlines, bad bosses and needy end users, but please - always remember that there are ways to manage that stress. Speaking to friends and family regularly to vent, getting a therapist, or taking time off.

Yes, you do have the ability to take personal leave/medical leave if its that bad. No, it doesn't matter what your colleagues or boss will think..and no, you are not a quitter, weak, or a loser if you take time for yourself - to heal mentally, physically or emotionally.

Don't let yourself get to the point that this one IT employee did at the Paris Police headquarters. Ended up taking the lives of multiple others, and ultimately losing his life. https://www.nbcnews.com/news/world/paris-policeman-kills-2-officers-injures-3-others-knife-attack-n1061861

EDIT: Holy Cow! Thanks for the silver and platinum kind strangers. All i wanted to do was to get some more awareness on this subject, and create a reminder that we all deserve happiness and peace of mind. A reminder that hopefully sticks with you for the days and weeks to come.

Work is just one component of life, and not to get so wrapped up and dedicate yourself to the detriment of your health. 302 comments 783 Posted by u/fresh1003 2 days ago By 2025 80% of enterprises will shutdown their data center and move to cloud...do you guys believe this?

By 2025 80% of enterprises will shutdown their data center and move to cloud...do you guys believe this? 995 comments 646 Posted by u/eternalterra 3 days ago Silver Career / Job Related The more tasks I have, the slower I become

Good morning,

We, sysadmins, have times when we don't really have nothing to do but maintenance. BUT, there are times when it seems like chaos comes out of nowhere. When I have a lot of tasks to do, I tend to get slower. The more tasks I have pending, the slower I become. I cannot avoid to start thinking about 3 or 4 different problems at the same time, and I can't focus! I only have 2 years of experiences as sysadmin.

Do you guys experience the same?

Cheers, 321 comments 482 Posted by u/proudcanadianeh 6 days ago General Discussion Cloudflare, Google and Firefox to add support for HTTP/3, shifting away from TCP

Per this article: https://www.techspot.com/news/82111-cloudflare-google-firefox-add-support-http3-shifting-away.html

Not going to lie, this is the first I have heard of http3. Anyone have any insight in what this shift is going to mean on a systems end? Is this a new protocol entirely? 265 comments 557 Posted by u/_sadme_ 8 hours ago Career / Job Related Leaving the IT world...

Hello everyone,

Have you ever wondered if your whole career will be related to IT stuff? I have, since my early childhood. It was more than 30 years ago - in the marvelous world of an 8-bit era. After writing my first code (10 PRINT " my_name " : 20 GOTO 10) I exactly knew what I wanted to do in the future. Now, after spending 18 years in this industry, which is half of my age, I'm not so sure about it.

I had plenty of time to do almost everything. I was writing software for over 100K users and I was covered in dust while drilling holes for ethernet cables in houses of our customers. I was a main network administrator for a small ISP and systems administrator for a large telecom operator. I made few websites and I was managing a team of technical support specialists. I was teaching people - on individual courses on how to use Linux and made some trainings for admins on how to troubleshoot multicast transmissions in their own networks. I was active in some Open Source communities, including running forums about one of Linux distributions (the forum was quite popular in my country) and I was punching endless Ctrl+C/Ctrl+V combos from Stack Overflow. I even fixed my aunt's computer!

And suddenly I realised that I don't want to do this any more. I've completely burnt out. It was like a snap of a finger.

During many years I've collected a wide range of skills that are (or will be) obsolete. I don't want to spend rest of my life maintaining a legacy code written in C or PHP or learning a new language which is currently on top and forcing myself to write in a coding style I don't really like. That's not all... If you think you'll enjoy setting up vlans on countless switches, you're probably wrong. If you think that managing clusters of virtual machines is an endless fun, you'll probably be disappointed. If you love the smell of a brand new blade server and the "click" sound it makes when you mount it into the rack, you'll probably get fed up with it. Sooner or later.

But there's a good side of having those skills. With skills come experience, knowledge and good premonition. And these features don't get old. Remember that!

My employer offered me a position of a project manager and I eagerly agreed to it. It means that I'm leaving the world of "hardcore IT" I'll be doing some other, less crazy stuff. I'm logging out of my console and I'll run Excel. But I'll keep all good memories from all those years. I'd like to thank all of you for doing what you're doing, because it's really amazing. Good luck! The world lies in your hands! 254 comments 450 Posted by u/remrinds 1 day ago General Discussion UPDATE: So our cloud exchange server was down for 17 hours on friday

my original post got deleted because i behaved wrongly and posted some slurs. I apologise for that.


anyway, so, my companie is using Office365 ProPlus and we migrated our on premise exchange server to cloud a while ago, and on friday last week, all of our user (1000 or so) could not access their exchange box, we are a TV broadcasting station so you can only imagine the damage when we could not use our mailing system.


initially, we opened a ticket with microsoft and they just kept us on hold for 12 hours (we are in japan so they had to communicate with US and etc which took time), and then they told us its our network infra thats wrong when we kept telling them its not. we asked them to check their envrionment at least once which they did not until 12 hours later.


in the end, it was their exchange server that was the problem, i will copy and paste the whole incident report below


Title: Can't access Exchange

User Impact: Users are unable to access the Exchange Online service.

Current status: We've determined that a recent sync between Exchange Online and Azure Active Directory (AAD) inadvertently resulted in access issues with the Exchange Online service. We've restored the affected environment and updated the Global Location Service (GLS) records, which we believe has resolved the issue. We're awaiting confirmation from your representatives that this issue is resolved.

Scope of impact: Your organization is affected by this event, and this issue impacts all users.

Start time: Friday, October 4, 2019, 4:51 AM
Root cause: A recent Service Intelligence (SI) move inadvertently resulted in access issues with the Exchange Online service.


they wont explain further than what they posted on the incident page but if anyone here is good with microsofts cloud envrionment, can anyone tell me what was the root cause of this? from what i can gather, the AAD and exchange server couldnt sync but they wont tell us what the actual problem is, what the hell is Service intelligence and how does it fix our exchange server when they updated the global location service?


any insight on these report would be more than appreciated


thanks! 444 comments 336 Posted by u/Rocco_Saint 13 hours ago KB4524148 Kills Print Spooler? Thought it was supposed to fix that issue?

I rolled out this patch this weekend to my test group and it appears that some of the workstations this was applied to are having print spooler issues.

Here's the details for the patch.

I'm in the middle of troubleshooting it now, but wanted to reach out and see if anyone else was having issues. 108 comments 316 Posted by u/GrizzlyWhosSteve 1 day ago Finally Learned Docker

I hadn't found a use case for containers in my environment so I had put off learning Docker for a while. But, I'm writing a rails app to simplify/automate some of our administrative tasks. Setting up my different dev and test environments was definitely non trivial, and I plan on onboarding another person or 2 so they can help me out and add it to their resume.

I installed Docker desktop on my Mac, wrote 2 files essentially copied from Docker's website, built it, then ran it. It took a total of 10 minutes to go from zero Docker to fully configured and running. It's really that easy to start using it. So, now I've decided to set up Kubernetes at work this week and see what I can find to do with it.

Edit: leaning towards OKD. Has anyone used it/OpenShift that wants to talk me out of it? 195 comments 189 Posted by u/Reverent 4 days ago Off Topic How to trigger a sysadmin in two words

Vendor Requirements. 401 comments 181 Posted by u/stewardson 6 days ago General Discussion Monday From Hell

Let me tell you about the Monday from hell I encountered yesterday.

I work for xyz corp which is an MSP for IT services. One of the companies we support, we'll call them abc corp .

I come in to work Monday morning and look at my alerts report from the previous day and find that all of the servers (about 12+) at abc corp are showing offline. Manager asks me to go on site to investigate, it's around the corner so nbd .

I get to the client, head over to the server room and open up the KVM console. I switch inputs and see no issues with the ESX hosts. I then switch over to the (for some reason, physical) vCenter server and find this lovely little message:

HELLO, this full-disk encryption.

E-mail: email@protonmail.com

rsrv: email@scryptmail.com

Now, I've never seen this before and it looks sus but just in case, i reboot the machine - same message . Do a quick google search and found that the server was hit with an MBR level ransomware encryption. I then quickly switched over to the server that manages the backups and found that it's also encrypted - f*ck.

At this point, I call Mr. CEO and account manager to come on site. While waiting for them, I found that the SANs had also been logged in to and had all data deleted and snapshots deleted off the datastores and the EQL volume was also encrypted - PERFECT!

At this point, I'm basically freaking out. ABC Corp is owned by a parent company who apparently also got hit however we don't manage them *phew*.

Our only saving grace at this point is the offsite backups. I log in to the server and wouldn't ya know it, I see this lovely message:

Last replication time: 6/20/2019 13:00:01

BackupsTech had a script that ran to report on replication status daily and the reports were showing that they were up to date. Obviously, they weren't so at this point we're basically f*cked.

We did eventually find out this originated from parentcompany and that the accounts used were from the old IT Manager that recently left a few weeks ago. Unfortunately, they never disabled the accounts in either domain and the account used was a domain admin account.

We're currently going through and attempting to undelete the VMFS data to regain access to the VM files. If anyone has any suggestions on this, feel free to let me know.

TL;DR - ransomware, accounts not disabled, backups deleted, f*cked. 94 comments Continue browsing in r/sysadmin Subreddit icon r/sysadmin

380k

Members

1.6k

Online

Oct 22, 2008

Cake Day A reddit dedicated to the profession of Computer System Administration. Reddit about careers press advertise blog Using Reddit help Reddit App Reddit premium Reddit gifts Directory Terms | Content policy | Privacy policy | Mod policy Reddit Inc © 2019. All rights reserved 1.2k They say, No more IT or system or server admins needed very soon... 1.2k Subreddit icon r/sysadmin • Posted by u/rdns98 6 days ago They say, No more IT or system or server admins needed very soon...

Sick and tired of listening to these so called architects and full stack developers who watch bunch of videos on YouTube and Pluralsight, find articles online. They go around workplace throwing words like containers, devops, NoOps, azure, infrastructure as code, serverless, etc, they don't understand half of the stuff. I do some of the devops tasks in our company, I understand what it takes to implement and manage these technologies. Every meeting is infested with these A holes. 1.0k comments 91% Upvoted What are your thoughts? Log in or Sign up log in sign up Sort by level 1

ntengineer 619 points · 6 days ago

Your best defense against these is to come up with non-sarcastic and quality questions to ask these people during the meeting, and watch them not have a clue how to answer them.

For example, a friend of mine worked at a smallish company, some manager really wanted to move more of their stuff into Azure including AD and Exchange environment. But they had common problems with their internet connection due to limited bandwidth and them not wanting to spend more. So during a meeting my friend asked a question something like this:

"You said on this slide that moving the AD environment and Exchange environment to Azure will save us money. Did you take into account that we will need to increase our internet speed by a factor of at least 4 in order to accommodate the increase in traffic going out to the Azure cloud? "

Of course, they hadn't. So the CEO asked my friend if he had the numbers, which he had already done his homework, and it was a significant increase in cost every month and taking into account the cost for Azure and the increase in bandwidth wiped away the manager's savings.

I know this won't work for everyone. Sometimes there is real savings in moving things to the cloud. But often times there really isn't. Calling the uneducated people out on what they see as facts can be rewarding. level 2

PuzzledSwitch 99 points · 6 days ago

my previous boss was that kind of a guy. he waited till other people were done throwing their weight around in a meeting and then calmly and politely dismantled them with facts.

no amount of corporate pressuring or bitching could ever stand up to that. level 3

themastermatt 43 points · 6 days ago

Ive been trying to do this. Problem is that everyone keeps talking all the way to the end of the meeting leaving no room for rational facts. level 4

PuzzledSwitch 33 points · 6 days ago

make a follow-up in email, then.

or, you might have to interject for a moment.

5 more replies level 3

williamfny Jack of All Trades 25 points · 6 days ago

This is my approach. I don't yell or raise my voice, I just wait. Then I start asking questions that they generally cannot answer and slowly take them apart. I don't have to be loud to get my point across. level 4

MaxHedrome 5 points · 6 days ago

Listen to this guy OP

This tactic is called "the box game". Just continuously ask them logical questions that can't be answered with their stupidity. (Box them in), let them be their own argument against themselves.

2 more replies level 2

notechno 34 points · 6 days ago

Not to mention downtime. We have two ISPs in our area. Most of our clients have both in order to keep a fail-over. However, depending on where the client is located, one ISP is fast but goes down every time it rains and the other is solid but slow. Now our only AzureAD customers are those who have so many remote workers that they throw their hands up and deal with the outages as they come. Maybe this works in Europe or South Korea, but this here 'Murica got too many internet holes. level 3

katarh 12 points · 6 days ago

Yup. If you're in a city with fiber it can probably work. If you have even one remote site, and all they have is DSL (or worse, satellite, as a few offices I once supported were literally in the woods when I worked for a timber company) then even Citrix becomes out of the question.

4 more replies

1 more reply level 2

elasticinterests 202 points · 6 days ago

Definitely this, if you know your stuff and can wrap some numbers around it you can regain control of the conversation.

I use my dad as a prime example, he was an electrical engineer for ~40 years, ended up just below board level by the time he retired. He sat in on a product demo once, the kit they were showing off would speed up jointing cable in the road by 30 minutes per joint. My dad asked 3 questions and shut them down:

"how much will it cost to supply all our jointing teams?" £14million

"how many joints do our teams complete each day?" (this they couldn't answer so my dad helped them out) 3

"So are we going to tell the jointers that they get an extra hour and a half hour lunch break or a pay cut?"

Room full of executives that had been getting quite excited at this awesome new investment were suddenly much more interested in showing these guys the door and getting to their next meeting. level 3

Cutriss '); DROP TABLE memes;-- 61 points · 6 days ago

I'm confused a bit by your story. Let's assume they work 8-hour days and so the jointing takes 2.66 hours per operation.

This enhancement will cut that down to 2.16 hours. That's awfully close to enabling a team to increase jointing-per-day from 3 to 4.

That's nearly a 33% increase in productivity. Factoring in overhead it probably is slightly higher.

Is there some reason the workers can't do more than 3 in a day? level 4

slickeddie Sysadmin 87 points · 6 days ago

I think they did 3 as that was the workload so being able to do 4 isn't relevant if there isn't a fourth to do.

That's what I get out of his story anyway.

And also if it was going to be 14 million in costs to equip everyone, the savings have to be there. If adding 1 unit of productivity per day didn't save 14 million in a year or two, it's not really worth it. level 5

Cutriss '); DROP TABLE memes;-- 40 points · 6 days ago

That was basically what I figured was the missing piece - the logistical inability to process 4 units.

As far as the RoI, I had to assume that the number of teams involved and their operational costs had already factored into whether or not 14m was even a price anyone could consider paying. In other words, by virtue of the meeting even happening I figured that the initial costing had not already been laughed out of the room, but perhaps that's a bit too much of an assumption to make. level 6

beer_kimono 14 points · 6 days ago

In my limited experience they slow roll actually pricing anything. Of course physical equipment pricing might be more straightforward, which would explain why his dad got a number instead of a discussion of licensing options. level 5

Lagkiller 7 points · 6 days ago

And also if it was going to be 14 million in costs to equip everyone, the savings have to be there. If adding 1 unit of productivity per day didn't save 14 million in a year or two, it's not really worth it.

That entirely depends. If you have 10 people producing 3 joints a day, with this new kit you could reduce your headcount by 2 and still produce the same, or take on additional workload and increase your production. Not to mention that you don't need to equip everyone with these kits either, you could save them for the projects which needed more daily production thus saving money on the kits and increasing production on an as needed basis.

They story is missing a lot of specifics and while it sounds great, I'm quite certain there was likely a business case to be made.

5 more replies level 4

Standardly 14 points · 6 days ago

He's saying they get 3 done in a day, and the product would save them 30 minutes per joint. That's an hour and a half saved per day, not even enough time to finish a fourth joint, hence the "so do i just give my workers an extra 30 min on lunch"? He just worded it all really weird. level 4

elasticinterests 11 points · 6 days ago

Your ignoring travel time in there, there are also factors to do with outside contractors carrying out the digging and reinstatement works and getting sign off to actually dig the hole in the road in the first place.

There is also the possibility I'm remembering the time wrong... it's been a while! level 5

Cutriss '); DROP TABLE memes;-- 10 points · 6 days ago

Travel time actually works in my favour. If it takes more time to go from job to job, then the impact of the enhancement is magnified because the total time per job shrinks. level 4

wildcarde815 Jack of All Trades 8 points · 6 days ago

Id bet because nobody wants to pay overtime.

1 more reply level 3

say592 4 points · 6 days ago

That seems like a poor example. Why would you ignore efficiency improvements just to give your workers something to do? Why not find another task for them or figure out a way to consolidate the teams some? We fight this same mentality on our manufacturing floors, the idea that if we automate a process someone will lose their job or we wont have enough work for everyone. Its never the case. However, because of the automation improvements we have done in the last 10 years, we are doing 2.5x the output with only a 10% increase in the total number of laborers.

So maybe in your example for a time they would have nothing for these people to do. Thats a management problem. Have them come back and sweep the floor or wash their trucks. Eventually you will be at a point where there is more work, and that added efficiency will save you from needing to put more crews on the road.

2 more replies level 2

SithLordAJ 75 points · 6 days ago

Calling the uneducated people out on what they see as facts can be rewarding.

I wouldnt call them all uneducated. I think what they are is basically brainwashed. They constantly hear from the sales teams of vendors like Microsoft pitching them the idea of moving everything to Azure.

They do not hear the cons. They do not hear from the folks who know their environment and would know if something is a good fit. At least, they dont hear it enough, and they never see it first hand.

Now, I do think this is their fault... they need to seek out that info more, weigh things critically, and listen to what's going on with their teams more. Isolation from the team is their own doing.

After long enough standing on the edge and only hearing "jump!!", something stupid happens. level 3

AquaeyesTardis 18 points · 6 days ago

Apart from performance, what would be some of the downsides of containers? level 4

ztherion Programmer/Infrastructure/Linux 51 points · 6 days ago

There's little downside to containers by themselves. They're just a method of sandboxing processes and packaging a filesystem as a distributable image. From a performance perspective the impact is near negligible (unless you're doing some truly intensive disk I/O).

What can be problematic is taking a process that was designed to run on exactly n dedicated servers and converting it to a modern 2 to n autoscaling deployment that shares hosting with other apps on n a platform like Kubernetes. It's a significant challenge that requires a lot of expertise and maintenance, so there needs to be a clear business advantage to justify hiring at least one additional full time engineer to deal with it. level 5

AirFell85 11 points · 6 days ago

ELI5:

More logistical layers require more engineers to support.

1 more reply

3 more replies level 4

justabofh 33 points · 6 days ago

Containers are great for stateless stuff. So your webservers/application servers can be shoved into containers. Think of containers as being the modern version of statically linked binaries or fat applications. Static binaries have the problem that any security vulnerability requires a full rebuild of the application, and that problem is escalated in containers (where you might not even know that a broken library exists)

If you are using the typical business application, you need one or more storage components for data which needs to be available, possibly changed and access controlled.

Containers are a bad fit for stateful databases, or any stateful component, really.

Containers also enable microservices, which are great ideas at a certain organisation size (if you aren't sure you need microservices, just use a simple monolithic architecture). The problem with microservices is that you replace complexity in your code with complexity in the communications between the various components, and that is harder to see and debug. level 5

Untgradd 6 points · 6 days ago

Containers are fine for stateful services -- you can manage persistence at the storage layer the same way you would have to manage it if you were running the process directly on the host.

6 more replies level 5

malikto44 5 points · 6 days ago

Backing up containers can be a pain, so you don't want to use them for valuable data unless the data is stored elsewhere, like a database or even a file server.

For spinning up stateless applications to take workload behind a load balancer, containers are excellent.

9 more replies

33 more replies level 3

malikto44 3 points · 6 days ago

The problem is that there is an overwhelming din from vendors. Everybody and their brother, sister, mother, uncle, cousin, dog, cat, and gerbil is trying to sell you some pay-by-the-month cloud "solution".

The reason is that the cloud forces people into monthly payments, which is a guarenteed income for companies, but costs a lot more in the long run, and if something happens and one can't make the payments, business is halted, ensuring that bankruptcies hit hard and fast. Even with the mainframe, a company could limp along without support for a few quarters until they could get cash flow enough.

If we have a serious economic downturn, the fact that businesses will be completely shuttered if they can't afford their AWS bill just means fewer companies can limp along when the economy is bad, which will intensify a downturn.

1 more reply

3 more replies level 2

wildcarde815 Jack of All Trades 12 points · 6 days ago

Also if you can't work without cloud access you better have a second link. level 2

pottertown 10 points · 6 days ago

Our company viewed the move to Azure less as a cost savings measure and more of a move towards agility and "right now" sizing of our infrastructure.

Your point is very accurate, as an example our location is wholly incapable of moving moving much to the cloud due to half of us being connnected via satellite network and the other half being bent over the barrel by the only ISP in town. level 2

_The_Judge 27 points · 6 days ago

I'm sorry, but I find management these days around tech wholly inadequate. The idea that you can get an MBA and manage shit you have no idea about is absurd and just wastes everyone elses time for them to constantly ELI5 so manager can do their job effectively. level 2

laserdicks 57 points · 6 days ago

Calling the uneducated people out on what they see as facts can be rewarding

Aaand political suicide in a corporate environment. Instead I use the following:

"I love this idea! We've actually been looking into a similar solution however we weren't able to overcome some insurmountable cost sinkholes (remember: nothing is impossible; just expensive). Will this idea require an increase in internet speed to account for the traffic going to the azure cloud?" level 3

lokko12 71 points · 6 days ago

Will this idea require an increase in internet speed to account for the traffic going to the azure cloud?

No.

...then people rent on /r/sysadmin about stupid investments and say "but i told them". level 4

HORACE-ENGDAHL Jack of All Trades 61 points · 6 days ago

This exactly, you can't compromise your own argument to the extent that it's easy to shoot down in the name of giving your managers a way of saving face, and if you deliver the facts after that sugar coating it will look even worse, as it will be interpreted as you setting them up to look like idiots. Being frank, objective and non-blaming is always the best route. level 4

linuxdragons 13 points · 6 days ago

Yeah, this is a terrible example. If I were his manager I would be starting the paperwork trail after that meeting.

6 more replies level 3

messburg 61 points · 6 days ago

I think it's quite an american thing to do it so enthusiastically; to hurt no one, but the result is so condescending. It must be annoying to walk on egg shells to survive a day in the office.


And this is not a rant against soft skills in IT, at all. level 4

vagrantprodigy07 13 points · 6 days ago

It is definitely annoying. level 4

widowhanzo 27 points · 6 days ago
· edited 6 days ago

We work with Americans and they're always so positive, it's kinda annoying. They enthusiastically say "This is very interesting" when in reality it sucks and they know it.

Another less professional example, one of my (non-american) co-workers always wants to go out for coffee (while we have free and better coffee in the office), and the American coworker is always nice like "I'll go with you but I'm not having any" and I just straight up reply "No. I'm not paying for shitty coffee, I'll make a brew in the office". And that's that. Sometimes telling it as it is makes the whole conversation much shorter :D level 5

superkp 42 points · 6 days ago

Maybe the american would appreciate the break and a chance to spend some time with a coworker away from screens, but also doesn't want the shit coffee?

Sounds quite pleasant, honestly. level 6

egamma Sysadmin 39 points · 6 days ago

Yes. "Go out for coffee" is code for "leave the office so we can complain about management/company/customer/Karen". Some conversations shouldn't happen inside the building. level 7

auru21 5 points · 6 days ago

And complain about that jerk who never joins them

1 more reply level 6

Adobe_Flesh 6 points · 6 days ago

Inferior American - please compute this - the sole intention was consumption of coffee. Therefore due to existence of coffee in office, trip to coffee shop is not sane. Resistance to my reasoning is futile. level 7

superkp 1 point · 6 days ago

Coffee is not the sole intention. If that was the case, then there wouldn't have been an invitation.

Another intention would be the social aspect - which americans are known to love, especially favoring it over doing any actual work in the context of a typical 9-5 corporate job.

I've effectively resisted your reasoning, therefore [insert ad hominem insult about inferior logic].

5 more replies level 5

ITaggie Tier II Support/Linux Admin 10 points · 6 days ago

I mean, I've taken breaks just to get away from the screens for a little while. They might just like being around you.

1 more reply

11 more replies level 3

tastyratz 7 points · 6 days ago

There is still value to tact in your delivery, but, don't slit your own throat. Remind them they hired you for a reason.

"I appreciate the ideas being raised here by management for cost savings and it certainly merits a discussion point. As a business, we have many challenges and one of them includes controlling costs. Bridging these conversations with subject matter experts and management this early can really help fill out the picture. I'd like to try to provide some of that value to the conversation here" level 3

renegadecanuck 2 points · 6 days ago

If it's political suicide to point out potential downsides, then you need to work somewhere else.

Especially since your response, in my experience, won't get anything done (people will either just say "no, that's not an issue", or they'll find your tone really condescending) and will just piss people off.

I worked with someone like that would would always be super passive aggressive in how she brought things up and it pissed me off to no end, because it felt less like bringing up potential issues and more like being belittling. level 4

laserdicks 1 point · 4 days ago

Agreed, but I'm too early on in my career to make that jump. level 3

A_A_A_U_U_U 1 point · 6 days ago

Feels good to get to a point in my career where I can call people our whenever the hell I feel like it. I've got recruiters banging down my door, I couldn't swing a stick without hitting a job offer or three.

Of course you I'm not suggesting problem be combative for no reason and to be tactful about it but if you don't call out fools like that then you're being negligent in your duties. level 4

adisor19 3 points · 6 days ago

This. Current IT market is in our advantage. Say it like it is and if they don't like it, GTFO.

14 more replies level 1

DragonDrew Jack of All Trades 777 points · 6 days ago

"I am resolute in my ability to elevate this collaborative, forward-thinking team into the revenue powerhouse that I believe it can be. We will transition into a DevOps team specialising in migrating our existing infrastructure entirely to code and go completely serverless!" - CFO that outsources IT level 2

OpenScore Sysadmin 529 points · 6 days ago

"We will utilize Artificial Intelligence, machine learning, Cloud technologies, python, data science and blockchain to achieve business value" level 3

omfgitzfear 472 points · 6 days ago

We're gonna be AGILE level 4

whetu 113 points · 6 days ago

Synergy. level 5

Erok2112 92 points · 6 days ago
Gold

Weird Al even wrote a song about it!

https://www.youtube.com/watch?v=GyV_UG60dD4 level 6

uptimefordays Netadmin 32 points · 6 days ago

It's so good, I hate it. level 7

Michelanvalo 31 points · 6 days ago

I love Al, I've seen him in concert a number of times, Alapalooza was the first CD I ever opened, I own hundreds of dollars of merchandise.

I cannot stand this song because it drives me insane to hear all this corporate shit in one 4:30 space.

4 more replies

8 more replies level 5

geoff1210 9 points · 6 days ago

I can't attend keynotes without this playing in the back of my head

17 more replies level 4

MadManMorbo 58 points · 6 days ago

We recently implemented DevOps practices, Scrum, and sprints have become the norm... I swear to god we spend 60% of our time planning our sprints, and 40% of the time doing the work, and management wonders why our true productivity has fallen through the floor... level 5

Angdrambor 26 points · 6 days ago

Let me guess - they left out the retrospectives because somebody brought up how bad they were fucking it all up? level 6

ValensEtVolens 1 point · 6 days ago

Those should be fairly short too. But how do you improve if you don't apply lessons learned?

Glad I work for a top IT company.
level 5
StormlitRadiance 23 points · 6 days ago

If you spend three whole days out of every five in planning meetings, this is a problem with your meeting planners, not with screm. If these people stay in charge, you'll be stuck in planning hell no matter what framework or buzzwords they try to fling around. level 6

lurker_lurks 15 points · 6 days ago

Scrum is dead, long live Screm! We need to implement it immediately. We must innovate and stay ahead of the curve! level 7

Solaris17 Sysadmin 3 points · 6 days ago

My last company used screm. No 3 day meeting events, 20min of loose direction and we were off to the races. We all came back with parts to different projects. SYNERGY. level 7

JustCallMeFrij 1 point · 6 days ago

First you scream, then you ahh. Now you can screm level 8

lurker_lurks 4 points · 6 days ago

You screm, I screm, we all screm for ice crem. level 7

StormlitRadiance 1 point · 5 days ago

It consists of three managers for every engineer and they all screm all day at a different quartet of three managers and an engineer. level 6

water_mizu 7 points · 6 days ago

Are you saying quantum synergy coupled with block chain neutral intelligence can not be used to expedite artificial intelligence amalgamation into that will metaphor into cucumber obsession?

3 more replies level 5

malikto44 9 points · 6 days ago

I worked at a place where the standup meetings went at least 4-6 hours each day. It was amazing how little got done there. Glad I bailed.

7 more replies level 4

opmrcrab 23 points · 6 days ago

fr agile, FTFY :P level 4

ChristopherBurr 20 points · 6 days ago

Haha, we just fired our Agil scrum masters. Turns out, they couldn't make development faster or streamlined.

I was so tired of seeing all the colored post it notes and white boards set up everywhere.

JasonHenley 8 points · 6 days ago

We prefer to call them Scrum Lords here. level 4

Mr-Shank 85 points · 6 days ago

Agile is cancer...

I understand what it takes to implement and manage these technologies. Every meeting is infested with these A holes. 1.0k comments 1.0k Posted by u/bpitts2 3 days ago Rant If I go outside of process to help you for your "urgent" issue, be cool and don't abuse the relationship.

What is it with these people? Someone brought me an "urgent" request (of course there wasn't a ticket), so I said no worries, I'll help you out. Just open a ticket for me so we can track the work and document the conversation. We got that all knocked out and everyone was happy.

So a day or two later, I suddenly get an instant message for yet another "urgent" issue. ... Ok ... Open a ticket, and I'll get it assigned to one of my team members to take a look.

And a couple days later ... he's back and I'm being asked for help troubleshooting an application that we don't own. At least there's a ticket and an email thread... but wtf man.

What the heck man?

This is like when you get a free drink or dessert from your waiter. Don't keep coming back and asking for more free pie. You know damn well you're supposed to pay for pie. Be cool. I'll help you out when you're really in a tight spot, but the more you cry "urgent", the less I care about your issues.

IT folks are constantly looked at as being dicks because we force people to follow the support process, but this is exactly why we have to make them follow the process. 290 comments 833 Posted by u/SpicyTunaNinja 4 days ago Silver Let's talk about mental health and stress

Hey r/Sysadmin , please don't suffer in silence. I know the job can be very difficult at times, especially with competing objectives, tight (or impossible) deadlines, bad bosses and needy end users, but please - always remember that there are ways to manage that stress. Speaking to friends and family regularly to vent, getting a therapist, or taking time off.

Yes, you do have the ability to take personal leave/medical leave if its that bad. No, it doesn't matter what your colleagues or boss will think..and no, you are not a quitter, weak, or a loser if you take time for yourself - to heal mentally, physically or emotionally.

Don't let yourself get to the point that this one IT employee did at the Paris Police headquarters. Ended up taking the lives of multiple others, and ultimately losing his life. https://www.nbcnews.com/news/world/paris-policeman-kills-2-officers-injures-3-others-knife-attack-n1061861

EDIT: Holy Cow! Thanks for the silver and platinum kind strangers. All i wanted to do was to get some more awareness on this subject, and create a reminder that we all deserve happiness and peace of mind. A reminder that hopefully sticks with you for the days and weeks to come.

Work is just one component of life, and not to get so wrapped up and dedicate yourself to the detriment of your health. 302 comments 783 Posted by u/fresh1003 2 days ago By 2025 80% of enterprises will shutdown their data center and move to cloud...do you guys believe this?

By 2025 80% of enterprises will shutdown their data center and move to cloud...do you guys believe this? 995 comments 646 Posted by u/eternalterra 3 days ago Silver Career / Job Related The more tasks I have, the slower I become

Good morning,

We, sysadmins, have times when we don't really have nothing to do but maintenance. BUT, there are times when it seems like chaos comes out of nowhere. When I have a lot of tasks to do, I tend to get slower. The more tasks I have pending, the slower I become. I cannot avoid to start thinking about 3 or 4 different problems at the same time, and I can't focus! I only have 2 years of experiences as sysadmin.

Do you guys experience the same?

Cheers, 321 comments 482 Posted by u/proudcanadianeh 6 days ago General Discussion Cloudflare, Google and Firefox to add support for HTTP/3, shifting away from TCP

Per this article: https://www.techspot.com/news/82111-cloudflare-google-firefox-add-support-http3-shifting-away.html

Not going to lie, this is the first I have heard of http3. Anyone have any insight in what this shift is going to mean on a systems end? Is this a new protocol entirely? 265 comments 557 Posted by u/_sadme_ 8 hours ago Career / Job Related Leaving the IT world...

Hello everyone,

Have you ever wondered if your whole career will be related to IT stuff? I have, since my early childhood. It was more than 30 years ago - in the marvelous world of an 8-bit era. After writing my first code (10 PRINT " my_name " : 20 GOTO 10) I exactly knew what I wanted to do in the future. Now, after spending 18 years in this industry, which is half of my age, I'm not so sure about it.

I had plenty of time to do almost everything. I was writing software for over 100K users and I was covered in dust while drilling holes for ethernet cables in houses of our customers. I was a main network administrator for a small ISP and systems administrator for a large telecom operator. I made few websites and I was managing a team of technical support specialists. I was teaching people - on individual courses on how to use Linux and made some trainings for admins on how to troubleshoot multicast transmissions in their own networks. I was active in some Open Source communities, including running forums about one of Linux distributions (the forum was quite popular in my country) and I was punching endless Ctrl+C/Ctrl+V combos from Stack Overflow. I even fixed my aunt's computer!

And suddenly I realised that I don't want to do this any more. I've completely burnt out. It was like a snap of a finger.

During many years I've collected a wide range of skills that are (or will be) obsolete. I don't want to spend rest of my life maintaining a legacy code written in C or PHP or learning a new language which is currently on top and forcing myself to write in a coding style I don't really like. That's not all... If you think you'll enjoy setting up vlans on countless switches, you're probably wrong. If you think that managing clusters of virtual machines is an endless fun, you'll probably be disappointed. If you love the smell of a brand new blade server and the "click" sound it makes when you mount it into the rack, you'll probably get fed up with it. Sooner or later.

But there's a good side of having those skills. With skills come experience, knowledge and good premonition. And these features don't get old. Remember that!

My employer offered me a position of a project manager and I eagerly agreed to it. It means that I'm leaving the world of "hardcore IT" I'll be doing some other, less crazy stuff. I'm logging out of my console and I'll run Excel. But I'll keep all good memories from all those years. I'd like to thank all of you for doing what you're doing, because it's really amazing. Good luck! The world lies in your hands! 254 comments 450 Posted by u/remrinds 1 day ago General Discussion UPDATE: So our cloud exchange server was down for 17 hours on friday

my original post got deleted because i behaved wrongly and posted some slurs. I apologise for that.


anyway, so, my companie is using Office365 ProPlus and we migrated our on premise exchange server to cloud a while ago, and on friday last week, all of our user (1000 or so) could not access their exchange box, we are a TV broadcasting station so you can only imagine the damage when we could not use our mailing system.


initially, we opened a ticket with microsoft and they just kept us on hold for 12 hours (we are in japan so they had to communicate with US and etc which took time), and then they told us its our network infra thats wrong when we kept telling them its not. we asked them to check their envrionment at least once which they did not until 12 hours later.


in the end, it was their exchange server that was the problem, i will copy and paste the whole incident report below


Title: Can't access Exchange

User Impact: Users are unable to access the Exchange Online service.

Current status: We've determined that a recent sync between Exchange Online and Azure Active Directory (AAD) inadvertently resulted in access issues with the Exchange Online service. We've restored the affected environment and updated the Global Location Service (GLS) records, which we believe has resolved the issue. We're awaiting confirmation from your representatives that this issue is resolved.

Scope of impact: Your organization is affected by this event, and this issue impacts all users.

Start time: Friday, October 4, 2019, 4:51 AM
Root cause: A recent Service Intelligence (SI) move inadvertently resulted in access issues with the Exchange Online service.


they wont explain further than what they posted on the incident page but if anyone here is good with microsofts cloud envrionment, can anyone tell me what was the root cause of this? from what i can gather, the AAD and exchange server couldnt sync but they wont tell us what the actual problem is, what the hell is Service intelligence and how does it fix our exchange server when they updated the global location service?


any insight on these report would be more than appreciated


thanks! 444 comments 336 Posted by u/Rocco_Saint 13 hours ago KB4524148 Kills Print Spooler? Thought it was supposed to fix that issue?

I rolled out this patch this weekend to my test group and it appears that some of the workstations this was applied to are having print spooler issues.

Here's the details for the patch.

I'm in the middle of troubleshooting it now, but wanted to reach out and see if anyone else was having issues. 108 comments 316 Posted by u/GrizzlyWhosSteve 1 day ago Finally Learned Docker

I hadn't found a use case for containers in my environment so I had put off learning Docker for a while. But, I'm writing a rails app to simplify/automate some of our administrative tasks. Setting up my different dev and test environments was definitely non trivial, and I plan on onboarding another person or 2 so they can help me out and add it to their resume.

I installed Docker desktop on my Mac, wrote 2 files essentially copied from Docker's website, built it, then ran it. It took a total of 10 minutes to go from zero Docker to fully configured and running. It's really that easy to start using it. So, now I've decided to set up Kubernetes at work this week and see what I can find to do with it.

Edit: leaning towards OKD. Has anyone used it/OpenShift that wants to talk me out of it? 195 comments 189 Posted by u/Reverent 4 days ago Off Topic How to trigger a sysadmin in two words

Vendor Requirements. 401 comments 181 Posted by u/stewardson 6 days ago General Discussion Monday From Hell

Let me tell you about the Monday from hell I encountered yesterday.

I work for xyz corp which is an MSP for IT services. One of the companies we support, we'll call them abc corp .

I come in to work Monday morning and look at my alerts report from the previous day and find that all of the servers (about 12+) at abc corp are showing offline. Manager asks me to go on site to investigate, it's around the corner so nbd .

I get to the client, head over to the server room and open up the KVM console. I switch inputs and see no issues with the ESX hosts. I then switch over to the (for some reason, physical) vCenter server and find this lovely little message:

HELLO, this full-disk encryption.

E-mail: email@protonmail.com

rsrv: email@scryptmail.com

Now, I've never seen this before and it looks sus but just in case, i reboot the machine - same message . Do a quick google search and found that the server was hit with an MBR level ransomware encryption. I then quickly switched over to the server that manages the backups and found that it's also encrypted - f*ck.

At this point, I call Mr. CEO and account manager to come on site. While waiting for them, I found that the SANs had also been logged in to and had all data deleted and snapshots deleted off the datastores and the EQL volume was also encrypted - PERFECT!

At this point, I'm basically freaking out. ABC Corp is owned by a parent company who apparently also got hit however we don't manage them *phew*.

Our only saving grace at this point is the offsite backups. I log in to the server and wouldn't ya know it, I see this lovely message:

Last replication time: 6/20/2019 13:00:01

BackupsTech had a script that ran to report on replication status daily and the reports were showing that they were up to date. Obviously, they weren't so at this point we're basically f*cked.

We did eventually find out this originated from parentcompany and that the accounts used were from the old IT Manager that recently left a few weeks ago. Unfortunately, they never disabled the accounts in either domain and the account used was a domain admin account.

We're currently going through and attempting to undelete the VMFS data to regain access to the VM files. If anyone has any suggestions on this, feel free to let me know.

TL;DR - ransomware, accounts not disabled, backups deleted, f*cked. 94 comments Continue browsing in r/sysadmin Subreddit icon r/sysadmin

380k

Members

1.6k

Online

Oct 22, 2008

Cake Day A reddit dedicated to the profession of Computer System Administration. Reddit about careers press advertise blog Using Reddit help Reddit App Reddit premium Reddit gifts Directory Terms | Content policy | Privacy policy | Mod policy Reddit Inc © 2019. All rights reserved 1.2k They say, No more IT or system or server admins needed very soon... 1.2k Subreddit icon r/sysadmin • Posted by u/rdns98 6 days ago They say, No more IT or system or server admins needed very soon...

Sick and tired of listening to these so called architects and full stack developers who watch bunch of videos on YouTube and Pluralsight, find articles online. They go around workplace throwing words like containers, devops, NoOps, azure, infrastructure as code, serverless, etc, they don't understand half of the stuff. I do some of the devops tasks in our company, I understand what it takes to implement and manage these technologies. Every meeting is infested with these A holes. 1.0k comments 91% Upvoted What are your thoughts? Log in or Sign up log in sign up Sort by level 1

ntengineer 619 points · 6 days ago

Your best defense against these is to come up with non-sarcastic and quality questions to ask these people during the meeting, and watch them not have a clue how to answer them.

For example, a friend of mine worked at a smallish company, some manager really wanted to move more of their stuff into Azure including AD and Exchange environment. But they had common problems with their internet connection due to limited bandwidth and them not wanting to spend more. So during a meeting my friend asked a question something like this:

"You said on this slide that moving the AD environment and Exchange environment to Azure will save us money. Did you take into account that we will need to increase our internet speed by a factor of at least 4 in order to accommodate the increase in traffic going out to the Azure cloud? "

Of course, they hadn't. So the CEO asked my friend if he had the numbers, which he had already done his homework, and it was a significant increase in cost every month and taking into account the cost for Azure and the increase in bandwidth wiped away the manager's savings.

I know this won't work for everyone. Sometimes there is real savings in moving things to the cloud. But often times there really isn't. Calling the uneducated people out on what they see as facts can be rewarding. level 2

PuzzledSwitch 99 points · 6 days ago

my previous boss was that kind of a guy. he waited till other people were done throwing their weight around in a meeting and then calmly and politely dismantled them with facts.

no amount of corporate pressuring or bitching could ever stand up to that. level 3

themastermatt 43 points · 6 days ago

Ive been trying to do this. Problem is that everyone keeps talking all the way to the end of the meeting leaving no room for rational facts. level 4

PuzzledSwitch 33 points · 6 days ago

make a follow-up in email, then.

or, you might have to interject for a moment.

5 more replies level 3

williamfny Jack of All Trades 25 points · 6 days ago

This is my approach. I don't yell or raise my voice, I just wait. Then I start asking questions that they generally cannot answer and slowly take them apart. I don't have to be loud to get my point across. level 4

MaxHedrome 5 points · 6 days ago

Listen to this guy OP

This tactic is called "the box game". Just continuously ask them logical questions that can't be answered with their stupidity. (Box them in), let them be their own argument against themselves.

2 more replies level 2

notechno 34 points · 6 days ago

Not to mention downtime. We have two ISPs in our area. Most of our clients have both in order to keep a fail-over. However, depending on where the client is located, one ISP is fast but goes down every time it rains and the other is solid but slow. Now our only AzureAD customers are those who have so many remote workers that they throw their hands up and deal with the outages as they come. Maybe this works in Europe or South Korea, but this here 'Murica got too many internet holes. level 3

katarh 12 points · 6 days ago

Yup. If you're in a city with fiber it can probably work. If you have even one remote site, and all they have is DSL (or worse, satellite, as a few offices I once supported were literally in the woods when I worked for a timber company) then even Citrix becomes out of the question.

4 more replies

1 more reply level 2

elasticinterests 202 points · 6 days ago

Definitely this, if you know your stuff and can wrap some numbers around it you can regain control of the conversation.

I use my dad as a prime example, he was an electrical engineer for ~40 years, ended up just below board level by the time he retired. He sat in on a product demo once, the kit they were showing off would speed up jointing cable in the road by 30 minutes per joint. My dad asked 3 questions and shut them down:

"how much will it cost to supply all our jointing teams?" £14million

"how many joints do our teams complete each day?" (this they couldn't answer so my dad helped them out) 3

"So are we going to tell the jointers that they get an extra hour and a half hour lunch break or a pay cut?"

Room full of executives that had been getting quite excited at this awesome new investment were suddenly much more interested in showing these guys the door and getting to their next meeting. level 3

Cutriss '); DROP TABLE memes;-- 61 points · 6 days ago

I'm confused a bit by your story. Let's assume they work 8-hour days and so the jointing takes 2.66 hours per operation.

This enhancement will cut that down to 2.16 hours. That's awfully close to enabling a team to increase jointing-per-day from 3 to 4.

That's nearly a 33% increase in productivity. Factoring in overhead it probably is slightly higher.

Is there some reason the workers can't do more than 3 in a day? level 4

slickeddie Sysadmin 87 points · 6 days ago

I think they did 3 as that was the workload so being able to do 4 isn't relevant if there isn't a fourth to do.

That's what I get out of his story anyway.

And also if it was going to be 14 million in costs to equip everyone, the savings have to be there. If adding 1 unit of productivity per day didn't save 14 million in a year or two, it's not really worth it. level 5

Cutriss '); DROP TABLE memes;-- 40 points · 6 days ago

That was basically what I figured was the missing piece - the logistical inability to process 4 units.

As far as the RoI, I had to assume that the number of teams involved and their operational costs had already factored into whether or not 14m was even a price anyone could consider paying. In other words, by virtue of the meeting even happening I figured that the initial costing had not already been laughed out of the room, but perhaps that's a bit too much of an assumption to make. level 6

beer_kimono 14 points · 6 days ago

In my limited experience they slow roll actually pricing anything. Of course physical equipment pricing might be more straightforward, which would explain why his dad got a number instead of a discussion of licensing options. level 5

Lagkiller 7 points · 6 days ago

And also if it was going to be 14 million in costs to equip everyone, the savings have to be there. If adding 1 unit of productivity per day didn't save 14 million in a year or two, it's not really worth it.

That entirely depends. If you have 10 people producing 3 joints a day, with this new kit you could reduce your headcount by 2 and still produce the same, or take on additional workload and increase your production. Not to mention that you don't need to equip everyone with these kits either, you could save them for the projects which needed more daily production thus saving money on the kits and increasing production on an as needed basis.

They story is missing a lot of specifics and while it sounds great, I'm quite certain there was likely a business case to be made.

5 more replies level 4

Standardly 14 points · 6 days ago

He's saying they get 3 done in a day, and the product would save them 30 minutes per joint. That's an hour and a half saved per day, not even enough time to finish a fourth joint, hence the "so do i just give my workers an extra 30 min on lunch"? He just worded it all really weird. level 4

elasticinterests 11 points · 6 days ago

Your ignoring travel time in there, there are also factors to do with outside contractors carrying out the digging and reinstatement works and getting sign off to actually dig the hole in the road in the first place.

There is also the possibility I'm remembering the time wrong... it's been a while! level 5

Cutriss '); DROP TABLE memes;-- 10 points · 6 days ago

Travel time actually works in my favour. If it takes more time to go from job to job, then the impact of the enhancement is magnified because the total time per job shrinks. level 4

wildcarde815 Jack of All Trades 8 points · 6 days ago

Id bet because nobody wants to pay overtime.

1 more reply level 3

say592 4 points · 6 days ago

That seems like a poor example. Why would you ignore efficiency improvements just to give your workers something to do? Why not find another task for them or figure out a way to consolidate the teams some? We fight this same mentality on our manufacturing floors, the idea that if we automate a process someone will lose their job or we wont have enough work for everyone. Its never the case. However, because of the automation improvements we have done in the last 10 years, we are doing 2.5x the output with only a 10% increase in the total number of laborers.

So maybe in your example for a time they would have nothing for these people to do. Thats a management problem. Have them come back and sweep the floor or wash their trucks. Eventually you will be at a point where there is more work, and that added efficiency will save you from needing to put more crews on the road.

2 more replies level 2

SithLordAJ 75 points · 6 days ago

Calling the uneducated people out on what they see as facts can be rewarding.

I wouldnt call them all uneducated. I think what they are is basically brainwashed. They constantly hear from the sales teams of vendors like Microsoft pitching them the idea of moving everything to Azure.

They do not hear the cons. They do not hear from the folks who know their environment and would know if something is a good fit. At least, they dont hear it enough, and they never see it first hand.

Now, I do think this is their fault... they need to seek out that info more, weigh things critically, and listen to what's going on with their teams more. Isolation from the team is their own doing.

After long enough standing on the edge and only hearing "jump!!", something stupid happens. level 3

AquaeyesTardis 18 points · 6 days ago

Apart from performance, what would be some of the downsides of containers? level 4

ztherion Programmer/Infrastructure/Linux 51 points · 6 days ago

There's little downside to containers by themselves. They're just a method of sandboxing processes and packaging a filesystem as a distributable image. From a performance perspective the impact is near negligible (unless you're doing some truly intensive disk I/O).

What can be problematic is taking a process that was designed to run on exactly n dedicated servers and converting it to a modern 2 to n autoscaling deployment that shares hosting with other apps on n a platform like Kubernetes. It's a significant challenge that requires a lot of expertise and maintenance, so there needs to be a clear business advantage to justify hiring at least one additional full time engineer to deal with it. level 5

AirFell85 11 points · 6 days ago

ELI5:

More logistical layers require more engineers to support.

1 more reply

3 more replies level 4

justabofh 33 points · 6 days ago

Containers are great for stateless stuff. So your webservers/application servers can be shoved into containers. Think of containers as being the modern version of statically linked binaries or fat applications. Static binaries have the problem that any security vulnerability requires a full rebuild of the application, and that problem is escalated in containers (where you might not even know that a broken library exists)

If you are using the typical business application, you need one or more storage components for data which needs to be available, possibly changed and access controlled.

Containers are a bad fit for stateful databases, or any stateful component, really.

Containers also enable microservices, which are great ideas at a certain organisation size (if you aren't sure you need microservices, just use a simple monolithic architecture). The problem with microservices is that you replace complexity in your code with complexity in the communications between the various components, and that is harder to see and debug. level 5

Untgradd 6 points · 6 days ago

Containers are fine for stateful services -- you can manage persistence at the storage layer the same way you would have to manage it if you were running the process directly on the host.

6 more replies level 5

malikto44 5 points · 6 days ago

Backing up containers can be a pain, so you don't want to use them for valuable data unless the data is stored elsewhere, like a database or even a file server.

For spinning up stateless applications to take workload behind a load balancer, containers are excellent.

9 more replies

33 more replies level 3

malikto44 3 points · 6 days ago

The problem is that there is an overwhelming din from vendors. Everybody and their brother, sister, mother, uncle, cousin, dog, cat, and gerbil is trying to sell you some pay-by-the-month cloud "solution".

The reason is that the cloud forces people into monthly payments, which is a guarenteed income for companies, but costs a lot more in the long run, and if something happens and one can't make the payments, business is halted, ensuring that bankruptcies hit hard and fast. Even with the mainframe, a company could limp along without support for a few quarters until they could get cash flow enough.

If we have a serious economic downturn, the fact that businesses will be completely shuttered if they can't afford their AWS bill just means fewer companies can limp along when the economy is bad, which will intensify a downturn.

1 more reply

3 more replies level 2

wildcarde815 Jack of All Trades 12 points · 6 days ago

Also if you can't work without cloud access you better have a second link. level 2

pottertown 10 points · 6 days ago

Our company viewed the move to Azure less as a cost savings measure and more of a move towards agility and "right now" sizing of our infrastructure.

Your point is very accurate, as an example our location is wholly incapable of moving moving much to the cloud due to half of us being connnected via satellite network and the other half being bent over the barrel by the only ISP in town. level 2

_The_Judge 27 points · 6 days ago

I'm sorry, but I find management these days around tech wholly inadequate. The idea that you can get an MBA and manage shit you have no idea about is absurd and just wastes everyone elses time for them to constantly ELI5 so manager can do their job effectively. level 2

laserdicks 57 points · 6 days ago

Calling the uneducated people out on what they see as facts can be rewarding

Aaand political suicide in a corporate environment. Instead I use the following:

"I love this idea! We've actually been looking into a similar solution however we weren't able to overcome some insurmountable cost sinkholes (remember: nothing is impossible; just expensive). Will this idea require an increase in internet speed to account for the traffic going to the azure cloud?" level 3

lokko12 71 points · 6 days ago

Will this idea require an increase in internet speed to account for the traffic going to the azure cloud?

No.

...then people rent on /r/sysadmin about stupid investments and say "but i told them". level 4

HORACE-ENGDAHL Jack of All Trades 61 points · 6 days ago

This exactly, you can't compromise your own argument to the extent that it's easy to shoot down in the name of giving your managers a way of saving face, and if you deliver the facts after that sugar coating it will look even worse, as it will be interpreted as you setting them up to look like idiots. Being frank, objective and non-blaming is always the best route. level 4

linuxdragons 13 points · 6 days ago

Yeah, this is a terrible example. If I were his manager I would be starting the paperwork trail after that meeting.

6 more replies level 3

messburg 61 points · 6 days ago

I think it's quite an american thing to do it so enthusiastically; to hurt no one, but the result is so condescending. It must be annoying to walk on egg shells to survive a day in the office.


And this is not a rant against soft skills in IT, at all. level 4

vagrantprodigy07 13 points · 6 days ago

It is definitely annoying. level 4

widowhanzo 27 points · 6 days ago
· edited 6 days ago

We work with Americans and they're always so positive, it's kinda annoying. They enthusiastically say "This is very interesting" when in reality it sucks and they know it.

Another less professional example, one of my (non-american) co-workers always wants to go out for coffee (while we have free and better coffee in the office), and the American coworker is always nice like "I'll go with you but I'm not having any" and I just straight up reply "No. I'm not paying for shitty coffee, I'll make a brew in the office". And that's that. Sometimes telling it as it is makes the whole conversation much shorter :D level 5

superkp 42 points · 6 days ago

Maybe the american would appreciate the break and a chance to spend some time with a coworker away from screens, but also doesn't want the shit coffee?

Sounds quite pleasant, honestly. level 6

egamma Sysadmin 39 points · 6 days ago

Yes. "Go out for coffee" is code for "leave the office so we can complain about management/company/customer/Karen". Some conversations shouldn't happen inside the building. level 7

auru21 5 points · 6 days ago

And complain about that jerk who never joins them

1 more reply level 6

Adobe_Flesh 6 points · 6 days ago

Inferior American - please compute this - the sole intention was consumption of coffee. Therefore due to existence of coffee in office, trip to coffee shop is not sane. Resistance to my reasoning is futile. level 7

superkp 1 point · 6 days ago

Coffee is not the sole intention. If that was the case, then there wouldn't have been an invitation.

Another intention would be the social aspect - which americans are known to love, especially favoring it over doing any actual work in the context of a typical 9-5 corporate job.

I've effectively resisted your reasoning, therefore [insert ad hominem insult about inferior logic].

5 more replies level 5

ITaggie Tier II Support/Linux Admin 10 points · 6 days ago

I mean, I've taken breaks just to get away from the screens for a little while. They might just like being around you.

1 more reply

11 more replies level 3

tastyratz 7 points · 6 days ago

There is still value to tact in your delivery, but, don't slit your own throat. Remind them they hired you for a reason.

"I appreciate the ideas being raised here by management for cost savings and it certainly merits a discussion point. As a business, we have many challenges and one of them includes controlling costs. Bridging these conversations with subject matter experts and management this early can really help fill out the picture. I'd like to try to provide some of that value to the conversation here" level 3

renegadecanuck 2 points · 6 days ago

If it's political suicide to point out potential downsides, then you need to work somewhere else.

Especially since your response, in my experience, won't get anything done (people will either just say "no, that's not an issue", or they'll find your tone really condescending) and will just piss people off.

I worked with someone like that would would always be super passive aggressive in how she brought things up and it pissed me off to no end, because it felt less like bringing up potential issues and more like being belittling. level 4

laserdicks 1 point · 4 days ago

Agreed, but I'm too early on in my career to make that jump. level 3

A_A_A_U_U_U 1 point · 6 days ago

Feels good to get to a point in my career where I can call people our whenever the hell I feel like it. I've got recruiters banging down my door, I couldn't swing a stick without hitting a job offer or three.

Of course you I'm not suggesting problem be combative for no reason and to be tactful about it but if you don't call out fools like that then you're being negligent in your duties. level 4

adisor19 3 points · 6 days ago

This. Current IT market is in our advantage. Say it like it is and if they don't like it, GTFO.

14 more replies level 1

DragonDrew Jack of All Trades 777 points · 6 days ago

"I am resolute in my ability to elevate this collaborative, forward-thinking team into the revenue powerhouse that I believe it can be. We will transition into a DevOps team specialising in migrating our existing infrastructure entirely to code and go completely serverless!" - CFO that outsources IT level 2

OpenScore Sysadmin 529 points · 6 days ago

"We will utilize Artificial Intelligence, machine learning, Cloud technologies, python, data science and blockchain to achieve business value" level 3

omfgitzfear 472 points · 6 days ago

We're gonna be AGILE level 4

whetu 113 points · 6 days ago

Synergy. level 5

Erok2112 92 points · 6 days ago
Gold

Weird Al even wrote a song about it!

https://www.youtube.com/watch?v=GyV_UG60dD4 level 6

uptimefordays Netadmin 32 points · 6 days ago

It's so good, I hate it. level 7

Michelanvalo 31 points · 6 days ago

I love Al, I've seen him in concert a number of times, Alapalooza was the first CD I ever opened, I own hundreds of dollars of merchandise.

I cannot stand this song because it drives me insane to hear all this corporate shit in one 4:30 space.

4 more replies

8 more replies level 5

geoff1210 9 points · 6 days ago

I can't attend keynotes without this playing in the back of my head

17 more replies level 4

MadManMorbo 58 points · 6 days ago

We recently implemented DevOps practices, Scrum, and sprints have become the norm... I swear to god we spend 60% of our time planning our sprints, and 40% of the time doing the work, and management wonders why our true productivity has fallen through the floor... level 5

Angdrambor 26 points · 6 days ago

Let me guess - they left out the retrospectives because somebody brought up how bad they were fucking it all up? level 6

ValensEtVolens 1 point · 6 days ago

Those should be fairly short too. But how do you improve if you don't apply lessons learned?

Glad I work for a top IT company.
level 5
StormlitRadiance 23 points · 6 days ago

If you spend three whole days out of every five in planning meetings, this is a problem with your meeting planners, not with screm. If these people stay in charge, you'll be stuck in planning hell no matter what framework or buzzwords they try to fling around. level 6

lurker_lurks 15 points · 6 days ago

Scrum is dead, long live Screm! We need to implement it immediately. We must innovate and stay ahead of the curve! level 7

Solaris17 Sysadmin 3 points · 6 days ago

My last company used screm. No 3 day meeting events, 20min of loose direction and we were off to the races. We all came back with parts to different projects. SYNERGY. level 7

JustCallMeFrij 1 point · 6 days ago

First you scream, then you ahh. Now you can screm level 8

lurker_lurks 4 points · 6 days ago

You screm, I screm, we all screm for ice crem. level 7

StormlitRadiance 1 point · 5 days ago

It consists of three managers for every engineer and they all screm all day at a different quartet of three managers and an engineer. level 6

water_mizu 7 points · 6 days ago

Are you saying quantum synergy coupled with block chain neutral intelligence can not be used to expedite artificial intelligence amalgamation into that will metaphor into cucumber obsession?

3 more replies level 5

malikto44 9 points · 6 days ago

I worked at a place where the standup meetings went at least 4-6 hours each day. It was amazing how little got done there. Glad I bailed.

7 more replies level 4

opmrcrab 23 points · 6 days ago

fr agile, FTFY :P level 4

ChristopherBurr 20 points · 6 days ago

Haha, we just fired our Agil scrum masters. Turns out, they couldn't make development faster or streamlined.

I was so tired of seeing all the colored post it notes and white boards set up everywhere. level 5

JasonHenley 8 points · 6 days ago

We prefer to call them Scrum Lords here.

1 more reply level 4

Mr-Shank 85 points · 6 days ago

Agile is cancer... level 5

Skrp 66 points · 6 days ago

It doesn't have to be. But oftentimes it is, yes. level 6

Farren246 74 points · 6 days ago

Agile is good. "Agile" is very very bad. level 7

nineteen999 55 points · 6 days ago
· edited 6 days ago

Everyone says this, meaning "the way I do Agile is good, the way everyone else does it sucks. Buy my Agile book! Or my Agile training course! Only $199.99". level 8

fariak 54 points · 6 days ago

There are different ways to do Agile? From the past couple of places I worked at I thought Agile was just standing in a corner for 5 minutes each morning. Do some people sit? level 9

nineteen999 45 points · 6 days ago

Wait until they have you doing "retrospectives" on a Friday afternoon with a bunch of alcohol involved. By Monday morning nobody remembers what the fuck they retrospected about on Friday. level 10

fariak 51 points · 6 days ago

Now that's a scrum Continue this thread

6 more replies level 9

Ryuujinx DevOps Engineer 25 points · 6 days ago

No, that's what it's supposed to look like. A quick 'Is anyone blocked? Does anyone need anything/can anyone chip in with X? Ok get back to it'

What it usually looks like is a round table 'Tell us what you're working on' that takes at least 30, and depending on team size, closer to an hour. level 10

become_taintless 13 points · 6 days ago

our weekly 'stand-up' is often 60-90 minutes long, because they treat it like not only a roundtable discussion about what you're working on, but an opportunity to hash out every discussion to death, in front of C-levels.

also, the C-levels are at our 'stand-ups', because of course Continue this thread

3 more replies

8 more replies level 8

togetherwem0m0 12 points · 6 days ago

To me agile is an unfortunate framework to confront and dismantle a lot of hampering low value business processes. I call it a "get-er-done" framework. But yes theres not all Rose's and sunshine in agile. But it's important to destroy processes that make delivering value impossible

1 more reply level 7

PublicyPolicy 9 points · 6 days ago

Haha. all the places I worked with agile.

We gotta do agile.

But we set how much work gets done and when. Oh you are behind schedule. No problem. No unit test and no testing for you. Can't fall behind.

Then CIO. Guess what, we moved the up december deadline to September. Be agile! It's already been promised. We just have to pivot fuckers!

11 more replies level 5

Thameus We are Pakleds make it go 8 points · 6 days ago

"That's not real Agile" level 6

pioto 36 points · 6 days ago

No true Scotsman Scrum Master level 5

Thameus We are Pakleds make it go 8 points · 6 days ago

"That's not real Agile" level 6

pioto 36 points · 6 days ago

No true Scotsman Scrum Master

1 more reply level 5

StormlitRadiance 3 points · 6 days ago

Psychotic middle managers will always have their little spastic word salad, no matter what those words are. level 6

make_havoc 2 points · 6 days ago

Why? Why? Why is it that I can only give you one upvote? You need a thousand for this truth bomb! level 5

sobrique 2 points · 6 days ago

Like all such things - it's a useful technique, that turns into a colossal pile of wank if it's misused. This is true of practically every buzzword laden methodology I've seen introduced in the last 20 years. level 5

Angdrambor 2 points · 6 days ago

Fro me, the fact that my team is moderately scrummy is a decent treatment for my ADHD. The patterns are right up there with ritalin in terms of making me less neurologically crippled. level 5

corsicanguppy DevOps Zealot 1 point · 6 days ago

The 'fr' on the front isn't usually pronounced level 4

Thangleby_Slapdiback 3 points · 6 days ago

Christ I hate that word. level 4

NHarvey3DK 2 points · 6 days ago

I think we've moved on to AI level 4

blaze13541 1 point · 6 days ago

I think I'm going to snap if I have one more meting that discusses seamless migrations and seamless movement across a complex multi forest non-standardized network.

pooley92 1 point · 6 days ago

Try the business bullshit generator https://www.atrixnet.com/bs-generator.html level 4

pooley92 1 point · 6 days ago

Or try the tech bullshit generator https://www.makebullshit.com/

unixwasright 49 points · 6 days ago

Do we not still need to get the word "paradigm" in there somewhere? level 4

wallybeavis 36 points · 6 days ago

Last time I tried shifting some paradigms, I threw out my back. level 5

jackology 19 points · 6 days ago

Pivot yourself. level 6

EViLTeW 23 points · 6 days ago

If this doesn't work, circle back around and do the needful.

[Oct 06, 2019] Weird Al Yankovic - Mission Statement

Highly recommended!
This song seriously streamlined my workflow.
Oct 06, 2019 | www.youtube.com

FanmaR , 4 years ago

Props to the artist who actually found a way to visualize most of this meaningless corporate lingo. I'm sure it wasn't easy to come up with everything.

Maxwelhse , 3 years ago

He missed "sea change" and "vertical integration". Otherwise, that was pretty much all of the useless corporate meetings I've ever attended distilled down to 4.5 minutes. Oh, and you're getting laid off and/or no raises this year.

VenetianTemper , 4 years ago

From my experiences as an engineer, never trust a company that describes their product with the word "synergy".

Swag Mcfresh , 5 years ago

For those too young to get the joke, this is a style parody of Crosby, Stills & Nash, a folk-pop super-group from the 60's. They were hippies who spoke out against corporate interests, war, and politics. Al took their sound (flawlessly), and wrote a song in corporate jargon (the exact opposite of everything CSN was about). It's really brilliant, to those who get the joke.

112steinway , 4 years ago

Only in corporate speak can you use a whole lot of words while saying nothing at all.

Jonathan Ingersoll , 3 years ago

As a business major this is basically every essay I wrote.

A.J. Collins , 3 years ago

"The company has undergone organization optimization due to our strategy modification, which includes empowering the support to the operation in various global markets" - Red 5 on why they laid off 40 people suddenly. Weird Al would be proud.

meanmanturbo , 3 years ago

So this is basically a Dilbert strip turned into a song. I approve.

zyxwut321 , 4 years ago

In his big long career this has to be one of the best songs Weird Al's ever done. Very ambitious rendering of one of the most ambitious songs in pop music history.

teenygozer , 3 years ago

This should be played before corporate meetings to shame anyone who's about to get up and do the usual corporate presentation. Genius as usual, Mr. Yankovic!

Dunoid , 4 years ago

Maybe I'm too far gone to the world of computer nerds, but "Cloud Computing" seems like it should have been in the song somewhere.

Snoo Lee , 4 years ago

The "paradigm shift" at the end of the video / song is when the corporation screws everybody at the end. Brilliantly done, Al.

A Piece Of Bread , 3 years ago

Don't forget to triangulate the automatonic business monetizer to create exceptional synergy.

GeoffryHawk , 3 years ago

There's a quote it goes something like: A politician is someone who speaks for hours while saying nothing at all. And this is exactly it and it's brilliant.

Sefie Ezephiel , 4 months ago

From the current Gamestop earnings call "address the challenges that have impacted our results, and execute both deliberately and with urgency. We believe we will transform the business and shape the strategy for the GameStop of the future. This will be driven by our go-forward leadership team that is now in place, a multi-year transformation effort underway, a commitment to focusing on the core elements of our business that are meaningful to our future, and a disciplined approach to capital allocation."" yeah Weird Al totally nailed it

Phil H , 6 months ago

"People who enjoy meetings should not be put in charge of anything." -Thomas Sowell

Laff , 3 years ago

I heard "monetize our asses" for some reason...

Brett Naylor , 4 years ago

Excuse me, but "proactive" and "paradigm"? Aren't these just buzzwords that dumb people use to sound important? Not that I'm accusing you of anything like that. [pause] I'm fired, aren't I?~George Meyer

Mark Kahn , 4 years ago

Brilliant social commentary, on how the height of 60's optimism was bastardized into corporate enthusiasm. I hope SteveJjobs got to see this.

Mark , 4 years ago

That's the strangest "Draw My Life" I've ever seen.

Δ , 17 hours ago

I watch this at least once a day to take the edge of my job search whenever I have to decipher fifteen daily want-ads claiming to seek "Hospitality Ambassadors", "Customer Satisfaction Specialists", "Brand Representatives" and "Team Commitment Associates" eventually to discover they want someone to run a cash register and sweep up.

Mike The SandbridgeKid , 5 years ago

The irony is a song about Corporate Speak in the style of tie-died, hippie-dippy CSN (+/- )Y four-part harmony. Suite Judy Blue Eyes via Almost Cut My Hair filtered through Carry On. "Fantastic" middle finger to Wall Street,The City, and the monstrous excesses of Unbridled Capitalism.

Geetar Bear , 4 years ago (edited)

This reminds me of George carlin so much

Vaugn Ripen , 2 years ago

If you understand who and what he's taking a jab at, this is one of the greatest songs and videos of all time. So spot on. This and Frank's 2000 inch tv are my favorite songs of yours. Thanks Al!

Joolz Godfree , 4 years ago

hahaha, "Client-Centric Solutions...!" (or in my case at the time, 'Customer-Centric' solutions) now THAT's a term i haven't heard/read/seen in years, since last being an office drone. =D

Miles Lacey , 4 years ago

When I interact with this musical visual medium I am motivated to conceptualize how the English language can be better compartmentalized to synergize with the client-centric requirements of the microcosmic community focussed social entities that I administrate on social media while interfacing energetically about the inherent shortcomings of the current socio-economic and geo-political order in which we co-habitate. Now does this tedium flow in an effortless stream of coherent verbalisations capable of comprehension?

Soufriere , 5 years ago

When I bought "Mandatory Fun", put it in my car, and first heard this song, I busted a gut, laughing so hard I nearly crashed. All the corporate buzzwords! (except "pivot", apparently).

[Oct 06, 2019] Devop created huge opportunities for a new generation of snake oil salesman

Highly recommended!
Oct 06, 2019 | www.reddit.com

DragonDrew Jack of All Trades 772 points · 4 days ago

"I am resolute in my ability to elevate this collaborative, forward-thinking team into the revenue powerhouse that I believe it can be. We will transition into a DevOps team specialising in migrating our existing infrastructure entirely to code and go completely serverless!" - CFO that outsources IT level 2 OpenScore Sysadmin 527 points · 4 days ago

"We will utilize Artificial Intelligence, machine learning, Cloud technologies, python, data science and blockchain to achieve business value"

[Oct 06, 2019] This talk of going serverless or getting rid of traditional IT admins has gotten very old. In some ways it is true, but in many ways it is greatly exaggerated. There will always be a need for onsite technical support

Oct 06, 2019 | www.reddit.com

remi_in_2016_LUL NOC/SOC Analyst 109 points · 4 days ago

I agree with the sentiment. This talk of going serverless or getting rid of traditional IT admins has gotten very old. In some ways it is true, but in many ways it is greatly exaggerated. There will always be a need for onsite technical support. There are still users today that cannot plug in a mouse or keyboard into a USB port. Not to mention layer 1 issues; good luck getting your cloud provider to run a cable drop for you. Besides, who is going to manage your cloud instances? They don't just operate and manage themselves.

TLDR; most of us aren't going anywhere.

[Oct 05, 2019] Sick and tired of listening to these so called architects and full stack developers who watch bunch of videos on YouTube and Pluralsight, find articles online. They go around workplace throwing words like containers, devops, NoOps, azure, infrastructure as code, serverless, etc, but they don t understand half of the stuff

Devop created a new generation of bullsheeters
Oct 05, 2019 | www.reddit.com

They say, No more IT or system or server admins needed very soon...

Sick and tired of listening to these so called architects and full stack developers who watch bunch of videos on YouTube and Pluralsight, find articles online. They go around workplace throwing words like containers, devops, NoOps, azure, infrastructure as code, serverless, etc, they don't understand half of the stuff. I do some of the devops tasks in our company, I understand what it takes to implement and manage these technologies. Every meeting is infested with these A holes.

ntengineer 613 points · 4 days ago

Your best defense against these is to come up with non-sarcastic and quality questions to ask these people during the meeting, and watch them not have a clue how to answer them.

For example, a friend of mine worked at a smallish company, some manager really wanted to move more of their stuff into Azure including AD and Exchange environment. But they had common problems with their internet connection due to limited bandwidth and them not wanting to spend more. So during a meeting my friend asked a question something like this:

"You said on this slide that moving the AD environment and Exchange environment to Azure will save us money. Did you take into account that we will need to increase our internet speed by a factor of at least 4 in order to accommodate the increase in traffic going out to the Azure cloud? "

Of course, they hadn't. So the CEO asked my friend if he had the numbers, which he had already done his homework, and it was a significant increase in cost every month and taking into account the cost for Azure and the increase in bandwidth wiped away the manager's savings.

I know this won't work for everyone. Sometimes there is real savings in moving things to the cloud. But often times there really isn't. Calling the uneducated people out on what they see as facts can be rewarding. level 2

PuzzledSwitch 101 points · 4 days ago

my previous boss was that kind of a guy. he waited till other people were done throwing their weight around in a meeting and then calmly and politely dismantled them with facts.

no amount of corporate pressuring or bitching could ever stand up to that. level 3

themastermatt 42 points · 4 days ago

Ive been trying to do this. Problem is that everyone keeps talking all the way to the end of the meeting leaving no room for rational facts. level 4 PuzzledSwitch 35 points · 4 days ago

make a follow-up in email, then.

or, you might have to interject for a moment.

williamfny Jack of All Trades 26 points · 4 days ago

This is my approach. I don't yell or raise my voice, I just wait. Then I start asking questions that they generally cannot answer and slowly take them apart. I don't have to be loud to get my point across. level 4

MaxHedrome 6 points · 4 days ago

Listen to this guy OP

This tactic is called "the box game". Just continuously ask them logical questions that can't be answered with their stupidity. (Box them in), let them be their own argument against themselves.

CrazyTachikoma 4 days ago

Most DevOps I've met are devs trying to bypass the sysadmins. This, and the Cloud fad, are burning serious amount of money from companies managed by stupid people that get easily impressed by PR stunts and shiny conferences. Then when everything goes to shit, they call the infrastructure team to fix it...

[Oct 05, 2019] Summary of Eric Hoffer's, The True Believer Reason and Meaning

Oct 05, 2019 | reasonandmeaning.com

Summary of Eric Hoffer's, The True Believer September 4, 2017 Book Reviews - Politics , Politics - Tyranny John Messerly

Eric Hoffer in 1967, in the Oval Office, visiting President Lyndon Baines JohnsonEric Hoffer in 1967, in the Oval Office , visiting President Lyndon Baines Johnson

" Hatred is the most accessible and comprehensive of all the unifying agents Mass movements can rise and spread without belief in a god, but never without a belief in a devil. " ~ Eric Hoffer, The True Believer: Thoughts on the Nature of Mass Movements

(This article was reprinted in the online magazine of the Institute for Ethics & Emerging Technologies, October 19, 2017.)

Eric Hoffer (1898 – 1983) was an American moral and social philosopher who worked for more than twenty years as longshoremen in San Francisco. The author of ten books, he was awarded the Presidential Medal of Freedom in 1983. His first book, The True Believer: Thoughts on the Nature of Mass Movements (1951), is a work in social psychology which discusses the psychological causes of fanaticism. It is widely considered a classic.

Overview

The first lines of Hoffer's book clearly state its purpose:

This book deals with some peculiarities common to all mass movements, be they religious movements, social revolutions or nationalist movements. It does not maintain that all movements are identical, but that they share certain essential characteristics which give them a family likeness.

All mass movements generate in their adherents a readiness to die and a proclivity for united action; all of them, irrespective of the doctrine they preach and the program they project, breed fanaticism, enthusiasm, fervent hope, hatred and intolerance; all of them are capable of releasing a powerful flow of activity in certain departments of life; all of them demand blind faith and single-hearted allegiance

The assumption that mass movements have many traits in common does not imply that all movements are equally beneficent or poisonous. The book passes no judgments, and expresses no preferences. It merely tries to explain (pp. xi-xiii)

Part 1 – The Appeal of Mass Movements

Hoffer says that mass movements begin when discontented, frustrated, powerless people lose faith in existing institutions and demand change. Feeling hopeless, such people participate in movements that allow them to become part of a larger collective. They become true believers in a mass movement that "appeals not to those intent on bolstering and advancing a cherished self, but to those who crave to be rid of an unwanted self because it can satisfy the passion for self-renunciation." (p. 12)

Put another way, Hoffer says: "Faith in a holy cause is to a considerable extent a substitute for the loss of faith in ourselves." (p. 14) Leaders inspire these movements, but the seeds of mass movements must already exist for the leaders to be successful. And while mass movements typically blend nationalist, political and religious ideas, they all compete for angry and/or marginalized people.

Part 2 – The Potential Converts

The destitute are not usually converts to mass movements; they are too busy trying to survive to become engaged. But what Hoffer calls the "new poor," those who previously had wealth or status but who believe they have now lost it, are potential converts. Such people are resentful and blame others for their problems.

Mass movements also attract the partially assimilated -- those who feel alienated from mainstream culture. Others include misfits, outcasts, adolescents, and sinners, as well as the ambitious, selfish, impotent and bored. Wha